Takeaways

So you've seen the clusters. Now what.

The shift in lens

Most podcast analytics ask what people listened to. Cohort asks which apps the downloads came through, and groups shows whose downloads distribute similarly across the 142-column app dimension.

Against Podcast Index topic categories the clusters scored ARI 0.016. Random. Against popularity bands and app mix, ARI 0.215. Real signal. Two shows can be very different topically and still get downloaded through the same app mix. Two shows in the same genre can have very different app distributions.

One important distinction. Similar app distribution is not the same as shared listeners. We measure aggregate downloads per (show, app), not individual listener events. Two shows in the same row cluster reach audiences that use a similar mix of apps. Whether the actual humans overlap is plausible inference, not observed measurement.
Key Result
What groups shows together in this data is the app mix their downloads flow through, not what the shows are about. App is a more useful organizing axis than topic.

Three app clusters carry most of the listening

89% of observed downloads flow through three of the sixteen app clusters. They all look like "mainstream platforms" from the outside. The algorithm separated them because the show categories distributed through them differ enough to register as distinct groups.

Walled-garden apps
Apple Podcasts, Spotify, CastBox. The platform-default surface. Audiences who listen through the app that came with their phone or their music service. Category mix is close to the global average.
Dedicated podcast apps
Overcast, Pocket Casts, Podcast Addict, AntennaPod. Audiences who actively chose a podcast app. Over-indexes ~4× on tech and entrepreneurship, ~3× on news. Under-indexes on true crime and education.
Smart-speaker apps
Amazon Music, Alexa devices, iHeartRadio, Audible. Often ambient or passive listening. Over-indexes on health and society topics.

For podcasters

If you run a podcast, the cluster picture is useful for a few things.

Which cluster your show lives in

Look at your show's download distribution across apps. The dominant apps place you in one of the clusters above. That cluster predicts how your downloads distribute more reliably than your show's topic does.

Concretely, a tech podcast and a true-crime podcast at the same download scale will probably share more cohort neighbors than a tech podcast and a different tech podcast at very different scales. Scale and app mix dominate.

Reading the lift signal

Within each app cluster, certain show categories over-perform or under-perform relative to the global average. The lift heatmap shows the cluster-by-category picture. Practical signals.

Key Result
Marketing budget allocated where your category over-indexes typically returns better than spreading evenly across platforms. The lift signal is the cheapest budget-allocation tool you can read from cohort.

Cross-promotion candidates

Cohort row clusters group shows whose downloads distribute through similar app mixes at similar scale. Your cluster mates reach audiences who use the same set of apps. Cross-promoting through those shows puts you in front of listeners who are already in the right app environment to find you.

Whether those listeners overlap directly with your existing audience isn't something the matrix can tell us, only that the platform reach overlaps. That's still useful. Many cross-promo programs are sold on platform-reach overlap, not literal listener overlap.

Key Result
For platform-reach overlap, cohort's row-cluster neighbors beat topic-similar shows from a category browser. For "do these shows share listeners," you'd need per-listener data we don't have.

Beyond individual shows

Things that show up in the data that matter for platforms and tooling.

What the data isn't

The aggregate-downloads-per-(show, app) format imposes real limits on what cohort can support.

Not topic similarity
Row clusters don't model show content. For "shows similar in subject matter," use a different model. Transcript embeddings or category metadata work better for that question.
Not listener overlap
OP3 exposes aggregate counts, not per-listener events. Two shows in the same cluster reach audiences with similar app habits, but whether the same individuals listen to both is inference, not observation. Plausible at similar scale and language, much less so across language boundaries.
Not engagement quality
A download is not a completed listen. The data has no drop-off, no completion percentage, no skip behavior. High downloads can still mean low retention.
Not causation
Audience patterns are observed in the data, not caused. Cohort tells you what is, not why. Inference from cluster membership to listener intent should stay cautious.

Also see: analysis · algorithm reference · slide deck · the paper