Takeaways

So you've seen the clusters. Now what.

The shift in lens

Most podcast analytics ask what people listened to. Cohort asks which apps the downloads came through, and groups shows whose downloads distribute similarly across the 142-column app dimension.

Against Podcast Index topic categories the clusters scored ARI 0.016. Random. Against popularity bands and app mix, ARI 0.215. Real signal. Two shows can be very different topically and still get downloaded through the same app mix. Two shows in the same genre can have very different app distributions.

One important distinction. Similar app distribution is not the same as shared listeners. We measure aggregate downloads per (show, app), not individual listener events. Two shows in the same row cluster reach audiences that use a similar mix of apps. Whether the actual humans overlap is plausible inference, not observed measurement.

Key Result

What groups shows together in this data is the app mix their downloads flow through, not what the shows are about. App is a more useful organizing axis than topic.

Three app clusters carry most of the listening

89% of observed downloads flow through three of the sixteen app clusters. They all look like "mainstream platforms" from the outside. The algorithm separated them because the show categories distributed through them differ enough to register as distinct groups.

77.3%

Walled-garden apps

Apple Podcasts, Spotify, CastBox. The platform-default surface. Audiences who listen through the app that came with their phone or their music service. Category mix is close to the global average.

11.8%

Dedicated podcast apps

Overcast, Pocket Casts, Podcast Addict, AntennaPod. Audiences who actively chose a podcast app. Over-indexes ~4× on tech and entrepreneurship, ~3× on news. Under-indexes on true crime and education.

6.4%

Smart-speaker apps

Amazon Music, Alexa devices, iHeartRadio, Audible. Often ambient or passive listening. Over-indexes on health and society topics.

For podcasters

If you run a podcast, the cluster picture is useful for a few things.

Which cluster your show lives in

Look at your show's download distribution across apps. The dominant apps place you in one of the clusters above. That cluster predicts how your downloads distribute more reliably than your show's topic does.

Concretely, a tech podcast and a true-crime podcast at the same download scale will probably share more cohort neighbors than a tech podcast and a different tech podcast at very different scales. Scale and app mix dominate.

Reading the lift signal

Within each app cluster, certain show categories over-perform or under-perform relative to the global average. The lift heatmap shows the cluster-by-category picture. Practical signals.

Tech & entrepreneurship: dedicated podcast apps over-index ~4×. Marketing budget into Overcast, Pocket Casts, and podcast-app discovery tends to convert above baseline.
True crime: walled-garden Apple+Spotify is the main download source. Dedicated podcast apps are under-indexed. Platform recommendation slots matter more than dedicated-app placements.
Health & society: smart-speaker apps over-index ~3×. Content that reads well as ambient listening (single-host, slower pace) has a structural fit.
News: dedicated podcast apps over-index. The walled-garden audience treats news as one genre among many rather than a focused listening habit.

Key Result

Marketing budget allocated where your category over-indexes typically returns better than spreading evenly across platforms. The lift signal is the cheapest budget-allocation tool you can read from cohort.

Cross-promotion candidates

Cohort row clusters group shows whose downloads distribute through similar app mixes at similar scale. Your cluster mates reach audiences who use the same set of apps. Cross-promoting through those shows puts you in front of listeners who are already in the right app environment to find you.

Whether those listeners overlap directly with your existing audience isn't something the matrix can tell us, only that the platform reach overlaps. That's still useful. Many cross-promo programs are sold on platform-reach overlap, not literal listener overlap.

Key Result

For platform-reach overlap, cohort's row-cluster neighbors beat topic-similar shows from a category browser. For "do these shows share listeners," you'd need per-listener data we don't have.

Beyond individual shows

Things that show up in the data that matter for platforms and tooling.

Walled-garden dominance: 77% of downloads come through Apple Podcasts, Spotify, and CastBox. Platform UX and platform recommendation algorithms shape the majority of podcast listening, full stop.
Engaged-listener cohort: the 12% on dedicated podcast apps over-indexes on tech, entrepreneurship, and news. Niche vertical podcasts can find their audience here in higher concentration than on platform-default apps.
Smart-speaker listening: 6% of downloads come through smart-speaker and connected-device apps. Listening is ambient. Health and society content over-indexes here.
Long tail: the remaining 5% includes language- and region-specific clusters (iVoox for Spanish, Xiao Yu Zhou for Chinese, Anghami for Arabic). These don't show up in aggregate platform metrics but they're meaningful inside their language groups.

What the data isn't

The aggregate-downloads-per-(show, app) format imposes real limits on what cohort can support.

Not topic similarity

Row clusters don't model show content. For "shows similar in subject matter," use a different model. Transcript embeddings or category metadata work better for that question.

Not listener overlap

OP3 exposes aggregate counts, not per-listener events. Two shows in the same cluster reach audiences with similar app habits, but whether the same individuals listen to both is inference, not observation. Plausible at similar scale and language, much less so across language boundaries.

Not engagement quality

A download is not a completed listen. The data has no drop-off, no completion percentage, no skip behavior. High downloads can still mean low retention.

Not causation

Audience patterns are observed in the data, not caused. Cohort tells you what is, not why. Inference from cluster membership to listener intent should stay cautious.

Also see: analysis · algorithm reference · slide deck · the paper