What it is
A single market category contains thousands of keywords with effectively the same intent. Treating them individually is wasteful and statistically noisy. Theia clusters keywords into demand pockets — groups that:
- Share intent (the consumer is looking for the same thing)
- Connect to the same set of products (the SERPs and Amazon listings overlap)
- Can be addressed with the same content
A UK mirrorless camera category might have 8,300 keywords resolving into 40-60 demand pockets.
How clustering works
Theia clusters keywords in two ways depending on the data available:
1. SERP overlap clustering (Google) Keywords whose top 10 SERP results overlap by ≥ N URLs are treated as the same demand pocket. This is the classic "ranks together = clusters together" approach.
2. Bipartite Leiden clustering (Amazon) Keywords × products as a weighted bipartite graph, with traffic_index as edge weight. Leiden Surprise finds communities. This works when SERP data is sparse but ranking traffic is rich (Amazon, retailer search).
The two methods produce overlapping but distinct clusterings. The final keyword cluster is the union after cross-language merge.
Why this matters
For SEO and content strategy:
- You write one piece of content per cluster, not per keyword
- You target the distinctive keyword for each cluster as the primary, with variants as secondary
- You measure share of voice at cluster level, not keyword level
For market sizing:
- A demand pocket has a meaningful search volume (sum of all member keywords)
- Pockets can be ranked by volume × commercial intent
- The long tail of low-volume variants is grouped, not discarded
Example: UK gift market
The UK gift market we mapped in 2022 had 11,000+ unique keywords. Clustered into 7 demand pockets:
| Distinctive keyword | Volume | |
|---|---|---|
| Personalised gifts | "personalised gift" | 280K/mo |
| Experience days | "experience days" | 110K/mo |
| Same-day flowers | "flowers next day delivery" | 95K/mo |
| Hampers | "luxury hamper" | 68K/mo |
| Subscription boxes | "subscription box gift" | 52K/mo |
| Tech gadgets | "gadget gift" | 31K/mo |
| Mini photo prints | "mini photo printer" | 18K/mo |
7 demand pockets is a strategy. 11,000 keywords is a spreadsheet.
Distinctive vs generic keywords within a cluster
Within each cluster, HHI-weighted distinctiveness ranks the keywords by how characteristic they are of the cluster. The most distinctive keyword becomes the cluster name and the primary content target. The head terms ("gift", "present") are recognised as generic and excluded from naming.