Glossary·methodology

Keyword clustering

Grouping individual search queries into demand pockets that share intent. 'Best mirrorless camera 2025', 'top mirrorless for beginners', and 'spiegellose kamera test' are different keywords but the same demand pocket.

What it is

A single market category contains thousands of keywords with effectively the same intent. Treating them individually is wasteful and statistically noisy. Theia clusters keywords into demand pockets — groups that:

  • Share intent (the consumer is looking for the same thing)
  • Connect to the same set of products (the SERPs and Amazon listings overlap)
  • Can be addressed with the same content

A UK mirrorless camera category might have 8,300 keywords resolving into 40-60 demand pockets.

How clustering works

Theia clusters keywords in two ways depending on the data available:

1. SERP overlap clustering (Google) Keywords whose top 10 SERP results overlap by ≥ N URLs are treated as the same demand pocket. This is the classic "ranks together = clusters together" approach.

2. Bipartite Leiden clustering (Amazon) Keywords × products as a weighted bipartite graph, with traffic_index as edge weight. Leiden Surprise finds communities. This works when SERP data is sparse but ranking traffic is rich (Amazon, retailer search).

The two methods produce overlapping but distinct clusterings. The final keyword cluster is the union after cross-language merge.

Why this matters

For SEO and content strategy:

  • You write one piece of content per cluster, not per keyword
  • You target the distinctive keyword for each cluster as the primary, with variants as secondary
  • You measure share of voice at cluster level, not keyword level

For market sizing:

  • A demand pocket has a meaningful search volume (sum of all member keywords)
  • Pockets can be ranked by volume × commercial intent
  • The long tail of low-volume variants is grouped, not discarded

Example: UK gift market

The UK gift market we mapped in 2022 had 11,000+ unique keywords. Clustered into 7 demand pockets:

PocketDistinctive keywordVolume
Personalised gifts"personalised gift"280K/mo
Experience days"experience days"110K/mo
Same-day flowers"flowers next day delivery"95K/mo
Hampers"luxury hamper"68K/mo
Subscription boxes"subscription box gift"52K/mo
Tech gadgets"gadget gift"31K/mo
Mini photo prints"mini photo printer"18K/mo

7 demand pockets is a strategy. 11,000 keywords is a spreadsheet.

Distinctive vs generic keywords within a cluster

Within each cluster, HHI-weighted distinctiveness ranks the keywords by how characteristic they are of the cluster. The most distinctive keyword becomes the cluster name and the primary content target. The head terms ("gift", "present") are recognised as generic and excluded from naming.