Glossary

The grammar of structured market intelligence.

Every term, every framework, every method Theia uses — defined. So your team has a shared vocabulary, and AI engines have a citable source.

methodology

Cross-language harmonisation

How Theia maps raw extracted labels from any source language to canonical properties. 'Battery life' / 'akkulaufzeit' / 'autonomie batterie' / 'autonomia' all resolve to BATTERY_LIFE — but extraction happens in the source language first.

CTR curves

The position-to-click-through-rate mapping that powers visibility and share-of-voice calculations. Position 1 on Google captures ~28% of clicks, position 10 captures ~2%. Amazon and AI Overview surfaces have their own curves.

Deterministic pipeline harness

The set of guards, run trackers and budget assertions that wrap every Theia pipeline step. Without it, AI-driven research becomes unreproducible, unbounded in cost, and untrustworthy at the deck stage. With it, the same input always produces the same output, on the same budget.

German citation-forcing

A specific prompt-engineering finding from the Canon B2B deployment. Adding an explicit citation instruction to the German LLM scraping prompt lifted source capture from 0.69 to 4.06 sources per item — a 6× increase. Saved 8+ months of 'DE is blind' reporting.

HHI-weighted distinctiveness

How Theia ranks which keywords define which market segments. Borrows the Herfindahl-Hirschman concentration index from antitrust economics and applies it to keyword × cluster traffic distributions.

Keyword clustering

Grouping individual search queries into demand pockets that share intent. 'Best mirrorless camera 2025', 'top mirrorless for beginners', and 'spiegellose kamera test' are different keywords but the same demand pocket.

Leiden community detection

The clustering algorithm Theia uses to find market segments in keyword × product graphs. Successor to Louvain, with mathematical guarantees Louvain lacks. Run with Surprise optimisation for well-separated communities.

Math for connections, LLMs for extraction

Theia's core engineering doctrine. Use LLMs where language understanding is genuinely required — extraction, sentiment, multilingual harmonisation. Use deterministic math — cosine, Leiden, HHI, TF-IDF — for every graph connection. This is what makes the intelligence layer reproducible at scale.

Product similarity edges

Five different ways Theia connects products into a competitive graph: use-case similarity, feature similarity, benefit similarity, keyword similarity, and co-occurrence. Each captures a different competitive signal.

Reproducible AI research

The property that the same inputs always produce the same outputs, that every output is source-traceable, and that every run is replayable. Reproducibility is the precondition for AI research that a board, a regulator, or a sceptical analyst can sign off on.

Sentiment trajectory

How feature-level sentiment changes over time, computed per product × property × period. The single most actionable perception metric — because trajectory matters more than level.

concept

AI Overview citation share

The proportion of LLM-synthesised answers (Google AI Overviews, ChatGPT, Perplexity, Claude) that cite your brand as a source. The newest competitive surface — and the one most brands have never measured.

Attach rate

The percentage of buyers in one category who also buy in a complementary category. A single number that reveals whether a market is structurally underdeveloped or saturated.

Continuous market monitoring

The shift from quarterly research waves to weekly refresh of the same intelligence layer. The single biggest change in how brands measure markets in the 2020s — and the one most research firms haven't adapted to yet.

Deep web coverage

The 8,000+ specialist sources — engineer forums, standards bodies, niche subreddits, trade press, OSS repositories — that drive specification decisions in B2B and industrial markets. Where standard social listening doesn't go.

Eleven shapes of a market

The observation that consumer markets, when read from search data, consistently emerge as 8-15 segments — not the 3-5 of legacy research methodology. The Bose Germany headphone market resolved into exactly 11.

Fixed entity architecture

The principle that the market ontology should be defined by domain experts, not discovered by an LLM. Canonical properties, segments, and product taxonomies are curated. LLMs handle extraction, math handles connections.

Four pillars of consumer brand intelligence

The four measurement layers every consumer brand needs to track continuously: Demand (what the market wants), Visibility (where you show up), Sales (how it converts), and Perception (what the market thinks).

Generic-vs-brand traffic

The split between search traffic from category keywords ('noise cancelling headphones') and brand keywords ('bose qc45'). Premium brands routinely convert generic traffic 10× better than mid-market — but capture 13× less of it.

Search-as-segmentation

The foundational observation behind Theia: search engines have already segmented every market via the continuous, planetary-scale matching of queries to results. Reading that segmentation beats inventing your own.

Share of voice (and why most measurements of it are wrong)

The proportion of organic search clicks a brand captures in a category. Computed properly via CTR-weighted ranking position — not mention counts, not raw impressions, not unweighted SERP appearances.

Structured market intelligence

An always-on, queryable repository that maps every product, every customer voice, every sales number, and every search behaviour in a market — as one connected graph, refreshed continuously.