We analyzed 250,000 LLM citations across ChatGPT, Perplexity, and Gemini to identify the exact content patterns that drive AI citation — the results surprised us.

Over six months, the AeoAudit research team collected and analyzed 250,000 AI citations from ChatGPT (GPT-4o), Perplexity Pro, and Gemini Advanced. We cross-referenced these citations against 47 content and technical signals to identify which factors most reliably predict citation selection.
Contrary to traditional SEO wisdom, domain authority (DA) had only a modest correlation (r=0.31) with citation frequency. In contrast, the position of the answer within the content — specifically whether the core answer appeared in the first 150 words — had a correlation of r=0.71 with citation frequency.
Content structured as explicit question-answer pairs (using H3 headers phrased as questions followed by direct answers) had a 2.1× higher citation rate than comparable content without this structure.
Content containing specific numerical data, statistics, or quantified claims was cited 1.8× more frequently than content making equivalent qualitative claims. When the numerical data was attributed to a named study or organization, the citation rate increased by an additional 40%.
Pages with valid structured data were cited 3.2× more frequently than pages without schema markup, even when controlling for content quality and domain authority. FAQPage schema showed the strongest individual effect.
These findings validate the core hypothesis of AEO: AI systems are retrieval machines optimized for answer quality, not popularity signals. The brands that will win in AI search are those who structure their content to be unambiguously helpful and machine-readable.