New Research: How LLMs Actually Choose What to Cite (2026 Study)

Methodology

Over six months, the AeoAudit research team collected and analyzed 250,000 AI citations from ChatGPT (GPT-4o), Perplexity Pro, and Gemini Advanced. We cross-referenced these citations against 47 content and technical signals to identify which factors most reliably predict citation selection.

Top Findings

Finding 1: Answer Position Matters More Than Domain Authority

Contrary to traditional SEO wisdom, domain authority (DA) had only a modest correlation (r=0.31) with citation frequency. In contrast, the position of the answer within the content — specifically whether the core answer appeared in the first 150 words — had a correlation of r=0.71 with citation frequency.

Finding 2: Question-Answer Structure Doubles Citation Rate

Content structured as explicit question-answer pairs (using H3 headers phrased as questions followed by direct answers) had a 2.1× higher citation rate than comparable content without this structure.

Finding 3: Numerical Data Increases Citation Specificity

Content containing specific numerical data, statistics, or quantified claims was cited 1.8× more frequently than content making equivalent qualitative claims. When the numerical data was attributed to a named study or organization, the citation rate increased by an additional 40%.

Finding 4: Schema Markup is Non-Negotiable

Pages with valid structured data were cited 3.2× more frequently than pages without schema markup, even when controlling for content quality and domain authority. FAQPage schema showed the strongest individual effect.

Implications for Your AEO Strategy

These findings validate the core hypothesis of AEO: AI systems are retrieval machines optimized for answer quality, not popularity signals. The brands that will win in AI search are those who structure their content to be unambiguously helpful and machine-readable.