# How LLMs Choose Which Sources to Cite | WebPossible

> Large language models do not cite at random. How retrieval and synthesis actually pick sources, and the content patterns that make a model reach for you.

Source: https://webpossible.com/ai-search-optimization/how-llms-choose-sources/
Format: Markdown version for AI agents. The canonical HTML page is at the source URL above.

---

[Home](/) / [AI Search Optimization](/ai-search-optimization/) / How LLMs Choose Sources

# How LLMs choose their sources

Models do not cite at random, and they do not cite whoever ranks first. Understanding the two-stage logic behind a citation tells you exactly what to fix.

[Get the AEO / GEO checklist](/resources/aeo-geo-audit-checklist/)

## Two stages, two different bars

| Stage | What gets you through |
| --- | --- |
| **Retrieval** | Relevance, authority, crawlability, clean structure, [schema](/structured-data-for-ai/) |
| **Synthesis** | Specificity, verifiable claims, original data, consistency with the wider web |

Retrieval decides who is in the room. Synthesis decides who gets quoted. Most pages that lose are relevant enough to be retrieved but not trustworthy or specific enough to be cited. That gap is where the work is.

## What makes a model reach for you

### You lower its uncertainty

A model writing an answer is constantly estimating how confident it can be. A specific number with a clear source, a first-hand test, a named method, all reduce that uncertainty, so they get pulled in. Vague or generic claims do the opposite.

### You agree with the world, or prove why you do not

Models are wary of claims that contradict the consensus without evidence. If you take a contrarian position, back it with data. If you simply contradict the record carelessly, you become a source to route around.

### You are easy to read

Facts trapped in images, scripts, or sprawling unstructured prose are facts a model might miss or mangle. Clean structure and a markdown version make your claims trivial to extract correctly. This is the foundation under [GEO](/generative-engine-optimization/).

Next: the [AI SEO and visibility tools](/ai-search-optimization/ai-seo-tools/) that help you act on this, or the [AI search optimization overview](/ai-search-optimization/).

## How LLMs cite, answered

How do large language models decide what to cite?

In two stages. First retrieval gathers candidate sources from an index, a live search, or training data, rewarding relevance, authority, and clean structure. Then synthesis writes the answer from the candidates the model trusts most, rewarding sources that are specific, verifiable, and consistent with everything else it has read. You optimize both stages.

Why do models skip some relevant pages?

Because relevance gets you into the candidate pool, but trust decides who gets quoted. A page that contradicts the consensus without evidence, hides its facts in images or scripts, or reads like filler adds risk for the model. It would rather cite a source that lowers its uncertainty than one that raises it.

What content gets cited most?

Content that reduces the model's uncertainty: specific numbers, first-hand testing, named methodology, and clearly attributed facts. Restated consensus rarely gets cited, because it adds nothing the model did not already have from a dozen other pages.

[Run the checklist](/resources/aeo-geo-audit-checklist/)