Every AP prep tool with AI grading claims it works. We measured ours against the only ground truth that exists — official College Board sample essays with published scores — and we publish the results. As of June 2026, no other AP prep platform publishes calibration data.
The method
We collected released sample essays (DBQ, LEQ, and SAQ) from College Board's official scoring materials for AP US History, AP World History, and AP European History — essays where the official score is public. We ran each through AimFive's grader cold, then compared.
The results
- 77% of essays scored within ±1 point of the official College Board score (31 official sample essays across APUSH, AP World, AP Euro)
- 0.00 average bias — the grader is not systematically harsh or lenient
- 94% of SAQ scores within ±1 in checkpoint-scoring validation
How the grader works
It scores point-by-point on the actual College Board rubric — thesis, contextualization, evidence, analysis — the way a human reader does. A strict judge model decides each rubric point; a separate coach explains what to fix. The coach cannot change your score, so friendlier feedback never inflates results. Subjective points require unanimous agreement across multiple passes, which is what eliminated the leniency that plagues most AI graders.
What this doesn't mean
An AI score is practice feedback, not an official result. Human AP readers resolve genuinely ambiguous essays in ways no AI can promise to match. We publish this data so you know the practice signal is honest — not to claim perfection.
Try a DBQ with rubric feedback · APUSH practice · LEQ practice
AP and Advanced Placement are trademarks of College Board. AimFive is not affiliated with or endorsed by College Board.