How Accurate Is Our AI Essay Grader?

Every AP prep tool with AI grading claims it works. We measured ours against the only ground truth that exists — official College Board sample essays with published scores — and we publish the results. As of June 2026, no other AP prep platform publishes calibration data.

The method

We collected released sample essays (DBQ, LEQ, and SAQ) from College Board's official scoring materials for AP US History, AP World History, and AP European History — essays where the official score is public. We ran each through AimFive's grader cold, then compared.

The results

77% of essays scored within ±1 point of the official College Board score (31 official sample essays across APUSH, AP World, AP Euro)
0.00 average bias — the grader is not systematically harsh or lenient
94% of SAQ scores within ±1 in checkpoint-scoring validation

How the grader works

It scores point-by-point on the actual College Board rubric — thesis, contextualization, evidence, analysis — the way a human reader does. A strict judge model decides each rubric point; a separate coach explains what to fix. The coach cannot change your score, so friendlier feedback never inflates results. Subjective points require unanimous agreement across multiple passes, which is what eliminated the leniency that plagues most AI graders.

What this doesn't mean

An AI score is practice feedback, not an official result. Human AP readers resolve genuinely ambiguous essays in ways no AI can promise to match. We publish this data so you know the practice signal is honest — not to claim perfection.

Try a DBQ with rubric feedback · APUSH practice · LEQ practice

AP and Advanced Placement are trademarks of College Board. AimFive is not affiliated with or endorsed by College Board.

The method

The results

How the grader works

What this doesn't mean

Other apps give you a score. AimFive shows you the rubric.