Most AI grading tools claim accuracy and never explain how. Here's the full methodology behind AimFive's rubric scoring — what model, what data, what works, and what doesn't.
The Goal
For every DBQ, LEQ, SAQ, or FRQ a student writes, AimFive returns:
- A score per rubric point (earned / not earned, with reasoning).
- Specific feedback on which sentences earned which points.
- One actionable suggestion for the next attempt.
How It Works
- Rubric loading: Each AP essay format has a structured rubric extracted from College Board's published scoring guidelines. AimFive references the exact criteria, not a paraphrase.
- Structural parse: The essay is broken into thesis, body paragraphs, evidence claims, and conclusion. Each part is mapped to relevant rubric criteria.
- Per-criterion evaluation: An LLM evaluates whether each rubric criterion is met by the student's writing — citing specific text from the essay as evidence.
- Score aggregation + feedback: Earned rubric points are summed; specific feedback is generated per missed point with what would have earned it.
Calibration
The grader has been calibrated against:
- Officially released College Board scoring guides for each AP essay format.
- Sample student essays from the College Board's published sample exams.
- AP-experienced teacher annotations on a small holdout set used for evaluation.
Limits — Where It Fails
We're explicit about this because pretending AI grading is perfect makes it less trustworthy, not more.
- Complexity / sophistication point: The hardest rubric point to grade reliably for any grader (including humans). AimFive's agreement with human teachers drops here. We err on the side of NOT awarding the point unless multiple criteria are clearly met.
- Document analysis on DBQs: When students misattribute or misquote documents, our grader sometimes misses the error. We're improving this with each iteration.
- Handwriting-style errors: AimFive only grades typed responses. Handwriting analysis from photos is on the roadmap but not live.
- Languages other than English: AP Spanish Language grading uses a different model and is in beta.
The Independent Agreement Study
We are running a formal agreement study comparing AimFive grading to three AP-experienced teachers on 90 essays across APUSH, AP World, AP English Language, and AP Biology. Results will be published in our annual State of AP report. We'll publish whether agreement is good, bad, or somewhere in between.
How to Report Bad Grading
If AimFive grades an essay wrong, tell us. Email grading@aimfive.com with the essay text and the score you think it should have earned. Every report goes into our training/eval set. We pay $50 per validated case where our grader missed.
Try AimFive Free · Outcome Study · State of AP Report
AP and Advanced Placement are trademarks of College Board. AimFive is not affiliated with or endorsed by College Board.