Week-by-Week Progress
| Week | Focus | Key Outcomes |
|---|---|---|
| 1–2 | Setup & Intro to VLMs | Ran first baseline experiments |
| 3–4 | Prompt Engineering Basics | Learned few-shot & CoT prompting |
| 5–6 | Float Value Stability | Designed parsing + clamping logic |
| 7–8 | Regional Consistency | Built region-specific scoring prompts |
| 9–10 | Determinism & Repeatability | Logged nondeterminism patterns |
| 11 | Human-like Grading | Mental mapping + region independence |
| 12 | Benchmarking with Trackers | W&B, Weave, Trackio dashboards integrated |
| 13+ | Fine-tuning & Feedback Loop | Starting dataset prep and calibration |
Each week has been a step toward making AI grading stable, interpretable, and human-aligned.