Week-by-Week Progress
Week | Focus | Key Outcomes |
---|---|---|
1–2 | Setup & Intro to VLMs | Ran first baseline experiments |
3–4 | Prompt Engineering Basics | Learned few-shot & CoT prompting |
5–6 | Float Value Stability | Designed parsing + clamping logic |
7–8 | Regional Consistency | Built region-specific scoring prompts |
9–10 | Determinism & Repeatability | Logged nondeterminism patterns |
11 | Human-like Grading | Mental mapping + region independence |
12 | Benchmarking with Trackers | W&B, Weave, Trackio dashboards integrated |
13+ | Fine-tuning & Feedback Loop | Starting dataset prep and calibration |
Each week has been a step toward making AI grading stable, interpretable, and human-aligned.