Week-by-Week Progress

Week Focus Key Outcomes
1–2 Setup & Intro to VLMs Ran first baseline experiments
3–4 Prompt Engineering Basics Learned few-shot & CoT prompting
5–6 Float Value Stability Designed parsing + clamping logic
7–8 Regional Consistency Built region-specific scoring prompts
9–10 Determinism & Repeatability Logged nondeterminism patterns
11 Human-like Grading Mental mapping + region independence
12 Benchmarking with Trackers W&B, Weave, Trackio dashboards integrated
13+ Fine-tuning & Feedback Loop Starting dataset prep and calibration

Each week has been a step toward making AI grading stable, interpretable, and human-aligned.