Week 11 – Prompt Engineering for Visual Scoring & Ground Truth Stability
Dates: August 10 – August 16
Internship: AI/ML Intern at SynerSense Pvt. Ltd.
Mentor: Praveen Kulkarni Sir
Focus
This week concentrated on refining prompt strategies to improve visual-fat quantification accuracy. A major emphasis was placed on ensuring consistency, region-wise interpretability, and building mental mappings for float-value scoring.
Goals for the Week
- Enhance the existing prompt using few-shot and chain-of-thought prompting techniques
- Investigate inconsistency across repeated image inputs and design stability measures
- Develop a mental visual scale to standardize fat prominence scores (0.00 to 1.00)
- Create annotated prompt-ready instructions derived from real visual examples
- Ensure precise parsing and region-wise separation for each side (left/right)
Tasks Completed
Task | Status | Notes |
---|---|---|
Added CoT and structured reasoning layers to prompt | ✅ Completed | Boosted VLM interpretability and regional independence |
Designed and discussed image-level consistency mechanisms | ✅ Completed | Proposed “cache + score threshold + output clamp” ensemble logic |
Constructed mental scoring map per region based on bulge prominence | ✅ Completed | Helps guide both prompting and human validation |
Rewrote output parser format for strict float range (0.00–1.00) enforcement | ✅ Completed | Removed random float fluctuations via prompt calibration |
Logged visual inconsistencies across random generations | ✅ Completed | Documented variation patterns and their correlation with ambiguity |
Coordinated prompt changes with DevOps team via updated prompt version | ✅ Completed | Used prompt version 10 after modifications |
Key Learnings
- Chain-of-thought improves per-region breakdown and avoids cross-region bias.
- Few-shot prompting improves formatting fidelity by modeling ideal completions.
- Visual-to-float mappings require stable mental anchors for human-model alignment.
- Output formatting needs strict post-filtering to avoid hallucinated or malformed strings.
Problems Faced & Solutions
Problem | Solution |
---|---|
Inconsistent float values for same image | Used mental mapping + prompt enhancement + caching |
Prompt exceeding token limits after embedding examples | Pruned explanations, shortened examples, and modularized format section |
Model output outside allowed float range (e.g., 1.2, -0.1) | Rewrote scoring boundaries directly in prompt and added float clamp |
📎 References
Goals for Next Week
- Begin region-specific visual dataset curation for fine-tuning
- Start work on feedback loop interface between model score and human validation
- Explore statistical calibration techniques using real score distributions
Screenshots (Optional)
Example of new prompt structure, before/after score comparison, cache logs.
“Week 11 was about making the model think like a human grader—clear, structured, and region-wise grounded.”