Week 10 – Output Consistency, Prompt Tuning & Caching
Dates: August 3 – August 9
Internship: AI/ML Intern at SynerSense Pvt. Ltd.
Mentor: Praveen Kulkarni Sir
Focus
This week emphasized improving consistency in model outputs by enhancing prompt design and introducing output caching to handle repeat image submissions.
Goals for the Week
- Minimize output variation for identical inputs using prompt reinforcement and caching
- Refine prompt versioning and float formatting reliability
- Set up a caching mechanism using hashed image data
- Review repeatability across prompt versions and VLM models
Tasks Completed
Task | Status | Notes |
---|---|---|
Refined prompt instructions for deterministic float outputs | ✅ Completed | Added range reminders and removed randomness triggers |
Implemented base64 encoding & hashing for image fingerprinting | ✅ Completed | Used hash-based caching to store previous image analysis results |
Integrated local cache layer in analyze_image() function | ✅ Completed | Prevents unnecessary API calls for repeat submissions |
Benchmarked prompt output variation across multiple runs | ✅ Completed | Evaluated stability improvements pre/post caching |
Validated prompt structure inside OpenAI’s prompt management UI | ✅ Completed | Ensured strict parsing and structured response remained intact |
Documented float quantification scale for interpretability | ✅ Completed | Drafted internal scale chart for scoring visual fat prominence |
Key Learnings
- Prompt tuning and structural constraints reduce LLM hallucination and increase repeatability.
- Caching based on input content (image hash) preserves server resources and enhances UX.
- Separating content (region definitions) from formatting improves long-term maintainability.
- Explicit examples and negative examples in the prompt reinforce compliance.
Problems Faced & Solutions
Problem | Solution |
---|---|
LLM gave slightly different scores for the same image | Introduced image hashing + local caching; reinforced output format |
Float values exceeding the 0.00–1.00 range | Added guardrails in prompt text; tested invalid ranges to fine-tune |
Difficulty debugging caching mismatch | Switched from filename check to SHA256-based base64 hash comparison |
📎 References
Goals for Next Week
- Explore CoT (Chain-of-Thought) and few-shot prompting for region-specific comparison
- Begin work on per-region visual threshold calibration with model feedback
- Build a log system to trace and version response changes for same inputs
Screenshots (Optional)
Hashing example, cache hit logs, or response comparisons from different prompt versions.
“Week 10 taught me that smart caching and deterministic prompting are how you tame an LLM into consistency.”