Week 10 – Output Consistency, Prompt Tuning & Caching

Dates: August 3 – August 9
Internship: AI/ML Intern at SynerSense Pvt. Ltd.
Mentor: Praveen Kulkarni Sir

Focus

This week emphasized improving consistency in model outputs by enhancing prompt design and introducing output caching to handle repeat image submissions.

Goals for the Week

Minimize output variation for identical inputs using prompt reinforcement and caching
Refine prompt versioning and float formatting reliability
Set up a caching mechanism using hashed image data
Review repeatability across prompt versions and VLM models

Tasks Completed

Task	Status	Notes
Refined prompt instructions for deterministic float outputs	✅ Completed	Added range reminders and removed randomness triggers
Implemented base64 encoding & hashing for image fingerprinting	✅ Completed	Used hash-based caching to store previous image analysis results
Integrated local cache layer in `analyze_image()` function	✅ Completed	Prevents unnecessary API calls for repeat submissions
Benchmarked prompt output variation across multiple runs	✅ Completed	Evaluated stability improvements pre/post caching
Validated prompt structure inside OpenAI’s `prompt management` UI	✅ Completed	Ensured strict parsing and structured response remained intact
Documented float quantification scale for interpretability	✅ Completed	Drafted internal scale chart for scoring visual fat prominence

Key Learnings

Prompt tuning and structural constraints reduce LLM hallucination and increase repeatability.
Caching based on input content (image hash) preserves server resources and enhances UX.
Separating content (region definitions) from formatting improves long-term maintainability.
Explicit examples and negative examples in the prompt reinforce compliance.

Problems Faced & Solutions

Problem	Solution
LLM gave slightly different scores for the same image	Introduced image hashing + local caching; reinforced output format
Float values exceeding the 0.00–1.00 range	Added guardrails in prompt text; tested invalid ranges to fine-tune
Difficulty debugging caching mismatch	Switched from filename check to SHA256-based base64 hash comparison

📎 References

Goals for Next Week

Explore CoT (Chain-of-Thought) and few-shot prompting for region-specific comparison
Begin work on per-region visual threshold calibration with model feedback
Build a log system to trace and version response changes for same inputs

Screenshots (Optional)

Hashing example, cache hit logs, or response comparisons from different prompt versions.

“Week 10 taught me that smart caching and deterministic prompting are how you tame an LLM into consistency.”