Week 10 – Output Consistency, Prompt Tuning & Caching

Dates: August 3 – August 9
Internship: AI/ML Intern at SynerSense Pvt. Ltd.
Mentor: Praveen Kulkarni Sir


Focus

This week emphasized improving consistency in model outputs by enhancing prompt design and introducing output caching to handle repeat image submissions.


Goals for the Week

  • Minimize output variation for identical inputs using prompt reinforcement and caching
  • Refine prompt versioning and float formatting reliability
  • Set up a caching mechanism using hashed image data
  • Review repeatability across prompt versions and VLM models

Tasks Completed

Task Status Notes
Refined prompt instructions for deterministic float outputs ✅ Completed Added range reminders and removed randomness triggers
Implemented base64 encoding & hashing for image fingerprinting ✅ Completed Used hash-based caching to store previous image analysis results
Integrated local cache layer in analyze_image() function ✅ Completed Prevents unnecessary API calls for repeat submissions
Benchmarked prompt output variation across multiple runs ✅ Completed Evaluated stability improvements pre/post caching
Validated prompt structure inside OpenAI’s prompt management UI ✅ Completed Ensured strict parsing and structured response remained intact
Documented float quantification scale for interpretability ✅ Completed Drafted internal scale chart for scoring visual fat prominence

Key Learnings

  • Prompt tuning and structural constraints reduce LLM hallucination and increase repeatability.
  • Caching based on input content (image hash) preserves server resources and enhances UX.
  • Separating content (region definitions) from formatting improves long-term maintainability.
  • Explicit examples and negative examples in the prompt reinforce compliance.

Problems Faced & Solutions

Problem Solution
LLM gave slightly different scores for the same image Introduced image hashing + local caching; reinforced output format
Float values exceeding the 0.00–1.00 range Added guardrails in prompt text; tested invalid ranges to fine-tune
Difficulty debugging caching mismatch Switched from filename check to SHA256-based base64 hash comparison

📎 References


Goals for Next Week

  • Explore CoT (Chain-of-Thought) and few-shot prompting for region-specific comparison
  • Begin work on per-region visual threshold calibration with model feedback
  • Build a log system to trace and version response changes for same inputs

Screenshots (Optional)

Hashing example, cache hit logs, or response comparisons from different prompt versions.


“Week 10 taught me that smart caching and deterministic prompting are how you tame an LLM into consistency.”