Can Prompting Enforce Visual Consistency?
The Hypothesis
Vision-language models (VLMs) can sometimes produce inconsistent scores for the same image. Can this be mitigated via strategic prompting?
Experiment
- Used OpenAI VLM on the same facial image with the same prompt multiple times
- Added progressively stricter format instructions in the prompt
Observations
- Without format instructions: ~8% variance
- With format instructions: variance reduced to ~3%
- With reference outputs included: variance reduced to <2%
Conclusion
Explicit structure, range constraints, and exemplar outputs greatly improve consistency.
Prompting is not just about what you say — it’s how you say it, when you format it, and whether the model can trust your anchors.