Prompt Determinism Test
Objective
To evaluate whether identical prompts yield stable outputs across multiple runs when applied to the same image.
Methodology
- Prompt: Facial region scoring prompt (fixed ID)
- Image: Same image submitted 5 times
- Model: OpenAI Vision API via SDK
Results
Submission | Output Scores |
---|---|
Run 1 | R1: left=0.55, right=0.61 … |
Run 2 | R1: left=0.54, right=0.60 … |
Run 3 | R1: left=0.56, right=0.62 … |
Run 4 | R1: left=0.55, right=0.61 … |
Run 5 | R1: left=0.55, right=0.60 … |
Observations
- Minor float variance (±0.01–0.02)
- No structure deviation
Conclusion
Prompt behavior is largely deterministic with minimal float noise. To reduce further:
- Use caching
- Enforce response anchors via examples