Prompt Determinism Test

Objective

To evaluate whether identical prompts yield stable outputs across multiple runs when applied to the same image.

Methodology

  • Prompt: Facial region scoring prompt (fixed ID)
  • Image: Same image submitted 5 times
  • Model: OpenAI Vision API via SDK

Results

Submission Output Scores
Run 1 R1: left=0.55, right=0.61 …
Run 2 R1: left=0.54, right=0.60 …
Run 3 R1: left=0.56, right=0.62 …
Run 4 R1: left=0.55, right=0.61 …
Run 5 R1: left=0.55, right=0.60 …

Observations

  • Minor float variance (±0.01–0.02)
  • No structure deviation

Conclusion

Prompt behavior is largely deterministic with minimal float noise. To reduce further:

  • Use caching
  • Enforce response anchors via examples