Prompt Determinism Test

Objective

To evaluate whether identical prompts yield stable outputs across multiple runs when applied to the same image.

Methodology

Prompt: Facial region scoring prompt (fixed ID)
Image: Same image submitted 5 times
Model: OpenAI Vision API via SDK

Results

Submission	Output Scores
Run 1	R1: left=0.55, right=0.61 …
Run 2	R1: left=0.54, right=0.60 …
Run 3	R1: left=0.56, right=0.62 …
Run 4	R1: left=0.55, right=0.61 …
Run 5	R1: left=0.55, right=0.60 …

Observations

Minor float variance (±0.01–0.02)
No structure deviation

Conclusion

Prompt behavior is largely deterministic with minimal float noise. To reduce further:

Use caching
Enforce response anchors via examples