Why OpenAI Image Inputs Behave Nondeterministically

Root Causes of Variability

Latent Sampling: Behind the scenes, OpenAI VLMs may use latent feature sampling or augmentation noise.
Prompt Reinterpretation: Small temperature effects or internal randomness in parsing can cause slight drifts.
System Scaling: When processed across different inference nodes, slight implementation-level shifts can occur.

Real-World Impact

You may notice R1=0.55 vs R1=0.58 for the same image & prompt.
Doesn’t mean model is broken — just soft stochasticity.

Mitigation Strategies

Use prompt anchoring (examples + format + constraints)
Cache input hashes to avoid resubmission
Record all scores with versioning info

True determinism is hard at scale — but structured prompting can bring us close enough.