Why OpenAI Image Inputs Behave Nondeterministically
Root Causes of Variability
- Latent Sampling: Behind the scenes, OpenAI VLMs may use latent feature sampling or augmentation noise.
- Prompt Reinterpretation: Small temperature effects or internal randomness in parsing can cause slight drifts.
- System Scaling: When processed across different inference nodes, slight implementation-level shifts can occur.
Real-World Impact
- You may notice R1=0.55 vs R1=0.58 for the same image & prompt.
- Doesn’t mean model is broken — just soft stochasticity.
Mitigation Strategies
- Use prompt anchoring (examples + format + constraints)
- Cache input hashes to avoid resubmission
- Record all scores with versioning info
True determinism is hard at scale — but structured prompting can bring us close enough.