Why OpenAI Image Inputs Behave Nondeterministically

Root Causes of Variability

  1. Latent Sampling: Behind the scenes, OpenAI VLMs may use latent feature sampling or augmentation noise.
  2. Prompt Reinterpretation: Small temperature effects or internal randomness in parsing can cause slight drifts.
  3. System Scaling: When processed across different inference nodes, slight implementation-level shifts can occur.

Real-World Impact

  • You may notice R1=0.55 vs R1=0.58 for the same image & prompt.
  • Doesn’t mean model is broken — just soft stochasticity.

Mitigation Strategies

  • Use prompt anchoring (examples + format + constraints)
  • Cache input hashes to avoid resubmission
  • Record all scores with versioning info

True determinism is hard at scale — but structured prompting can bring us close enough.