Hallucinations Aren’t Magic: Why LLMs Make Confident Mistakes and How Benchmarks Encourage Them
‘OpenAI research shows hallucinations are statistical consequences of pretraining and that binary benchmarking incentivizes overconfident guessing; changing evaluation to reward calibrated uncertainty can reduce hallucinations.’