#trajectory-aware03/07/2025
ReasonFlux-PRM: Revolutionizing Chain-of-Thought Evaluation in Large Language Models
'ReasonFlux-PRM is a new trajectory-aware reward model that evaluates both reasoning steps and final answers in large language models, significantly improving their reasoning capabilities and training outcomes.'