
Scale AI to set the Pentagon’s path for testing and evaluating large language models

The company will create a comprehensive T&E framework for generative AI within the Defense Department.
The Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) tapped Scale AI to produce a trustworthy means for testing and evaluating large language models that can support — and potentially disrupt — military planning and decision-making.
According to a statement the San Francisco-based company shared exclusively with DefenseScoop, the outcomes of this new one-year contract will supply the CDAO with “a framework to deploy AI safely by measuring model performance, offering real-time feedback for warfighters, and creating specialized public sector evaluation sets to test AI models for military support applications, such as organizing the findings from after action reports.”
