Anthropic is introducing a new funding initiative to tackle the issues confronting current AI benchmarking practices, where the existing benchmark limits the capability to assess the performance and influence of AI models.
Existing benchmarks often fall short of accurately representing how the average person uses AI systems. They fail to capture the nuances and complexities of real-world usage, leading to limited ability to offer significant insights into AI model performance.
Additionally, many of these benchmarks were developed before the advent of modern generative AI, raising questions about their relevance and applicability.
Anthropic's Funding Initiative
The program aims to identify and fund third-party organizations capable of creating benchmarks that can effectively measure advanced capabilities in AI models.
“Our investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem,” Anthropic published on its official blog.
The necessity for new benchmarks capable of effectively evaluating AI models more accurately is urgent, “Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.” added in the blog.
Focus Areas for New Benchmarks
Anthropic's new benchmarks will focus on evaluating AI models' advanced capabilities, particularly in relation to AI security and societal implications.
These benchmarks will assess a model's ability to carry out tasks that have significant implications, such as cyberattacks, weapon enhancement, and manipulation or deception of individuals through deepfakes or misinformation.
Furthermore, Anthropic aims to develop an "early warning system" to identify and assess AI risks related to national security and defense. While details about this system are not disclosed in the blog post, Anthropic emphasizes its commitment to addressing these risks.
The funding program will also support research into benchmarks for "end-to-end" tasks, exploring AI's potential in various domains.
These tasks include facilitating scientific research, speaking in numerous languages, reducing prejudices, and filtering out toxicity.
Anthropic intends to develop new platforms that empower subject-matter experts to generate their own assessments and carry out extensive trials involving thousands of users.
The company has employed a dedicated coordinator for this initiative and is exploring opportunities to acquire or expand projects with scalability potential.
CEO Dario Amodei has emphasized the wider impact of AI and the necessity for thorough solutions to tackle possible inequality issues.
In an interview with Time Magazine, Amodei highlighted the importance of finding solutions beyond Universal Basic Income to ensure that advancements in AI technology benefit the wider public.