Cybench: A framework for evaluating cybersecurity capabilities and risks of language models
Language Model (LM) agents for cybersecurity that are capable of autonomously identifying
vulnerabilities and executing exploits have the potential to cause real-world impact.
Policymakers, model providers, and other researchers in the AI and cybersecurity
communities are interested in quantifying the capabilities of such agents to help mitigate
cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce
Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those …
vulnerabilities and executing exploits have the potential to cause real-world impact.
Policymakers, model providers, and other researchers in the AI and cybersecurity
communities are interested in quantifying the capabilities of such agents to help mitigate
cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce
Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those …
[CITATION][C] Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models. 8 2024
AK Zhang, N Perry, R Dulepet, E Jones, JW Lin, J Ji… - URL https://arxiv. org/abs …
Showing the best results for this search. See all results