Google Scholar

Cybench: A framework for evaluating cybersecurity capabilities and risks of language models

AK Zhang, N Perry, R Dulepet, J Ji, JW Lin… - arXiv preprint arXiv …, 2024 - arxiv.org

Language Model (LM) agents for cybersecurity that are capable of autonomously identifying
vulnerabilities and executing exploits have the potential to cause real-world impact.
Policymakers, model providers, and other researchers in the AI and cybersecurity
communities are interested in quantifying the capabilities of such agents to help mitigate
cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce
Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those …

Save Cite Cited by 13 Related articles All 3 versions View as HTML

[CITATION][C] Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models. 8 2024

AK Zhang, N Perry, R Dulepet, E Jones, JW Lin, J Ji… - URL https://arxiv. org/abs …

Save Cite Cited by 4 Related articles

Showing the best results for this search. See all results

Cite

Advanced search

Saved to My library

Cybench: A framework for evaluating cybersecurity capabilities and risks of language models

[CITATION][C] Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models. 8 2024