Jordan Boyd-Graber: Home

I am a full professor in the University of Maryland Computer Science Department (tenure home), Institute of Advanced Computer Studies, INFO, and Language Science Center.

My research focuses on making machine learning more useful, more interpretable, and able to learn and interact from humans. This helps users sift through decades of documents; discover when individuals lie, reframe, or change the topic in a conversation; or to compete against humans in games that are based in natural language.

My Google Scholar page

Book a meeting with me (collaborators and UMD students).

Recent Publications

Yoo Yeon Sung, Maharshi Gor, Eve Fleisig, Ishani Mondal, and Jordan Lee Boyd-Graber. ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks. North American Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Sung:Gor:Fleisig:Mondal:Boyd-Graber-2025,
	Title = {ADVSCORE: A Metric for the Evaluation and Creation of Adversarial Benchmarks},
	Author = {Yoo Yeon Sung and Maharshi Gor and Eve Fleisig and Ishani Mondal and Jordan Lee Boyd-Graber},
	Journal = {North American Association for Computational Linguistics},
	Year = {2025},
	Url = {http://cs.umd.edu/~jbg//docs/2025_naacl_advscore.pdf},
}
```
Accessible Abstract: Adversarial datasets should validate AI robustness by presenting samples that humans handle well but models struggle with. However, as models advance, these datasets risk becoming obsolete. Assessing whether a dataset remains adversarial is challenging due to the absence of a standardized metric for adversarialness. To address this, we introduce AdvScore, a human-grounded evaluation metric that quantifies a dataset's adversarial nature by accounting for the differing abilities of models and humans while also identifying low-quality examples.
Nishant Balepur, Feng Gu, Abhilasha Ravichander, Shi Feng, Jordan Boyd-Graber, and Rachel Rudinger. Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Balepur:Gu:Ravichander:Feng:Boyd-Graber:Rudinger-2025,
	Title = {Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?},
	Author = {Nishant Balepur and Feng Gu and Abhilasha Ravichander and Shi Feng and Jordan Boyd-Graber and Rachel Rudinger},
	Booktitle = {Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
	Year = {2025},
	Location = {Albuquerque},
	Url = {http://cs.umd.edu/~jbg//docs/2025_naacl_reverseqa.pdf},
}
```
Accessible Abstract: Language models like ChatGPT are pretty good at answering questions (e.g. "What is 12 * 12?"), but we show they can surprisingly struggle when asked to do the reverse task: generating questions for answers (e.g. "Give me a question with the answer 144"). We study when these errors happen, what might be causing them, and how they can be addressed.
Feng Gu, Wichayaporn Wongkamjan, Jonathan K. Kummerfeld, Denis Peskoff, Jonathan May, and Jordan Boyd-Graber. Personalized Help for Optimizing Low-Skilled Users' Strategy. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Gu:Wongkamjan:Kummerfeld:Peskoff:May:Boyd-Graber-2025,
	Title = {Personalized Help for Optimizing Low-Skilled Users' Strategy},
	Author = {Feng Gu and Wichayaporn Wongkamjan and Jonathan K. Kummerfeld and Denis Peskoff and Jonathan May and Jordan Boyd-Graber},
	Booktitle = {Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
	Year = {2025},
	Location = {Albuquerque},
	Url = {http://cs.umd.edu/~jbg//docs/2024_arr_chiron-advisor.pdf},
}
```
Accessible Abstract: AIs can beat humans in game environments; however, how helpful those agents are to human remains understudied. We augment CICERO, a natural language agent that demonstrates superhuman performance in Diplomacy, to generate both move and message advice based on player intentions. A dozen Diplomacy games with novice and experienced players, with varying advice settings, show that some of the generated advice is beneficial. It helps novices compete with experienced players and in some instances even surpass them. The mere presence of advice can be advantageous, even if players do not follow it.
Nishant Balepur, Alexa Siu, Nedim Lipka, Franck Dernoncourt, Tong Sun, Jordan Lee Boyd-Graber, and Puneet Mathur. MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections. Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2025. [Bibtex]
```
@inproceedings{Balepur:Siu:Lipka:Dernoncourt:Sun:Boyd-Graber:Mathur-2025,
	Title = {MoDS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections},
	Author = {Nishant Balepur and Alexa Siu and Nedim Lipka and Franck Dernoncourt and Tong Sun and Jordan Lee Boyd-Graber and Puneet Mathur},
	Booktitle = {Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
	Year = {2025},
	Location = {Albuquerque},
	Url = {http://cs.umd.edu/~jbg//docs/2025_naacl_mods.pdf},
}
```
Accessible Abstract: When you ask ChatGPT for advice on questions with multiple perspectives (e.g. "Is pineapple good on pizza?"), you likely want a response that fairly represents all viewpoints. We formulate this task, collect a dataset to test it, and develop MoDSâ€”a system where multiple ChatGPT's debate like a panel discussionâ€”to generate balanced answers for questions based on multiple sources.

Yoo Yeon Sung, Eve Fleisig, Yu Hope, Ishan Upadhyay, and Jordan Lee Boyd-Graber. GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration. ArXiv, Preprint. [Bibtex]

@article{Sung:Fleisig:Hope:Upadhyay:Boyd-Graber-Preprint,
	Title = {GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration},
	Author = {Yoo Yeon Sung and Eve Fleisig and Yu Hope and Ishan Upadhyay and Jordan Lee Boyd-Graber},
	Journal = {ArXiv},
	Year = {Preprint},
	Url = {https://arxiv.org/pdf/2502.19684},
}

Benjamin Börschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, and Lierni Sestorain Saralegu. Meta Answering for Machine Reading. ArXiv, 2020. [Preprint] [Bibtex]

@article{B\"orschinger:Boyd-Graber:Buck:Bulian:Ciaramita:Huebscher:Gajewski:Kilcher:Nogueira:Saralegu-2020,
	Title = {Meta Answering for Machine Reading},
	Author = {Benjamin B\"orschinger and Jordan Boyd-Graber and Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Michelle Chen Huebscher and Wojciech Gajewski and Yannic Kilcher and Rodrigo Nogueira and Lierni Sestorain Saralegu},
	Journal = {ArXiv},
	Year = {2020},
	Url = {https://arxiv.org/abs/1911.04156},
}

Pedro Rodriguez, Shi Feng, Mohit Iyyer, He He, and Jordan Boyd-Graber. Quizbowl: The Case for Incremental Question Answering. ArXiv, 2020. [Webpage] [Bibtex]

@article{Rodriguez:Feng:Iyyer:He:Boyd-Graber-2020,
	Title = {Quizbowl: The Case for Incremental Question Answering},
	Author = {Pedro Rodriguez and Shi Feng and Mohit Iyyer and He He and Jordan Boyd-Graber},
	Journal = {ArXiv},
	Year = {2020},
	Url = {https://arxiv.org/abs/1904.04792},
}

News

Recent Publications