Automated Parliaments: A Solution to Decision Uncertainty and Misalignment in Language Models

Forster, Thomas; Ouwerx, Jonathan; Ragoler, Shak

Abstract:As AI takes on a greater role in the modern world, it is essential to ensure that AI models can overcome decision uncertainty and remain aligned with human morality and interests. This research paper proposes a method for improving the decision-making of language models (LMs) via Automated Parliaments (APs) - constructs made of AI delegates each representing a certain perspective. Delegates themselves consist of three AI models: generators, modifiers, and evaluators. We specify two mechanisms for producing optimal solutions: the Simultaneous Modification mechanism for response creation and an evaluation mechanism for fairly assessing solutions. The overall process begins when each generator creates a response aligned with its delegate's theory. The modifiers alter all other responses to make them more self-aligned. The evaluators collectively assess the best end response. Finally, the modifiers and generators learn from feedback from the evaluators. In our research, we tested the evaluation mechanism, comparing the use of single-value zero-shot prompting and AP few-shot prompting in evaluating morally contentious scenarios. We found that the AP architecture saw a 57.3% reduction in its loss value compared to the baseline. We conclude by discussing some potential applications of APs and specifically their potential impact when implemented as Automated Moral Parliaments.

Comments:	39 pages, 4 figures
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.10098 [cs.AI]
	(or arXiv:2311.10098v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2311.10098

Computer Science > Artificial Intelligence

Title:Automated Parliaments: A Solution to Decision Uncertainty and Misalignment in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators