Skip to main content
SearchLoginLogin or Signup

Confidence in the Reasoning of Large Language Models

Forthcoming. Now Available: Just Accepted Version.
Published onDec 16, 2024
Confidence in the Reasoning of Large Language Models
·

Abstract

There is a growing literature on reasoning by large language models (LLMs), but the discussion on the uncertainty in their responses is still lacking. Our aim is to assess the extent of confidence that LLMs have in their answers and how it correlates with accuracy. Confidence is measured (i) qualitatively in terms of persistence in keeping their answer when prompted to reconsider, and (ii) quantitatively in terms of self-reported confidence score. We investigate the performance of three LLMs – GPT4o, GPT4-turbo, and Mistral—on two benchmark sets of questions on causal judgement and formal fallacies, and a set of probability and statistical puzzles and paradoxes. Although the LLMs show significantly better performance than random guessing, there is a wide variability in their tendency to change their initial answers. There is a positive correlation between qualitative confidence and accuracy, but the overall accuracy for the second answer is often worse than for the first answer. There is a strong tendency to overstate the self-reported confidence score. Confidence is only partially explained by the underlying token-level probability. The material effects of prompting on qualitative confidence and the strong tendency for overconfidence indicate that current LLMs do not have any internally coherent sense of confidence.

Keywords: artificial intelligence, BIG-Bench AI tests, chatbots, generative AI, statistical inference, statistical puzzles and paradoxes



12/16/2024: To preview this content, click below for the Just Accepted version of the article. This peer-reviewed version has been accepted for its content and is currently being copyedited to conform with HDSR’s style and formatting requirements.


©2024 Yudi Pawitan and Chris Holmes. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?