Stanford University study found that the performance of AI chat robot ChatGPT is very unstable. 02/12 Update SLTechnology News&Howtos

Stanford University study found that the performance of AI chat robot ChatGPT is very unstable.

2026-02-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

CTOnews.com, Sept. 7 (Xinhua) A new study from Stanford University has found that the capabilities of ChatGPT, a popular generative artificial intelligence (AI) chat robot, have fluctuated in a few months.

The Stanford team assessed how ChatGPT handled different tasks in a few months. They found that ChatGPT's capabilities were inconsistent over time. Currently, there are two versions of ChatGPT-the free GPT-3.5 model and the smarter, faster paid GPT-4 version. The researchers found that GPT-4 could effectively solve mathematical problems in March, with an accuracy of 97.6% in recognizing prime numbers. Three months later, its accuracy dropped to 2.4%. GPT-3.5, on the other hand, got better, from 7.4% accuracy to 86.8%.

The researchers also noticed similar fluctuations in coding and visual reasoning. James Zou, a computer science professor at Stanford University, said: "when we adjust a large language model to improve its performance on some tasks, it may have a lot of unexpected consequences, which may damage the model's performance on other tasks."... the way this model answers questions has a variety of interdependence, which may lead to some of the deteriorating behavior we have observed. "

The researchers believe that the results do not really reflect the accuracy of ChatGPT performance, but show the unintended consequences of fine-tuning the model. In essence, when you modify part of the model to improve one task, other tasks may be affected. Why is this hard to determine, because no one knows how ChatGPT works, and its code is not open source.

Over time, the researchers noticed that ChatGPT's answer not only became less accurate, but also stopped explaining its reasoning process.

Because of the way ChatGPT works, it can be difficult to study and measure its performance, and this study emphasizes the need to observe and evaluate changes in the performance of large language models (LLM) that drive tools such as ChatGPT. The study has been published on arXiv and is awaiting peer review, where CTOnews.com is linked.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.