Artificial intelligence writing testing tools are unreliable, and the Constitution of the United States is considered to be written by robots. 04/16 Update SLTechnology News&Howtos

Artificial intelligence writing testing tools are unreliable, and the Constitution of the United States is considered to be written by robots.

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

CTOnews.com, July 16 (Xinhua) recently, some netizens have found that if the most important legal document of the United States, the Constitution of the United States, is entered into some tools specially used to test artificial intelligence writing, it will get a surprising result: the Constitution of the United States is almost certainly written by artificial intelligence. This is clearly impossible unless James Madison (the fourth president of the United States and the father of the Constitution of the United States) is a traveler. So why do these AI detection tools make such errors? Foreign media Arstechnica interviewed several experts, as well as the developers of AI testing tool GPTZero, to uncover the reasons.

In the field of education, artificial intelligence writing has caused a lot of controversy. For a long time, teachers rely on traditional teaching methods and use papers as a tool to measure students' mastery of a topic. Many teachers try to rely on AI tools to test the writing generated by AI, but the evidence so far shows that they are not reliable. Due to the existence of false positives, the text classifiers of AI detection tools such as GPTZero, ZeroGPT and OpenAI are unreliable and can not be used to determine whether the article is generated by a large language model (LLM).

When a part of the Constitution of the United States is entered into GPTZero, GPTZero will say that the text is "probably written entirely by AI." In the past six months, screenshots of other AI testing tools showing similar results have gone viral on social media. In fact, the same thing will happen if you enter something from the Bible. To explain why these tools make such obvious mistakes, we first need to understand how they work.

According to CTOnews.com, different AI writing detectors use slightly different detection methods, but the basic principles are similar: a model of artificial intelligence is trained on a large number of texts (including millions of writing examples) and a set of hypothetical rules to determine whether writing is more likely to be generated by humans or artificial intelligence.

At the heart of GPTZero, for example, is a neural network that is trained on "a large, diverse corpus that includes texts generated by human writing and artificial intelligence, with an emphasis on English prose". Next, the system uses attributes such as "confusion" and "sudden" to evaluate the text and classify it.

In machine learning, the degree of confusion is an index to measure the degree of deviation between a text and the content learned by an artificial intelligence model in the training process. The idea of measuring confusion is that when artificial intelligence models are written, they naturally choose the content they are most familiar with, which comes from their training data. The closer the output is to the training data, the lower the degree of confusion. Human beings are more chaotic writers, and they can also write with a low degree of confusion, especially when imitating the formal style used in legal or certain types of academic writing. Moreover, many of the phrases we use are surprisingly common.

For example, we want to guess the next word in this phrase: "I'd like a cup of _." Most people fill in the blanks with "water", "coffee" or "tea". A language model trained on a large number of English texts will do the same because these phrases often appear in English writing, and any of these results will have a low degree of confusion.

Another attribute of the text measured by GPTZero is "sudden", which refers to the rapid continuous occurrence of certain words or phrases or the phenomenon of "sudden" in the text. In essence, it suddenly evaluates the variability of sentence length and structure throughout the text. Human writers often show a dynamic writing style, which leads to the variable sentence length and structure of the text, while the text generated by artificial intelligence is often more consistent and unified. However, paroxysm is not a foolproof indicator of content generated by artificial intelligence. Like "bewilderment", there are exceptions. Human writers may write in a highly structured and consistent style, resulting in low sudden scores. On the contrary, artificial intelligence models can be trained to simulate more human variability in sentence length and structure, thus improving its paroxysm score. In fact, with the improvement of artificial intelligence language models, studies have shown that their writing looks more and more like human writing.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.