AI can't beat the AI,ChatGPT detector. Innocent students are frequently wronged. 2.1 million teachers are using it. 02/14 Update SLTechnology News&Howtos

AI can't beat the AI,ChatGPT detector. Innocent students are frequently wronged. 2.1 million teachers are using it.

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Are you all right when you are wronged by AI?

How do you feel when you are accused of "cheating" by AI?

This happened to Lucy Goetz, a senior high school student. She originally wrote an original thesis on socialism that got the highest score.

However, Turnitin's AI writing detector says that the end of the Goetz paper is generated in ChatGPT.

"I'm glad to have a good relationship with my teachers," Goetz said in shock.

In short, it's a good thing the teacher knows me, otherwise I wouldn't have been able to jump into the Yellow River.

What is even more surprising is that the ChatGPT detector is now used by 2.1 million teachers.

The marked part of AI's paper that could not beat AIGoetz is abnormal, but this shows that detectors sometimes make mistakes.

Obviously, AI can't beat AI. This can have disastrous consequences for many students.

To test the Turnitin detector, Washington Times reporter Geoffrey A. Fowler tested five high school students, including Goetz.

They have created 16 samples of papers covering real, AI generation, and mixed sources.

What was the result?

Turnitin's detector made errors in at least half of the samples, and only accurately identified 6 of them, but failed in all 3. These include mistagging of 8 per cent of the content in Goetz's original papers.

For the remaining seven articles, Fowler said, "I will only give it a partial score because it is roughly correct in judgment, but misidentifies some writing parts from ChatGPT-generated or mixed sources." "

However, Turnitin claims that the overall accuracy of its detector is 98%. The company also said that in its own tests, situations similar to Goetz papers, or false positives, were less than 1 per cent likely to occur.

Turnitin's AI detector detail page assigns a total score and highlights sentences suspected of being generated by AI. The company said it deliberately marked paragraphs suspected of being generated by AI in blue rather than red, and linked teacher resources below the score.

Rebecca Dell, an AP English teacher in Goetz, Concord, Calif., says it is worrying that the Turnitin system for tagging AI text is not always effective.

Unlike accusations of plagiarism, AI cheats do not have source documents as evidence, which is most likely to make teachers biased against students.

Maybe not everyone is as lucky as Goetz.

"it is particularly frightening for students to be accused of AI cheating," Goetz said. Unless your teacher knows your writing style or trusts you very much, there is no way to prove that you have not cheated.

AI testing why it is so difficult has been used by students and teachers in many colleges and universities in daily homework and teaching since the advent of ChatGPT.

However, if not restricted, ChatGPT will become the most powerful cheating tool in history, helping students to do homework or even finish exam papers.

In order to counter reconnaissance, an easy-to-use detector has become something the teacher expects. Edward Tian, a 22-year-old Princeton student, developed a self-developed detector, the GPTZero.

Even OpenAI officials announced the launch of a new tool called AI Text Classifier document detector.

However, the performance of these detectors is not satisfactory.

Detecting the content created by AI sounds simple. But when we give you a handwritten email and an ChatGPT-generated email, we can hardly tell it apart.

Eric Wang, vice president of artificial intelligence at Turnitin, says testing AI writing with software involves statistics. From a statistical point of view, artificial intelligence differs from human beings in that it is extremely stable at the average level.

To put it bluntly, the level of AI is stable. In fact, however, this is not the case.

A system like ChatGPT is like an advanced version of auto-completion, looking for the next most likely word to write. This is actually why it reads so naturally. AI writing is the most likely subset of human writing. "

Turnitin's detector will "identify averages where the writing is too consistent". The challenge is that sometimes human writing may indeed seem to be average.

In economics, math and laboratory reports, students tend to follow a fixed writing style, which means they are more likely to be mistaken for AI writing.

This may be why Turnitin mistakenly marked Goetz's paper because it involved economics.

Wang says Turnitin tries to adjust its system so that it needs more confidence before marking a sentence for AI generation in order to make a mistake in this regard.

And said that their own software has made great progress. When I first tested Goetz's paper in late January, the software recognized that about 50 per cent of it was generated by AI. Turnitin ran my sample again through its system in late March, marking only 8% of Goetz papers as generated by AI. "

Turnitin detectors also face other important technical limitations.

Of the six samples that it completely tested correctly, they were clearly 100% of the student work, or generated by ChatGPT.

But when tested with papers with mixed AI and human sources, it often misidentifies a single sentence or leaves out the human part altogether. And it cannot find traces of ChatGPT in papers processed by Quillbot, a rewriting program that can reassemble sentences.

In addition, the Turnitin detector may have lagged behind the current level of artificial intelligence technology.

Because ChatGPT, for example, has been blessed by GPT-4 and has more creativity and stylization ability.

'i think the detector is unreliable in the long run, 'said Jim Fan, a scientist at Nvidia. Artificial intelligence will get better and better, and will write in a more and more human-like way. It is safe to say that with the passage of time, the little quirks of these language models will be reduced.

Is it a good idea to test with AI? Why release artificial intelligence detectors when there is a potential for error (even if it's only 1%)?

"Teachers want to be a deterrent," Chechitelli said. However, some educators worry that this will actually increase students' nervousness.

On April 4, Turnitin activated the ChatGPT detector for about 10700 secondary and higher education institutions to grade student assignments "generated by AI" and analyze them sentence by sentence.

Mitchel Sollenberger, associate provost of digital education at the University of Michigan at Dearborn, asked Turnitin not to activate AI testing for its campus at the initial release.

He is concerned that teachers who pass the Turnitin test of about 20, 000 student papers each semester may misreport, leading to unfounded surveys of academic integrity. Teachers should not become experts in third-party software systems.

Reference:

Https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.