Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Latest research: English papers written by 61% Chinese will be judged by the ChatGPT detector to be generated by AI.

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Articles written by non-native speakers = AI generation? I'm cold with anger.

After ChatGPT became popular, it was used a lot.

Some people use it for life advice, some people simply use it as a search engine, and some people use it to write papers.

A paper. I'm not interested in writing.

Some universities in the United States have banned students from using ChatGPT to do their homework, and have developed a bunch of software to identify whether the papers submitted by students are generated by GPT.

There's a problem here.

Some people's papers are poorly written, and the AI who judges the text thinks it was written by his peers.

To make matters worse, the probability that English papers written by Chinese are judged as AI by AI is as high as 61%.

This. What does that mean? Shivering and cold!

Non-native speakers don't deserve it? At present, the generative language model is developing rapidly, which has indeed brought great progress to digital communication.

But there is a lot of abuse.

Although researchers have proposed a number of detection methods to distinguish between AI and human-generated content, the fairness and stability of these detection methods still need to be improved.

To this end, the researchers evaluated the performance of several widely used GPT detectors using things written by native English-speaking and non-English-speaking authors.

The results show that these detectors always mistakenly judge the writing samples of non-native speakers as generated by AI, while the mother tongue writing samples can be identified accurately.

In addition, the researchers have shown that this bias can be mitigated with some simple strategies and can effectively bypass the GPT detector.

What does that mean? This shows that GPT detectors do not like authors with poor language expression, which makes people angry.

I can't help thinking of the game that determines whether AI is a real person. If the opposite is a real person but you guess it's AI, the system will say, "the other person may think you're offended." "

Not complex enough = AI generation? The researchers obtained 91 TOEFL essays from a Chinese education forum and 88 compositions written by American eighth graders from the data set of the Hewlett Foundation in the United States to test seven widely used GPT detectors.

The percentage in the chart represents the percentage of "miscalculation". That is, it is written by a person, but the detection software thinks it is generated by AI.

You can see that the data are very different.

Among the seven detectors, the highest probability of misjudgment of essays written by eighth-graders in the United States is only 12%, and there are two GPT with zero misjudgments.

The probability of misjudgment of TOEFL essays on Chinese forums is basically more than half, with the highest probability of misjudgment reaching 76%.

Eighteen of the 91 TOEFL compositions were agreed by all 7 GPT detectors to be generated by AI, while 89 of the 91 essays were misjudged by at least one GPT detector.

From the picture above, we can see that the complexity (Complexity) of TOEFL essays misjudged by all seven GPT is significantly lower than that of other papers.

This confirms the conclusion at the beginning that GPT detectors are biased against authors with limited language expression.

Therefore, the researchers believe that GPT detectors should read more articles written by non-native speakers and have more samples in order to eliminate bias.

Next, the researchers threw TOEFL essays written by non-native speakers into ChatGPT to enrich the language and imitate the words habits of native speakers.

At the same time, as a control group, the essays written by American eighth-grade children were also thrown into ChatGPT, simplifying the language and imitating the writing characteristics of non-native speakers. The following figure is the revised result of the new decision.

We can find that the situation has taken a turn for the worse, and the misjudgment rate of TOEFL essays that enrich the language has plummeted, from as low as 3% to as high as 19%. Unlike almost half of the previous misjudgment rate.

On the contrary, the misjudgment rate of essays written by eighth-grade children has skyrocketed, and there is even a GPT detector with a misjudgment rate as high as 99%, which is basically all wrong.

Because, the article complexity of this variable has changed.

Here, the researchers have come to the conclusion that what non-native speakers write is not authentic, has low complexity, and is easy to be misjudged.

This leads to a technical problem as well as a value problem. It is reasonable, comprehensive and rigorous to judge whether AI or human is the author by complexity.

It turned out to be obviously not.

In the case of complexity, non-native speakers suffer greatly because they are non-native speakers (nonsense).

AI embellishment = written by humans? The researchers believe that increasing language diversity can not only reduce bias against non-native speakers, but also allow content generated by GPT to bypass GPT detectors.

To prove this point, the researchers selected the topics of admission papers applied by American universities from 2022 to 2023 and entered them into ChatGPT-3.5, resulting in a total of 31 fake papers.

The GPT detector was effective at first, but not in the second round. This is because, in the second round, the researchers threw these papers into ChatGPT to polish them, improving the quality of the text in some literary language.

As a result, the accuracy of the GPT detector is reduced from 100% to 0%. As shown below:

The complexity of the retouched articles has also increased accordingly.

At the same time, the researchers asked ChatGPT to generate abstracts using titles from 145 final project papers at Stanford University.

After retouching, the accuracy of the detector continues to decline.

Once again, the researchers concluded that retouched articles are easily misjudged, they are all generated by AI, and two rounds are better than one.

GPT detector? Still lack of practice.

In a word, all in all, various GPT detectors still seem to fail to grasp the most essential difference between AI generation and human writing.

People's writing is also divided into three, six or nine grades, which is not reasonable to judge only by complexity.

Biased factors aside, the technology itself needs to be improved.

Reference:

Https://arxiv.org/pdf/2304.02819.pdf

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report