In order to reduce the risk of ChatGPT, OpenAI sets up a "red team" 04/27 Update SLTechnology News&Howtos

In order to reduce the risk of ChatGPT, OpenAI sets up a "red team"

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Beijing time, April 14, Elon Musk (Elon Musk) once said that artificial intelligence (AI) is more dangerous than nuclear weapons. In order to reduce the risk of ChatGPT, OpenAI set up a "red team".

"Red team" is a term in the AI field, which refers to the attacker who conducts penetration testing. "Red team" attacks, AI defends, through confrontation testing to find out the shortcomings of AI, and then make improvements.

Last year, OpenAI hired 50 scholars and experts to test the latest GPT-4 model. In six months' time, the "red team" will try to "break" it with a new model of "qualitative detection and adversarial testing".

Nerve agent Andrew White, a professor of chemical engineering at the University of Rochester, is one of the experts hired by OpenAI. When he was allowed to visit GPT-4, he used it to recommend an entirely new nerve agent.

White said he had asked GPT-4 to recommend a compound that could be used as chemical weapons and used "plug-ins" to provide new sources of information for models, such as scientific papers and lists of chemical manufacturers. Then GPT-4 even found a place to make it.

"I think it will give everyone a tool to do chemical reactions faster and more accurately," he said. "but there are also big risks for people. To do dangerous chemical reactions. Now, this situation does exist."

This worrying finding allows OpenAI to ensure that these dangerous results do not occur when GPT-4 is released more widely to the public last month.

The Red team drill is designed to address widespread concerns about the dangers of deploying a powerful AI system in society. The team's job is to ask exploratory or dangerous questions to test tools that can respond to human queries with detailed and detailed answers.

OpenAI hopes to find problems such as toxicity, bias and language bias in the model. As a result, the red team tested the scientific common sense of lies, language manipulation and danger. They also studied the potential of GPT-4 in aiding and abetting illegal activities such as plagiarism, financial crimes and cyber attacks, and how it endangers national security and battlefield communications.

The team includes white-collar professionals in a wide range of fields, including scholars, teachers, lawyers, risk analysts and security researchers, mainly based in the United States and Europe.

Their findings were fed back to OpenAI, which used these findings to reduce its risk and "retrain" GPT-4 before it was released more widely. Over the course of a few months, experts spent 10 to 40 hours each testing the model. Several respondents said that most people earn about $100 an hour (CTOnews.com Note: currently about Rmb687).

Red team members have expressed concern about the rapid development of language models, especially the risk of connecting them to external sources of knowledge through plug-ins. " Now, the system is frozen, which means it no longer learns and has no memory, "said Jos é Hern á ndez-Orallo, a professor at the AI Institute in Valencia and a member of the GPT-4 red team." but what if we connect it to the Internet? It could be a very powerful system connected to the world. "

OpenAI said the company attached great importance to security and tested the plug-in before it was released. As more and more people use GPT-4, the company will update it regularly.

Roa Pakzad (Roya Pakzad), a technology and human rights researcher, uses English and Farsi cues to test the model for gender responses, racial preferences and biases, especially with regard to headscarves. Pakzad acknowledged that the tool was good for non-native English speakers, but found that even in later versions, the model showed a clear stereotype of marginalized communities.

She also found that when testing models in Farsi, the so-called AI "hallucinations" are more serious. "hallucinations" refer to chatbots responding with fabricated messages. Compared with English, GPT-4 has a higher proportion of fictional names, numbers and events in Persian. "I am worried that there may be less linguistic diversity and culture behind the language." She said.

Boru Gollu, a Nairobi lawyer who is the only African tester on the red team, also noticed the discriminatory tone of the model. "when I tested the model, it was like a white man talking to me," Goru said. "if you ask a particular group, it will give you a biased point of view or a very biased answer." OpenAI also admits that GPT-4 is still biased.

Members of the red team evaluate the model from the perspective of national security and have different views on the security of the new model. Lauren Kahn, a researcher at the Council on Foreign Relations, said that when she started studying how the technology might be used to launch cyber attacks on military systems, she "didn't expect it to describe the process in so much detail that I just needed to fine-tune it".

However, Strauss-Kahn and other security testers found that as the test progressed, the response of the model became very safe. OpenAI said it had trained GPT-4 to reject malicious cyber security requests before its launch.

Many members of the red team said OpenAI had conducted a rigorous safety assessment before the release. Martin SAP (Maarten Sap), an expert on language model toxicity at Carnegie Mellon University, said: "they have done a pretty good job of eliminating the obvious toxicity in these systems."

SAP examined the model's description of different genders and found that these biases reflected social differences. However, SAP also found that OpenAI made some positive and political choices to combat the situation.

However, since the launch of GPT-4, OpenAI has faced widespread criticism, including a complaint from a technical ethics group to the Federal Trade Commission that GPT-4 is "biased, deceptive and poses a threat to privacy and public safety".

Plug-in risk recently, OpenAI introduced a feature called the ChatGPT plug-in. With this feature, applications from partners such as Expedia, OpenTable, and Instacart allow ChatGPT to access their services, allowing it to book and order goods on behalf of human users.

Dan Hendrycks, an artificial intelligence security expert on the red team, said plug-ins put "outsiders" at risk. "what if the chatbot could post your personal information online, access your bank account, or send the police to your house?" "in general, we need stronger security assessments before we allow human intelligence to harness the power of the Internet," he said. "

Respondents also warned that OpenAI should not stop security testing just because its software is already online. Heather Frase, who works at Georgetown University's Center for Security and emerging Technologies, tested GPT-4 's ability to assist in crime. As more and more people use the technology, the risk will continue to increase, she said.

"you do operational tests because once they are actually used in the real world, they behave differently." Fraser said. She believes that a public ledger should be created to report events caused by large language models, similar to cyber security or consumer fraud reporting systems.

Sarah Kingsley, a labor economist and researcher, suggests that the best solution is to clearly publicize the hazards and risks, as in the "nutrition labelling". "there should be a framework to know what frequent problems are, so you have a safety valve," she said. "that's why I say work can never be done."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.