LeCun strongly recommends that Dr. Harvard share the use of GPT-4 for scientific research, down to each workflow. 04/20 Update SLTechnology News&Howtos

LeCun strongly recommends that Dr. Harvard share the use of GPT-4 for scientific research, down to each workflow.

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Using GPT-4 for scientific research may be standard for everyone in the future, but it takes skill to make efficient use of LLM tools. Recently, a Harvard doctor shared his experience and was recommended by LeCun.

The emergence of GPT-4 has made many people worry about their scientific research, even joking that NLP no longer exists.

Instead of worrying, it is better to apply it to scientific research and simply "change the volume method".

Kareem Carr, a PhD in biostatistics from Harvard University, says he has done academic research using large language modeling tools such as GPT-4.

He says these tools are very powerful, but there are also some very painful traps.

His tweet about the use of LLM was even recommended by LeCun.

Let's take a look at how Kareem Carr uses the sharp weapon of AI to do scientific research.

First principle: don't look for LLM what you can't verify. At the beginning, Carr gave the first most important principle:

Never ask a large language model (LLM) for information that you cannot verify by yourself, or ask it to perform tasks that you cannot verify that it has done correctly.

The only exception is that it is not a critical task, such as asking LLM about the idea of decorating the apartment.

Use the best practices of the literature review to summarize the research on breast cancer over the past 10 years. This is a poor request because you cannot directly verify that it correctly summarizes the literature.

Instead, ask, "give me a list of the top reviews of breast cancer research over the past 10 years."

Such hints can not only verify the source, but also verify the reliability yourself.

The tip for writing "tips" requires LLM to write code or find relevant information for you very easily, but the quality of the output can vary greatly. You can take the following measures to improve quality:

Setting context: explicitly tell LLM what information should be used

Use terms and symbols to tilt LLM towards the correct context information

If you have any ideas about how to handle the request, please tell LLM the specific method to use. For example, "solve this inequality" should be changed to "use Cauchy-Schwarz theorem to solve this inequality, and then apply complete square".

You know, these language models are much more complex in terms of language than you think, and even very vague hints can help.

Be more specific: this is not a Google search, so don't worry about whether there is a website discussing your exact problem.

"how to solve the simultaneous equation of quadratic term? "the hint is not clear. You should ask:" solve the equations of an and b for x = (1max 2) (aqb) and y = (1max 3) (a ^ 2 + ab+ b ^ 2).

Define the output format: take advantage of the flexibility of LLMs to format the output in a way that works best for you, such as:

Code

Mathematical formula

Article

Tutorial

Concise guide

You can even ask for code to generate the following, including tables, drawings, and charts.

Although you get the content of the LLM output, this is just the beginning. Because you need to verify the output. This includes:

Find inconsistencies

Output terms of content through Google search tools to obtain sustainable sources of information

Where possible, write code to test it yourself.

The reason for self-verification is that LLM often makes strange mistakes that are inconsistent with its seemingly professional level. For example, LLM may mention a very advanced mathematical concept, but is confused about simple algebraic problems.

Ask again:

The content generated by large language models is random. Sometimes, recreating a new window and asking your question again may provide you with a better answer.

In addition, you can use multiple LLM tools. Kareem Carr currently uses Bing AI,GPT-4,GPT-3.5 and Bard AI in scientific research according to its own needs. However, each of them has its own advantages and disadvantages.

Citation + productivity citation

According to Carr's experience, it is best to ask GPT-4 and Bard AI the same mathematical problem at the same time to get a different point of view. Bing AI is suitable for web search. GPT-4 is much smarter than GPT-3.5, but OpenAI currently limits 25 messages in 3 hours, which makes it difficult to access.

When it comes to citation, citing references is a particularly weak point in LLM. Sometimes the references given to you by LLM exist, and sometimes they don't.

Earlier, a netizen encountered the same problem. He said that he asked ChatGPT to provide reference materials related to the mathematical nature of the list, but ChatGPT generated non-existent citations, known as "hallucinations".

However, Kareem Carr points out that false citations are not entirely useless.

In his experience, words in fabricated references are usually related to real terms, as well as to researchers in related fields. Therefore, searching for these terms through Google usually brings you closer to the information you are looking for.

In addition, Bing is also a good choice when searching for sources.

Productive forces

There are many unrealistic claims that LLM can improve your productivity, such as "LLM can increase your productivity by 10 times or even 100 times."

According to Carr's experience, this acceleration only makes sense without double-checking any work, which is irresponsible for scholars.

However, LLM has greatly improved the academic workflow of Kareem Carr, including:

-prototype idea design-identify useless ideas-accelerate tedious data reformatting tasks-learn new programming languages, packages and concepts-Google search

With the help of the current LLM,Carr, he says he spends less time on what to do next. LLM can help him push vague or incomplete ideas into a complete solution.

In addition, LLM reduces the time Carr spends on sidelines that have nothing to do with his main goals.

I found that I had entered a state of flow and I could move on. This means that I can work longer hours without getting tired.

One last piece of advice: be careful not to get involved in a sideline. The sudden increase in productivity by these tools can be intoxicating and may distract individuals.

With regard to the experience of ChatGPT, Carr posted a dynamic post on LinkedIn to share his feelings about the use of ChatGPT:

As a data scientist, I have been experimenting with OpenAI's ChatGPT for several weeks. It is not as good as people think.

Despite the initial disappointment, my feeling is that a system like ChatGPT can add great value to the standard data analysis workflow.

At this point, the value is not obvious. It's easy for ChatGPT to get some details wrong with simple things, and it simply doesn't solve problems that require multiple reasoning steps.

The main question for each new task in the future will still be whether it is easier to evaluate and improve ChatGPT's solution, or to start from scratch.

I did find that even a bad solution for ChatGPT tends to activate relevant parts of my brain, but not from scratch.

It's like they always say it's easier to criticize a plan than to come up with a plan yourself.

Netizens need to verify this for the content of AI output, and say that in most cases, the correct rate of artificial intelligence is about 90%. But the remaining 10% of mistakes can be fatal.

Carr joked that if it was 100%, I wouldn't have a job.

So why does ChatGPT generate fake references?

It is worth noting that ChatGPT uses a statistical model to guess the next word, sentence, and paragraph based on probability to match the context provided by the user.

Because the source data of the language model is very large, it needs to be "compressed", which causes the final statistical model to lose its accuracy.

This means that even if there are true statements in the original data, the "distortion" of the model will produce a kind of "fuzziness", which leads to the most "specious" statements in the model.

In short, the model does not have the ability to assess whether the output it produces is equivalent to a true statement.

In addition, the model is based on public network data collected by the public welfare organization Common Crawl and similar sources for crawling or crawling until 21 years.

Because the data on the public network is basically unfiltered, it may contain a lot of error information.

Recently, an analysis by NewsGuard found that GPT-4 is actually more likely to generate error messages than GPT-3.5, and its response is more detailed and convincing.

In January, NewsGuard tested GPT-3.5 for the first time and found that it generated 80 out of 100 false news stories. A further test of GPT-4 was conducted in March and found that GPT-4 had made false and misleading responses to all 100 false statements.

Thus it can be seen that source verification and testing are needed in the process of using LLM tools.

Reference:

Https://twitter.com/kareem_carr/status/1640003536925917185

Https://scholar.harvard.edu/kareemcarr/home

Https://www.newsguardtech.com/misinformation-monitor/march-2023/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.