The future of programmers belongs to "pseudo code". Nature column: three postures for accelerating scientific research and programming with ChatGPT 07/12 Update SLTechnology News&Howtos

The future of programmers belongs to "pseudo code". Nature column: three postures for accelerating scientific research and programming with ChatGPT

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

ChatGPT makes scientific research programming easier!

Based on the emergence of generative artificial intelligence tools, such as ChatGPT, Bard and other chat robots, and how to use AI tools for academic research has caused great controversy, but at the same time, the value of AI-generated code for scientific research has been ignored.

Compared with plagiarism caused by ChatGPT generated text, code copying with AI is obviously less controversial, and open science even encourages "code sharing" and "code reuse". It is also convenient to trace back to the source, such as using "import" in python to import dependency packages even if referenced.

In a recent commentary on Nature, the author team discussed three potential capabilities of ChatGPT in scientific programming, including brainstorming, breaking down complex tasks, and dealing with simple but time-consuming tasks.

Article link: https://www.nature.com/ articles / s41559023-02063-3 researchers have explored the capabilities and limitations of using generative AI to enhance scientific coding by using ChatGPT to translate natural language into computer-readable code.

The examples in the experiment mainly explored common tasks that might be related to ecology, evolution, and other fields, and the researchers found that 80% of the coding tasks could be accomplished with ChatGPT.

ChatGPT can generate very useful code if the task is broken down into small, manageable blocks of code with precise hints as queries.

It is worth noting that doing the same experiment with Google's Bard usually results in similar results, but there are more errors in the code, so this article mainly uses ChatGPT for experiments.

Lead author Cory Merow is a quantitative ecologist whose main research direction is to establish a mechanism model to predict the response of population and community to environmental change. Even the best data sets are not perfect in predicting global change responses, so tools need to be developed to combine data sources and explore data sets to gain insight into possible changes in biological systems.

ChatGPT-assisted scientific coding ChatGPT is based on the regression model GPT-3, and carries out fitting training on a large number of web pages, books and other texts, and can generate text without search.

So ChatGPT is better at interpolating (predicting texts similar to training data) than extrapolating (extrapolating, predicting new texts that are different from training samples).

The sheer size of the training set is an advantage, which means that GPT-3 has seen a large number of language patterns that enable it to interpolate and increase the likelihood of generating useful responses to humans.

However, for the code generation task, GPT-3 does not know how to program, but only knows what the code looks like and which words are most likely to appear in the next location. Its working principle is similar to automatic completion, predicting the next code block (chunk) based on the probability model. The block is usually smaller than the word (word) and can also be called token.

The probability of generating the correct token is based on the probability product of all token, that is, increasing the number of predicted token or reducing the certainty of the selected token will increase the difficulty of the task, thus reducing the probability of obtaining the correct token.

Therefore, to increase the probability of correct token, you need to shorten the length of the generation task or provide more specific instructions.

Finally, the researchers warn that some of the text generated by ChatGPT looks like code, but may not be executed, so you need to carefully observe debugging during the coding process.

Brainstorming tool ChatGPT can well retrieve multiple data sources, such as plant traits, species distribution areas and meteorological data in the ecological field.

Although some of the data provided by ChatGPT is incorrect, these errors can be quickly corrected through the links it provides.

However, ChatGPT cannot write crawlers to download data from websites, probably because R language packages and underlying APIs (such as R access database protocols) are updated too fast. after all, ChatGPT training data was built in 2021.

ChatGPT can come up with a variety of statistical techniques when you encounter specific problems, generate more guidance based on user assumptions in subsequent questions, and provide an initial code.

However, the synthesis process is only applicable to the presentation and exchange of ideas, and fact checking is still needed through traditional data sources such as papers.

It should be noted that some websites claim that ChatGPT has the ability to write abstracts of books, but the researchers' test results show that this summary synthesis is completely wrong, probably because the test books do not appear in the GPT-3 training set.

The more difficult task requires more debugChatGPT to be very good at generating template code, providing a short script code that contains a small number of functions under specific instructions.

For example, in the following example, the researchers asked ChatGPT to string the inputs and outputs of four commonly used functions together. And provide a sample code that uses this function to simulate data.

You can see that the results generated by ChatGPT are almost perfect, and it only takes a few minutes to debug the code, but you need to be very specific about query in the prompt, including providing naming and functions to be used.

The researchers found that the key to success is:

1. Decompose a complex task into multiple sub-tasks, each of which needs only a few steps to complete. After all, the code generated by ChatGPT is the result of a probabilistic text prediction model.

2. ChatGPT works best when using functions that already exist, because it only involves interpolation rather than extrapolation.

For example, code that uses regular expressions (regex) to extract information from text is very difficult for many developers, but because regular Web sites already offer a large number of online examples and may appear in ChatGPT examples, ChatGPT is good at writing rules.

3. One of the biggest criticisms of ChatGPT in academic circles is the lack of transparency of its information sources.

For code generation tasks, a degree of transparency can be achieved by specifying a "namespace", that is, explicitly calling the package name when using a function.

However, ChatGPT may directly copy the individual's public code without referencing it, and researchers still have a responsibility to verify the correct code owner.

At the same time, if you want to generate a longer script, it will expose some ChatGPT flaws, such as falsifying function names or parameters, which is why StackOverflow disables ChatGPT code generation.

But if the user provides a clear set of execution steps, ChatGPT can still generate a useful workflow template that defines the connection between the input and output of the steps, which is probably the most useful way to extrapolate new code with GPT-3.

Currently, ChatGPT cannot convert pseudocode (algorithmic steps described in simple language) into perfect computer executable code, but this may not be far from reality.

ChatGPT is especially helpful for beginners and unfamiliar programming languages, because beginners can only write shorter scripts, making it easier to debug.

ChatGPT is better at non-creative tasks ChatGPT is best at solving time-consuming formulaic tasks that can be used to debug, detect, and interpret errors in code.

ChatGPT is also very effective in writing function documentation, such as using roxygen 2's inline documentation syntax, which is very efficient in identifying all parameters and classes, but rarely explains how to use functions.

A key limitation is that ChatGPT generation is limited to about 500 words, focusing on the generation of smaller blocks of code, and the ability to generate unit tests to automate code functionality.

Most of the advice given by ChatGPT is helpful in defining the structure of the test and examining the expected object classes.

Finally, ChatGPT is very effective at reformatting code to follow standardized (such as Google) code styles.

The future belongs to pseudo code.

ChatGPT and other artificial intelligence-driven natural language processing tools are ready to automate developers' simple tasks, such as writing short functions, syntax debugging, comments, and formatting, while extended complexity depends on users' willingness to debug (and their proficiency).

The researchers summarized the functions of ChatGPT in code generation, which can simplify the coding process in the field of science, but manual checking is still necessary, and runnable code does not necessarily mean that the code can perform the desired tasks, so unit testing or informal interactive testing is still crucial.

In cases where the solution may be developed by humans and generated by simple replication by ChhatGPT, it is important to ensure that the correct code belongs to the person.

There are already chatbots that automatically provide links to their sources (for example, Microsoft's Bing), although this step is still in its infancy.

Compared with traditional methods, ChatGPT provides an alternative way to learn coding skills, which can ease the obstacles to writing initial tasks by converting pseudocode directly into code.

Researchers suspect that future developments will use tools such as ChatGPT to automatically debug the written code and iteratively generate, run and propose new code according to the errors encountered. In the course of the experiment, the researchers found that the ability to correct the code is limited, only when very specific instructions are directed at small blocks of code, and the efficiency of the debugging process is much lower than that of manual debugging.

The researchers suspect that as technology advances, such as the recently released GPT-4 model, which is said to be 10 times larger than the GPT-3 model, automated debugging will be improved.

The future is coming, and now is the time for developers to learn hint engineering skills to take advantage of emerging AI tools, and researchers expect code generated using artificial intelligence to become increasingly valuable skills in all aspects of software development that are the basis of scientific discovery and understanding.

Reference:

Https://www.nature.com/articles/s41559-023-02063-3

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.