AI can write a paper! Chinese undergraduates invent AI thesis generator 04/26 Update SLTechnology News&Howtos

AI can write a paper! Chinese undergraduates invent AI thesis generator

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Https://www.toutiao.com/a6694829950148542980/

[introduction to Xin Zhiyuan] AI has reached a nearly perfect degree in writing papers! Researchers such as Wang Qingyun, a senior at Rensselaer Institute of Technology, have recently developed PaperRobot, which can range from generating ideas, writing abstracts and conclusions to writing "future research". It can even write the title of your next paper.

Are you still worried that you can't come up with good ideas for writing a paper?

Don't worry! PaperRobot, newly developed by researchers at Rensselaer Institute of Technology and Stanford University, provides an one-stop shop for generating idea, writing abstracts, writing conclusions and writing "future research". It can even write the title of the next paper for you, and the paper will be carefree from now on.

The paper, titled PaperRobot: Incremental Draft Generation of Scientific Ideas, has been accepted by ACL 2019 and has attracted a lot of attention on Twitter.

Google brain scientist David Ha (hardmaru) commented: "May a thousand (incremental) ideas bloom."

A senior Chinese invented the artifact of AI "thesis generation"

The authors are from Rensselaer Institute of Technology, DiDi Lab, University of Illinois at Urbana-Champaign, University of North Carolina at Chapel Hill and Stanford University. Among them, the lead author Qingyun Wang (Wang Qingyun) is a senior at Rensselaer Institute of Technology (he began to study computer science PhD in UIUC in August this year).

This is not the first time that Wang Qingyun has studied AI to write a paper. His research on "Abstract Generation of papers" has also aroused heated discussion as early as 2017. Wang Qingyun Middle School, a student of Hangzhou No. 2 Middle School, has been a "master of inventions" since childhood, and two inventions have been patented.

Paper address:

Https://arxiv.org/pdf/1905.07870.pdf

How does PaperRobot write papers automatically? To put it simply, it extracts background knowledge graph from previous papers, produces new scientific ideas, and finally writes out the key elements of the paper.

Its workflow includes:

(1) to have an in-depth understanding of a large number of papers written by human beings in the target field, and construct a comprehensive background knowledge map (knowledge graphs, KGs)

(2) predict links from the background knowledge base KG by combining slave graph attention (graph attention) and context text attention (contextual text attention) to generate new ideas.

(3) based on the memory-attention network, gradually write some key elements of a new paper: generate an abstract from the relevant entities of the input title and forecast, generate conclusions and future work from the abstract, and finally generate the title of the next paper from the future work.

The researchers conducted a Turing test on the AI paper production machine:

PaperRobot generates abstracts, conclusions and future work of papers in the biomedical field, while showing papers written by humans in the same field, requiring a biomedical expert to make a comparison. The results show that in terms of abstracts, conclusions and future work, 30%, 24% and 12% of the cases, respectively, human experts believe that AI produces better than human writing.

As for why this group of AI researchers chose the biomedical field to do experiments, the reason is simple: there are many biomedical papers, many! They tried to use their own field (NLP) to do the experiment, and the results were not satisfactory (NLP's paper corpus is not enough).

Next, Xin Zhiyuan translated and introduced this paper:

Simple 3 steps, graph network + attention mechanism, AI is even better than humans in writing papers.

Our goal is to build a paper robot PaperRobot to accelerate scientific discovery and production, its main tasks are as follows.

Read the existing papers.

There are too many papers. It is difficult for scientists to keep up with the blowout growth of papers. For example, in the biomedical field, an average of more than 500000 papers are published each year, and more than 1.2 million new papers are published in 2016 alone, with a total of more than 26 million papers (Van Noorden, 2014).

However, human reading ability is almost unchanged. In 2012, American scientists estimated that they could only read an average of 264 papers a year (only one in 5000), a figure consistent with the data they reported in the same survey they conducted in 2005.

PaperRobot automatically reads all available papers and builds a background knowledge map (KG) where nodes represent entities / concepts and edges represent the relationships between these entities.

In this study, we use a large number of published biomedical papers to extract entities and their relationships to construct background knowledge maps. We use the entity and relationship extraction system proposed by Wei et al. (2013) to extract three types of entities (disease, chemistry and genes). Then, we further linked all entities to CTD (Comparative Genotoxicology Database) to extract 133 subtypes of relationships, such as markers / mechanisms, treatments, and increased expression.

Figure 3 is an example.

Figure 3: example of biomedical knowledge extraction and link prediction (dotted lines represent links to predictions)

Generate new ideas

Scientific discovery can be seen as creating a new node or link (links) in a knowledge graph.

Creating new nodes usually means discovering new entities (such as new proteins) through a series of real laboratory experiments, which may be too difficult for PaperRobot. However, using the background knowledge graph as a starting point, it is easier to create new edges automatically.

Research by Foster et al. (2015) shows that more than 60% of the 6.4 million biomedical and chemical papers are incremental work. This inspires us to automatically add new ideas and assumptions by predicting new links in the background knowledge map (KGs).

We propose a new entity representation method, which combines KG structure and unstructured context text for link prediction.

As shown in figure 3 above, the dotted line represents the predicted link. Because calcium and zinc are similar in context text information and graph structure, we predict two new neighbors of calcium: CD14 molecule and neurociliary protein 2 (neuropilin 2), which are neighbors of zinc in the initial background knowledge map.

Write a new paper on new ideas

The final step is to communicate new ideas clearly to readers, which is very difficult; in fact, many scientists are bad writers (Pinker, 2014).

Using a novel memory-attention network architecture, based on the input title and predicted related entities, PaperRobot automatically writes an abstract of a new paper, and then further writes the conclusion part and related work part. Finally, a new title is written for the follow-up paper.

This process is shown in figure 1.

Figure 1: PaperRobot paper writing process

We chose biomedicine as our target field because there are a large number of available papers in this field.

Turing tests show that PaperRobot-generated output is sometimes more popular than manual content, and that most paper abstracts can be informative and well-organized with only a small amount of editing by domain experts.

Let's take a look at the summary written by AI:

Background: Snail is a multifunctional protein that plays an important role in the pathogenesis of prostate cancer. However, it has been shown to be associated with poor prognosis. The purpose of this study was to investigate the effect of negatively on the expression of maspin in human nasopharyngeal carcinoma cell lines. Methods: Quantitative real-time PCR and western blot analysis were used to determine whether the demethylating agent was investigated by quantitative RT-PCR (qRT-PCR) and Western blotting. Results showed that the binding protein plays a significant role in the regulation of tumor growth and progression.

The overall framework of PaperRobot is shown in figure 2.

Table 1 shows the examples generated from the entire process.

Table 1: comparison of papers written by humans with those written by AI (bold for entities related to the subject; italics for manual editing)

For a detailed description of the algorithm for each step, please read the original paper.)

Experimental process and results

Data collection

We collected biomedical papers from the PMC Open access subset. Citing a paper for human written papers to construct a predicted ground truth of the new title, we assume that the title of paper An is generated from "conclusions and future work" of paper B. We constructed background knowledge maps from 1687060 papers, including 30483 entities and 875698 relationships. Table 2 shows detailed data statistics.

Table 2 Statistical results of thesis writing

Automatic evaluation

Previous studies have shown that automatic evaluation of long text generation is a major challenge. After the story is generated, we use METEOR to measure the relevance of the article topic to a given title, and use perplexity to further evaluate the quality of the language model.

The confusion score of our model is based on the language model learned in PubMed papers (500000 topics, 50000 abstracts, 50000 conclusions and future work), which were not used for training or testing in our experiments. The results are shown in Table 3. Our framework is superior to all previous approaches.

Table 3 results of automatic evaluation of diagnostic task paper writing

Turing test

The model was tested by biomedical experts (non-native speakers) and non-experts (native speakers). In the test, each human is required to compare the string output of the system and the string created by the human, and select a string of higher quality.

Table 4 Turing test results of the model (%). The percentage indicates how often the human referee chooses our model to output results. If the output string (such as summary) is based on the same input string (such as title), the input condition is marked "same", otherwise it is marked "different".

It can be seen that in the selection of experts, the selection rate of abstracts generated by PaperRobot is up to 30% higher than that of human abstracts, the "conclusions and future work" section is up to 24%, and the new title is up to 12% higher. The performance of experts in the field is not significantly better than that of non-experts, because these two groups of people tend to focus on different aspects: experts focus on content (entities, topics, etc.), rather than experts on language.

Human post-editing

In order to measure the effectiveness of PaperRobot as a writing assistant, we randomly selected 50 abstracts generated by the system in the first iteration and asked experts in the field to edit them until the experts thought that the edited abstracts were informative and coherent enough. The score was then given by BLEU,ROUGE and TER by comparing the quality of the summary before and after human editing, as shown in Table 5. It took the expert about 40 minutes. Completed the editing of 50 abstracts.

Some post-edited examples. You can see that most of the editing content is a change in form.

Chinese undergraduates invent Little Talent

Qingyun Wang (Wang Qingyun), the first author of this thesis, is a senior at Rensselaer Institute of Technology, majoring in computer science and mathematics. Starting in August, he will study for a PhD at the University of Illinois at Urbana-Champaign, majoring in computer science.

Wang Qingyun is very interested in natural language processing, specializing in natural language generation, information extraction and dialogue systems, and has published a number of related papers during his undergraduate course.

Surprisingly, Wang Qingyun's resume also listed two patents, namely, "remote control convenience table" and "household waste oil soap making device", which were obtained in middle school. Among them, "remote control convenience table" won the first prize of the 27th Zhejiang Innovation Competition.

Wang Qingyun, a classmate in middle school.

It seems that classmate Wang has invented talents since he was a child. Needless to say, the AI essay writing machine is also a good invention for the benefit of mankind. I look forward to Wang's continued improvement.

Reference link:

Https://arxiv.org/pdf/1905.07870.pdf

Http://www.hz2hs.net.cn/news/allinfo/1251.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.