MIT Professor Li Ju Group released "Jarvis Assistant CRESt": experimental scientists become Iron Man in a second, fully automated experiments + active learning 04/19 Update SLTechnology News&Howtos

MIT Professor Li Ju Group released "Jarvis Assistant CRESt": experimental scientists become Iron Man in a second, fully automated experiments + active learning

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

[guide to Xin Zhiyuan] put AI in charge of the experiment and scientists in charge of thinking.

At present, artificial intelligence and autonomous experiments are mainly written in Python and other languages, but not all experimental scientists are good at this kind of programming language, and their influence in real-world experiments is still limited.

However, based on OpenAI's recently released ChatGPT API function call function [1], the existing technology is enough to support the creation of a Jarvis intelligent assistant that was once only seen in Iron Man movies.

Recently, Professor Li Ju's team of MIT developed an artificial intelligence assistant CRESt (Copilot for Real-world Experimental Scientist) for experimental scientists, whose back end is connected in series with ChatGPT as the core:

1. Automated experiments of real-world robotic arms

two。 Local or online professional material database

3. Active Learning algorithm for optimizing material Formula

Video address: https://youtu.be/ POPPVtGueb0

At present, the front end of CRESt has achieved voice-in voice-out,AI to generate voice, as well as multi-platform seamless switching.

With CRESt, even without any code experience, researchers can use the automated experimental platform to do experiments for themselves through oral conversations with them.

Links to papers: https://doi.org/ 10.26434 / chemrxiv-2023-tnz1x

The "CRESt operating system" is mainly composed of four parts: user interface, ChatGPT back-end, active learning and end-effector.

The user interface is based on a Github project chatgpt-voice that supports voice-to-text and text-to-speech interaction. Its convenient network framework allows users to seamlessly continue the conversation on their mobile phones after leaving the lab.

And the back end runs independently, even if the front end changes will not be affected.

In addition, the author integrates ElevenLabs AI voice, which can generate very realistic human voice in real time, into the front end [3].

Text messages received from the front end are then transmitted to the ChatGPT back end based on CallingGPT.

CallingGPT is another Github project that converts Python functions recorded in Google docstring style into a JSON format that ChatGPT API can recognize so that it can be called when ChatGPT deems it necessary.

In addition, it closes a feedback loop between ChatGPT and the local Python library: the function recommended by ChatGPT will be executed locally immediately, and its return value will be sent back to ChatGPT.

In addition, the author also embeds an active learning algorithm in CRESt. Thanks to its good performance on small data sets, active learning is considered to be one of the most suitable machine learning algorithms for experimental science [5-7].

In machine learning projects involving real physical world experiments, data collection is often the biggest challenge.

Unlike the virtual world, every data point in the real physical world may cost a lot of time and money.

Generally speaking, a dataset of 1000 points is already quite good, and under such conditions, how to sample the design space efficiently becomes very important.

The main function of active learning is to interactively suggest combinations of parameters to be tested in the next batch of experiments, such as the alloy formula recommendations shown in the video.

Embedded in CRESt is the BoTorch-based Ax platform developed by the Meta team [8,9]. Ax has excellent SQL storage capabilities: even if the GPT backend is reset, you can continue previous active learning by fetching records stored in the database.

An end-effector is a series of subroutines that are called through HTTP requests. Some of these may involve information retrieval tasks (local or public database queries, such as Materials Project [10]), while others may have a real impact on the physical world, as shown in the video (liquid processing robots, laser cutting machines, pumps, air valves, robotic arms, etc.), mainly some automated hardware used for experiments.

The automation of these devices is mainly realized by PyAutoGUI, a Python library that can simulate human mouse and keyboard movements [11].

However, the authors expect that this redundant step will eventually lose its necessity, as most laboratory equipment should provide a dedicated AI communication interface outside the human interface in the near future.

Looking forward to what large language models can bring to the fields of science and engineering?

This is a question that the author's team has been thinking about since the advent of ChatGPT. There is no doubt that the big language model has demonstrated its extraordinary potential as a document organizer, and what we need to do is to provide them with more full text of the literature in the process of pre-training.

What are the other possibilities? In addition to the role of the experimenter assistant we developed in CRESt, we envisage that the large language model will revolutionize at least the following three dimensions:

Instrument technical instructor

At present, researchers must understand the theoretical basis of any technology they want to use, as well as the specific operation of individual instruments (sometimes experience-based "skills", "craftsmanship"), which may vary from manufacturer to manufacturer.

The latter often means time costs that cannot be ignored, such as a series of training courses for a common instrument, or reading 200 pages of instructions for a group of instruments, as well as hundreds of hours of hands-on exercises.

But let's calm down and think, are these steps really necessary?

We foresee that in the near future, researchers only need to clearly express their needs in natural language, and large language models will be able to translate these requirements into the best parameter settings (in fact, this is what some instrument experts are doing now to understand customer needs and translate them into instrument parameter settings / operations).

When necessary, the large language model can also provide the corresponding part of the manual to the user so that the user can know the details.

Technically, instrument manufacturers only need to fine-tune a large language basic model to learn from the experience of instrument operation mastered by senior technicians in the company, which can be done today.

Pipeline diagnostics

With the combination of multi-sensor robots or drones, large language models can help determine the root causes of poor repeatability of experiments.

In the future, the ideal experimental paradigm is to record all the metadata in the entire life cycle of each sample. When unexplained phenomena occur, all relevant log data will be input into the multimodal large language model for analysis.

Using its excellent hypothesis generation ability, the large language model can propose a series of potential reasons for human experts to further investigate the hypotheses they think are most likely.

This method can also be applied to industrial assembly lines-if a significant decline in production / quality rates is noted, large language models can identify "culprits" by comparing pipeline history.

Only when complex real-world operations are needed, human engineers need to be involved. in addition, the large language model can directly fine-tune the parameters of the sub-links that are likely to go wrong.

The prerequisite for realizing this role is that the large language model can handle a large number of images (videos), and its performance depends on the alignment of multimodal information (sample metadata, visual information, sound information, etc.).

Mechanism conjecture

We expect the large language model to be very good at using established scientific principles to explain new experimental phenomena. A large part of the work in the scientific mechanism exploration phase is pattern matching (for example, extracting tiny features from the spectrum and comparing them with standard databases), which are within the capabilities of large language models.

In the near future, this workflow will become very simple and straightforward, we only need to ask the big language model: we prepared and tested a sample, its composition is xxx, the processing process and parameter is xxx, and its performance is xxx.

These are all the characterization results (scanning electron microscope, X-ray diffraction, etc.). Please give 10 reasons to explain in detail why the performance of this sample is so good.

Human researchers can select the most reasonable explanation from a series of narratives generated by the large language model, and take it as a basis to perfect the whole mechanism explanation.

However, this task is the most challenging of all the roles of the large language model we envision, and the prerequisites for its implementation include:

1. Image input and alignment with scientific terms

two。 The ability to retrieve specific information from professional physical science databases

3. Pre-training of large language models in texts and appendices of scientific journals

4. Large language models have the ability to invoke a series of cutting-edge sub-domain machine learning models or simulation models.

Summarizing CRESt is only a starting point for large language models to assist scientists. We believe that the real potential of large language models lies in their ability to generate hypotheses [12].

Human beings have a relatively limited knowledge base, but the excellent causal reasoning ability enables us to give a small but to the point hypothesis.

In contrast, artificial intelligence has an extensive knowledge base and the ability to extract statistical information from big data [13], so they can generate a large number of less accurate hypotheses in a short time.

Therefore, this is not a story in which artificial intelligence competes with human beings, but a story in which artificial intelligence makes up for human shortcomings.

Under the cooperation mode of "AI suggests, humans select", both sides can give full play to their respective advantages and "make the best use of their talents".

Reference:

Https://doi.org/10.26434/chemrxiv-2023-tnz1x

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.