AI reads brain explosions, scans brain images, and Stable Diffusion reproduces images realistically 04/20 Update SLTechnology News&Howtos

AI reads brain explosions, scans brain images, and Stable Diffusion reproduces images realistically

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Recently, a study claims to be able to use Stable Diffusion to reconstruct brain activity into high-resolution, high-precision images. The relevant papers were accepted by CVPR 2023, causing an uproar among netizens. AI brain reading is close at hand.

Even without Hogwarts' magic, you can see what other people are thinking!

The method is simple: visualize brain images based on Stable Diffusion.

For example, the bears, planes and trains you see look like this.

When AI sees the brain signal, the generated image looks like the following, so you can see that you have all the main points.

This AI brain reading has just been accepted by CVPR 2023, making the circle "the most exciting part of the brain" in an instant.

It's wild! Forget to prompt the project, now you just need to use your head to "think" about the images.

Imagine that using Stable Diffusion to reconstruct visual images from fMRI data may mean a non-invasive brain-computer interface in the future.

Let AI skip human language and perceive what's going on in the human brain.

At that time, Musk's Neuralink will have to catch up with the AI ceiling.

Without fine-tuning, use AI to directly reproduce what you're thinking, so how on earth does AI read the brain?

The latest research comes from a research team at Osaka University in Japan.

Paper address: https://sites.google.com/ view / stablediffusion-with-brain/ researchers at the Graduate School of Frontier Biosciences at Osaka University and CiNet at NICT in Japan are based on a potential diffusion model (LDM) and, more specifically, to reconstruct visual experiences from fMRI data through Stable Diffusion.

The framework of the whole operation process is also very simple: an image encoder, an image decoder, and a semantic decoder.

By doing so, the team eliminated the need to train and fine-tune complex artificial intelligence models.

All that needs to be trained is a simple linear model that maps fMRI signals from the lower and upper visual brain regions to a single Stable Diffusion component.

Specifically, the researchers mapped brain regions to input from image and text encoders. The lower brain region is mapped to the image encoder and the upper brain region is mapped to the text encoder. In this way, this enables the system to be reconstructed using image composition and semantic content.

The first is decoding analysis. The LDM model used in the study is composed of image encoder ε, image decoder D and text encoder τ.

The researchers decode the potential representation of the reconstructed image z and the related text c from the fMRI signals of the early and advanced visual cortex respectively, which are used as input, and the reproduced image Xzc is generated by an automatic encoder.

Then, the researchers also established a coding model to predict fMRI signals from different components of LDM, so as to explore the internal operation mechanism of LDM.

The researchers experimented with fMRI images from the natural scene data set (NSD) and tested whether they could use Stable Diffusion to reconstruct what the subjects saw.

It can be seen that the prediction accuracy of the coding model and the latent image related to LDM is the highest in the posterior visual cortex of the brain.

The result of visual reconstruction of a subject shows that the image reconstructed only with z is visually consistent with the original image, but the semantic content can not be captured.

The image reconstructed with c only has better semantic fidelity, but the visual consistency is poor, while the image reconstructed with zc can have both high semantic fidelity and high resolution.

The reconstruction results of the same image from all subjects show that the effect of reconstruction is stable and accurate among different subjects.

The difference in specific details may come from different individuals' perceived experience or data quality, rather than from the wrong reconstruction process.

Finally, the results of quantitative evaluation are plotted into a chart.

The results show that the method used in the study can capture not only the low-level visual appearance, but also the high-level semantic content of the original stimulus.

From this point of view, the experiment shows that the combination of image and text decoding provides accurate reconstruction.

The researchers said there were differences in accuracy among subjects, but these differences were related to the quality of fMRI images. According to the team, the quality of the reconstruction is similar to that of current SOTA methods, but there is no need to train the AI model used in it.

At the same time, the team uses models derived from fMRI data to study the various building blocks of Stable Diffusion, such as how semantic content is generated during reverse diffusion, or what happens in U-Net.

In the early stage of the denoising process, the bottleneck layer (orange) of U-Net produces the highest prediction performance. With the progress of the denoising process, the early layer (blue) predicts the activity of the early visual cortex, while the bottleneck layer turns to the advanced visual cortex.

That is to say, at the beginning of the diffusion process, the image information is compressed in the bottleneck layer, and with denoising, the separation between U-Net layers appears in the visual cortex.

In addition, the team is quantitatively explaining the image conversion at different stages of diffusion. In this way, researchers aim to contribute to a better understanding of diffusion models from a biological point of view, which are widely used, but their understanding is still limited.

The picture of the human brain has long been decoded by AI? For years, researchers have been using artificial intelligence models to decode information from the human brain.

The core of most methods is by using pre-recorded fMRI images as input to the text or image's generative AI model.

In early 2018, for example, a team of researchers from Japan demonstrated how a neural network could reconstruct images from fMRI recordings.

In 2019, a team reconstructed images from monkey neurons, and Meta's team, led by Jean-Remi King, published new work, such as drawing text from fMRI data.

In October 2022, a team at the University of Texas at Austin showed that the GPT model could infer text that describes the semantic content a person sees in a video from fMRI scans.

In November 2022, researchers at the National University of Singapore, the Chinese University of Hong Kong and Stanford University used the MinD-Vis diffusion model to reconstruct images from fMRI scans with significantly higher accuracy than the methods available at that time.

If you push it back, some netizens have pointed out that "images based on brainwaves have been available since at least 2008, implying in some way that Stable Diffusion can read people's minds, which is ridiculous. "

The paper, published in Nature by the University of California, Berkeley, says that human brainwave activity can be converted into images using visual decoders.

When it comes to tracing history, others point to a 1999 Stanford Li Feifei study on image reconstruction from the cerebral cortex.

Li Feifei also began to comment and retweet, saying that he was still a university intern at that time.

And in 2011, a study by UC Berkeley used functional magnetic resonance imaging (fMRI) and computational models to initially reconstruct "dynamic visual images" of the brain.

In other words, they recreate the footage that people have seen.

But compared with the latest research, this reconstruction is completely unrecognizable and unrecognizable.

The author introduces Yu Takagi

Yu Takagi is an assistant professor at Osaka University. His research interest is the intersection of computational neuroscience and artificial intelligence.

During his Ph.D., he studied techniques for predicting individual differences from whole brain functional connections using functional magnetic resonance imaging (fMRI) at the ATR brain Communication Research Laboratory.

Recently, he used machine learning techniques to understand dynamic computing in complex decision-making tasks at the Oxford brain activity Center at the University of Oxford and the Department of Psychology at the University of Tokyo.

Shinji Nishimoto

Shinji Nishimoto is a professor at Osaka University. His research is a quantitative understanding of visual and cognitive processing in the brain.

More specifically, Professor Nishimoto's team focused on understanding neural processing and representation by building predictive models of brain activity evoked by natural perception and cognitive conditions.

Some netizens asked the author whether this study could be used to interpret dreams.

"it is possible to apply the same technique to brain activity during sleep, but the accuracy of this application is not clear. "

After reading this study: Legilimency is in place.

Reference:

Https://sites.google.com/view/stablediffusion-with-brain/

Https://www.biorxiv.org/content/10.1101/2022.11.18.517004v2

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.