Evolution of Machine Heart / understanding of AI-driven Software 2.0 Intelligent Revolution 04/19 Update SLTechnology News&Howtos

Evolution of Machine Heart / understanding of AI-driven Software 2.0 Intelligent Revolution

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

In the past few months, the capital spree of technology companies has come to an end because of the Fed's interest rate hike. The share price of US-listed SaaS has fallen by 70%. Layoffs and austerity are necessary options. But when the market was howling, Dall-E 2 was released, followed by a large number of cool AI companies. These events have sparked a wave in the venture capital world, where we see companies peddling products based on generative AI (Generative AI) valued at billions of dollars, even though revenues are less than a million dollars and there is no proven business model. Not long ago, the same story happened on Web 3! It feels like we are about to enter a new era of prosperity, but can artificial intelligence really lead to the recovery of the technology industry this time?

This article will take you through a magnificent history of the development of artificial intelligence, from the academic progress driven by key people, the emergence of algorithms and ideas, the progress of companies and products, and the iterative influence of brain science on neural networks. these four dimensions are used to deeply understand the evolution of the heart of machines. Forget the gaudy photo production apps, and let's learn something close to the nature of AI. The full text is divided into six chapters:

The Evolutionary History of 1.AI-pre-Neural Network era, the transition of Machine Learning, the Magic Box of opening Pandora

two。 The rise of Software 2.0-the shift and Evolution of Software Paradigm, Software 2.0 and Bug 2.0

3. Smart-oriented architecture-Infrastructure 3.0. the vanguard of how to assemble intelligent and intelligent architecture

4. Unified model-the birth of Transformer, the basic model, the new opportunity of AI

5. AI in the real world-the new frontier of autopilot, robots and intelligent agents

The Future of 6.AI Evolution-Perspective Neural Network, Thousand brain Theory, artificial Intelligence when will it be universal?

The article is long, accumulating 22800 words, please set aside an hour or so to read, you are welcome to collect it first and then read it!

Do you think machine intelligence can surpass that of human beings? With this question to read, I believe there will be a systematic answer!

In this paper, in order to write succinctly, when a large number of repeated words appear in the same paragraph, we will use AI (Artifical Intelligence) to represent artificial intelligence, ML (Machine Learning) to represent machine learning, DL (Deep Learning) to represent deep learning, and various English abbreviations to give priority.

01. The history of AI evolution as to whether machines can really "know", "think" and other issues, it is difficult for us to strictly define these. Our understanding of human psychological processes may only be a little better than fish's understanding of swimming.

John McCarthy

As early as 1945, Alan Turing was already thinking about how to use computers to simulate the human brain. He designed ACE (Automatic Computing Engine, an automatic computing engine) to simulate the work of the brain. In a letter to a colleague, he wrote: "I may be more interested in modeling the workings of the brain than in the practical use of computing. Although the brain works by complex neuronal circuits calculated by the growth of axons and dendrites, we can still make a model in ACE that allows this possibility to exist. The actual structure of ACE has not changed, it just remembers the data. " This is the origin of machine intelligence, at least at that time in the UK.

1.1 in the pre-neural network era, the neural network is a computer system that simulates the operation of neurons in the human brain.

AI appears with the development of neural network. In 1956, American psychologist Frank Rosenblatt implemented an early neural network demonstration-perceptron model (Perceptron Model), which classifies simple images, such as triangles and squares, by supervising Learning. This is a computer with only eight simulated neurons made of motors and turntables connected to 400 light detectors.

On the basis of these studies, the Georgetown laboratory of 01:Frank Rosenblatt & Perceptron ModelIBM has implemented the earliest machine language translation system, which can be translated between English and Russian. In the summer of 1956, at a conference at Dartmouth College, AI was defined as a research area of computer science, organized by Marvin Minsky (Minsky), John McCarthy (McCarthy), Claude Shannon (Shannon), and Nathaniel Rochester (Rochester), who were later called the "founders" of AI.

Picture 02:Participants of the 1956 Dartmouth Summer Research Project on AIDARPA in this "golden" era, most of the money was invested in AI, and just a decade later they invented ARPANET (the predecessor of the Internet). The early AI pioneers tried to teach computers to do complex psychological tasks that mimic humans, dividing them into five sub-areas: reasoning, knowledge representation, planning, natural language processing (NLP) and perception, terms that sound general to this day.

From expert system to Machine Learning

In 1966, Marvin Minsky and Seymour Papert explained in their book Perceptron: an introduction to Computational Geometry that only a few layers of neural networks could perform the most basic calculations because of hardware limitations, which suddenly dampened the enthusiasm of research and development on this route, and the AI field ushered in the first bubble burst. It never occurred to these pioneers that the speed of computers could grow exponentially and increase by hundreds of millions of times in the decades that followed.

In the 1980s, with the improvement of computer performance and the popularity of the new computer language Prolog & Lisp, logic could be realized by complex program structures, such as conditional loops. At this time, artificial intelligence was expert system (Expert System), and iRobot was definitely the star of that era. But after a short boom, due to the limitation of hardware storage space and the inability of expert systems to solve specific and difficult logic problems, artificial intelligence is once again in a dilemma.

I doubt whether anything very similar to formal logic can be a good model for human reasoning.

Marvin Minsky

Until IBM Deep Blue defeated chess champion Kasparov in 1997, new probabilistic inference (Probabilistic Reasoning) ideas began to be widely used in the AI field, and then IBM Watson projects used this method to often beat humans on the TV game show "Jeopardy."

Probability inference is a typical machine learning (Machine Learning). Most of today's AI systems are driven by ML, in which prediction models are trained based on historical data and used to predict the future. This is the first paradigm shift in the field of AI. The algorithm does not specify how to solve a task, but induces it according to the data to achieve the goal dynamically. Because of ML, we have the concept of big data (Big Data).

1.2 Machine Learning's transition Machine Learning algorithm is generally learned by analyzing data and inference models to establish parameters, or by interacting with the environment and obtaining feedback. Humans can annotate the data or not, and the environment can be simulated or in the real world.

Deep Learning

Deep Learning is a Machine Learning algorithm, which uses multi-layer neural networks and back propagation (Backpropagation) techniques to train neural networks. This field was almost pioneered by Geoffrey Hinton. As early as 1986, Hinton and his colleagues published a groundbreaking paper on DNNs-Deep Neural Networks, which introduces the concept of back propagation, which is an algorithm for adjusting weights. Whenever you change the weight, the neural network will approach the correct output faster than before, and you can easily implement a multi-layer neural network. Break through the spell of perceptron limitations written by Minsky in 1966.

03:Geoffrey Hinton & Deep Neural NetworksDeep Learning really sprang up in 2012, when Hinton and two of his students in Toronto showed that deep neural networks trained using back propagation beat the most advanced systems in image recognition, almost halving previous error rates. Because of his work and contribution to the field, the name Hinton has almost become synonymous with Deep Learning.

The data is new oil.

Deep Learning is a revolutionary field, but data is needed for it to work as expected. One of the most important data sets is ImageNet created by Li Feifei. Li Feifei, a former director of Stanford University's artificial intelligence laboratory and chief scientist of Google's AI / ML, saw that data was crucial to the development of Machine Learning algorithms as early as 2009, and published a paper on computer vision and pattern recognition (CVPR) in the same year.

Figure 04:FeiFei Li & ImageNet this data set is very useful to researchers, and because of this, it has become more and more famous, providing a benchmark for the most important annual DL competition. In just seven years, ImageNet improved the accuracy of the winning algorithm in classifying objects in images from 72% to 98%, exceeding the average human ability.

ImageNet has become the preferred dataset of the DL revolution, or rather, the dataset of the AlexNet convolution neural network (CNN-Convolution Neural Networks) led by Hinton. ImageNet not only led the revolution of DL, but also set a precedent for other data sets. Since its creation, dozens of new data sets have been introduced, with richer data and more accurate classification.

Neural network explosion

With the support of Deep Learning theory and data sets, deep neural network algorithms have exploded since 2012, such as convolution neural network (CNN), recurrent neural network (RNN-Recurrent Neural Network), long-term and short-term memory network (LSTM-Long Short-Term Memory) and so on, each of which has different characteristics. For example, recurrent neural networks are higher-level neurons directly connected to lower-level neurons.

Kunihiko Fukushima, a computer researcher from Japan, created an artificial neural network model based on the way vision works in the human brain. The architecture is based on two types of neurons in the human brain, called simple cells and complex cells. They exist in the primary visual cortex and are the parts of the brain that process visual information. Simple cells are responsible for detecting local features, such as edges; complex cells aggregate the results produced by simple cells in a region. For example, a simple cell may detect the edge of a chair, and the complex cell aggregates the information to produce the result, notifying the next higher-level simple cell, so that step-by-step recognition gets the complete result.

Figure 05: how the deep neural network recognizes the structure of the object (TensorFlow) CNN is based on the cascade model of these two types of cells and is mainly used for pattern recognition tasks. It is computationally more efficient and faster than most other architectures, and has been used to beat most other algorithms in many applications, including natural language processing and image recognition. Every time we know a little more about the working mechanism of the brain, the algorithms and models of the neural network will go further!

1.3 Open Pandora's Box from 2012 to the present, the use of deep neural networks has exploded and made amazing progress. Now most of the research in the field of Machine Learning is focused on Deep Learning, which is like entering the era when Pandora's box has been opened.

Picture with 06:AI history of evolution GAN

GAN-Generative Adversarial Network is another important milestone in the field of Deep Learning, which was born in 2014. It can help neural networks learn with less data, generate more composite images, and then use it to identify and create better neural networks. Ian Goodfellow, the creator of GANs, came up with the idea in a bar in Montreal, where two neural networks play cat-and-mouse games, one creating false images that look like real images, while the other determines whether they are real.

With 07:GANs to simulate the evolution of production portraits, GANs will help create images as well as real-world software simulations. Nvidia uses this technology extensively to enhance his reality simulation system, where developers can train and test other types of software. Instead of compressing data directly, you can use one neural network to "compress" images and another to generate raw videos or images. Demis Hassabis mentioned in one of his papers that the memory playback of the human brain's "hippocampus" is a similar mechanism.

Large scale neural network

The way the brain works certainly doesn't depend on someone programming with rules.

Geoffrey Hinton

The competition for large-scale neural networks began with Google Brain, which was founded in 2011, and now belongs to Google Research. They promoted the development of the TensorFlow language, put forward the technical scheme of the universal model Transformer and developed BERT on the basis of it, which we will discuss in detail in Chapter 4.

DeepMind is one of the legends of this era, acquired by Google for $525 million in 2014. It focuses on game algorithms, its mission is to "solve intelligent problems", and then use this intelligence to "solve all other problems"! DeepMind's team developed a new algorithm, Deep Q-Network (DQN), which can be learned from experience. In October 2015, the AlphaGo project beat human champion Lee se-dol in go for the first time; then AlphaGo Zero made it impossible for humans to make a comeback in the field of go with a new self-gaming algorithm.

Another legendary OpenAI, a research institution co-founded by Elon Musk, Sam Altman, Peter Thiel and Reid Hoffman in 2015, is a major competitor to DeepMind. OpenAI's mission is General artificial Intelligence (AGI-Artificial General Intelligence), a system that is highly autonomous and surpasses humans in most economically valuable jobs. GPT-3, launched in 2020, is one of the best natural language generation tools (NLP-Natural Language Processing). Through its API, it can achieve simultaneous natural language translation, dialogue, copywriting, and even code (Codex), as well as the most popular image generation (DALL E).

Gartner AI HypeCycle

Gartner's Technology hype cycle (HypeCycle) is worth a look. This is their latest 2022 estimate of the maturity of various technologies in the AI field, and you can quickly understand the stages of development of different technologies in this chapter on the evolution of AI.

It is illustrated with 08:Gartner AI HypeCycle 2022 neural network, a setback encountered in the 1960s, and then ushered in a new life after 2012. One of the reasons it took so long to develop back propagation is that it requires computers to perform multiplication matrix operations. In the late 1970s, the Cray-1, one of the most powerful supercomputers in the world, had a floating-point computing speed of 50 MFLOP per second. Now the unit for measuring the computing power of GPU is the TFLOP (Trillion FLOPs). The latest GPU Nvidia Volta used by Nvidia in data centers can achieve a performance of 125 TFLOP. The speed of a single chip is 2.5 million times more powerful than the fastest computer in the world 50 years ago. The progress of technology is multi-dimensional, and some untimely theories or methods can merge into huge energy when other technical conditions are achieved.

02. The rise of Software 2.0 computer languages in the future will focus more on goals than on the process of implementation by programmers.

Marvin Minsky

The concept of Software 2.0 was first proposed by Andrej Karpathy, a talented teenager who immigrated to Canada from the Czech Republic as a child, studied under Geoffrey Hinton at the University of Toronto, and then received his doctorate from the Lifei team at Stanford, focusing on NLP and computer vision, while he highlighted key figures and historical nodes who joined OpenAI,Deep Learning as a founding team member. In 2017, Elon Musk poached Tesla to take charge of autopilot research and development, and then there was a refactoring of FSD (Full Self-Driving).

According to Andrej Karpathy's definition, "Software 2.0 is generated in more abstract and unhuman-friendly languages, such as the weights of neural networks. No one is involved in writing this code, and a typical neural network may have millions of weights, so it is difficult to code directly with weights." Andrej said he had tried it before, and it was almost impossible for humans to do.

Mapping 09:Andrej Karpathy and Neural Network weight 2.1 Paradigm shift when creating a deep neural network, the programmer writes only a few lines of code to let the neural network learn by itself, calculate the weight, and form a network connection instead of handwritten code. This new paradigm of software development began with the first Machine Learning language TensorFlow, and we also call this new coding method Software 2.0. Before the rise of Deep Learning, most artificial intelligence programs were handwritten in programming languages such as Python and JavaScript. Humans write every line of code and determine all the rules of the program.

With pictures of 10:How does Machine Learning work? (TensorFlow) by contrast, with the advent of Deep Learning technology, programmers use these new ways to assign goals to programs. Such as winning a go game, or by providing appropriate input and output data, such as providing emails with "SPAM" features to the algorithm and other emails without "SPAM" features. Write a rough code skeleton (a neural network architecture), determine a searchable subset of the program space, and use the computing power we can provide to search in this space to form an effective program path. In the neural network, we limit the search to a continuous subset step by step, and the search process becomes very efficient through back propagation and random gradient descent (Stochastic Gradient Descent).

Neural network is not just another classifier, it represents the paradigm shift of our software development, it is software 2.0.

Software 1.0 people write code and compile to generate executable binaries; but in software 2.0 people provide data and neural network frameworks that are trained to compile data into binary neural networks. In most practical applications, neural network structure and training system are increasingly standardized as a commodity, so the development of most software 2.0 consists of model design and implementation and data cleaning tags. This fundamentally changes our paradigm in software development iterations, and the team is divided into two parts: 2.0 programmers are responsible for models and data. Those 1.0 programmers are responsible for maintaining and iterating the infrastructure, analysis tools, and visual interfaces that run the model and data.

Marc Andreessen's classic article title "Why Software Is Eating the World" can now be changed to: "Software (1. 0) is engulfing the world, and now artificial intelligence (2. 0) is engulfing software!"

2.2 the evolution of software from 1.0 to 2.0 has gone through an intermediate state called "data product". This happens when top software companies understand big data's business potential and begin to use Machine Learning to build data products. The following figure is from an article by Ahmad Mustapha, "The Rise of Software 2.0", which shows this transition well.

Figure 11: three states of software product evolution this intermediate state is also called big data and algorithm recommendation. In real life, such products can be Amazon product recommendations that can predict what customers will be interested in, Facebook friend recommendations, Netflix movie recommendations or Tiktok short video recommendations. What else? The routing algorithm of Waze, the ranking algorithm behind Airbnb, and so on, are dazzling.

Data products have several important features: 1, they are not the main functions of the software, usually to increase the experience, to achieve better user activity and sales goals; 2, to be able to evolve with the increase of data; 3, most of them are based on traditional ML, the most important point of data products can be explained.

But some industries are changing, and Machine Learning is the main body. This shift to the 2.0 technology stack took place when we gave up writing clear code to solve complex problems, and many areas have been advancing by leaps and bounds over the past few years. Speech recognition used to involve a lot of preprocessing, Gaussian mixture model and implicit Markov model, but today it has been almost completely replaced by neural network. As early as 1985, Fred Jelinek, a well-known information theorist and language recognition expert, had an oft-quoted joke: "every time I fire a linguist, the performance of our speech recognition system is improved."

Figure 12: in addition to the familiar image speech recognition, speech synthesis, machine translation, and game challenges, AI also sees early signs of transformation in many traditional systems. For example, The Case for Learned Index Structures replaces the core components of the data management system with neural networks, which is 70% faster than B-Trees cache optimization and saves an order of magnitude of memory.

Therefore, the paradigm of software 2.0 has these new features: 1. Deep Learning is the main body, and all the functions are built around the input and output of the neural network, such as speech recognition and autopilot. 2. Interpretability is not important. A good big data recommendation advertisement can tell the customer the reason for seeing the advertisement, but you can't find the rules in the neural network, at least not for now. 3. High R & D investment and low development investment. Now a large number of successes come from the research departments of universities and technology companies, and there are definitely more papers than applications.

2.3 advantages of Software 2.0 Why should we tend to migrate complex programs to Software 2.0? Andrej Karpathy gives a simple answer in "Software 2.0": they perform better in practice!

Easily written into the chip

Because the instruction set of neural networks is relatively small, mainly matrix multiplication (Matrix Multiplication) and threshold judgment (Thresholding at Zero), it is much easier to write them to chips, such as using custom ASIC, neural morphology chips, and so on (Alan Turing considered this when designing ACE). For example, small and inexpensive chips can have a pre-trained convolution network that can recognize speech, synthesize audio, and process visual signals. When we are surrounded by low-energy intelligence, the world will be very different (for better or worse).

Very agile.

Agile development means flexibility and efficiency. If you have a piece of C++ code and someone wants you to double its speed, you need to systematically tune or even rewrite it. However, in Software 2.0, we delete half of the channels in the network, retrain, and then we can. Its running speed is exactly twice as fast, but the output is worse, which is like magic. On the contrary, if you have more data or computing power, your program will work better by adding more channels and retraining.

Modules can be integrated into an optimal whole.

Students who have done software development know that program modules usually use common functions, API or remote calls to communicate. However, if we allow two software 2.0 modules that were originally trained separately to interact, we can easily achieve this through back propagation as a whole. Imagine that it would be amazing if your browser could automatically integrate and improve low-level system instructions to improve web page loading efficiency. But in Software 2.0, this is the default behavior.

It does better than you.

Last but not least, neural networks are better than any valuable vertical code you can think of, at least in anything related to image, video, sound, or voice, at least better than the code you write.

2.4 Bug 2.0 for traditional software, that is, software 1.0, most programs are saved in source code, which can range from thousands of lines to hundreds of millions of lines. It is said that Google's entire code base has about 2 billion lines of code. No matter how much code there is, traditional software engineering practices have shown that using encapsulation and modular design helps to create maintainable code, and it is easy to isolate Bug for modification.

But in the new paradigm, programs are stored in memory, and programmers write very little code as the weight of neural network architecture. Software 2.0 brings two new problems: unexplained and data contamination.

Because the weights of the trained neural networks are not understood by engineers (but there has been a lot of progress in understanding neural networks, which will be discussed in Chapter 6), we cannot know why the correct implementation is. What is the reason for the mistake? This is very different from big data's algorithm, although most applications only care about results and do not need to explain, but for some safety-sensitive areas, such as autopilot and medical applications, this is really important.

In the 2.0 stack, the data determines the connection of the neural network, so incorrect data sets and tags will confuse the neural network. The wrong data may come from errors, artificial designs, or targeted confusing data (which is also a new issue of program ethics in the field of artificial intelligence). For example, the automatic spelling function of the iOS system is contaminated by unexpected data training, and we will never get correct results when entering certain characters. The training model will consider contaminated data as an important correction, and once the training deployment is completed, the error spreads like a virus, reaching millions of iPhone phones. So in this version 2. 0 Bug, the data and program results need to be well tested to ensure that these marginal cases do not cause the program to fail.

In the short term, software 2.0 will become more and more common, and those problems that cannot be expressed through clear algorithms and software logic will turn to the new paradigm of 2.0, and the real world is not suitable for neat encapsulation. As Minsky said, software development should pay more attention to goals than processes. This paradigm has the opportunity to subvert the entire development ecology. Software 1.0 will serve software 2.0 peripheral systems and work together to build intelligence-oriented architectures. It is increasingly clear that when we develop general artificial intelligence (AGI), it must be written in software 2. 0.

03. Intelligence-oriented architecture reviewing the magnificent development of Deep Learning in the field of artificial intelligence over the past decade, we focus all our attention on the breakthrough of algorithms, the innovation of training models and the magical performance of intelligent applications. Of course, these are understandable, but the infrastructure of intelligent systems is rarely mentioned.

Just as in the early days of computer development, people needed experts in assembly languages, compilers, and operating systems to develop a simple application, so today you need a lot of data and distributed systems to deploy artificial intelligence on a large scale. Economic masters Andrew McAfee and Erik Brynjolfsson ridiculed sarcastically in their book Machine, Platform, Crowd: Harnessing Our Digital Future: "our era of machine intelligence is still human-driven."

Fortunately, the emergence of GANs has greatly reduced the cost of training, which relies entirely on human data, and Google AI is making continuous efforts to make AI's infrastructure civilian. But all this is still very early, and we need a new intelligent infrastructure to turn crowdsourced data into crowdsourced intelligence, unleashing the potential of artificial intelligence from expensive scientific institutions and a few elite organizations to make it engineering.

3.1The development of Infrastructure 3.0 applications and infrastructure is synchronized.

Infrastructure 1.0-C / S (client / server era)

The commercial Internet matured in the late 1990s, thanks to x86 instruction sets (Intel), standardized operating systems (Microsoft), relational databases (Oracle), Ethernet (Cisco) and network data storage (EMC). Amazon,eBay,Yahoo, and even the earliest Google and Facebook, are based on what we call Infrastructure 1.0.

Infrastructure 2.0-Cloud (Cloud Age)

Amazon AWS, Google Cloud and Microsoft Azure define a new type of infrastructure that is sustainable, scalable and programmable without physical deployment. Some of them are open source, such as Linux, MySQL, Docker, Kubernetes, Hadoop, Spark, etc., but most of them cost money, such as edge computing service Cloudflare, database service MangoDB, message service Twilio, payment service Stripe, etc. Taken together, all this defines the era of cloud computing.

In the final analysis, this generation of technology extends the Internet to billions of end users and effectively stores the information obtained from them. The innovation of Infrastructure 2.0 has catalyzed the rapid growth of data, combined with rapid advances in computing power and algorithms, setting the stage for today's Machine Learning era.

The question that Infrastructure 2.0 focuses on is-"how do we connect the world?" Today's technology redefines the question-"how do we understand the world?" This difference is like the difference between connectivity and cognition, which is known first and then understood. The various services in the 2.0 architecture are feeding data to the new architecture, which is like crowdsourcing in a broad sense; training algorithms infer logic (neural networks) from the data, and this logic is then used to understand and predict the world. This new architecture, which collects and processes data, trains models and finally deploys applications, is Infrastructure 3.0-an intelligence-oriented architecture. In fact, our brains work in the same way, which I will explain in detail in Chapter 6.

13:Hidden technical debt in Machine Learning Systems in the real world Machine Learning system, only a small part is composed of ML code, as shown in the middle of the small black box, its surrounding infrastructure is huge and complicated. A "smart" application is very data-intensive and computationally expensive. These characteristics make it difficult for ML to adapt to the general von Neumann computing paradigm, which has been developed for more than 70 years. In order for Machine Learning to give full play to its potential, it must go out of today's academic hall and become an engineering discipline. This actually means that new abstract architectures, interfaces, systems, and tools are needed to enable developers to easily develop and deploy these smart applications.

3.2 how to assemble intelligence to successfully build and deploy artificial intelligence requires a complex process, which involves multiple independent systems. First, the data needs to be collected, cleaned, and tagged; then, the characteristics on which the prediction is based must be determined; finally, the developer must train the model, validate it, and continuously optimize it. From start to finish, the process can now take months or years, even by the leading companies or research institutions in the industry.

Fortunately, in addition to the algorithm and model itself, the efficiency of assembling every step of the intelligent architecture is improving, higher computing power and distributed computing framework, faster network and more powerful tools. At each layer of the technology stack, we begin to see the emergence of new platforms and tools that are optimized for the Machine Learning paradigm, which is full of opportunities.

Drawing 14:Intelligence Infrastructure from Determined AI makes a simple description of the technology stack with reference to the classification of Amplify Partners, an investment expert in the field of intelligent architecture.

High performance chips optimized for Machine Learning, with built-in multiple computing cores and high bandwidth memory (HBM), can be highly parallelized and perform fast matrix multiplication and floating point mathematical neural network calculations, such as Nvidia's H100 Tensor Core GPU and Google's TPU.

The system software, which can make full use of the hardware efficiency, can compile the calculation to the transistor level. CUDA, launched by Nvidia in 2006, has maintained its leading position until now. CUDA is a software layer that can directly access the virtual instruction set of GPU to perform kernel-level parallel computing.

A distributed computing framework (Distributed Computing Frameworks) for training and reasoning, which can effectively extend the training operation of the model across multiple nodes

Data and metadata management systems, designed to create, manage, train and predict data, provide a reliable, unified and reusable management channel.

Extremely low latency service infrastructure that enables machines to quickly perform intelligent operations based on real-time data and context

Machine Learning continuous integration platform (MLOps), model interpreter, quality assurance and visual testing tools that can monitor, debug, optimize models and applications on a large scale

The terminal platform (End to End ML Platform) which encapsulates the whole Machine Learning workflow abstracts the complexity of the whole process and is easy to use. Almost all 2.0 architecture companies with a large amount of user data have their own internal 3.0architecture integration system, Uber's Michelangelo platform is used to train travel and reservation data, Google's TFX is a terminal ML platform for the public, and there are many startups in this field, such as Determined AI.

Overall, Infrastructure 3.0 will unleash the potential of AI / ML and contribute to the construction of human intelligent systems. Like the previous two generations of architectures, although the previous generation of infrastructure giants have already entered the market, with each paradigm shift, there will be new projects, platforms, and companies that challenge the current incumbents.

The key moment when Deep Learning, the pioneer of smart architecture, was chosen by big technology companies was in 2010. At a Japanese dinner at Palo Alto, Stanford University professor Andrew Ng met with Google's CEO Larry Page and Sebastian Thrun, the talented computer scientist who was then in charge of Google X. Just two years ago, Andrew wrote a paper on applying GPU to the effectiveness analysis of DL models. You know, DL was very unpopular in 2008, when it was the world of algorithms.

At about the same time, Nvidia's CEO Jensen Huang also realized the importance of GPU to DL. He put it this way: "Deep Learning is like the brain. Although its effectiveness is unreasonable, you can teach it to do anything. There is a huge obstacle, it requires a lot of computing, and we do GPU, which is a near-ideal computing tool for Deep Learning."

The details of the above story come from an in-depth report by Forbes in 2016. Since then, Nvidia and Google have embarked on the smart architecture path of Deep Learning, one starting from the terminal's GPU and the other from the cloud's TPU.

Comparison with 15:Nvidia AI vs Google AI most of the money Nvidia makes today comes from the game industry. AMD and many startups are doing things like selling GPU, selling acceleration chips, but Nvidia's ability on the software stack is unmatched by these hardware companies, because it has CUDA, which controls everything from kernel to algorithm, and allows thousands of chips to work together. This overall control allows Nvidia to develop cloud computing services, autopilot hardware and embedded intelligent robot hardware, as well as higher-level AI intelligent applications and Omniverse digital analog world.

Google embraced AI in such an academic way that they first founded Google Brain to try large-scale neural network training, which exploded the technology tree in this field. Inspirational ideas like GANs also came from Google (Ian Goodfellow worked at Google Brain at the time). Google launched TensorFlow and TPU (Tensor Processing Unit-Tensor Chip) before and after 2015, and acquired DeepMind in the same year to expand its research strength. Google AI prefers to provide the public with the computing power and full-process tools of AI / ML in the cloud, and then integrate intelligence into its product line through investments and acquisitions.

Now almost all the tech giants are improving their "smart" infrastructure. Microsoft invested $1 billion in OpenAI in 2019 to become their largest institutional shareholder. Facebook has also set up an AI research team, second only to their Reality Lab. They are involved in everything needed in Metaverse and related to "intelligence". They also reached a partnership with AMD at the end of this year to invest $20 billion and use their chips to build a new "smart" data center. Then there is Tesla, which has built the world's largest supercomputer, the Dojo, in addition to building trams, which will be used to train FSD's neural networks and prepare the brains of the future Optimus (Tesla humanoid robot).

Just as the past two decades have witnessed the emergence of the "cloud computing technology stack", in the next few years, we look forward to a huge ecosystem of infrastructure and tools built around the smart architecture-Infrastructure 3.0. Google is currently at the forefront of this field, trying to rewrite most of their code using the software 2.0 paradigm and running in a new intelligent architecture, because a potentially unified "model" has emerged, although it is still very early, but machine intelligence will soon have a consistent understanding of the world, just as our cerebral cortex understands the world.

Imagine you go to a hardware store and see a new hammer on the shelf. You may have heard of this hammer, it is faster and more accurate than other hammers, and in the past few years, many other hammers have looked out of date in front of it. All you have to do is add an accessory and twist it, and it becomes a saw, and it is as fast and accurate as any other saw. In fact, leading experts in the tool field say the hammer may indicate that all tools will be concentrated in a single device.

A similar story plays out in AI's tools, this versatile new hammer is a neural network we call Transformer (translator model-not the Transformers in cartoons), which was originally designed to deal with natural language, but has recently begun to affect other areas of the AI industry.

The birth of Transformer in 2017 Google Brain and researchers at the University of Toronto published a paper called Attention Is All You Need, which mentions a model of natural language processing (NLP), Transformer, which is probably the most important invention in the Deep Learning field since GANs. In 2018, Google implemented and opened source the first natural language processing model, BERT;, based on Transformer. Although the research results came from Google, it was quickly adopted by OpenAI, creating GPT-1 and the most recently popular GPT-3. Other companies and open source project teams have followed, implementing their own Transformer models, such as Cohere,AI21,Eleuther (projects dedicated to keeping AI open source), and innovations in other areas, such as Dall-E 2, MidJourney, Stable Diffusion, Disco Diffusion, Imagen, and many others that generate images.

Figure 16: of the eight students who published the "Attention Is All You Need" paper, six have started a company, four of which are related to artificial intelligence, and the other has started a blockchain project called Near.ai.

The subject of natural language processing was clearly defined when the discipline of AI was founded in the 1950s, but its accuracy and rationality of expression were greatly improved only with Deep Learning. Sequence Transmission Model (Seq2Seq) is a DL model used in the field of NLP. It has achieved great success in machine translation, text summarization and image subtitles. After 2016, Google has been used in search prompts, machine translation and other projects. A sequence transmission model is a model that receives and encodes items one after another at the input (which can be words, letters, image features, or any data that can be read by a computer) and decodes the output items one after another at the output.

In the case of machine translation, the input sequence is a series of words, and the complex matrix mathematical calculation in the trained neural network results in a series of translated target words at the output.

Transformer is also a sequence conduction model for NLP. This paper describes this new network structure succinctly and clearly. It is only based on attention mechanism (Attention) and does not need recursion (RNN) and convolution (CNN) at all. Two machine translation experiments show that this model is better in quality, easier to parallelize, and requires much less training time.

Curious students, if you want to understand the specific working principle of the Transformer model, it is recommended to read this article "How Transformers Work" by Giuliano Giacaglia.

4.2 Foundation Models researchers at CRFM & HAI at Stanford University called Transformer Foundation Models in a paper called "On the Opportunities and Risks of Foundation Models" in August 2021, which they believe has driven a new round of paradigm shift in the AI field. In fact, 70 per cent of the papers on AI published on arVix over the past two years mentioned Transformer, a fundamental shift from a 2017 IEEE study that concluded that RNN and CNN were the most popular models at the time.

From NLP to Generative AI

Maithra Raghu, a computer scientist from Google Brain, analyzed the vision converter (Vision Transformer) to determine how it "sees" the image. Unlike CNN, Transformer can capture the entire image from the beginning, while CNN first focuses on small parts to find details such as edges or colors.

This difference is easier to understand in the language domain, where Transformer was born in the NLP domain. For example, the owl found a squirrel. It tried to catch it, but only caught the end of its tail. " The structure of the second sentence is puzzling: what does "it" mean? It would be confusing if CNN would only pay attention to the words around "it", but if you connect each word with others, you will find that "the owl caught the squirrel and the squirrel lost part of its tail." This kind of relevance is the "Attention" mechanism, which is used by human beings to understand the world.

Transformer's versatility in converting data from one-dimensional strings (such as sentences) to two-dimensional arrays (such as images) shows that this model can handle many other types of data. Just a decade ago, there was little to communicate between the different branches of the AI field, as computer scientist Atlas Wang put it. "I think Transformer is so popular because it implies a potential to become universal, which may be an important step towards the great fusion of some kind of neural network structure, which is a general method of computer vision and may also be applicable to other machine intelligent tasks."

For more Generative AI cases based on the Transformer model, I recommend my friend Rokey's article "Wizards and spells in the AI era", which should be the most detailed and clearly written article on the Chinese Internet.

Emergence and homogenization

The meaning of Foundation Models can be summarized in two words: emergence and homogenization. Emergence is unknown and unpredictable, and it is the source of innovation and scientific discovery. Homogenization means that in a wide range of applications, the methodology for building Machine Learning has been integrated; it allows you to do different things in a unified way, but it also creates a single point of failure. The data contamination we mentioned in the Bug 2.0 section will be magnified quickly and now spread to all areas.

Figure 18: the emergence of artificial intelligence (from a paper by Stanford researchers in August 2021) the evolutionary history of AI is a process of continuous emergence and homogenization. With the introduction of ML, we can learn from examples (algorithmic probability inference); with the introduction of DL, advanced features for prediction appear; with the emergence of the basic model (Foundation Models), there are even more advanced functions to learn in context. At the same time, ML homogenizes the algorithm (such as RNN), DL homogenizes the model architecture (such as CNN), and the underlying model homogenizes the model itself (such as GPT-3).

If a basic model can centralize data from various schemas. Then this model can be widely adapted to a variety of tasks.

In addition to the transformation of the graphic 19:Foundation Model (from a paper by Stanford researchers in August 2021), in addition to the familiar fields of translation, text creation, image generation, speech synthesis, and video generation, the basic model is also used in professional areas.

DeepMind's AlphaFold 2 successfully improved the accuracy of protein structure prediction to more than 90% in December 2020, much higher than all its competitors. In an article published in the journal Nature, they mentioned that reading amino acid chains, such as text strings, and using this data to convert into possible protein folding structures could accelerate drug discovery. Similar applications have taken place at drug companies, where AstraZeneca and NVIDIA have jointly developed MegaMolBART, which can be trained on an unlabeled compound database to greatly improve efficiency.

Large-scale language model

This generalization makes the training of large-scale neural networks very meaningful. Natural language is the most abundant of all trainable data, it allows the basic model to be learned in context and transformed into a variety of needed media content, natural language = programming mode = general interface.

As a result, large-scale language models (LLMs-Large Scale Language Models) have become a must for technology giants and startups. In this arms race, deep pockets are the advantages. They can spend hundreds of millions of dollars to purchase GPU to train LLMs. For example, OpenAI's GPT-3 has 175 billion parameters, DeepMind's Gopher has 280 billion parameters, Google's own GLaM and LaMDA have 1.2 trillion parameters and 137 billion parameters respectively, and Microsoft's Megatron-Turing NLG in cooperation with Nvidia has 530 billion parameters.

But one feature of AI is that it is emerging, and in most cases the challenges are scientific rather than engineering. In Machine Learning, there is still a lot of room for improvement from an algorithm and architecture perspective. While there seems to be plenty of room for incremental engineering iterations and efficiency gains, a growing number of LLMs startups are raising smaller financing ($10 million to $50 million) on the assumption that there may be a better model architecture rather than pure scalability in the future.

4.3 New opportunities for AI with the further enhancement of model size and natural language understanding (as long as the training scale and parameters are expanded), we can expect a lot of professional creations and enterprise applications to be changed or even subverted. Most of an enterprise's business is actually in the "sales language"-marketing copywriting, e-mail communication, customer service, including more professional legal advisers, which are all expressions of language, and these expressions can be transformed into sound, image, and video in two dimensions. It can also be transformed into a more realistic model for use in meta-universe. The ability of machines to understand or generate documents directly will be one of the most disruptive changes since the mobile Internet revolution and cloud computing before and after 2010. Referring to the pattern of the mobile era, we will eventually have three types of companies:

1. Platform and infrastructure

The mobile platform ends with iPhone and Android, and there is no chance after that. But the competition in basic modeling areas such as OpenAI, Google, Cohere, AI21, Stability.ai, and the companies that build LLMs is just beginning. There are also many emerging open source options such as Eleuther. In the era of cloud computing, the code sharing community Github hosts almost half of software 1.0, so communities like Hugging Face, which share neural network models, should also become the hub and talent center of wisdom in the software 2.0 era.

2. Independent applications on the platform

Because of the location, perception, camera and other hardware features of mobile devices, services like Instagram,Uber,Doordash will not exist without the phone. Now based on LLMs services or training Transformer model, there will also be a number of new applications, such as Jasper (creative copywriting), Synthesia (synthetic voice and video), which will involve Creator & Visual Tools, Sales & Marketing, Customer Support, Doctor & Lawyers, Assistants, Code, Testing, Security and other industries, which would not exist without advanced Machine Learning breakthroughs.

Red shirt Capital USA (SequoiaCap) recently published a popular article, "Generative AI: A Creative New World", which analyzed the market and application in detail, and as it was introduced at the beginning, the entire investment community began to hunt AI after the defeat of speculation in Web 3.

Figure 21: application classification on top of the model (Gen AI market map V2) 3, existing product intelligence

In the revolution of the mobile Internet, most of the valuable mobile services are still occupied by the giants of the last era. For example, when many startups try to build "Mobile CRM" apps, the winner is CRM, which has added mobile support, and Salesforce has not been replaced by mobile apps. Similarly, Gmail and Microsoft Office have not been replaced by mobile apps, and their mobile versions are doing well. Eventually, Machine Learning will be built into the CRM tool with the largest number of users, and Salesforce will not be replaced by a completely new ML-driven CRM, just as Google Workspace is fully integrating their AI results.

We are in the early stages of the intellectual revolution, and it is difficult to predict what will happen. Apps such as Uber, where you press a button on your phone and a stranger comes to pick you up, seems unusual now, but you would never have thought of such an app and interface when a smartphone first appeared. The same will be true for those native applications of artificial intelligence, so please open your head, and the most interesting application forms are still waiting for you to discover.

We have felt the power of the basic model, but can this approach really produce intelligence and consciousness? Today's artificial intelligence looks very much like a tool rather than an intelligent agent. For example, like GPT-3 keeps learning during the training process, but once the model is trained, the various weights of its parameters are set, and new learning will not occur with the use of the model. Imagine if your brain is frozen in an instant, can process information, but will never learn anything new, is this kind of intelligence you want? This is how Transformer models work now, and if they become conscious and can learn dynamically, just as neurons in the brain are making new connections all the time, then their more advanced forms may represent a new kind of intelligence. We'll talk about this in Chapter 6, and before we do that, let's take a look at how AI survives in the real world.

05. Real-world AI used to worry about unmanned elevators very similar to the worries we hear about driverless cars today.

Garry Kasparov

AI (Real World AI) in the real world, according to the definition of Elon Musk, is "AI that imitates human beings to perceive and understand the world around them". They are intelligent machines that can coexist with the human world. Most of the problems we mentioned in the previous four chapters of this article to solve with AI are that you enter data or propose goals, and then AI gives you the results or achieve the goals, rarely involving interaction with the real world environment. In the real world, collecting large amounts of data is extremely difficult unless, like Tesla, there are millions of streetcars with cameras connected in real time to collect data for you; second, perception, planning and action should involve a combination of neural networks and intelligent algorithms, just as the brain controls human behavior, which is also an extreme challenge to research and development and engineering. However, after the birth of the Transformer model, AI, which can conquer the real world, has made new progress.

Just a few weeks ago, Ford's Argo AI announced the closure of the new frontier of autopilot, casting a shadow over the controversial field of autopilot. At present, no company that makes autopilot programs is really profitable, except for Comma.ai founded by the legendary George Hotz, a software engineer and senior hacker who did not pry Elon Musk at that time.

The choice of technical route

A self-driving car is actually a robot that needs to solve both hardware and software problems. It needs to use a camera, radar or other hardware to sense the surrounding environment, while the software plans the route based on knowing the environment and physical location, and finally allows the vehicle to reach its destination.

At present, there are two main schools of autopilot: pure vision system and lidar-based system. Google's Waymo is the forerunner of the lidar scheme, and so is the newly bankrupt Argo AI. In fact, most of them are of this school, because the advantage is obvious. Lidar can accurately identify the three-dimensional world and can easily hit the road without too complex neural network training, but the cost of high-power lidar is a big problem. The only alternative companies that use pure vision solutions are Tesla and Comma, which rely entirely on cameras and software without any auxiliary sensing hardware.

Another problem with lidar is that the world in its eyes has no color or texture, and it has to work with a camera to depict the real world. But the mixing of the two kinds of data will make the algorithm extremely complex, so Tesla completely abandoned lidar, or even ultrasonic radar, and cost saving is a very important reason. Another reason is that roads in the real world are designed for human driving, and people can accomplish this task only by vision. Why not artificial intelligence? This reason is very Elon Musk style, only need to increase the research and development investment in the neural network.

Waymo and Tesla are leaders in the field of self-driving. Mike Ramsey, vice president of Gartner, commented: "if the goal is to provide self-driving assistance to the public, then Tesla is very close; if the goal is to enable vehicles to drive safely, then Waymo is winning." The Waymo is the Level 4, which can drive itself under limited geographical conditions without driver supervision, but the technology that drives it is not ready for use in the mass market outside the testing field, and it is expensive. Since 2015, it has taken Tesla more than six years to catch up with Waymo's current test data, while there is less and less hardware for autopilot and lower costs. Tesla's strategy is interesting: "autopilot needs to adapt to any road and make the car think like a human", and if successful, it will be much more scalable.

Let the car see and think.

Tesla's bet on AI began with the addition of Andrej Karpathy in 2017, and a soul can really change an industry. The AI team led by Andrej completely reconstructed the original autopilot technology and used the latest neural network model Transformer to train the vision-based autopilot system FSD Beta 10. In the 2021 AI Day, the Tesla AI team also shared these latest research and development results without reservation, in order to recruit more talents to join.

To make cars think like people, Tesla simulates the way the human brain processes visual information, a complex process made up of a variety of neural networks and logical algorithms.

The autopilot steps of the 22:The Architecture of Tesla AutoPilotFSD are as follows:

1. Visual image collection: through 6 1280x960 resolution cameras in the car, capture 12bit dark video, identify various objects in the environment and Triggers (road conditions)

two。 Vector space generation: the world seen by human beings is a three-dimensional world built and restored by the brain according to perceptual data in real time. Tesla uses the same mechanism to project all the information of the world around the car into four-dimensional vector space, and then makes a dynamic aerial view of BEV, which allows the car to exercise and predict in three-dimensional space, so that it can be accurately controlled. HydraNets, which was based on the Transformer model before 2021, has been upgraded to the latest Occupancy Networks, which can more accurately identify the occupancy of objects in 3D space.

3. Neural network route planning: using Monte Carlo algorithm (mcts) under the guidance of neural network calculation, quickly complete their own path search planning, and the algorithm can also plan all moving targets, and can change the plan in time. Isn't it human thinking to make your own decisions based on the reactions of others?

The ability of Tesla FSD to perceive and make decisions so quickly depends on the neural network training of the supercomputer Tesla Dojo, which is similar to OpenAI and Google training LLMs, except that the data does not come from the Internet, but every Tesla car running on the road provides real 3D training data for Dojo through Shadow Mode.

Perhaps it is inevitable that nature has chosen the eye as the most important organ for obtaining information. One theory suggests that the Cambrian species explosion 530 million years ago was partly due to the ability to see the world, allowing new species to move and navigate in a rapidly changing environment, plan their actions and interact with the environment first. the probability of survival has been greatly improved. By the same token, if the machine can see it, will it also cause this new species to explode?

5.2 it's not a robot, it's an intelligent agent. Not all robots have the intelligence to perceive the real world. For a robot carrying goods in a warehouse, they don't need a lot of Deep Learning because the environment is known and predictable, and the same is true of most self-driving cars used in specific environments. Like the amazing Boston Dynamic robot dance, they have the best robot control technology in the world, but to do those arranged movements, just write the rules in a program. Many viewers will think that the slow movement of the robot Tesla Optimus released by Tesla in September is not comparable to that of Boston Dynamic, but it is more important to have a good robot brain and mass-production design.

The core of the interaction between autopilot and the real world is safety, avoid collisions, but the core of AI-powered robots is to interact with the real world, understand voice, grasp objects to avoid, and complete human instructions. The FSD technology that drives Tesla cars will also be used to drive Tesla Optimus robots, which have the same heart (FSD Computer) and the same brain (Tesla Dojo). But training robots is even more difficult than training autopilot. After all, without millions of Optimus already in use to help you collect data from the real world, the virtual world in the Metaverse concept can show off.

Simulated reality in the Virtual World

Building new basic models for robots to perceive the world will require a large number of data sets across different environments. Those virtual environments, robot interactions, human videos, and natural languages can all be useful data sources for these models. There is a special classification of EAI (Embodied artificial intelligence) for intelligent agents that use these data to train in virtual environments. At this point, Li Feifei is once again at the forefront, her team released a standardized simulation data set BEHAVIOR, containing 100 common humanoid actions, such as picking up toys, wiping tables, cleaning the floor, etc., EAI can be tested in any virtual world, hoping that the project will make an outstanding academic contribution to the field of artificial intelligence training data like ImageNet.

Naturally, Meta and Nvidia cannot be absent from doing simulation in the virtual world. Dhruv Batra, a computer scientist at the Georgia Institute of Technology and director of the Meta AI team, has created a virtual world called AI Habitat (AI Habitat), which aims to speed up the simulation. Here, the intelligent agent only needs to hang up for 20 minutes to learn 20 years of simulation experience. This is really a minute in meta-universe and a year in the world. In addition to providing computing modules for robots, Nvidia is an extensible robot simulator and synthetic data generation tool supported by the Omniverse platform. It provides a realistic virtual environment and physical engine for developing, testing and managing intelligent agents.

Robots are essentially concrete intelligent agents, and many researchers have found that training in the virtual world is cheap and beneficial. As more and more companies are involved in this field, the demand for data and training will increase, and there is bound to be a new basic model for EAI, which has great potential.

Amazon Prime's latest sci-fi series The Peripheral, adapted from William Gibson's 2014 novel of the same name, allows the heroine to enter the future intelligent agent through the brain-computer interface. It has always been thought that Metaverse is used by human beings to escape the real world, but for robots, spiritual practice in Metaverse is used to conquer the real world.

ARK Invest mentioned in their Big Ideas 2022 report that according to Wright's law, the production cost of AI relative to RCU-AI Relative Compute Unit can be reduced by 39% a year, and software improvements can contribute an additional 37% of the cost reduction over the next eight years. In other words, the integration of hardware and software could reduce the cost of artificial intelligence training by 60% a year by 2030.

The market size of 26:AI in 2030 is $87 trillion. The market capitalization of AI hardware and software companies can grow at an annualized rate of about 50 per cent, from $2.5 trillion in 2021 to $87 trillion in 2030.

By automating the tasks of knowledge workers, AI should be able to increase productivity and significantly reduce unit labor costs, as can be seen from the explosion of the use of generative AI; but real-world AI still has a long way to go to significantly reduce the cost of manual labor. We thought AI would put manual workers out of work, but we didn't know they had the potential to lay off mental workers first.

06, AI evolution of the future science fiction novelist Arthur Clarke said: "any advanced technology is no different from magic!" Back in the 19th century, it would be unthinkable to imagine cars driving at more than 100 kilometers per hour on highways or making video calls with people on the other side of the world on their mobile phones. Since Dartmouth Workshop pioneered the field of artificial intelligence in 1956, we have taken a big step forward in the dream of our ancestors to enable AI to accomplish intellectual tasks better than human beings. Although some people think that this may never happen, or in the very distant future, the new model will bring us closer to the truth about how the brain works. A comprehensive understanding of the brain is the future of AI generalization (AGI).

6.1 Perspective Neural Network scientists have found that when different neural networks are used to train the same data set, there are the same neurons in these networks. As a result, they put forward a hypothesis: there are universal characteristics in different networks. That is, if different architectures of neural networks train the same data set, then some neurons are likely to appear in all different architectures.

It's not the only surprise. They also found that the same feature detectors exist in different neural networks. For example, curve detectors (Curve Detectors) are found in neural networks such as AlexNet, InceptionV1, VGG19 and Resnet V2-50. Not only that, they also found a more complex form of Gabor Filter, which is often found in biological neurons. They are similar to the classic "complex cells" defined by neurology. Do the neurons of our brains also exist in artificial neural networks?

The research team with 27:OpenAI Microscope ModulesOpenAI says these neural networks are understandable. Through their Microscope project, you can visualize the interior of the neural network, some representing abstract concepts such as edges or curves, while others represent features such as dog eyes or noses. Connections between different neurons also represent meaningful algorithms, such as simple logic circuits (AND, OR, XOR), which exceed advanced visual features.

Transformer in the brain

Tim Behrens and James Whittington, two neuroscientists from University College London, have helped to prove that some of the structures in our brains function mathematically similar to the Transformer model, as shown in this article "How Transformers Seem to Mimic Parts of the Brain", which shows that the Transformer model precisely replicates the working patterns observed in their hippocampus.

Last year, Martin Schrimpf, a computational neuroscientist at the Massachusetts Institute of Technology, analyzed 43 different neural network models and compared them with magnetic resonance imaging (fMRI) and cortical electroencephalography (EEG) of neuronal activity in the brain. He found that Transformer is by far the most advanced neural network that can predict almost all the changes found in imaging. Computer scientist Yujin Tang recently designed a Transformer model and consciously sent large amounts of data to it randomly and disorderly, mimicking how the human body transmits sensory data to the brain. Their Transformer model, like our brains, can successfully process disordered flows of information.

Although research is advancing by leaps and bounds, the universal model of Transformer is only a small step towards a precise model of how the brain works, which is the starting point rather than the end of exploration. Schrimpf also points out that even the best-performing Transformer models are limited and work well at organizing words and phrases, but not for large-scale language tasks such as storytelling. This is a good direction, but this field is very complex!

Jeff Hawkins is the founder of Palm Computing and Handspring and one of the inventors of PalmPilot and Treo. After starting his business, he turned to neuroscience and founded the mahogany theoretical Neuroscience Center (Redwood Center), which has since focused on how the human brain works. "A Thousand Brains" explains in detail his most important research achievement. Zhanlu Culture launched the Chinese version of "Thousand brains Intelligence" in September this year.

The neocortex (Neocortex) is the organ of intelligence. Almost all the behaviors we think of as intelligence, such as vision, language, music, mathematics, science and engineering, are created by the neocortex. Hawkins has adopted a new framework for explaining how it works, called "Thousand Brains Theory", in which your brain is organized into thousands of independent computing units called Cortical Columns. These columns process information from the outside world in the same way, and each column establishes a complete model of the world. But because each column has a different connection to the rest of the body, each column has a unique reference frame. Your brain sorts out all these models by voting. Therefore, the basic job of the brain is not to build a single thought, but to manage thousands of individual thoughts it has all the time.

We can think of a computer running a neural network trained by Transformer as an extremely crude artificial cortical column, instilling it with all kinds of data, and it outputs prediction data (see chapters 4 and 5 for understanding). But there are more than 200,000 such small computers in the neocortex in distributed computing, they connect the data input from various sensory organs, and the most important thing is that the brain does not need pre-training, and the neurons grow and complete the learning by themselves. it is equivalent to the integration of artificial supercomputers for training and computers for predictive data. Before scientists reverse engineer the brain, AGI's progress is still struggling.

Thousand-brain theory is essentially a sensory-motor theory (Sensory-Motor Theory), which explains how we learn and recognize objects by seeing, moving and perceiving three-dimensional space. In this theory, each cortical column has a model of a complete object, so it knows what should be sensed at each position of the object. If a column knows the current position of its input and how its eyes move, it can predict the new position and what it will sense there. It's like looking at a map of a town and predicting what you'll see if you start walking in a certain direction. Do you think this process is very similar to Tesla's pure visual autopilot? Perceive, model, predict and act.

Learn like the brain.

Self-supervision: the unit of calculation of the neocortex is the cortical column, each of which is a complete sensory-motor system that takes input and produces behavior. For example, the future position of an object as it moves, or the next word in a sentence, the column will predict what its next input will be. Prediction is a method for cortical columns to test and update their models. If the result is different from the prediction, the wrong answer will allow the brain to complete a correction, which is self-supervision. Now the most cutting-edge neural networks, BERT, RoBERTa and XLM-R, are realizing "self-supervision" through pre-trained systems.

Continuous learning: the brain accomplishes continuous learning through neuronal tissue. When a neuron learns a new pattern, it forms new synapses on a dendritic branch. New synapses do not affect previously learned synapses in other branches. Therefore, learning something new does not force the neuron to forget or modify what it has previously learned. Today, most artificial neurons in Al systems do not have this ability. they go through a long training and are deployed when they are finished. This is one of the reasons for their inflexibility, which requires constant adjustment to changing conditions and new knowledge.

Multi-model mechanism: the neocortex is made up of tens of thousands of cortical columns, each of which learns from the model of an object. Voting is the key to making multi-model design work. Each column operates independently to some extent, but long-distance connections in the neocortex allow each column to vote on the object it perceives. The "brain" of an intelligent machine should also be made up of many almost the same elements (models), which can then be connected to a variety of mobile sensors.

Has its own frame of reference: knowledge in the brain is stored in the frame of reference. The reference frame is also used to predict, plan, and exercise, and thinking occurs each time the brain activates a location in the reference frame and retrieves relevant knowledge. Machines need to learn a model of the world, and when we interact with them, how they change and where they are relative to each other, we need a reference framework to express this kind of information. They are the backbone of knowledge.

Why General artificial Intelligence (AGI) is needed

AI will transition from the dedicated solutions we see today to more general-purpose solutions, which will dominate in the future, according to Hawkins for two main reasons:

The first is the same reason why general-purpose computers beat dedicated computers. General-purpose computers are more effective than general-purpose computers, which leads to faster technological progress. As more and more people use the same designs, more efforts are being made to strengthen the most popular designs and the ecosystems that support them, resulting in lower costs and improved performance. This is the basic driving force of exponential growth, which shaped industry and society in the second half of the twentieth century.

The second reason why Al will be universal is that some of the most important future applications of machine intelligence will require the flexibility of general-purpose solutions. Elon Musk, for example, hopes to have robots with general-purpose intelligence to help explore Mars. These applications will need to deal with many unexpected problems and design novel solutions, which can not be done by today's dedicated Deep Learning model.

6.3 when will artificial intelligence be used? General artificial Intelligence (AGI) is the ultimate goal in the field of AI, and it should also be the ultimate evolutionary direction after the invention of machine computing. Looking back at the evolution of the machine heart for more than 60 years, we seem to have found a way to imitate the human brain. To complete this puzzle, Machine Learning needs data, math and model improvements.

The data should be the easiest to implement in the puzzle. In terms of seconds, the size of the ImageNet data set is close to the amount of data of visual signals from birth to college graduation; HN Detection, a new model created by Google, is used to understand the size of the data set of street numbers on the exterior walls of houses and buildings, which is comparable to the amount of data obtained in a lifetime. Like human beings, using less data and higher abstraction to learn is the development direction of neural networks.

The computing power can be divided into two parts: the size of the parameters of the neural network (the number and connections of neurons) and the cost per unit calculation. As can be seen in the picture below, there is still an order of magnitude difference between the size of the artificial neural network and the human brain, but they are already competitive with some mammals.

Figure 29: comparison between the size of neural networks and the size of animal and human neurons the computing power we get for every dollar we spend has been growing exponentially. Now the amount of computation used in large-scale basic models doubles every 3.5 months.

Figure 30 122 years of Moore's Law: the computing power generated per dollar some people think that due to physical limitations, computing power can not maintain this upward trend. However, past trends do not support this theory. With the passage of time, the funds and resources in this field are also increasing, and more and more talents enter the field, because of the emerging effects, better software (algorithmic models, etc.) and hardware will be developed. Moreover, the limitations of physics also limit the ability of the human brain, so AGI can be achieved.

When AI becomes smarter than humans, we call this moment the singularity. Some predict that the singularity will come as soon as 2045. Nick Bostrom and Vincent C. M ü ller surveyed hundreds of AI experts at a series of meetings in 2017. the year in which singularity (or human-level machine intelligence) will occur is as follows:

Median year of optimistic forecast (10% possibility)-2022

Median year predicted by reality (50% probability)-2040

Median year of pessimistic forecast (probability 90%)-2075

Therefore, in the eyes of AI experts, it is very likely that machines will be as smart as humans in the next 20 years.

This means that machines will do better than humans for every task; when computers surpass humans, some people think they can continue to get better. In other words, if we make machines as smart as we are, there is no reason not to believe that they can make us smarter, leading to the emergence of super intelligence in an evolving spiral of machine hearts.

From tool Evolution to Digital Life

According to the above experts' predictions, machines should be self-aware and super intelligent. By then, there will be some major changes in our concept of machine consciousness, and we will face real digital life forms (DILIs-Digital Lifeforms).

Once you have DILIs that can evolve rapidly and be self-aware, some interesting questions arise around species competition. What is the basis of cooperation and competition between DILIs and human beings? If you let a self-conscious DILIs simulate pain, are you torturing a perceptual life?

These DILIs will be able to copy and edit themselves on the server (it should be assumed that at some point most of the world's code will be written by machines that can replicate themselves), which may accelerate their evolution. Imagine that if you could create 100000000 clones of yourself at the same time, modify different aspects of yourself, and create your own functions and selection criteria, DILIs would be able to do all this (assuming there are enough numeric and energy resources). The interesting topic of DILIs is discussed in detail in Life 3.0 and Superintelligence: Paths, Dangers, Strategies.

These problems may come sooner than we expected. In his latest article, "AI Revolution," Elad Gil mentions that OpenAI, Google and AI researchers at the core of various startups all agree that it will take another five to 20 years for a real AGI to be realized, and it is possible that it will be achieved forever in five years, just like autopilot. In any case, one of the potential threats to human survival is to compete with our digital offspring.

In his famous book The Structure of Scientific Revolutions, the historian Thomas Kuhn believes that most scientific progress is based on a widely accepted theoretical framework, which he calls the scientific paradigm. Occasionally, an established paradigm is overthrown and replaced by a new paradigm-what Kuhn calls a scientific revolution. We are in the midst of the intelligent revolution of AI!

Finally, send a song "I Am AI" by AI, which is updated every year at Nvidia's GTC conference to see how AI has infiltrated into our lives from various industries.

Referenc

Letter from Alan Turing to W Ross Ashby-Alan Mathison Turing

Software 2.0-Andrej Karpathy

The Rise of Software 2.0-Ahmad Mustapha

Infrastructure 3.0: Building blocks for the AI revolution-Lenny Pruss, Amplify Partners

Will Transformers Take Over Artificial Intelligence?-Stephen Ornes

AI Revolution-Transformers and Large Language Models (LLMs)-Elad Gil

What Is a Transformer Model?-RICK MERRITT

Wizards and spells in the Age of AI-Rokey Zhang

Generative AI: A Creative New World-SONYA HUANG, PAT GRADY AND GPT-3

What Real-World AI From Tesla Could Mean-CleanTechNica

A Look at Tesla's Occupancy Networks-Think Autonomous

By Exploring Virtual Worlds, AI Learns in New Ways-Allison Whitten

Self-Taught AI Shows Similarities to How the Brain Works-Anil Ananthaswamy

How Transformers Seem to Mimic Parts of the Brain-Stephen Ornes

Attention Is All You Need-PAPER by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

On the Opportunities and Risks of Foundation Models-PAPER by CRFM & HAI of Stanford University

Making Things Think-BOOK by Giuliano Giacaglia

A Thousand Brains (Chinese version: thousand brain Intelligence)-BOOK by Jeff Hawkins

This article comes from the ID:indigo-dm of the official account of Wechat: INDIGO, author: JEDI LU

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.