Stocktaking 10 computer tools for changing Science 04/10 Update SLTechnology News&Howtos

Stocktaking 10 computer tools for changing Science

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

This article comes from the official Wechat account: ID:fanpu2019, with the original title: "Ten computer tools to change science, which one have you used?" ", compiled by: Crow Youth

From Fortran to arXiv.org, from the BLAST of biology to the AlexNet of artificial intelligence, these technological advances have changed science and the world.

In 2019, the event Horizon Telescope (EHT) took the first picture of a human black hole. The image with a bright halo is not an ordinary photo, but the data captured by the radio telescope is synthesized by algorithm, and the relevant programming code is later released. Using computer programming to synthesize images has become a more and more common mode.

Behind every major discovery in modern science, from astronomy to biology, there is a computer. However, computers cannot replace human thinking. Even the most powerful computers will be useless without software that can solve scientific problems and researchers who know how to write and use software. Today, these powerful software have penetrated into all aspects of scientific research.

Previously, Nature selected 10 software tools that had a significant impact on the scientific community. Which one have you used or are using?

Image source: Pawel Jo desktop ca / Nature1, programming language pioneer: Fortran compiler (1957) the first modern computer was not easy to operate. At that time, programming really had to be "made up" manually, and researchers needed wires to connect rows of circuits. Later, with the advent of machine language and assembly language, users were able to write computer programs in code, but only if they had an in-depth understanding of computer architecture, which is beyond the reach of many scientists.

In the 1950s, with the gradual development of symbolic language, this situation began to change. In particular, the "formula translation" language Fortran developed by John Backus, an engineer at IBM, appeared. With the Fortran language, users can write computer programs using instructions that people can read, such as xspeak 3 instructions 5. The compiler converts these instructions into fast and efficient machine code.

The CDC 3600 computer, delivered to the National Center for Atmospheric Research in 1963, was programmed in Fortran. | University Corporation for Atmospheric Research / Science Photo Library however, even after the invention of Fortran, programming is still not an easy task. There were no keyboards and screens, and programmers had to record code on punch cards, and a complex simulation might require tens of thousands of punch cards. Even so, Fortran makes programming less remote, and many non-computer scientists can write their own code to solve scientific problems in their field.

Today, Fortran has gone through more than 60 years, and it is still widely used in climate modeling, fluid dynamics, computational chemistry and many other fields. Because of the advantages of fast running and small memory footprint, Fortran can be seen in any discipline that involves complex linear algebra and requires powerful computers to process numbers quickly, and those ancient codes are still active in laboratories and supercomputers around the world.

Signal processor: fast Fourier transform (1965) when radio astronomers patrol the sky, they capture complex signals that change over time. To understand the nature of these radio waves, they need to see how the signal changes as a function of frequency. Fourier transform can convert the signal from a function of time to a function of frequency. The problem is that the Fourier transform is not efficient enough, and N2 operations are needed for datasets of size N.

In 1965, American mathematicians James Cooley and John Tukey came up with a method of fast Fourier transform (FFT) to speed up the process. FFT uses the "divide and conquer" strategy of recursion to make a function call itself repeatedly, thus simplifying the problem of calculating Fourier transform to N log2 (N) step. The greater the N, the more obvious the speed increase. For 1000 data, the speed increase is about 100 times; for 1 million data, the speed increase can be about 50, 000 times.

The Murchison wide Field Array (Murchison Widefield Array) is a radio telescope in western Australia that uses fast Fourier transforms to process data. | John Goldsmith / Celestial Visions in fact, German mathematician Gauss invented FFT in 1805, but never published it. Cooley and Tukey rediscovered this method and opened the application of FFT in digital signal processing, image analysis, structural biology and other fields. In the eyes of many people, this is one of the greatest inventions in the field of applied mathematics and engineering.

Paul Adams of the Lawrence Berkeley National Laboratory recalls that when he analyzed the structure of the bacterial protein GroEL in 1995, even using FFT and supercomputers, the calculation took several days. "without FFT, it's hard to imagine how long it would take to make it."

Molecular cataloging: biological databases (1965) Today, databases have become such an indispensable part of scientific research that it is easy to overlook the fact that databases are software-driven. In the past few decades, the scale of the database has expanded rapidly, affecting many fields, but perhaps none has changed as much as biology.

Today's vast database of genomes and proteins originated from the work of bioinformatics pioneer Margaret Dehof (Margaret Dayhoff). In the early 1960s, while biologists were trying to sort out the amino acid sequences of proteins, Dyhoff began to collate this information, looking for clues to the evolutionary relationship between different species. In 1966, she and her collaborators published a paper, protein sequence and structure Map (Atlas of Protein Sequence and Structure), describing the sequence, structure and similarity of 65 proteins known at that time, and cataloging the data into punch cards, making it possible to retrieve and expand the database.

The digital biological database (Biological database) followed. The protein database (PDB) was put into use in 1971 and now records in detail more than 170000 macromolecular structures. In 1982, the National Institutes of Health (NIH) released the Genome Bank (GenBank) database to document DNA and the proteins it encodes.

These resources soon showed their value. In 1983, two independent teams noticed that a specific growth factor in the human body was very similar in sequence to the viral protein that causes cancer in monkeys. The findings reveal that a viral carcinogenic mechanism is to induce uncontrolled cell growth by imitating growth factors.

Because of this discovery, many biologists who were not interested in computers and statistics suddenly realized that they could understand something about cancer through sequence alignment. The researchers are also inspired: in addition to designing experiments to test specific hypotheses, they can also mine open databases to find connections that people have never thought of.

This power increases dramatically when different databases are linked together. For example, a joint search engine called Entrez can help researchers travel freely between DNA, proteins and literature.

The protein database has a profile of more than 170000 molecular structures, including the bacterial expression body (expressome) in the picture. | David S. Goodsell and RCSB PDB4, Weather Forecast: general Circulation Model (1969) at the end of World War II, computer pioneer von Neumann began to turn the computer used to calculate trajectory and weapon design a few years ago to weather prediction. Before that, people could only make weather forecasts based on experience and intuition, while Feng Neumann's team tried to predict the weather through numerical calculations based on the laws of physics.

In fact, scientists have been familiar with the relevant mathematical equations for many years, but early meteorologists are still at a loss to solve practical problems, because the weather is so unpredictable that it is far from the computing power of mathematicians. In 1922, British physicist Lewis Fry Richardson first published the work of using mathematical models to predict weather. To predict future weather, you need to input current atmospheric conditions, calculate how they will change in a short period of time, and repeat them over and over again-a very time-consuming process. It took him months to predict what the weather would be like in the next few hours, and it was so unreliable that it was "impossible to predict under any known land conditions".

The advent of computers makes this mathematical application really feasible. In the late 1940s, Feng Neumann formed a weather forecasting team, and in 1955, a second team, the Geophysical fluid Dynamics Laboratory (Geophysical Fluid Dynamics Laboratory,GFDL), also began to model the climate, and then they made the first successfully predicted atmospheric circulation model (General circulation model,GCM). By 1969, they had successfully combined atmospheric and ocean models.

At that time, the GCM model was relatively rough, covering only 1/6 of the earth's surface, divided into 500km2 squares, and the atmosphere divided into only nine layers. Today's weather model will divide the earth's surface into squares of 25 × 25 kilometers and divide the atmosphere into dozens of levels. Even so, the model created a milestone in scientific computing, testing the impact of rising carbon dioxide levels on the climate for the first time.

5. The basis of scientific computing: BLAS (1979) Scientific computing often involves relatively simple mathematical operations such as vectors and matrices, but before the 1970s, there was not a set of generally accepted computing tools to perform these operations. Therefore, programmers engaged in scientific work need to spend a lot of time designing code, only to do basic mathematical operations, rather than focusing on scientific problems as a whole.

What is needed in the programming field is a standard. In 1979, this standard appeared, it is the basic linear algebraic subroutine library (Basic Linear Algebra Subprograms), referred to as BLAS. BLAS simplifies matrix and vector computation to basic computing units such as addition and subtraction. This standard developed until 1990, defining dozens of basic subroutines of vector and matrix mathematics.

BLAS is perhaps the most important interface defined for scientific computing. It provides standardized names for common functions; BLAS-based code works in the same way on any computer; in addition, the establishment of standards also enables computer manufacturers to optimize BLAS to achieve fast operations on different hardware. It can be said that BLAS provides the basis for scientific computing.

Before the introduction of the programming tool BLAS in 1979, researchers working on the supercomputer Cray-1 at Lawrence Livermore National Laboratory had no standard for linear algebraic computation. | Science History Images / Alamy6, microscope essential: NIH Image (1987) in the early 1980s, the brain Imaging Laboratory of the National Institutes of Health (NIH) had a scanner that could digitize X-rays, but could not display or analyze these images on a computer. So Wayne Rasband, the programmer who works here, wrote a program to achieve this goal.

The program was originally designed for a $150000 PDP-11 computer, and then in 1987, Apple released Macintosh II,Rasband and ported the software to this new user-friendly platform, creating an image analysis system, or NIH Image.

The successors of NIH Image include ImageJ and Fiji, and researchers can view and analyze images on any computer, which has become a basic tool for biologists, and any biologist who has used a microscope is familiar with them.

ImageJ offers a seemingly simple minimalist user interface that has barely changed since the 1990s. However, the tool is virtually infinitely extensible-- compatible with a wide range of file formats, a flexible plug-in architecture, and a macro logger that can save workflows by recording mouse actions. People have designed a variety of unique plug-ins, some can automatically identify cells, some can track targets, users can easily make ImageJ tools more personalized according to their own needs.

With the help of the plug-in, the ImageJ tool can automatically recognize the nucleus in the microscope image. | Ignacio Arganda-Carreras / ImageJ7, sequence search: BLAST (1990) when it comes to search, we say to Google; in genetics, scientists say to BLAST a molecular sequence. Changing from a software name to a verb is probably the best indicator of the use of universality. (editor's note: for verbalization or adjectives of a person's name, see section 5 of Emmett: eight ways of the talented mathematician with a rough journey.)

The changes brought about by evolution are recorded in molecular sequences, such as substitution, deletion, rearrangement and so on. By searching for the similarity between molecular sequences, especially the amino acid sequences of proteins, researchers can discover their evolutionary relationships and gain an in-depth understanding of the function of genes. The crux of the problem, however, is to do this quickly and comprehensively in the rapidly expanding molecular information database.

Bioinformatics pioneer Margaret Dehoff (the one who built the biological database prototype) made a key contribution in 1978. She designed a PAM matrix whose values on each grid are the probability of one amino acid being replaced by another. This allows researchers to score the genetic relationship of two proteins not only according to the similarity of their molecular sequences, but also according to the evolutionary distance between them.

In 1985, an algorithm called FASTP was introduced by further combining PAM matrix and fast search ability. A few years later, the more powerful BLAST was born and released in 1990.

BLAST can not only quickly search growing databases, but also find matches that are more distant in evolutionary relationships and calculate how likely these matches are to happen accidentally. It is fast and easy to use. For genomic biology in its infancy at the time, BLAST was a transformative tool for scientists to figure out what role unknown genes might play based on the function of the genes involved.

8. Preprint platform: arXiv.org (1991) in the late 1980s, high-energy physicists usually mailed copies of submitted papers to their peers for advice, but this was often limited to a small number of people. Scientists at the lower end of the food chain have to rely on the generosity of Daniel, while many equally ambitious researchers are often excluded from the circle because they are not from top institutions.

In 1991, Paul Ginsparg, a physicist then working at Los Alamos National Laboratory, wrote an automated email in an attempt to level the playing field. Subscribers can receive a daily list of preprints, each associated with an article identifier. Through an email, users around the world can submit or retrieve an article through the lab's computer system, get a list of new articles, and search by author or title.

Ginsparg's plan is to keep the article for three months and limit it to the field of high-energy physics. But a colleague persuaded him to keep the articles indefinitely. At that moment, it changed from a bulletin board to an archive. Papers are pouring in, and other fields other than high-energy physics are also pouring in. Ginsparg migrated the system to the World wide Web in 1993 and gave it its current name-arXiv.org in 1998.

This year, arXiv, which is 30 years old, has about 1.8 million preprints, all of which are available free of charge, attracting more than 15000 submissions and 3000 million downloads per month. It provides researchers with a fast and convenient way to show academic work, thus avoiding the time and troubles required by traditional peer-reviewed journals.

From 1991 to 2021, the number of preprints submitted each month attracted by arXiv continued to grow. | the success of arXiv.orgarXiv has spawned the prosperity of other preprint websites for papers, including biology, medicine, sociology and many other disciplines. Today, its impact can be seen from tens of thousands of published preprints of the "COVID-19" virus. (editor's note: see "is it reliable to preprint this paper?" This method, which was regarded as heresy outside the world of particle physics 30 years ago, has long been regarded as a natural existence.

9, data browser: IPython Notebook (2011) Python is an interpreted language, the program will run the code one by one directly. Programmers can use an interactive tool called the read-evaluate-output loop (REPL) in which code is entered and then executed by a program called an interpreter. REPL allows rapid exploration and iteration, but Python's REPL is not suitable for scientific computing, for example, it does not allow users to easily preload code modules or open data visualization.

So in 2001, Fernando P é rez, then a graduate student, wrote his own version, which is IPython, an interactive Python interpreter with 259 lines of code. Ten years later, IPython was migrated to browsers and became IPython Notebook, which started a revolution in data science.

IPython Notebook is really like a notebook, putting code, results, images, and text in one document. Unlike other similar projects, it is open source and all developers are welcome to contribute. And it supports Python, a programming language popular with scientists. In 2014, IPython evolved into Jupyter, supporting about 100 languages, allowing users to easily explore data on remote supercomputers.

For data scientists, Jupyter has actually become a standard. In 2018, there were 2.5 million Jupyter notebooks on the GitHub code-sharing platform; today there are nearly 10 million, including the code for the discovery of gravitational waves in 2016 and the first photo of a black hole in 2019.

10. Fast Learners: AlexNet (2012) there are two types of artificial intelligence (AI): one uses written rules, and the other makes computers "learn" by simulating the neural structure of the brain. For a long time, artificial intelligence researchers have believed that the latter type of AI is not feasible. But in 2012, Alex Krizhevsky and Ilya Sutskever, two graduate students of the famous computer scientist Geoffrey Hinton, proved that this was not the case.

They designed AlexNet based on deep learning neural network algorithms to participate in the 2012 ImageNet large-scale Visual recognition Challenge. The researchers trained the AI with a database of 1 million images of everyday objects, then tested the generated AI algorithm with another independent set of images, and finally evaluated the ratio of the algorithm to correctly classify the images. At that time, the best algorithm would misclassify about 1x4 images, and AlexNet basically reduced the error rate by almost half to about 16%.

AlexNet's success in 2012 was due to three factors: a large enough training data set, excellent programming, and the power of GPU, even if the latter was actually intended to improve computer graphics performance. But the researchers increased the speed of the algorithm by 30 times. But that's not enough. the real breakthrough in algorithm actually happened three years ago. At that time, Hinton Labs created a neural network that could recognize speech more accurately than traditional AI, which had been improved for decades. Although it has only made a little progress, it marks a real technological breakthrough.

These achievements indicate the rise of deep learning in various fields. Today, our mobile phones can understand voice queries, and image analysis tools in biology labs can identify cells in microphotos, all relying on deep learning algorithms. As a result, AlexNet has become one of the tools to change science and the world.

Reference:

[1] https://www.nature.com/articles/d41586-021-00075-2

[2] https://www.britannica.com/biography/Lewis-Fry-Richardson

[3] https://en.wikipedia.org/wiki/General_circulation_model

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.