In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
This article comes from Weixin Official Accounts: Programming Technology Universe (ID: xuanyuancoding), Author: Xuanyuan Zhifeng O
Long time no see. My name is Q. I'm an employee of CPU Workshop 1. My CPU has 8 workshops, that is, 8 cores, each of which can execute two threads at the same time, that is, 8 cores and 16 threads, and the speed is very high.
In Workshop No.1 where I work, besides me, who is responsible for executing instructions, there are also Little A, who is responsible for reading instructions, Little Fatty, who is responsible for decoding instructions, and Old K, who is responsible for writing back results. We perform our respective duties and complete the work of executing programs together.
A simple loop That day, we came across a code:
void array_add(int data[], int len) for int i = 0; i len; i++ Data[i] += 1; }} It took hundreds of loops to execute this code, and each loop was simple and repetitive, which exhausted me.
Old K, who was in charge of writing back the results, was also sweating tiredly. He complained: "Every time I take it out, add 1 and write it back. If I can take a few more numbers at a time, it would be good to process them in batches."
Old K's words made my eyes light up. Right, can I operate in batches?
As he thought about it, he continued to work.
The busy day ended quickly, and it was evening again. After the computer was turned off, I gathered everyone together.
"Brothers, remember that cycle we encountered during the day? "
"Which cycle do you say, we can execute a lot of cycles this day," said Little A.
"The loop that adds one to every element of an array of integers."
"I remember. What happened to the loop? Is there a problem? "
I looked at Old K and said,"I was thinking about Old K's words today. Like this cycle, every time I take out and add 1 and write back, I operate one number at a time. The efficiency is too low. If we upgrade and modify it, we can support taking out multiple numbers at a time and adding 1 in batches. Isn't this much faster?" "
Old K was intrigued."That's great. What are you going to do? "
"I haven't figured that out yet. Any suggestions? "
Little Fatty, who is responsible for decoding instructions, said: "You can add a new instruction, which is specially used to fetch multiple data at a time to add 1."
"No, no, no. You can't limit it to such a death. Today it is plus 1. What if it is plus 2 next time? Command cannot be limited to 1"
"What if each data item is added differently? "
"Now that you mention it, what if it's not addition, it's subtraction, multiplication? "
"Oh, and..."
Everyone began to discuss, did not expect a small addition cycle, all of a sudden led to so many problems, this is what we did not expect.
Parallel computing With the deepening of the discussion, I feel that it has exceeded the scope that our No.1 workshop can control. It needs to be reported to the leadership and organized by eight workshop representatives to discuss together.
As soon as the leaders heard that there were new technologies to improve performance, they immediately became interested and soon organized a meeting to discuss the plan.
"Everyone is here, right? Ah Q, tell everyone the purpose of this meeting," the leader said.
I stood up and began to tell everyone about our problems and ideas.
"It's like this. We encountered a loop code in Workshop No.1 that day. The content of the loop body is very simple. It adds 1 to each element in the array. When we do this, we take each element, add it, and write it back. We feel that adding 1 one by one is too slow. If we can take more than one at a time and add 1 in parallel, it must be faster than one by one. "
As soon as I finished, everyone started whispering.
"I see, this is parallel computing! ", workshop No.2 tiger said the key.
"Q, do you have a plan?" asked No. 6 in Workshop 6. "
"Not yet. That's the purpose of today's meeting, because the situation is a little complicated and we need to come up with ideas together."
"Doesn't seem complicated."
"The example I gave above is just a simple case. Parallel computation may not be a fixed number. It may be the addition of one array to another array. Or maybe it's not integer addition, it's floating-point numbers, or maybe it's subtraction or multiplication, not addition, or arithmetic, but logic."
As soon as I finished speaking, everyone started whispering again.
"I was thinking about this series of things you said. Are we going to add a set of instructions specifically for parallel computing?" Xiao Hu said.
"This is a big project."
"Yeah..."
At this time, Xiao Liu asked: "When we calculate, we read the data into the register, but this register can only be loaded with one number at a time. How can we read multiple data at a time?" "
"You may need to add some registers with larger capacity, such as 128bit length, which can accommodate 4 32-bit integers at the same time."
"Is that necessary? We're a general-purpose CPU, not a chip that specializes in mathematical calculations. Why are we doing this? "The representative of Workshop No.4 raised doubts.
I am not to be outdone: "That is too necessary. In the fields of image, video, audio processing, etc., there are a lot of such computing requirements. We have to improve our ability to process this data."
Seeing that we were arguing, the leader patted the table and the meeting was quiet.
"I think Q is right. We really need to improve our ability to process this kind of data. However, there was no need to make it so complicated. He just needed to support integer parallel operations first. Add register this also need not worry, can borrow the register of floating point arithmetic unit FPU first. Let's settle this matter first. You can continue to discuss the specific plan later. "Then he left the conference room.
The leader is indeed a leader. He arranged us clearly in a few words.
SIMD After another intense discussion, we finally finalized the plan.
We borrowed the registers of the floating-point arithmetic units and gave them new names: MM0-MM7. Because it is a 64-bit register, it can store two 32-bit integers or four 16-bit integers or eight 8-bit integers at the same time.
We also added a new instruction set called MMX to perform integer operations in parallel.
We call this technique of processing multiple data simultaneously in one instruction Single Instruction Multiple Data (SIMD).
With this instruction set, we can handle this kind of integer arithmetic problem much faster.
However, two troublesome problems gradually emerged:
The first problem, because it borrows FPU registers, when executing SIMD instructions, you can't use the FPU calculation unit, and vice versa, if you use it at the same time, it will cause trouble, so it is a bit troublesome to switch between different modes frequently.
Another more important problem, our set of instructions can only handle integer parallel operations, but now there are more and more parallel operations of floating point numbers, especially for images, videos and some data processing for deep learning. Floating point numbers are more and more, which is not useful at this time.
We reported these problems to the leaders, and seeing the achievements we had made, the leaders finally agreed to continue upgrading.
This time, we extended the SSE instruction set to include eight 128-bit registers XMM0-XMM7, eliminating the need to share registers with the FPU. Moreover, the bit width was doubled, so more data could be accommodated. Naturally, more data could be processed at the same time.
Later, we continued to modify and upgrade, not only to support the parallel processing of floating point numbers, but also introduced a new generation of AVX instruction set, the register once again expanded to 256 bits, and now our SIMD technology is more advanced, processing data operations more and more powerful!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.