In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Preface
Some time ago, I saw on a third-party platform that I had written more than 10W words. It is hard to imagine that I had to skillfully use line breaks to complete my 800-word composition in high school (people who know it must have done it).
In this line of work, I have formed a habit of verifying everything that can be verified by myself.
So I wrote a tool in my spare time when I worked overtime all night last Friday:
Https://github.com/crossoverJie/NOWS
With SpringBoot, you only need one line of command to count how many words you have written.
Java-jar nows-0.0.1-SNAPSHOT.jar / xx/Hexo/source/_posts
Input the article directory that needs to be scanned to output the results (currently only .md ending Markdown files are supported)
Of course, the result is happy (more than 400,000 words), because in the early blog I like a lot of post code, and some English words are not filtered, so the results are quite different.
If only the statistics of Chinese characters are accurate, and the tool has a built-in flexible extension, users can customize their statistical strategies, as shown below.
In fact, this tool is quite simple, the amount of code is also small, not much can be worth talking about. But after my recollection, whether in interviews or communicating with netizens, I found a common phenomenon:
Most novice developers look at multithreading, but there are few related practices. Some even don't know the use of multithreading in actual development.
To this end, I would like to bring a practical and easy-to-understand multithreading case for this kind of friends based on this simple tool.
At least let you know:
Why do you need multithreading?
How to implement a multithreaded program?
What are the problems and solutions caused by multithreading?
Single thread statistics
Before we talk about multithreading, let's talk about how to implement single threading.
This time the requirement is also very simple, just need to scan a directory to read all the files below.
All of our implementations have the following steps:
Read all files in a directory.
Keep the paths of all files to memory.
Go through all the files and read the number of words in the text one by one.
Let's take a look at how the first two are implemented, and you need to continue reading the files in the current directory when you scan to the directory.
Such a scenario is perfect for recursion:
Public List getAllFile (String path) {
File f = new File (path)
File [] files = f.listFiles ()
For (File file: files) {
If (file.isDirectory ()) {
String directoryPath = file.getPath ()
GetAllFile (directoryPath)
} else {
String filePath = file.getPath ()
If (! filePath.endsWith (".md")) {
Continue
}
AllFile.add (filePath)
}
}
Return allFile
}
}
Keep the path of the file to a collection after reading.
It should be noted that this number of recursions needs to be controlled to avoid stack overflow (StackOverflow).
Finally, the contents of the file are read using the stream in Java8, so the code can be more concise:
Stream stringStream = Files.lines (Paths.get (path), StandardCharsets.UTF_8)
List collect = stringStream.collect (Collectors.toList ())
The next step is to read the word count and filter some special text (for example, I want to filter out all spaces, line breaks, hyperlinks, etc.).
Expansion ability
Simple processing can traverse the collect in the above code and replace the contents that need to be filtered with empty ones.
But everyone may think differently. For example, I just want to filter out blanks, line breaks and hyperlinks, but some people need to remove all the English words and even keep line breaks (just like writing a composition).
All this requires a more flexible way to deal with it.
After reading the above, "designing an interceptor using the chain of responsibility pattern", it should be easy to think of such a scenario with a chain of responsibility pattern.
The specific content of the chain of responsibility model will not be detailed. If you are interested, please refer to the above.
Let's look directly at the realization here:
Define the abstract interface and processing method of the chain of responsibility:
Public interface FilterProcess {
/ * *
Working with text
@ param msg
@ return
, /
String process (String msg)
}
Implementation of handling spaces and line feeds:
Public class WrapFilterProcess implements FilterProcess {
@ Override
Public String process (String msg) {
Msg = msg.replaceAll ("\ s *", "")
Return msg
}
}
Dealing with the implementation of hyperlinks:
Public class HttpFilterProcess implements FilterProcess {
@ Override
Public String process (String msg) {
Msg = msg.replaceAll ("((https | http | ftp | rtsp | mms)?:\ /\ /) [^\ s] +", ")
Return msg
}
}
In this way, you need to add these processing handle to the responsibility chain during initialization, and provide an API for the client to execute.
Such a simple word count tool is complete.
Multithreaded mode
It is very fast to execute once under the condition that there are only dozens of blogs in my place, but what if our files are tens of thousands, hundreds of thousands or even millions.
Although the function can be implemented, it is conceivable that the time-consuming is definitely multiplied.
At this time, multithreading gives full play to its advantage, and multiple threads can read the final summary results of the file respectively.
In this way, the process of implementation becomes:
Read all files in a directory.
Leave the file path to different threads to handle themselves.
The final summary results.
Problems caused by multithreading
It's not all right with multithreading, let's take a look at the first problem: sharing resources.
To put it simply, it is how to ensure that the total word count of multithreaded and single-threaded statistics is the same.
Based on my local environment, let's take a look at the results of single-thread running:
Total: 414142 words.
Next, switch to multithreading:
List allFile = scannerFile.getAllFile (strings [0])
Logger.info ("allFile size= [{}]", allFile.size ())
For (String msg: allFile) {
ExecutorService.execute (new ScanNumTask (msg,filterProcessManager))
}
Public class ScanNumTask implements Runnable {
Private static Logger logger = LoggerFactory.getLogger (ScanNumTask.class)
Private String path
Private FilterProcessManager filterProcessManager
Public ScanNumTask (String path, FilterProcessManager filterProcessManager) {
This.path = path
This.filterProcessManager = filterProcessManager
}
@ Override
Public void run () {
Stream stringStream = null
Try {
StringStream = Files.lines (Paths.get (path), StandardCharsets.UTF_8)
} catch (Exception e) {
Logger.error ("IOException", e)
}
List collect = stringStream.collect (Collectors.toList ())
For (String msg: collect) {
FilterProcessManager.process (msg)
}
}
}
Use thread pools to manage threads. For more information about thread pools, see here: "how to use and understand thread pools gracefully"
Execution result:
We will find that no matter how many times it is executed, this value will be less than our expected value.
Let's take a look at how the statistics are realized.
@ Component
Public class TotalWords {
Private long sum = 0
Public void sum (int count) {
Sum + = count
}
Public long total () {
Return sum
}
}
As you can see, it is just an accumulation of a basic type. So what is the reason why this value is smaller than expected?
I think most people would say that multithreaded runtime will cause some threads to overwrite the values of other thread operations.
But in fact, this is only the appearance of the problem, and the root cause is still not clear.
Memory visibility
The core reason is actually caused by the Java memory model (JMM).
Here is an explanation of the "volatile keyword you should know" written earlier:
Because of the Java memory model (JMM), all variables are stored in main memory, and each thread has its own working memory (cache).
When the thread is working, it needs to copy the data in the main memory to the working memory. In this way, any operation on the data is based on working memory (improved efficiency), and you cannot directly manipulate the data in the main memory and other threads' working memory, and then flush the updated data into the main memory.
The main memory mentioned here can be simply thought of as heap memory, while working memory can be thought of as stack memory.
As shown in the following figure:
So when running concurrently, it may appear that the data read by thread B is the data before thread An is updated.
More related content will no longer be carried out, and interested friends can flip through previous blog posts.
Let's talk about how to solve this problem directly. JDK has actually helped us think of these problems.
There are many concurrency tools you might use under the java.util.concurrent concurrency package.
This is very suitable for AtomicLong, which can modify the data atomically.
Take a look at the modified implementation:
@ Component
Public class TotalWords {
Private AtomicLong sum = new AtomicLong ()
Public void sum (int count) {
Sum.addAndGet (count)
}
Public long total () {
Return sum.get ()
}
}
It's just two API that use it. If you run the program again, you will find that the result is still wrong.
It's even zero.
Inter-thread communication
At this point, a new problem arises to see how it is possible to get the total data.
List allFile = scannerFile.getAllFile (strings [0])
Logger.info ("allFile size= [{}]", allFile.size ())
For (String msg: allFile) {
ExecutorService.execute (new ScanNumTask (msg,filterProcessManager))
}
ExecutorService.shutdown ()
Long total = totalWords.total ()
Long end = System.currentTimeMillis ()
Logger.info ("total sum= [{}], [{}] ms", total,end-start)
You don't know if you can see the problem, but when you finally print the total, you don't know if the other threads have finished executing.
Because executorService.execute () returns directly, none of the threads have finished executing when the print fetches the data, resulting in such a result.
I have written about inter-thread communication before: "in-depth understanding of thread communication."
The general ways are as follows:
Here we use thread pooling:
Add a judgment condition after deactivating the thread pool:
ExecutorService.shutdown ()
While (! executorService.awaitTermination (100, TimeUnit.MILLISECONDS)) {
Logger.info ("worker running")
}
Long total = totalWords.total ()
Long end = System.currentTimeMillis ()
Logger.info ("total sum= [{}], [{}] ms", total,end-start)
So we tried again and found that no matter how many times the result was correct:
Efficiency improvement
Some friends may also ask that this approach has not improved much efficiency.
This is actually due to a small number of local files and a relatively short time-consuming file processing.
Even the number of threads open enough leads to frequent context switching or reduces the efficiency of execution.
In order to improve the efficiency of the simulation, I let the current thread sleep for 100 milliseconds for each file processed to simulate the execution time.
Let's see how long it takes for a single thread to run.
Total time: [8404] ms
Then it takes time when the thread pool size is 4:
Total time: [2350] ms
It can be seen that the improvement in efficiency is still very obvious.
Think more.
This is just one of the uses of multithreading. I believe friends who see it here should have a better understanding of it.
Leave a post-reading exercise for everyone, and the scene is similar:
There are tens of millions of mobile phone number data stored in Redis or other storage media, each number is unique, and all of these numbers need to be traversed in the fastest time.
Friends who are interested in ideas are welcome to leave a message at the end of the article to participate in the discussion.
Summary
I hope that the friends who have finished reading can have their own answers to the questions at the beginning of the article:
Why do you need multithreading?
How to implement a multithreaded program?
What are the problems and solutions caused by multithreading?
Here to provide you with a learning exchange platform, Java technology exchange ┟ 810309655
With 1-5 work experience, in the face of the current popular technology do not know where to start, need to break through the technical bottleneck can be added.
I stayed in the company for a long time and lived comfortably, but I hit a brick wall in the interview when I changed jobs. Those who need to study and change jobs to get a high salary in a short period of time can join the group.
If you have no work experience, but the foundation is very solid, the working mechanism of java, common design ideas, often use java development framework to master proficiency can be added.
Add Java architect advanced communication group to obtain Java engineering, high-performance and distributed, high-performance, simple and simple. High architecture.
Free learning rights for live streaming of advanced practical information about performance tuning, Spring,MyBatis,Netty source code analysis, big data and other knowledge points
It is Daniel flying that makes you take a lot of detours. The group number is: 810309655, Xiaobai do not enter, it is best to have development experience.
Note: add group requirements
1. Those who have work experience and do not know where to start in the face of the current popular technology, those who need to break through the technical bottleneck can be added.
2. After staying in the company for a long time, I had a very comfortable life, but I hit a brick wall in the job-hopping interview. Those who need to study and change jobs to get a high salary in a short period of time can be increased.
3, if there is no work experience, but the foundation is very solid, the working mechanism of java, common design ideas, often use java development framework to master proficiency, can be added.
4. I think I am very good. B, I can meet my general needs. However, the knowledge learned is not systematic, it is difficult to continue to break through in the field of technology can be added.
5. Ali Java Senior Daniel Live explains knowledge points, shares knowledge, combs and summarizes many years of work experience, and takes you to establish your own technical system and technical cognition comprehensively and scientifically!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.