In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how to enable Java to achieve large text parallel computing, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.
Simply improve the efficiency of text reading, using BufferedReader is a good choice. The fastest method is MappedByteBuffer, but compared to BufferedReader, the effect is not very obvious. In other words, although the latter is fast, it is also limited (don't be under the illusion of performance improvement several times).
For the reading of large text, the performance bottleneck is mainly due to the fact that most of the time taken up by IO,read is normal, the hard disk itself is not fast, and it is time-consuming to convert it into an object after reading into memory.
If you want to speed up, you should use a parallel method, using multithreading to read and process data at the same time, but it is troublesome for Java to write multithreaded programs, and it is troublesome to consider adjusting the boundary when reading the same file in parallel segments.
For example, if you want to use this scenario: summarize the sales of each customer in groups, and some of the source data are as follows:
O_ORDERKEY O_CUSTKEY O_ORDERDATE O_TOTALPRICE
10262 RATTC 1996-07-22 14487.0
10263 ERNSH 1996-07-23 43818.0
10264 FOLKO 2007-07-24 1101.0
10265 BLONP 1996-07-25 5528.0
10266 WARTH 1996-07-26 7719.0
10267 FRANK 1996-07-29 20858.0
10268 GROSR 1996-07-30 19887.0
10269 WHITC 1996-07-31 456.0
10270 WARTH 1996-08-01 13654.0
...
Expected results:
The Java part of the multithreaded code should be written like this:
...
Final int DOWN_THREAD_NUM = 8
CountDownLatch doneSignal = new CountDownLatch (DOWN_THREAD_NUM)
RandomAccessFile [] outArr = new RandomAccessFile [down _ THREAD_NUM]
Try {
Long length = new File (OUT_FILE_NAME). Length ()
Long numPerThred = length / DOWN_THREAD_NUM
Long left = length% DOWN_THREAD_NUM
For (int I = 0; I < DOWN_THREAD_NUM; iTunes +) {
OutArr [I] = new RandomAccessFile (OUT_FILE_NAME, "rw")
...
If (I = = DOWN_THREAD_NUM-1) {
New ReadThread (I * numPerThred, (I + 1) * numPerThred + left, outArr [I], keywords,doneSignal) .start ()
...
} else {
New ReadThread (I * numPerThred, (I + 1) * numPerThred,outArr [I], keywords,doneSignal) .start ()
...
}
}
}
...
It would be much easier if there is an aggregator, which encapsulates the multithreading of Java, provides the function of segmenting and paralleling large files, makes it much easier to write and requires less personnel. For example, for the above problem, 2 lines are solved (the aggregator has built-in parallelism option @ m, which does not set the number of parallelism, but defaults to the number of cores as the number of parallelism):
A
one
= file ("/ workspace/orders.txt") .cursor@mt ()
two
= A1.groups (Odyssey O_TOTALPRICE): AMOUNT)
In fact, there are many situations where parallel processing of large text with Java is very troublesome, even large text grouping, sorting, associative computing and other requirements, but using the aggregator SPL is very simple.
The above is how to enable Java to achieve large text parallel computing. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.