How to understand Java 8 parallel streams 07/03 Update SLTechnology News&Howtos

How to understand Java 8 parallel streams

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article focuses on "how to understand Java 8 parallel streams". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand Java 8 parallel flow.

Parallel flow

Recognize and open parallel flow

What is parallel flow: parallel flow is to divide the contents of a stream into multiple data blocks and use different threads to process the streams of each different data block. For example, there is a need:

There is a List collection, and each apple object in list has only weight. We also know that the unit price of apple is 5 CNY / kg. Now we need to calculate the unit price of each apple. The traditional way is as follows:

List appleList = new ArrayList (); / / pretend that the data is for (Apple apple: appleList) {apple.setPrice (5.0 * apple.getWeight () / 1000);}

We use the iterator to traverse the apple object in list to calculate the price of each apple. The time complexity of this algorithm is O (list.size ()). With the increase of list size, the time-consuming will increase linearly. Parallel flow can greatly shorten this time.

The parallel flow processes the collection as follows:

AppleList.parallelStream () .forEach (apple-> apple.setPrice (5.0 * apple.getWeight () / 1000))

The difference from the normal flow is the parallelStream () method that is called here. Of course, you can also use stream.parallel () to convert a normal stream into a parallel flow. It is recommended to take a look at: there are 10 ways for Java 8 to create Stream. For more information, you can follow the official account of Java technology stack and reply to java to get a series of tutorials.

Parallel flows can also be converted to sequential flows through the sequential () method, but note that parallel and sequential conversions of the stream do not actually change the convection itself, only with a mark. And perform multiple parallel / sequential conversions of the stream on a pipeline, and the last method call takes effect.

Parallel flow is so convenient, where do its threads come from? How many are there? How to configure it?

The default ForkJoinPool thread pool is used inside the parallel flow. The default number of threads is the number of cores of the processor, and configuring the system core property: java.util.concurrent.ForkJoinPool.common.parallelism can change the thread pool size. But the value is a global variable.

Changing him will affect all parallel flows. Currently, it is not possible to configure an exclusive number of threads for each stream. Generally speaking, the number of processor cores is a good choice.

Test the performance of parallel flows

To make it easier to test performance, we let the thread sleep for 1 second each time after calculating the Apple price, indicating that other IO-related operations were performed during this period, and output the program execution time and sequential execution time:

Public static void main (String [] args) throws InterruptedException {List appleList = initAppleList (); Date begin = new Date (); for (Apple apple: appleList) {apple.setPrice (5.0 * apple.getWeight () / 1000); Thread.sleep (1000);} Date end = new Date () Log.info ("number of apples: {}, time: {} s", appleList.size (), (end.getTime ()-begin.getTime ()) / 1000);}

Parallel version

List appleList = initAppleList (); Date begin = new Date (); appleList.parallelStream () .forEach (apple-> {apple.setPrice (5.0 * apple.getWeight () / 1000); try {Thread.sleep (1000);} catch (InterruptedException e) {e.printStackTrace () }}); Date end = new Date (); log.info ("number of apples: {}, time: {} s", appleList.size (), (end.getTime ()-begin.getTime ()) / 1000)

Time-consuming situation

In line with our prediction, my computer is a quad-core I5 processor. After parallelism is turned on, each of the four processors executes a thread, and the task is completed in 1s!

Can parallel flow be used freely?

Separability affects the speed of the flow

Through the above tests, some people will easily come to the conclusion that parallel flows are so fast that we can completely abandon the external iterations of foreach/fori/iter and use the internal iterations provided by Stream.

Is this really the case? Is parallel flow really so perfect? The answer is, of course, no. You can copy the following code and test it on your own computer. After testing, you can find that parallel flow is not always the fastest way to deal with it.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

For the first n digits processed by the iterate method, it is always slower than the loop, whether parallel or not, and the non-parallel version can be understood to mean that the streaming operation is slower than the loop and tends to the bottom layer. But why is the parallel version slow? Here are two points to pay attention to:

2. Iterate generates boxed objects, which must be broken down into numbers in order to sum.

3. It is difficult to divide iterate into several independent blocks to execute in parallel.

This is an interesting question, and we must be aware that some stream operations are easier to parallelize than others. For iterate, each application of this function depends on the results of the previous application. Therefore, in this case, not only can we not effectively divide the flow into small blocks. On the contrary, it has once again increased spending because of parallelization.

4. For the LongStream.rangeClosed () method, there is no second pain point for iterate. It generates a basic type of value without unpacking operation. In addition, it can directly split the generated number 1-n into four parts: 1-n big 4, 1n/4-2n/4,... 3n/4-n. Therefore, rangeClosed () in parallel is faster than the external iteration of the for loop.

Package lambdasinaction.chap7; import java.util.stream.*; public class ParallelStreams {public static long iterativeSum (long n) {long result = 0; for (long I = 0; I + 1) .limit (n) .reduce (Long::sum). Get () } public static long parallelSum (long n) {return Stream.iterate (1L, I-> I + 1) .limit (n). Parallel (). Reduce (Long::sum). Get ();} public static long rangedSum (long n) {return LongStream.rangeClosed (1, n). Reduce (Long::sum). GetAsLong () } public static long parallelRangedSum (long n) {return LongStream.rangeClosed (1, n) .parallel () .reduce (Long::sum) .getAsLong ();}} package lambdasinaction.chap7; import java.util.concurrent.*; import java.util.function.*; public class ParallelStreamsHarness {public static final ForkJoinPool FORK_JOIN_POOL = new ForkJoinPool () Public static void main (String [] args) {System.out.println ("Iterative Sum done in:" + measurePerf (ParallelStreams::iterativeSum, 10000000L) + "msecs"); System.out.println ("Sequential Sum done in:" + measurePerf (ParallelStreams::sequentialSum, 10000000L) + "msecs") System.out.println ("Parallel forkJoinSum done in:" + measurePerf (ParallelStreams::parallelSum, 10000000L) + "msecs"); System.out.println ("Range forkJoinSum done in:" + measurePerf (ParallelStreams::rangedSum, 10000000L) + "msecs"); System.out.println ("Parallel range forkJoinSum done in:" + measurePerf (ParallelStreams::parallelRangedSum, 10000000L) + "msecs") } public static long measurePerf (Function f, T input) {long fastest = Long.MAX_VALUE; for (int I = 0; I < 10; iTunes +) {long start = System.nanoTime (); R result = f.apply (input); long duration = (System.nanoTime ()-start) / 1000000000 System.out.println ("Result:" + result); if (duration < fastest) fastest = duration;} return fastest;}}

The problem of shared variable modification

Although parallel flow can easily implement multithreading, it still does not solve the problem of modifying shared variables in multithreading. There is a shared variable total in the following code, which uses sequential flow and parallel flow to calculate the sum of the first n natural numbers, respectively.

Public static long sideEffectSum (long n) {Accumulator accumulator = new Accumulator (); LongStream.rangeClosed (1, n) .forEach (accumulator::add); return accumulator.total;} public static long sideEffectParallelSum (long n) {Accumulator accumulator = new Accumulator (); LongStream.rangeClosed (1, n). Parallel (). ForEach (accumulator::add); return accumulator.total;} public static class Accumulator {private long total = 0 Public void add (long value) {total + = value;}}

The result of each output of sequential execution is: 50000005000000, while the result of parallel execution is varied. This is because there is data competition every time you visit totle. For reasons for data competition, you can take a look at the blog about volatile. Therefore, parallel flow is not recommended when there is an operation in the code to modify a shared variable.

Attention to the use of parallel flow

There are the following points to note in the use of parallel flows:

Try to use raw data streams such as LongStream / IntStream / DoubleStream instead of Stream to process numbers, so as to avoid the extra overhead caused by frequent unpacking

To consider the total computational cost of the flow's operation pipeline, assume that N is the total number of tasks to operate and Q is the time of each operation. N * Q is the total time of the operation, and the higher the Q value, the more likely it is to use parallel flows to bring benefits.

For example, several types of resources come from the front end and need to be stored in the database. Each resource corresponds to a different table. We can think of the number of types as N, and the network time used to store the database + the insertion time as Q. In general, the time-consuming network is relatively large. Therefore, this operation is more suitable for parallel processing. Of course, when the number of types is greater than the number of cores, the performance improvement of this operation will be reduced to some extent. A better optimization method will be presented to you in future blogs.

Parallel flow is not recommended for a small amount of data

Stream data that is easy to split into blocks, it is recommended to use parallel streams

Here are some splittable performance tables for some common collection framework corresponding flows:

Source separability ArrayList excellent LinkedList poor IntStream.range poor HashSet good TreeSet so good, I believe you have a deeper understanding of "how to understand Java 8 parallel flow", might as well come to the actual operation! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.