Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the performance optimization technology in big data?

2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

Big data in the performance optimization technology refers to what, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

1. Code correlation

When you encounter performance problems, the first thing you should do is to check whether it is related to the business code-- not by reading the code to solve the problem, but through the log or code to eliminate some low-level errors related to the business code. The best place for performance optimization is within the application.

For example, check the business log to see if there are a large number of errors in the log content. Most of the performance problems in the application layer and framework layer can be found in the log (the log level setting is unreasonable, resulting in crazy online logging) Furthermore, checking the main logic of the code, such as improper use of for loops, NPE, regular expressions, mathematical calculations, and other common problems, can be fixed by simply modifying the code.

Do not easily link performance optimization to caching, asynchronization, JVM tuning and other terms, complex problems may have a simple solution, the "28 principles" is still valid in the field of performance optimization. Of course, understanding some basic "common code trampling points" can speed up the process of problem analysis, and some bottleneck optimization ideas from CPU, memory, JVM, etc., may also be reflected here in the code.

Here are some high-frequency coding points that can easily cause performance problems.

1) regular expressions consume a lot of CPU (for example, greedy patterns may cause backtracking). Be careful to use string split (), replaceAll () and other methods; regular expressions must be precompiled.

2) the use of String.intern () on JDK in earlier versions (Java 1.6 and earlier) may cause a memory overflow in the method area (permanent generation). In a higher version of JDK, if the string pool setting is too small and there are too many cached strings, it will also incur a large performance overhead.

3) when outputting the exception log, if the stack information is clear, you can cancel the output of the detailed stack, and the construction of the exception stack is costly. Note: if a large number of repeated stack information is thrown in the same location, JIT will optimize it and directly throw a pre-compiled exception with a matching type, so that the exception stack information will not be seen.

4) avoid unnecessary unpacking operations between reference types and base types, and try to be consistent. Automatic boxing occurs too frequently, which will seriously consume performance.

5) selection of Stream API. For complex and parallel operations, it is recommended to use Stream API, which can simplify the code and give full play to the advantages of multi-core CPU. If it is a simple operation or CPU is a single core, explicit iteration is recommended.

6) according to the business scenario, manually create a thread pool through ThreadPoolExecutor, and specify the number of threads and queue size according to different tasks to avoid the risk of resource exhaustion. The uniformly named threads are also convenient for subsequent troubleshooting.

7) choose the concurrent container reasonably according to the business scenario. For example, when selecting a container of Map type, if you require strong consistency of data, you can use Hashtable or "Map + lock"; read is much greater than write, use CopyOnWriteArrayList; to access a small amount of data, do not have strong consistency requirements for data, change infrequently, use ConcurrentHashMap; to access a large amount of data, read and write frequently, and do not have strong consistency requirements for data, use ConcurrentSkipListMap.

8) the optimization ideas of lock are as follows: reducing the granularity of lock, coarsening of lock in loop, reducing the holding time of lock (the choice of read-write lock) and so on. At the same time, some JDK optimized concurrency classes are also considered, such as using LongAdder instead of AtomicLong for counting and ThreadLocalRandom instead of Random class in statistical scenarios with low consistency requirements.

Code layer optimization in addition to the above, there are many not listed one by one. We can observe that among these points, there are some common optimization ideas that can be extracted, such as:

Space for time: use memory or disk in exchange for more valuable CPU or network, such as the use of cache

Time for space: save memory or network resources by sacrificing part of the CPU, such as turning a large network transmission into multiple

Other technologies such as parallelization, asynchronization, pooling and so on.

2. CPU correlation

As mentioned earlier, we should pay more attention to CPU load. High CPU utilization is generally not a problem. CPU load is the key basis for judging whether the system computing resources are healthy or not.

2.1High CPU utilization & & high average load

This situation is common in CPU-intensive applications, where a large number of threads are runnable, and there are very few of them. Common application scenarios that consume a lot of CPU resources are:

Regular operation

Mathematical operation

Serialization / deserialization

Reflection operation

An endless cycle or an unreasonable large number of cycles

Basic / third-party component defects

Check the general idea of high CPU usage: print thread stacks through jstack for many times (> 5 times), and you can generally locate thread stacks that consume more CPU. Or through the way of Profiling (based on event sampling or burying point), we can get the on-CPU flame map applied in a period of time, which can also quickly locate the problem.

It is also possible that there are frequent GC (including Young GC, Old GC, Full GC) in the application, which can also lead to an increase in CPU utilization and load. Troubleshooting idea: use jstat-gcutil to continuously output the number and time of GC statistics of the current application. The increase in load caused by frequent GC is usually accompanied by insufficient available memory. You can check the amount of available memory on the current machine by using commands such as free or top.

Is it possible that the high utilization of CPU is caused by the performance bottleneck of CPU itself? It's possible. You can further view the detailed CPU utilization through vmstat. The user mode CPU utilization (us) is relatively high, which means that the user mode process takes up more CPU. If this value is more than 50% for a long time, you should focus on troubleshooting the performance problems of the application itself. Kernel mode CPU utilization (sy) is higher, indicating that kernel state takes up more CPU, so we should focus on troubleshooting kernel threads or system call performance problems. If the value of us + sy is greater than 80%, CPU may be insufficient.

2.2 CPU utilization is low & the average load is high

If the CPU utilization is not high, it means that our application is not busy with computing, but doing something else. It is easy to understand that CPU utilization is low and average load is high, which is common in I-amp O-intensive processes. After all, the average load is the sum of R-state processes and D-state processes. After removing the first type, only D-state processes are left (D-state is usually caused by waiting for I-amp O, such as disk I-O, network I-O, etc.).

Troubleshooting & & Verification idea: use vmstat 1 to regularly output system resource usage and observe the value of the% wa (iowait) column, which identifies the percentage of disk I iowait O waiting time in the CPU time slice. If this value is more than 30%, it indicates that the disk IUnio wait is serious, which may be caused by a large number of random disk access or direct disk access (without using the system cache), or there may be a bottleneck in the disk itself. It can be verified with the output of iostat or dstat, such as% wa (iowait) rising while observing a large disk read request, indicating that the disk read may be the cause of the problem.

In addition, long-time network requests (that is, network Imax O) will also lead to an increase in the average load of CPU, such as MySQL slow query, using RPC interface to obtain interface data and so on. The investigation of this situation generally requires a comprehensive analysis based on the upstream and downstream dependencies of the application itself and the trace log of the embedded middleware.

2.3 the number of CPU context switches becomes higher

First use vmstat to check the number of context switches of the system, and then observe the voluntary context switching (cswch) and involuntary context switching (nvcswch) of the process through pidstat. Voluntary context switching is caused by the transition of thread state within the application, such as calling methods such as sleep (), join (), wait (), or using Lock or synchronized lock structures; involuntary context switching is due to thread running out of assigned time slices or being scheduled by the scheduler due to execution priority.

If the number of voluntary context switches is high, it means that there are resource acquisition waits in CPU, for example, lack of system resources such as Imax O, memory, and so on. If the number of involuntary context switching is high, the possible reason is that there are too many threads in the application, resulting in fierce competition in CPU time slices and frequent forced scheduling by the system, which can be supported by the number of threads and thread state distribution calculated by jstack.

3. Memory dependent

As mentioned earlier, memory is divided into system memory and process memory (including Java application processes). Generally, most of the memory problems we encounter fall on process memory, and the bottleneck caused by system resources is relatively small. For the Java process, its own memory management automatically solves two problems: how to allocate memory to objects and how to reclaim the memory allocated to objects, the core of which is the garbage collection mechanism.

Although garbage collection can effectively prevent memory leakage and ensure the effective use of memory, it is not omnipotent. Unreasonable parameter configuration and code logic will still bring a series of memory problems. In addition, the early garbage collector was not very good in terms of functionality and recycling efficiency, and too many GC parameter settings relied heavily on the developer's tuning experience. For example, improper setting of maximum heap memory may cause a series of problems such as heap overflow or heap concussion.

Let's take a look at a few common ideas for analyzing memory problems.

3.1 insufficient system memory

Java applications generally monitor the memory level of a stand-alone machine or a cluster. If the memory utilization of a stand-alone machine is greater than 95%, or if the memory utilization of the cluster is greater than 80%, it means that there may be potential memory problems (Note: the memory level here is system memory).

Except for some extreme cases, the general system is out of memory, which is most likely caused by Java applications. When using the top command, we can see the actual memory footprint of the Java application process, where RES represents the resident memory usage of the process, and VIRT represents the virtual memory footprint of the process. The relationship of memory size is: VIRT > RES > the actual heap size used by the Java application. In addition to heap memory, the overall memory footprint of the Java process, as well as method area / meta space, JIT cache, etc., is mainly composed of the following:

Java application memory usage = Heap (heap area) + Code Cache (code cache) + Metaspace (meta-space) + Symbol tables (symbol table) + Thread stacks (thread stack area) + Direct buffers (out-of-stack memory) + JVM structures (other JVM self-occupation) + Mapped files (memory mapping file) + Native Libraries (local library) +.

The memory footprint of the Java process can be viewed using the jstat-gc command. The current heap memory partition and meta-space usage can be found in the output metrics. The statistics and usage of out-of-heap memory can be obtained by NMT (Native Memory Tracking,HotSpot VM Java8 introduction). The memory space used by the thread stack is easy to be ignored, although the thread stack memory is lazy loading mode, will not directly use the size of + Xss to allocate memory, but too many threads will also lead to unnecessary memory consumption, you can use jstackmem this script to count the overall thread occupation.

The train of thought of troubleshooting the insufficient memory of the system:

First, use free to check the amount of free space in the current memory, and then use vmstat to check the specific memory usage and memory growth trend. This stage can generally locate the process that takes up the most memory.

Analyze the memory usage of the cache / buffer. If this value does not change much over a period of time, it can be ignored. If you observe a continuous increase in the cache / buffer size, you can use pcstat, cachetop, slabtop and other tools to analyze the specific cache / buffer usage.

After excluding the effect of cache / buffer on system memory, if you find that the memory is still growing, it is very likely that there is a memory leak.

3.2 Java memory overflow

Memory overflow means that when you apply a new object instance, the memory space required is greater than the available space of the heap. There are many kinds of memory overflows, and you will generally see the OutOfMemoryError keyword in the error log. The common types and analysis ideas of memory overflow are as follows:

1) java.lang.OutOfMemoryError: Java heap space. Reasons: objects can no longer be allocated in the heap (new and old), references to some objects have been held for a long time but not released, garbage collectors cannot be recycled, and a large number of Finalizer objects are used, which are not in the recycling cycle of GC. Generally, heap overflows are caused by memory leaks. If you confirm that there are no memory leaks, you can increase the heap memory appropriately.

2) java.lang.OutOfMemoryError:GC overhead limit exceeded. Reason: the garbage collector spends more than 98% of its time on garbage collection, but reclaims less than 2% of the heap memory, usually because of memory leaks or too small heap space.

3) java.lang.OutOfMemoryError: Metaspace or java.lang.OutOfMemoryError: PermGen space. Troubleshooting ideas: check whether there are dynamic classes loaded but not unloaded in time, whether there are a large number of string constant pooling, whether the permanent generation / meta space is too small, and so on.

4) java.lang.OutOfMemoryError: unable to create new native Thread. Reason: when the virtual machine expands the stack space, it cannot apply for enough memory space. The stack size of each thread and the number of threads applied as a whole can be reduced appropriately. In addition, the total number of processes / threads created in the system is also limited by the system's free memory and operating system, please check carefully.

Note: this stack overflow is different from StackOverflowError, which is caused by the deep level of method calls and insufficient stack memory allocated to create new stack frames. In addition, there are Swap partition overflow, local method stack overflow, array allocation overflow and other OutOfMemoryError types, which are not very common and will not be introduced one by one.

3.3 Java memory leak

Java memory leaks can be said to be a developer's nightmare, unlike memory leaks, which are simple and easy to find on site. The performance of memory leak is that after the application is running for a period of time, the memory utilization is getting higher and higher, and the response is slower and slower, until the process "fake death" finally occurs.

Java memory leaks may cause insufficient available memory in the system, fake death of processes, OOM, and so on. There are no more than two ways to troubleshoot:

Regularly output statistics of objects in the heap through jmap to locate objects whose number and size continue to grow.

Use the Profiler tool to Profiling the application to find the hot spots of memory allocation.

In addition, when heap memory continues to grow, it is recommended to dump a snapshot of heap memory, based on which you can do some analysis. Although the snapshot is an instantaneous value, it also has a certain meaning.

3.4 garbage collection related

The metrics of GC (garbage collection, the same below) is an important yardstick to measure whether the memory usage of Java processes is healthy. The core indicator of garbage collection: the frequency and frequency of GC Pause (including MinorGC and MajorGC), as well as the memory details of each collection, the former can be obtained directly through the jstat tool, while the latter needs to analyze GC logs. It is important to note that the FGC/FGCT in the jstat output column represents the number of GC Pause (that is, Stop-the-World) in an old garbage collection. For example, for the CMS garbage collector, the value of each old garbage collection is increased by 2 (the initial tag and relabel two stages of Stop-the-World, the statistical value will be 2.

When do I need GC tuning? This depends on the specific circumstances of the application, such as response time requirements, throughput requirements, system resource constraints, and so on. Some experience: GC frequency and time consumption have increased significantly, GC Pause average time consumption exceeds 500ms, Full GC execution frequency is less than 1 minute, etc. If GC meets some of the above characteristics, it means that GC tuning is needed.

Due to the wide variety of garbage collectors and different tuning strategies for different applications, several general GC tuning strategies are introduced below.

1) Select the appropriate GC recycler. According to the application of delay, throughput requirements, combined with the characteristics of each garbage collector, reasonable selection. It is recommended to use G1 to replace CMS garbage collector, the performance of G1 is gradually optimized, and its performance is catching up with and even surpassing on machines with 8GB memory and below. G1 parameter adjustment is more convenient, while the parameters of CMS garbage collector are too complex, easy to cause space fragmentation, high consumption of CPU and other disadvantages, which also make it in a state of abandonment. The newly introduced ZGC garbage collector in Java 11 can basically be used to achieve full-phase concurrent marking and collection, which is worth looking forward to.

2) reasonable setting of heap memory size. Do not set the heap size too large. It is recommended that you do not exceed 75% of the system memory to avoid running out of system memory. The maximum heap size is consistent with the initialization heap size to avoid heap concussion. The size setting of the new generation is more critical. When we adjust the frequency and time-consuming of the GC, we are often adjusting the size of the new generation, including the proportion of the new generation and the old age, the proportion of the Eden area and the Survivor area in the new generation, and so on. The setting of these ratios also needs to consider the promotion age of the objects in each generation, and there are still many things to consider in the whole process. If you use the G1 garbage collector, there will be much less to consider in the new generation size, and adaptive strategies will determine each collection (CSet). The adjustment of the Cenozoic generation is the core of GC tuning, which depends very much on experience, but generally speaking, the high frequency of Young GC means that the Cenozoic generation is too small (or the Eden region and Survivor configuration is unreasonable), and the Young GC time is long, which means that the Cenozoic generation is too large.

3) reduce the frequency of Full GC. If there are frequent Full GC or old GC, it is very likely that there is a memory leak, resulting in the object being held for a long time. By analyzing the dump memory snapshot, the problem can generally be located quickly. In addition, the proportion of the new generation and the old age is not appropriate, resulting in frequent direct allocation of objects to the old era, which may also lead to Full GC, which requires comprehensive analysis combined with business codes and memory snapshots. In addition, by configuring GC parameters, we can get a lot of key information needed for GC tuning, such as configuration-XX:+PrintGCApplicationStoppedTime-XX:+PrintSafepointStatistics-XX:+PrintTenuringDistribution, we can get information about GC Pause distribution, security point time-consuming statistics, object promotion age distribution, and-XX:+PrintFlagsFinal allows us to know the final effective GC parameters.

4. Disk IPUBO and network IPUBO

4.1 troubleshooting ideas for disk Icano problems:

Use the tool to output disk-related output indicators, such as% wa (iowait) and% util. According to the output, you can judge whether there is an anomaly in disk Imax O. For example, the index% util is higher, which indicates that there is a heavy Imax O behavior.

Use pidstat to navigate to a specific process and focus on the size and rate of data read or written down

Using the lsof + process number, you can view the list of files opened by the abnormal process (including directories, block devices, dynamic libraries, network sockets, etc.). Combined with the business code, you can generally locate the source of Imax O. If you need specific analysis, you can also use perf and other tools to trace to locate the source of Imax O.

It is important to note that the increase in% wa (iowait) does not necessarily mean that there is a bottleneck on disk IWeiO, which is a numerical representation of the percentage of time taken by the CPU operation. It is also normal if the main activity of the application process during this period of time is IWeiO.

4.2 there are bottlenecks in network IPUBO for the following possible reasons:

The object transferred at one time is too large, which may result in slow response to requests and frequent GC

The unreasonable selection of network Ipaw O model results in low QPS and long response time of the application as a whole.

The thread pool setting for RPC calls is unreasonable. You can use jstack to count the distribution of threads, and if there are many threads in TIMED_WAITING or WAITING state, you need to pay attention to them. For example: the database connection pool is not enough, which shows that many threads are competing for a lock of the connection pool on the thread stack.

The RPC call timeout setting is unreasonable, resulting in many failed requests.

The thread stack snapshot of Java applications is very useful. In addition to the problems mentioned above for troubleshooting unreasonable thread pool configuration, other scenarios, such as high CPU and slow application response, can start with thread stack.

5. Useful one-line command

This section gives a number of commands for locating performance problems, which are used for quick positioning.

1) View the number of current network connections on the system

Netstat-n | awk'/ ^ tcp/ {+ + S [$NF]} END {for (an in S) print a, S [a]}'

2) View the distribution of objects in the heap Top 50 (locate memory leaks)

Jmap-histo:live $pid | sort-n-r-K2 | head-n 50

3) list the top 10 processes by CPU/ memory usage

# ps axo% mem,pid,euser,cmd | sort-nr | head-10#CPUps-aeo pcpu,user,pid,cmd | sort-nr | head-10

4) display the CPU utilization and idle rate of the system as a whole

Grep "cpu" >

5) count threads by thread state (enhanced version)

Jstack $pid | grep java.lang.Thread.State: | sort | uniq-c | awk'{sum+=$1; split ($0TOTAL a, ":"); gsub (/ ^ [\ t] + | [\ t] + $/, ", a [2]); printf"% s:% s\ n ", a [2], $1}; END {printf" TOTAL:% s ", sum}'

6) View the stack information of the Top10 thread machines that consume the most CPU

It is recommended that you use the show-busy-java-threads script, which can be used to quickly troubleshoot CPU performance problems of Java (the top us value is too high), automatically find out the threads that consume more CPU in the running Java process, and print out their thread stack, so as to determine the method calls that cause performance problems. This script has been used in Ali online OPS environment. Link address: https://github.com/oldratlee/useful-scripts/.

7) Flame image generation (perf, perf-map-agent and FlameGraph need to be installed):

# 1. Collect stack and symbol table information when the application is running (sampling time 30 seconds, 99 events per second); sudo perf record-F 99-p $pid-g-sleep 30;. / jmaps# 2. Use perf script to generate the analysis results, and the resulting flamegraph.svg file is the flame diagram. Sudo perf script |. / pkgsplit-perf.pl | grep java |. / flamegraph.pl > flamegraph.svg

8) list the top 10 processes by Swap partition usage

For file in / proc/*/status; do awk'/ VmSwap | Name | ^ Pid/ {printf $2 "$3} END {print"}'$file; done | sort-k 3-n-r | head-10

9) JVM memory usage and garbage collection status statistics

# display the cause of the last or current garbage collection jstat-gccause $pid# shows the capacity and usage of each generation jstat-gccapacity $pid# shows the new generation capacity and usage jstat-gcnewcapacity $pid# shows the old capacity jstat-gcoldcapacity $pid# displays garbage collection information (continuous output at 1 second interval) jstat-gcutil $pid 1000

10) some other daily commands

# quickly kill all java processes ps aux | grep java | awk'{print $2}'| xargs kill-kill the top10 file find /-type f-print0 in the directory that takes up the most disk space | xargs-0 du-h | sort-rh | head-n 10 although performance optimization is important, don't put too much effort into optimization too early (of course, perfect architecture design and coding are necessary). Premature optimization is the root of all evils. On the one hand, the optimization work done in advance may not apply to the rapidly changing business requirements, but hinder the new requirements and functions; on the other hand, premature optimization increases the complexity of the application and reduces the maintainability of the application. When to optimize and to what extent is a proposition that needs to be weighed in many ways.

After reading the above, have you mastered what the performance optimization technology in big data refers to? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report