How to detect the loss of Web service requests 07/19 Update SLTechnology News&Howtos

How to detect the loss of Web service requests

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to detect the loss of Web service requests". In the daily operation, I believe that many people have doubts about how to detect the loss of Web service requests. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to detect the loss of Web service requests". Next, please follow the editor to study!

Problem description

Recently, users have occasionally reported that there is a timeout problem in some HTTP APIs, but there is no exception in the Trace monitoring of the web server, such as a http return value of 503. This situation is usually caused by a problem with the web container and the client cannot connect. This article will focus on how to monitor such problems.

We use a typical Web service architecture, where the application accesses our LVS (Linux Virtual Server) machine through the domain name, and the LVS corresponds to multiple Web servers.

Considering that the LVS cannot be tracked, and the Web server (there is an accumulation on the Tomcat, the scope of impact cannot be evaluated). After much thought, we are going to add a Nginx to Tomcat and LVS to track the reality of user visits. Nginx is a free, open source, high performance HTTP server. Through the Nginx code, we can grasp the real situation of first-hand user access. We originally intended to do statistics through the Access log of Nginx, but later, referring to the documentation of Ali Cloud Link tracking, we can use link tracking to connect the buried point of HTTP with Tomcat, and we can find the problem in more detail.

Prepare the environment and reproduce the problems, compile and install Nginx and Jaeger Agent. For the specific installation process, please refer to Ali Cloud Link tracking documentation. Test environment: need to reproduce the timeout problem, write a Mini Program, start 200 threads, each thread sends 500 requests to the service in a row. A total of 100000 requests were submitted.

Investigation process

Comparing the data of Web server and the link statistics of Nginx server, if the number of requests of the two kinds of requests is not the same, it can be determined that there are requests lost. The reason for the loss of the request is determined according to the detailed data on the link.

1. Web server data statistics

After sending the request, it is found that the web server handles a total of 98717 requests, 1283 fewer than the client.

2. Nginx server statistics

If you look at the requests of Nginx, there are a total of 100000 requests, which means that Nginx has received all the requests, but only 98717 requests have been processed on the Web service (monitored by javax.servlet.Filter burial).

3. Problem analysis.

Check the Nginx service and find some of the requested HTTP return codes of Nginx. As shown in the following figure:

Compared with the normal HTTP link, it is found that only one Span of the requested HTTP of Nginx returns a return code of 499, while the HTTP return code of 200 shows the complete call link (besides the Span of Nginx, there is also the Span of Web service on the link), as shown below:

We can explain this problem in this way: the client traffic enters the Web server, and if the Web server cannot handle it (beyond the maximum traffic it can bear or the Web server itself may have FullGC,OOM, deadlock, thread pool slow problems), then the request that the client sets the timeout will appear 499, without entering the javax.servlet.Filter processing, and the Web server will not see any access record.

Is it possible to assume that requests with a HTTP return value of 499 are all requests that the server failed to process?

4. Further investigation

We fetched a total of 2719 requests returned on Nginx, which is larger than the 1283 requests lost by the Web service. What is the reason why this data does not match? We took a closer look at the data and there was a request that Nginx returned 499, but the Web service returned 200. These requests go into the Web service handler, but the Web service timed out before it was returned. If there is no Tracing to link the context, it is difficult to explain this problem through the Nginx log or the Web service log (a request, Nginx returns 499 and the Web service returns 200), as shown in the following figure:

By opening the link between Nginx and Web Container Service (Tomcat), we can check the status of each link of the HTTP request and easily locate the problem.

At this point, the study on "how to detect the loss of Web service requests" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.