Analysis and solution of Common problems of IBM WebSphere Portal downtime or low performance 07/06 Update SLTechnology News&Howtos

Analysis and solution of Common problems of IBM WebSphere Portal downtime or low performance

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Using IBM WebSphere Portal to build enterprise portal system is a wise choice for users, but because of the complexity of Portal products, downtime or low performance is often a headache for users. Often there are customer portals after the launch of the page blank or can not be accessed, or even downtime, which is a headache. This paper analyzes the common causes of low performance or downtime of IBM Portal, and puts forward popular solutions based on the author's more than ten years of product implementation experience.

WebSphere Portal performance bottlenecks are usually divided into the following eight quarantines. The poor performance of each quarantine may lead to slow user access, loss of system response, blank pages, pages that cannot be displayed, or even downtime. As shown in the figure:

Next, we will introduce the problem points and performance optimization strategies of these eight quarantines in turn:

1. Feel optimization such as Ajax asynchronous call:

Ajax asynchronous invocation perception optimization refers to through technical means, under the premise that other aspects have achieved the best optimization, make full use of asynchronous invocation and other technologies to improve the speed of users, but in essence, the overall performance of the system has not been improved.

(1) problem analysis

This situation usually occurs in the experience of entering the home page after logging in, especially when there are more Portlet deployed on the home page, what is more, multiple Portlet have to call background resources or logical processing, and the response time of each Portlet is relatively slow. If you have to wait until all the Portlet initialization is completed before the portal home page is displayed, then the user waiting time can be imagined, which will bring users a very poor experience.

(2) Optimization strategy

IBM WebSphere Portal is divided into two parts: portal container and Portlet container. After the Portal container is loaded, the Portlet container is loaded. When the Portal container is loaded, it mainly compiles and processes the theme skin, various class libraries, databases and other resources on which the container runs. The Portlet container first logs in to Portlet and authenticates the communication with Ldap, then each Portlet enters the init () method to load various submissions, and then enters doService () to deal with various business logics. after the processing results are obtained and transmitted back to the portal, they are uniformly compiled into Html format, and the Html files compiled by the portal container are assembled into a Html file to present. The time taken for multiple Portlet to complete the business logic and return the result varies. By default, the system waits for all Portlet to return all the data before it is presented uniformly. Then the Portlet with the longest response time becomes the shortest plank on the barrel. Fortunately, WebSphere Portal has supported Ajax asynchronous loading technology since version 7.0. the optimization logic of this layer is to use Portlet asynchronous loading technology. Even when only one Portlet is processed, it will be packaged and presented by the portal container, and then each Portlet will be loaded one by one until it is loaded. As far as the user is concerned, he will first see the front page of the portal and several Portlet columns in a relatively short period of time. At this time, the user's attention will be drawn to the Portlet that has already been presented, and the remaining several Portlet users who load one by one will reduce their perception, so the overall user's feeling is better.

This is a kind of design idea, which needs to be applied to every corner of portal system construction. For example, the theme skin part of the home page may be designed to scroll six pictures back and forth to create an animated playback effect. In some extreme scenarios, for example, when the transmission speed of the network is low and the volume of the six pictures is large, if the animation playback function is not loaded until the six pictures are fully transmitted, the user will feel worse. The workaround is that when downloading the first picture, the first picture is displayed through code control, and then the animation playing logic is executed after the other five pictures have been downloaded. In this way, the feelings of users will also be greatly improved.

II. JVM stack and Web thread pool optimization

JVM stack optimization and Web thread pool optimization refer to the optimization of the JVM and other parameters of the Portal container itself, that is, the configuration of the JVM container itself and garbage collection policy.

(1) problem analysis

The WebSphere Portal service runs on a stand-alone server on a WebSphere Application Server container, and since it is a JVM container, there is a size limit. Each logical processing of user requests needs to open up a memory area in JVM to store temporary data, a total of CPU and a second-level cache of CPU to perform binary operations. In particular, some low-quality code does not execute the clear () method to clear the memory in time after occupying memory, and its processing capacity is limited by relying solely on the garbage collection mechanism of JVM itself. In fact, 3.5 gigabytes of memory can easily be used up.

When the memory already occupied is larger than the memory configured by JVM itself and the JVM has not been garbage collected, there will be no extra memory to store more new processing requests from users. At this time, the new request needs to wait for JVM to release memory, while if JVM does not release memory, the user experience will be that the page has been blank all the time. Until after the timeout displays "the page cannot respond", "the web page cannot be displayed" and other errors, or even Hung live or Crash down, this is downtime.

(2) Optimization strategy

The WebSphere Portal service runs on a stand-alone server on the WebSphere Application Server container. Usually, the maximum stack size of JVM is 3.5 GB on 64-bit machines, because the garbage collection mechanism that relies solely on JVM itself after 3.5 GB will greatly reduce the efficiency of collection in an oversized JVM stack. Here, you also need to configure the size of the new generation of memory. Of course, if it is an old 32-bit machine, the maximum size of the JVM should not exceed 1.8 gigabytes, because the maximum addressing capacity on a 32-bit machine is only 2G.

At the same time, the multithreading mechanism of Portal itself will also be easy to use more WebContainer when the user visits are large and can not be released in time. Usually, we will increase the number of thread pools by 10 times. If "some threads is hung, waiting for..." appears in the log. When you wait for a similar error, it is likely that the thread pool is no longer enough. Of course, at this time, we first have to rule out the cause of the exhaustion caused by the thread pool caused by the program error. If we exclude this, it is probably due to the small number of thread pools. This kind of error is very serious, which will directly lead to the loss of user response, such as the page is blank and the page cannot be displayed, which will soon lead to downtime.

III. Theme and skin tuning

Theme and skin optimization means that customers develop one or more sets of portal themes and skins for customers in order to meet customized needs (i.e. Themes and Skins). The author has encountered a number of customers due to theme skin problems leading to poor system performance or downtime occurred.

(1) problem analysis

The poor performance or downtime caused by theme skin is mainly reflected in two aspects: first, too much theme or skin. In order to enrich the visual effects of the portal, many users will require developers to develop multiple sets of themes and skins, almost every main page should use a separate set of themes, and each Portlet under each theme should use a separate skin. We know that each set of themes or skins in Portal is assembled and compiled by more than 30 files in a separate folder. When users use the portal system, multiple sets of themes and skins are loaded, which means that hundreds or even thousands of jsp or jspf,css,js,jpg files are loaded into memory, which consumes too much system resources! This kind of situation often happens to some customers who have high requirements for the visual effects of the portal. Second, there is low-quality code in the theme or skin file, for example, there is an endless loop in the skin of some topics, or some resources on other systems or even the Internet need to be read. When the peripheral system does not process and return the results in time, the theme skin has been waiting for this resource. If the resource is not returned, you have to wait until the timeout, if the timeout is waiting. It is obvious that the page is blank or cannot be displayed.

(2) Optimization strategy

The author strongly suggests that the problems that can be solved through some parameterized ways can be solved through parameterized ways as far as possible, for example, the theme skins of multiple department portals use the same set of themes, Logo pictures on parameterized themes, parameterized text and other ways to achieve differences in the display effect of portals of different departments. Similarly, each set of themes should not exceed 5 sets of skins, and different styles of Portlet should be presented in a parameterized way.

As for the theme skin code level, it is recommended that the project team make great efforts to strictly check for low-quality logic such as endless loops, reading large amounts of data, not releasing memory in time, waiting for dependent system responses, and so on. For customers who do not understand the code, if they want to use reverse push to determine whether there is code quality, they can use LoadRunner to test the anti-fatigue of the system, and through the voltage test, let the low-quality code of the system consume and occupy memory, CPU and so on as much as possible. Dingya Technology can provide free training and stress testing guidance and training for users across the country.

IV. Optimization of SQL execution efficiency

Poor performance or downtime of the portal is caused not only by running out of memory, but also by CPU and hard drives. SQL execution efficiency is one of the important factors that may cause wear and tear in these three aspects at the same time.

(1) problem analysis

The damage of SQL execution efficiency is usually reflected in the following two aspects: (1) SQL slow query or statement executes a query with a large amount of data and returns the query results with a large amount of data. For example, a customer queries out 40 million user login records at once and prints them in the topic file. These large amounts of data will not only take up memory, but also consume CPU, or even write some hard disk files, causing the hard disk to be full and down over time. SQL slow query will bring a lot of user waiting time. (2) too many times of SQL statement execution or endless loop. When the system wants to query a set of data, it needs to execute multiple sets of queries in multiple tables and assemble the returned results, or under some extreme conditions, SQL executes an endless loop, which consumes a lot of CPU resources and does not return results, so downtime is inevitable.

(2) Optimization strategy

Be sure to identify SQL slow queries, large data queries, endless loops and other problems through strong code review (Code Review), and reasonably design table structure and SQL statements. As in the previous section, for customers who do not understand the code, if they want to use reverse push to determine whether there is code quality, they can use LoadRunner to test the anti-fatigue of the system, and through the pressure test, let the low-quality code of the system consume and occupy memory, CPU and so on as much as possible. Dingya Technology can provide free training and stress testing guidance and training for users across the country. The so-called anti-fatigue test refers to the recording as far as possible to cover all pages, all logical function points, and set LoadRunner in the premise of no thinking time, up to 72 hours of durability testing, the so-called "time to see the hearts of the people", even if the SQL implementation efficiency is a little bit low, resources are not released in time, through high-intensity pressure testing, the problem will be infinitely magnified and finally exposed.

Fifth, database performance optimization

Database performance optimization refers to the parameter optimization of the database itself and the parameter optimization of WebSphere using data source connection pooling.

(1) problem analysis

IBM WebSphere Portal uses data source connection pooling to provide database services, so unreasonable configuration of data source connection pooling will lead to long waiting time or downtime of database processing logic. For example, if some Portlet applications read the database frequently and the number of connection pools is too low, a large number of database read and write logic will queue up for database connections, or even cause downtime after timeout. The most direct consequence is that many users' Portlet applications have not been initialized, because database read and write operations are performed after waiting for other logic to release the data source connection pool.

(2) Optimization strategy

Usually, we configure the number of data source connection pools on the WAS console to be 3-6 times the default of the system, depending on how often each customer's portal content uses the database. The number of data source connection pools is too low, which will cause the Portlet logical queue to wait for longer response time. If the number is too high, the available memory of the system will be reduced, because each data source connection pool will occupy about 3m of memory. If hundreds of data source connection pools are configured, only the memory used for database connection will take up more than one gigabyte. Calculate that we have configured 3.5G JVM, and there is not much computing memory left for the logical processing of the portal, which will also lead to system performance degradation or downtime. The expiration time of the data source connection pool is also emphasized here, and inappropriate TimeOut can also lead to waiting for a response or the system is unable to process the user's normal request. The most accurate number configuration comes from best practices. To put it bluntly, the LoadRunner stress test with reasonable configuration is carried out after setting different parameters, and the design of the stress test scenario is as close as possible to the real scenario used by users to simulate the real situation of the production environment, and then determine the best configuration of the data source connection pool by comparing the group with the best performance of LoadRunner stress test.

In addition, usually the database server and the portal server are different servers, and the proportion of customers with firewalls in the middle is very high. Here we would like to emphasize that we must help customers configure firewall policies reasonably. The case of portal downtime caused by the firewall cutting off the long connection between the portal and the database is very high.

VI. Data / dataset optimization

Data and dataset optimization refers to the problem of the amount of data read and written by Portal container or Portlet container. Too much data will consume a lot of time and space resources, which will lead to poor system performance or downtime.

(1) problem analysis

The code in Portal theme skin and / or Portlet application logic code will consume a lot of time and space resources when performing large amount of data operations; time resources are reflected in CPU processing and space resources are reflected in memory consumption. Whether the CPU is too busy or memory consumption is mostly a dangerous thing, the most commonly used are segmented reading and writing, paging display, big data transfer and other problems.

(1) read and write a large amount of data. When the database performs a large amount of data reading and writing, it will consume a lot of CPU time and memory. As soon as the user concurrency increases, it will easily lead to performance problems.

(2) pagination display. Needless to say, users usually read the first three pages, and if you read hundreds of pages at once, the memory space is huge.

(3) big data transfer means that a very large amount of data may be stored in some tables, such as user login logs, the data will accumulate more and more, the reading and writing speed will be slower and slower, and the system resource consumption will become larger and larger, and the performance will be lower and lower.

(2) Optimization strategy

Aiming at these common problems, the adjustment strategies are introduced respectively. In fact, most of this section is the same as regular Java development, please refer to it yourself.

(1) segmented reading and writing means that when reading and writing data, logic code should try not to read and write a large amount of data, such as writing more than 100000 pieces of data to the database or file system at a time, or reading millions of pieces of data, which must be extremely inefficient. Try to modify the design and optimize the reading logic.

(2) pagination display. Needless to say, read and write up to 3 pages at a time, and then read more data when the user clicks to turn the page.

(3) big data transfer. It is suitable for storing a large amount of data in some tables, it is best not to exceed the level of tens of millions of records, which is usually used in user login logs. With the passage of time, the number of records in the table becomes more and more, and it is easy to consume resources when reading.

VII. Portlet code and logic optimization

In fact, Portlet development is no different from traditional Java development. The poor system performance caused by Portlet waiting for database connection and endless loop has become the essence of optimization at the Portlet level.

(1) problem analysis

The low performance caused by Portlet code usually includes: (1) the Portlet response is unusually slow due to inefficient execution of logic or endless loop, such as reading the to-do information of each system in more than a dozen systems (2) Portlet waits for certain resources that are not loaded in time or cannot be loaded at all, and the Portlet side does not deal with the error handling when it can not be read out, resulting in Portlet waiting for resources all the time, and it is normal for the system thread Hung to live.

(2) Optimization strategy

This requires reasonable design of Portlet implementation logic. For example, to unify to-do Portlet, we can use reverse push. When new to-do items are generated by various business systems, they are actively pushed to the portal cache database (which can be considered as Broker layer), and then Portlet directly reads the to-do entries in a table in the buffer database, which can improve Portlet performance by tens of times.

Second, eliminate the problems of resource waiting and endless loop in Portlet through strict code checking. The inspection of this problem is no different from that of ordinary Java development, and I will not repeat it in this article.

VIII. Optimization of network transmission (packet size)

The optimization of display logic and packaging logic often occurs in customer scenarios where the network speed is not too high, especially those deployed in the cloud or on the Internet, which means that the user response time is too slow due to excessive data transfer, so slow that the user thinks the system is down. Note: if the transmission time expires, there will be a real downtime.

(1) problem analysis

We know that all user requests are transmitted through the only network cable outlet on the server side to establish a bit stream with the user's client computer (or mobile phone). The construction of each user access portal needs to transfer 3M data traffic, which contains css files, jss files, html files, jpg/png and other picture files compiled after logical processing. Assuming that there are 300 users visiting concurrently, these 300 users need to transfer 3Mx300=900M data traffic. If the bandwidth requested by our server is 100 megabytes, that is, 12.5m of data can be transmitted per second, then even if it takes time to go out of logical processing and the consumption of transmission into the network, it will take 900m divided by 12.5M/S equal to 72 seconds. What a huge amount of data, each user has to wait 72 seconds just for transmission.

(2) Optimization strategy

Without affecting the viewing effect of the image, we recommend that users reduce the size of the image as much as possible. For program developers, without affecting the function, try to lose weight for js files, css files, etc., reduce their size to a minimum, and minimize the consumption of user experience caused by network transmission. As a system architect or project manager, we should fully research, fully take into account the real user concurrency, and assist users to purchase or allocate reasonable network bandwidth.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.