Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Oracle RAC uses Jumbo Frames

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Let's take a look at what Jumbo Frames is.

We know that in the TCP/IP protocol cluster, the unit of Ethernet data link layer communication is the frame (frame), the size of one frame is set at 1518 bytes, the size of the MTU (Maximum Transmission Unit maximum transmission unit) of the traditional 10m network card frame is 1500 bytes (as shown in the example), 14 bytes in the base are reserved for the header of the frame, 4 bytes are reserved for CRC check, in fact, the first 40 bytes of the whole TCP/IP are removed, and the effective data is 1460 bytes. Later 100m and 1000m network cards remained compatible, also 1500 bytes. But for 1000m network cards, this means more interrupts and processing time. So the gigabit network card uses "Jumbo Frames" to expand the frmae to 9000 bytes. Why 9000 bytes instead of larger? Because a 32-bit CRC checksum loses its efficiency advantage for larger than 12000 bytes, while 9000 bytes is sufficient for 8KB applications such as NFS.

Eg:

[root@vm1 ~] # ifconfig

Eth0 Link encap:Ethernet HWaddr 08:00:27:37:9C:D0

Inet addr:192.168.0.103 Bcast:192.168.0.255 Mask:255.255.255.0

Inet6 addr: fe80::a00:27ff:fe37:9cd0/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:9093 errors:0 dropped:0 overruns:0 frame:0

TX packets:10011 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0 txqueuelen:1000

RX bytes:749067 (731.5 KiB) TX bytes:4042337 (3.8 MiB)

If you are in a configuration MTU ~ 1500 byte (1.5K) path in the above network, a block of 8K is transferred from one node to another, then six packets are required to transmit. The 8K cache is divided into six IP packets sent to the receiving side. At the receiving end, the six IP packets are received and the 8K buffer is recreated. The reorganization buffer is eventually passed to the application for further processing.

Figure 1

Figure 1 shows how the data blocks are split and reorganized. In this figure, the LMS process sends a block of 8KB data to the remote process. In the process of transmission, the buffer 8KB is divided into six IP packets, and these IP packets are sent to the receiving side through the network. On the receiving side, the kernel thread reassembles the six IP packets and stores the 8KB blocks in the buffer. The foreground process reads it from the socket buffer to PGA and copies it to database buffer.

The above process will lead to fragmentation and reassembly, that is, over-segmentation and reorganization, which virtually increases the CPU utilization of this database node. In this case, we have to choose Jumbo Frames.

Now that our network environment can reach gigabit, 10 gigabit, or even higher, we can set it in the system with the following command (provided your environment is gigabit Ethernet switches and gigabit Ethernet network):

# ifconfig eth0 mtu 9000

Make it permanent.

# vi / etc/sysconfig/network-script/ifcfg-eth0

Add

MTU 9000

See http://www.cyberciti.biz/faq/rhel-centos-debian-ubuntu-jumbo-frames-configuration/ for more details.

The following article tests the above settings well:

Https://blogs.oracle.com/XPSONHA/entry/jumbo_frames_for_rac_interconn_1

The test steps and results are excerpted as follows:

Theory tells us properly configured Jumbo Frames can eliminate 10% of overhead on UDP traffic.

So how to test?

I guess an 'end to end' test would be best way. So my first test is a 30 minute Swingbench run against a two node RAC, not too much stress in the begin.

The MTU configuration of the network bond (and the slave nics will be 1500 initially).

After the test, collect the results on the total transactions, the average transactions per second, the maximum transaction rate (results.xml), interconnect traffic (awr) and cpu usage. Then, do exactly the same, but now with an MTU of 9000 bytes. For this we need to make sure the switch settings are also modified to use an MTU of 9000.

B.t.w.: yes, it's possible to measure network only, but real-life end-to-end testing with a real Oracle application talking to RAC feels like the best approach to see what the impact is on for example the avg. Transactions per second.

In order to make the test as reliable as possible some remarks:

-use guaranteed snapshots to flashback the database to its original state.

-stop/start the database (clean the cache)

B.t.w: before starting the test with an MTU of 9000 bytes the correct setting had to be proofed.

One way to do this is using ping with a packet size (- s) of 8972 and prohibiting fragmentation (- M do).

One could send Jumbo Frames and see if they can be sent without fragmentation.

[root@node01 rk] # ping-s 8972-M do node02-ic-c 5

PING node02-ic. (192.168.23.32) 8972 (9000) bytes of data.

8980 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.914 ms

As you can see this is not a problem. While for packages larger then 9000 bytes, this is a problem:

[root@node01 rk] # ping-s 8973-M do node02-ic-c 5

-node02-ic. Ping statistics

5 packets transmitted, 5 received, 0 packet loss, time 4003ms

Rtt min/avg/max/mdev = 0.859max 0.955 ms 1.167 ms, pipe 2

PING node02-ic. (192.168.23.32) 8973 (9001) bytes of data.

From node02-ic. (192.168.23.52) icmp_seq=0 Frag needed and DF set (mtu = 9000)

Bringing back the MTU size to 1500 should also prohibit sending of fragmented 9000 packages:

[root@node01 rk] # ping-s 8972-M do node02-ic-c 5

PING node02-ic. (192.168.23.32) 8972 (9000) bytes of data.

-node02-ic. Ping statistics

5 packets transmitted, 0 received, 100% packet loss, time 3999ms

Bringing back the MTU size to 1500 and sending 'normal' packages should work again:

[root@node01 rk] # ping node02-ic-M do-c 5

PING node02-ic. (192.168.23.32) 56 (84) bytes of data.

64 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.174 ms

-node02-ic. Ping statistics

5 packets transmitted, 5 received, 0 packet loss, time 3999ms

Rtt min/avg/max/mdev = 0.174 ms, pipe 2

An other way to verify the correct usage of the MTU size is the command 'netstat-a-I-n' (the column MTU size should be 9000 when you are performing tests on Jumbo Frames):

Kernel Interface table

Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg

Bond0 1500 0 10371535 00 0 15338093 00 0 BMmRU

Bond0:1 1500 0-no statistics available-BMmRU

Bond1 9000 0 83383378 000 89645149 000 BMmRU

Eth0 9000 0 36 000 88805888 000 BMsRU

Eth2 1500 0 8036210 00 0 14235498 00 0 BMsRU

Eth3 9000 0 83383342 000 839261 000 BMsRU

Eth4 1500 0 2335325 00 0 1102595 00 0 BMsRU

Eth5 1500 0 252075239 00 0 252020454 00 0 BMRU

Eth6 1500 00 00 00 0 BM

As you can see my interconnect in on bond1 (build on eth0 and eth3). All 9000 bytes.

Not finished yet, no conclusions yet, but here is my first result.

You will notice the results are not that significantly.

MTU 1500:

TotalFailedTransactions: 0

AverageTransactionsPerSecond: 1364

MaximumTransactionRate: 107767

TotalCompletedTransactions: 4910834

MTU 9000:

TotalFailedTransactions: 1

AverageTransactionsPerSecond: 1336

MaximumTransactionRate: 109775

TotalCompletedTransactions: 4812122

In a chart this will look like this:

As you can see, the number of transactions between the two tests isn't really that significant, but the UDP traffic is less! Still, I expected more from this test, so I have to put more stress to the test.

I noticed the failed transaction, and found "ORA-12155 TNS-received bad datatype in NSWMARKER packet". I did verify this and I am sure this is not related to the MTU size. This is because I only changed the MTU size for the interconnect and there is no TNS traffic on that network.

As said, I will now continue with tests that have much more stress on the systems:

-number of users changed from 80 to 150 per database

-number of databases changed from 1 to 2

-more network traffic:

-rebuild the Swingbench indexes without the 'REVERSE' option

-altered the sequences and lowered increment by value to 1 and cache size to 3. (in stead of 800)

-full table scans all the time on each instance

-run longer (4 hours in stead of half an hour)

Now, what you see is already improving. For the 4 hour test, the amount of extra UDP packets sent with an MTU size of 1500 compared to an MTU size of 9000 is about 2.5 to 3 million, see this chart:

Imagine yourself what an impact this has. Each package you not send save you the network-overhead of the package itself and a lot of CPU cycles that you don't need to spend.

The load average of the Linux box also decreases from an avg of 16 to 14.

In terms of completed transactions on different MTU sizes within the same timeframe, the chart looks like this:

To conclude this test two very high load runs are performed. Again, one with an MTU of 1500 and one with an MTU of 9000.

In the charts below you will see less CPU consumption when using 9000 bytes for MTU.

Also less packets are sent, although I think that number is not that significant compared to the total number of packets sent.

My final thoughts on this test:

1. You will hardly notice the benefits of using Jumbo on a system with no stress

2. You will notice the benefits of Jumbo using Frames on a stressed system and such a system will then use less CPU and will have less network overhead.

This means Jumbo Frames help you scaling out better then regular frames.

Depending on the interconnect usage of your applications the results may vary of course. With interconnect traffic intensive applications you will see the benefits earlier then with application that have relatively less interconnect activity.

I would use Jumbo Frames to scale better, since it saves CPU and reduces network traffic and this way leaves space for growth.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report