What is the reason for the high speed of Netty FastThreadLocal? 07/06 Update SLTechnology News&Howtos

What is the reason for the high speed of Netty FastThreadLocal?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the reason why Netty's FastThreadLocal speed is fast?" the content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the reason for Netty's FastThreadLocal speed?"

Preface

Recently, when looking at the netty source code, I found a class called FastThreadLocal. Jdk itself comes with a ThreadLocal class, so you can roughly think of where it is faster than the class that comes with jdk, where it is faster, and why it is faster. Let's do a simple analysis.

Performance testing

ThreadLocal is mainly used in the multithreaded environment to obtain the data of the current thread easily. Users do not need to care about the multithreading problem and are convenient to use. In order to illustrate the problem, two scenarios are tested respectively: multiple threads operate the same ThreadLocal, and multiple ThreadLocal under a single thread are tested as follows:

1. Multiple threads operate on the same ThreadLocal

Use the test code for ThreadLocal and FastThreadLocal respectively, some of which are as follows:

Public static void test2 () throws Exception {CountDownLatch cdl = new CountDownLatch (10000); ThreadLocal threadLocal = new ThreadLocal (); long starTime = System.currentTimeMillis (); for (int I = 0; I

< 10000; i++) { new Thread(new Runnable() { @Override public void run() { threadLocal.set(Thread.currentThread().getName()); for (int k = 0; k < 100000; k++) { threadLocal.get(); } cdl.countDown(); } }, "Thread" + (i + 1)).start(); } cdl.await(); System.out.println(System.currentTimeMillis() - starTime + "ms"); } 以上代码创建了10000个线程，同时往ThreadLocal设置，然后get十万次，然后通过CountDownLatch来计算总的时间消耗，运行结果为：1000ms左右；下面再对FastThreadLocal进行测试，代码类似： public static void test2() throws Exception { CountDownLatch cdl = new CountDownLatch(10000); FastThreadLocal threadLocal = new FastThreadLocal(); long starTime = System.currentTimeMillis(); for (int i = 0; i < 10000; i++) { new FastThreadLocalThread(new Runnable() { @Override public void run() { threadLocal.set(Thread.currentThread().getName()); for (int k = 0; k < 100000; k++) { threadLocal.get(); } cdl.countDown(); } }, "Thread" + (i + 1)).start(); } cdl.await(); System.out.println(System.currentTimeMillis() - starTime); } 运行之后结果为：1000ms左右；可以发现在这种情况下两种类型的ThreadLocal在性能上并没有什么差距，下面对第二种情况进行测试； 2.单线程下的多个ThreadLocal 分别对ThreadLocal和FastThreadLocal使用测试代码，部分代码如下： public static void test1() throws InterruptedException { int size = 10000; ThreadLocal tls[] = new ThreadLocal[size]; for (int i = 0; i < size; i++) { tls[i] = new ThreadLocal(); } new Thread(new Runnable() { @Override public void run() { long starTime = System.currentTimeMillis(); for (int i = 0; i < size; i++) { tls[i].set("value" + i); } for (int i = 0; i < size; i++) { for (int k = 0; k < 100000; k++) { tls[i].get(); } } System.out.println(System.currentTimeMillis() - starTime + "ms"); } }).start(); } 以上代码创建了10000个ThreadLocal，然后使用同一个线程对ThreadLocal设值，同时get十万次，运行结果：2000ms左右; 下面再对FastThreadLocal进行测试，代码类似： public static void test1() { int size = 10000; FastThreadLocal tls[] = new FastThreadLocal[size]; for (int i = 0; i < size; i++) { tls[i] = new FastThreadLocal(); } new FastThreadLocalThread(new Runnable() { @Override public void run() { long starTime = System.currentTimeMillis(); for (int i = 0; i < size; i++) { tls[i].set("value" + i); } for (int i = 0; i < size; i++) { for (int k = 0; k < 100000; k++) { tls[i].get(); } } System.out.println(System.currentTimeMillis() - starTime + "ms"); } }).start(); } 运行结果：30ms左右；可以发现性能达到两个数量级的差距，当然这是在大量访问次数的情况下才有的效果；下面重点分析一下ThreadLocal的机制，以及FastThreadLocal为什么比ThreadLocal更快； ThreadLocal的机制因为我们常用的就是set和get方法，分别看一下对应的源码： public void set(T value) { Thread t = Thread.currentThread(); ThreadLocalMap map = getMap(t); if (map != null) map.set(this, value); else createMap(t, value); } ThreadLocalMap getMap(Thread t) { return t.threadLocals; } 以上代码大致意思：首先获取当前线程，然后获取当前线程中存储的threadLocals变量，此变量其实就是ThreadLocalMap，最后看此ThreadLocalMap是否为空，为空就创建一个新的Map，不为空则以当前的ThreadLocal为key，存储当前value；可以进一步看一下ThreadLocalMap中的set方法： private void set(ThreadLocal key, Object value) { // We don't use a fast path as with get() because it is at // least as common to use set() to create new entries as // it is to replace existing ones, in which case, a fast // path would fail more often than not. Entry[] tab = table; int len = tab.length; int i = key.threadLocalHashCode & (len-1); for (Entry e = tab[i]; e != null; e = tab[i = nextIndex(i, len)]) { ThreadLocal k = e.get(); if (k == key) { e.value = value; return; } if (k == null) { replaceStaleEntry(key, value, i); return; } } tab[i] = new Entry(key, value); int sz = ++size; if (!cleanSomeSlots(i, sz) && sz >

= threshold) rehash ();}

General meaning: ThreadLocalMap uses an array to store data. Similar to HashMap;, each ThreadLocal will assign a threadLocalHashCode during initialization, and then perform modular operations with the length of the array, so there will be hash conflicts. Array + linked lists are used to deal with conflicts in HashMap, while in ThreadLocalMap, you can see that you can directly use nextIndex for traversal operations, which is obviously worse. Let's take a look at the get method:

Public T get () {Thread t = Thread.currentThread (); ThreadLocalMap map = getMap (t); if (map! = null) {ThreadLocalMap.Entry e = map.getEntry (this); if (e! = null) {@ SuppressWarnings ("unchecked") T result = (T) e.value; return result }} return setInitialValue ();}

Similarly, first get the current thread, then get the ThreadLocalMap in the current thread, and then use the current ThreadLocal as the key to get the value from the ThreadLocalMap:

Private Entry getEntry (ThreadLocal key) {int I = key.threadLocalHashCode & (table.length-1); Entry e = table [I]; if (e! = null & & e.get () = = key) return e; else return getEntryAfterMiss (key, I, e) } private Entry getEntryAfterMiss (ThreadLocal key, int I, Entry e) {Entry [] tab = table; int len = tab.length; while (e! = null) {ThreadLocal k = e.get (); if (k = = key) return e If (k = = null) expungeStaleEntry (I); else I = nextIndex (I, len); e = tab [I];} return null;}

In the same way as set, the array subscript is obtained by taking the module, and if there is no conflict, the data is returned directly, otherwise traversal will also occur. Therefore, the following problems can be roughly known through analysis:

1.ThreadLocalMap is stored under Thread, and ThreadLocal is used as key, so multiple threads operating the same ThreadLocal is actually a record inserted in the ThreadLocalMap of each thread, and there is no conflict.

When resolving conflicts, 2.ThreadLocalMap has a great impact on performance by traversing

3.FastThreadLocal solves conflicts in other ways to optimize performance

Let's move on to see how FastThreadLocal optimizes performance.

Why is the FastThreadLocal of Netty fast?

Two classes, FastThreadLocal and FastThreadLocalThread, are provided in Netty, and FastThreadLocalThread inherits from Thread. The following also analyzes the source code of the commonly used set and get methods:

Public final void set (V value) {if (value! = InternalThreadLocalMap.UNSET) {set (InternalThreadLocalMap.get (), value);} else {remove ();}} public final void set (InternalThreadLocalMap threadLocalMap, V value) {if (value! = InternalThreadLocalMap.UNSET) {if (threadLocalMap.setIndexedVariable (index, value)) {addToVariablesToRemove (threadLocalMap, this) Else {remove (threadLocalMap);}}

Here, we first determine whether value is InternalThreadLocalMap.UNSET, and then an InternalThreadLocalMap is also used to store data:

Public static InternalThreadLocalMap get () {Thread thread = Thread.currentThread (); if (thread instanceof FastThreadLocalThread) {return fastGet ((FastThreadLocalThread) thread);} else {return slowGet ();}} private static InternalThreadLocalMap fastGet (FastThreadLocalThread thread) {InternalThreadLocalMap threadLocalMap = thread.threadLocalMap (); if (threadLocalMap = = null) {thread.setThreadLocalMap (threadLocalMap = new InternalThreadLocalMap ()) } return threadLocalMap;}

It can be found that InternalThreadLocalMap is also stored in FastThreadLocalThread. The difference is that instead of using the hash value corresponding to ThreadLocal to obtain the location, the index attribute of FastThreadLocal is directly used. Index is initialized when instantiated:

Private final int index; public FastThreadLocal () {index = InternalThreadLocalMap.nextVariableIndex ();}

Then enter the nextVariableIndex method:

Static final AtomicInteger nextIndex = new AtomicInteger (); public static int nextVariableIndex () {int index = nextIndex.getAndIncrement (); if (index < 0) {nextIndex.decrementAndGet (); throw new IllegalStateException ("too many thread-local indexed variables");} return index;}

There is a static nextIndex object in InternalThreadLocalMap that is used to generate array subscripts. Because it is static, the index generated by each FastThreadLocal is contiguous. Take a look at how setIndexedVariable is in InternalThreadLocalMap:

Public boolean setIndexedVariable (int index, Object value) {Object [] lookup = indexedVariables; if (index < lookup.length) {Object oldValue = lookup [index]; lookup [index] = value; return oldValue = = UNSET;} else {expandIndexedVariableTableAndSet (index, value); return true;}}

IndexedVariables is an array of objects, which is used to store value; directly using index as the array subscript. If the index is greater than the array length, the capacity is expanded. The get method is directly read through the index in FastThreadLocal:

Public final V get (InternalThreadLocalMap threadLocalMap) {Object v = threadLocalMap.indexedVariable (index); if (v! = InternalThreadLocalMap.UNSET) {return (V) v;} return initialize (threadLocalMap);} public Object indexedVariable (int index) {Object [] lookup = indexedVariables; return index < lookup.length? Lookup [index]: UNSET;}

Reading directly through the subscript is very fast, but there is a problem, which may lead to a waste of space

Summary

From the above analysis, we can know that performance problems may be encountered only when there are a large number of ThreadLocal for read and write operations; in addition, FastThreadLocal achieves O (1) reading data by exchanging space for time; there is also a question why HashMap (array + black red tree) is not used internally instead of ThreadLocalMap.

Thank you for your reading, the above is the content of "what is the reason for the fast FastThreadLocal of Netty". After the study of this article, I believe you have a deeper understanding of what is the reason for the fast FastThreadLocal of Netty, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.