ThreadLocal in 1.4 and 1.5 has a memory leak. It isn’t very snappy about freeing local values. It does not handle the
InheritableThreadLocal.childValue template methods with sufficient care.
In 1.1 Swing (IIRC), 1.2 and 1.3,
ThreadLocal had a synchronised map from
Thread to local value. This works fine, but the lock hurts performance.
From 1.4, each
Thread has a map from
ThreadLocal to a local value. As only a single thread ever uses each map it can be fast. The mapping uses a form of weak hash map with strong references to the local values. Any strong reference path from the local value to the
ThreadLocal protects the
ThreadLocal, and hence the value, from garbage collection. The dead values cannot therefore be collected until the thread exits.
+--------+ +----------+ +----------------+ | | | |«weak» | | | Thread |-------->| Entry<T> |-------->| ThreadLocal<T> | | |1 *| |* 1| | +--------+ +----------+ +----------------+ | V +----------+ | | | T | | | +----------+
It seems reasonable to assume you would never do anything as daft as make a
ThreadLocal‘s value reference the
ThreadLocal. However, we know these sort of object structures can be created all too easily.
Consider a web app with a
ThreadLocal assigned to a static final variable referencing an object of a class defined within WEB-INF/lib. Now
Thread has a strong reference through a map to the local value, all objects have an implicit reference to their
Class to its
ClassLoader, and that to the
Class with the static reference to the
ThreadLocal. So we have a strong reference path from
ThreadLocal. As servlet containers tend not to bind threads to web apps, reloading will leak severely. Applets use
ThreadGroups so are not affected, although they have problems of their own.
Leak fixing (too much)
Thread needs to have a weak, rather than strong, references to local values as well as to
ThreadLocal keys. In fact a weak reference to the
Thread hash map entry works better. Now we have the problem that the local value can be collected at any point, even while still valid.
+--------+ +----------+ +----------------+ | |«weak» | | | | | Thread |-------->| Entry<T> |---------| ThreadLocal<T> | | |1 *| |* 1| | +--------+ +----------+ +----------------+ | V +----------+ | | | T | | | +----------+
ThreadLocal needs to maintain strong references to the initialised values (or hash map entries), map from
Thread will clear associated references when it exits. As pre-1.3
ThreadLocal, such a map is very slow as it is shared between threads and therefore needs to be synchronised. However, as we are only using it to ensure strong referencing, it need never be read. The only time synchronisation is necessary is on creating the initial value for a thread and removing it when the thread exits (unless the
ThreadLocal itself has already been collected).
1.4/1.5 is somewhat tardy about releasing local values, even when it doesn’t leak. Repeatedly creating, initialising and dropping thread locals, although a rare case, retains a surprisingly quantity of memory.
ThreadLocal goes to considerable lengths to detect and remove stale entries in the course of its normal work. In exchange for some performance, entries are released earlier than might be expected, but still only on demand and if you’re lucky.
With our fixed thread-local map keeping a weak reference to the entire entry rather than just the key, the garbage collector by itself takes out everything except the cleared
WeakReference object. Reclamation of the value is very much faster. The single remaining remnant can wait to be reclaimed.
With relaxed demands on the weak-entry hash map, we can go for a more conventional algorithm. It still needs to be custom as the entire entry has a weak reference to it. I have gone for a reasonably conventional hash map, but with rotating circular singularly-linked lists for buckets. Stale entries are cleaned out only after a threshold is reached, and before checking a lower threshold for resizing.
The hash map’s array elements point to the last read link for that index. So lookup usually follow a fast code path. In rare cases another lookup with matching hash code happens in between. Things only start to go bad when three or more entries with colliding hash codes are repeatedly read in the reverse order they appear in.
A conventional way to use
ThreadLocal is to subclass and implement the
initialValue factory method. Use of the thread-local becomes easy as you don’t need to check that you’ve set it to some valid value (and null may be valid) at every read. Also, if you’re in the habit of frequently calling
As ever you need to be really, really careful when invoking a call-back. You must ensure that the calling class is re-entrant. In the case of
initialValue, if it does not throw it must only be called once.
ThreadLocal gets confused if an
initialValue implementation initialises another
ThreadLocal. The hash map get resized, leaving the half initialised thread-local writing to an out of use table, or the table may just be jumbled.
We also want to make sure an exception throw from
initialValue does not leave the thread-local map corrupt and that
StackOverflowError do not occur after
initialValue has return successfully.
So clearly we need to allocate our hash map entry and
WeakReference before calling
initialValue. They should also have been placed in both maps appropriately. In order for re-entrant initialisations not to doubly insert entries for one
ThreadLocal it is necessary to mark the entry as valid or not.
As all entries need to be found when setting their value, but only valid entries for getting, separate read-key and write-key references is a convenient approach. The entry and
WeakReference are created with only the write-key set. They are then placed into both maps. The initial-value is found. As a final step, if no exceptions have been thrown, the entry’s value and read-key are set and
get() returns exception-free.
I noticed when playing around with this stuff that merely loading a class, which happens lazily so lots of threads do it, causes the thread-locals map to be initialised for that thread. Seems a bit wasteful, and we can do better in terms of performance. The class responsible is
StringCoding which gets called when a String is constructed from a byte array. As this class is in
java.lang, instead of a
ThreadLocal, we can just add a field onto
In general, if you have control of
Thread construction, adding a field will be faster and possibly less buggy than using
Threads, garbage collections, dynamic memory allocation, exceptions and re-entrant call-backs: a nasty combination but solvable. I think.
Update 14-Apr-2005: Added to the Bug Parade as Bug #6254531. Note, the patch does indeed suck.
- Bug #4252910: speed improvement for ThreadLocal.java
- Bug #5025230 : (thread) Creating thread local variables from within ThreadLocal.initialValue()
- “ThreadLocals are EVIL!!!” — Alef Arendsen
- “ThreadLocals and memory leaks revisited” — Alef Arendsen