Tag Archives: Amnesia fixing

[repost]Fixing ThreadLocal

original:http://jroller.com/tackline/entry/fixing_threadlocal

ThreadLocal in 1.4 and 1.5 has a memory leak. It isn’t very snappy about freeing local values. It does not handle the initialValue and InheritableThreadLocal.childValue template methods with sufficient care.

The leak

In 1.1 Swing (IIRC), 1.2 and 1.3, ThreadLocal had a synchronised map from Thread to local value. This works fine, but the lock hurts performance.

From 1.4, each Thread has a map from ThreadLocal to a local value. As only a single thread ever uses each map it can be fast. The mapping uses a form of weak hash map with strong references to the local values. Any strong reference path from the local value to the ThreadLocal protects the ThreadLocal, and hence the value, from garbage collection. The dead values cannot therefore be collected until the thread exits.

+--------+         +----------+         +----------------+
|        |         |          |«weak»   |                |
| Thread |-------->| Entry<T> |-------->| ThreadLocal<T> |
|        |1       *|          |*       1|                |
+--------+         +----------+         +----------------+
                        |
                        V
                   +----------+
                   |          |
                   |    T     |
                   |          |
                   +----------+

It seems reasonable to assume you would never do anything as daft as make a ThreadLocal‘s value reference the ThreadLocal. However, we know these sort of object structures can be created all too easily.

Consider a web app with a ThreadLocal assigned to a static final variable referencing an object of a class defined within WEB-INF/lib. Now Thread has a strong reference through a map to the local value, all objects have an implicit reference to their Class, and Class to its ClassLoader, and that to the Class with the static reference to the ThreadLocal. So we have a strong reference path from Thread to ThreadLocal. As servlet containers tend not to bind threads to web apps, reloading will leak severely. Applets use ThreadGroups so are not affected, although they have problems of their own.

Leak fixing (too much)

Clearly Thread needs to have a weak, rather than strong, references to local values as well as to ThreadLocal keys. In fact a weak reference to the ThreadLocal-to-Thread hash map entry works better. Now we have the problem that the local value can be collected at any point, even while still valid.

+--------+         +----------+         +----------------+
|        |«weak»   |          |         |                |
| Thread |-------->| Entry<T> |---------| ThreadLocal<T> |
|        |1       *|          |*       1|                |
+--------+         +----------+         +----------------+
                        |
                        V
                   +----------+
                   |          |
                   |    T     |
                   |          |
                   +----------+

Amnesia fixing

ThreadLocal needs to maintain strong references to the initialised values (or hash map entries), map from Threads. Thread will clear associated references when it exits. As pre-1.3 ThreadLocal, such a map is very slow as it is shared between threads and therefore needs to be synchronised. However, as we are only using it to ensure strong referencing, it need never be read. The only time synchronisation is necessary is on creating the initial value for a thread and removing it when the thread exits (unless the ThreadLocal itself has already been collected).

About time

1.4/1.5 is somewhat tardy about releasing local values, even when it doesn’t leak. Repeatedly creating, initialising and dropping thread locals, although a rare case, retains a surprisingly quantity of memory. ThreadLocal goes to considerable lengths to detect and remove stale entries in the course of its normal work. In exchange for some performance, entries are released earlier than might be expected, but still only on demand and if you’re lucky.

With our fixed thread-local map keeping a weak reference to the entire entry rather than just the key, the garbage collector by itself takes out everything except the cleared WeakReference object. Reclamation of the value is very much faster. The single remaining remnant can wait to be reclaimed.

With relaxed demands on the weak-entry hash map, we can go for a more conventional algorithm. It still needs to be custom as the entire entry has a weak reference to it. I have gone for a reasonably conventional hash map, but with rotating circular singularly-linked lists for buckets. Stale entries are cleaned out only after a threshold is reached, and before checking a lower threshold for resizing.

The hash map’s array elements point to the last read link for that index. So lookup usually follow a fast code path. In rare cases another lookup with matching hash code happens in between. Things only start to go bad when three or more entries with colliding hash codes are repeatedly read in the reverse order they appear in.

Call-back etiquette

A conventional way to use ThreadLocal is to subclass and implement the initialValue factory method. Use of the thread-local becomes easy as you don’t need to check that you’ve set it to some valid value (and null may be valid) at every read. Also, if you’re in the habit of frequently calling set, don’t.

As ever you need to be really, really careful when invoking a call-back. You must ensure that the calling class is re-entrant. In the case of initialValue, if it does not throw it must only be called once.

The current ThreadLocal gets confused if an initialValue implementation initialises another ThreadLocal. The hash map get resized, leaving the half initialised thread-local writing to an out of use table, or the table may just be jumbled.

We also want to make sure an exception throw from initialValue does not leave the thread-local map corrupt and that OutOfMemoryError, StackOverflowError do not occur after initialValue has return successfully.

So clearly we need to allocate our hash map entry and WeakReference before calling initialValue. They should also have been placed in both maps appropriately. In order for re-entrant initialisations not to doubly insert entries for one ThreadLocal it is necessary to mark the entry as valid or not.

As all entries need to be found when setting their value, but only valid entries for getting, separate read-key and write-key references is a convenient approach. The entry and WeakReference are created with only the write-key set. They are then placed into both maps. The initial-value is found. As a final step, if no exceptions have been thrown, the entry’s value and read-key are set and get() returns exception-free.

A shortcut

I noticed when playing around with this stuff that merely loading a class, which happens lazily so lots of threads do it, causes the thread-locals map to be initialised for that thread. Seems a bit wasteful, and we can do better in terms of performance. The class responsible is StringCoding which gets called when a String is constructed from a byte array. As this class is in java.lang, instead of a ThreadLocal, we can just add a field onto Thread.

In general, if you have control of Thread construction, adding a field will be faster and possibly less buggy than using ThreadLocal.

Summary

Threads, garbage collections, dynamic memory allocation, exceptions and re-entrant call-backs: a nasty combination but solvable. I think.

Update 14-Apr-2005: Added to the Bug Parade as Bug #6254531. Note, the patch does indeed suck.

See:

(2005-04-14 15:37:54.0) Permalink Comments [4]

Comments:
Even when a ThreadLocal is garbage collected, the WeakReference only clears the key of the entry. At this point, Entry.get() returns null and the ThreadLocal itself can be gc’d, but the hard reference to the value remains for much longer. This unfinalizable value ends up making it’s class and class loader unfinalizable until its expunged from the table, which happens annoingly slowly.

These would be eventually cleaned up when the ThreadLocal’s private rehash() method calls expungeStaleEntries() or if cleanSomeSlots(i,j) happens to get it, but it’s annoying that there’s no way to force stale entries to be cleared eagerly, especially when it might keep my whole webapp classloader from being finalizable.

Posted by bwtaylor on April 27, 2007 at 09:21 AM WEST #

Would it not be a better solution for app servers to use a context classloader that exposed a version of ThreadLocal that would then only be resolved to a single thread?

In this way, there would never be data locking needed. Wouldn’t any self referencing structures then be released when the context classloader was dereferenced on a reload?Posted by Gregg Wonderly on October 09, 2007 at 02:11 PM WEST #

Another slant on the leak issue is that the problem is the ad-hoc pairing of the weak-reference to the key and the strong reference to the value. Suppose we could add a new java.lang.ref class that allowed you to pair these values such that a reference chain from the value would not be treated as a strong reference to the paired-object. In that way the weak-reference to the key would be cleared and all other clearing would follow.

Feasible?Posted by David Holmes on October 10, 2007 at 07:46 AM WEST #

Gregg: Two problems. Firstly ClassLoader prevents loading of java.* classes (presumably for non-technical reasons). Secondly, as currently implemented ThreadLocal don’t actually include any thread local data. All the values are reference through the Thread object (via Thread.currentThread). So Thread would need to be loaded by a disposable class loader. But then you might as well just cycle the threads as any “good” (web) app server would do now.

David: As RFE “(ref) Add joined weak references / weak properties”. I’ve had one of my JDC bugs there for sometime (are Sun employees supposed to keep votes?). I don’t know how feasible it is to implement. Certainly it would be handy for a number of object lifetime contention issues.

http://bugs.sun.com/view_bug.do?bug_id=4630118