Demystifying Labyrinthine: 2013

Monday, December 16, 2013

Abstract Factory Design Pattern

With wider set of available options in the market, applications have to support various protocols at a specific moment in time. It becomes tedious to integrate all such vendors at one go. So to reduce the responsibility of the developer it is mandatory to share the burden with different vendors.

Abstraction has been worked wonderfully for developers and companies in the past. In this pattern too abstraction is the base.

Here, application designates the responsibility of object creation with vendor. Vendor has to write factory to instantiate the object whereas application only uses the reference of the interface.

Using this pattern a framework is defined, which produces objects that follow a general pattern and at runtime this factory is paired with any concrete factory to produce objects that follow the pattern of a certain country. In other words, the Abstract Factory is a super-factory which creates other factories (Factory of factories).

Pic taken from http://howtodoinjava.com/2012/10/29/abstract-factory-pattern-in-java/

As you can see in the image, application wants to decide certain type of cars. So application has provided carfactory interface, which delegates the responsibility on its implementing classes to implement the specific requirement.

As it is evident application has provided default factory, but there are other factories which are defined based on the region i.e. Asia, and US. It can be further divided into different segments i.e. country or class or car.

Advantage of this pattern is the delegating the creation of object on individual vendor.

Tuesday, July 23, 2013

Threads: All About Volatile

I was thinking to improve my threading fundamentals, so I planned to make few notes for myself that will assist me in future (Obviously at the time of Interviews). Following questions and answers have been taken from stackoverflow.com. This post is a sticky notes post for me. Please do not use the answers for reference purpose.

Here we will try to concentrate on basic questions on volatile. Volatile has been quite a common word in threading world, but it has not been utilized so much in codes. Let’s start digging little deeper to figure out about volatile.

Do you ever use the volatile keyword in Java?

When exactly do you use the volatile keyword in Java?

Most of the information has been taken from https://www.ibm.com/developerworks/java/library/j-jtp06197/. I have tried to get the extract of the article.

Volatile variables can perform subset of things that synchronize block can perform. In general cases, lock offers mutual exclusion and visibility feature. Mutual exclusion facilitated only one thread to access the shared resources, whereas visibility means that inconsistent data must not be visible across different threads. Volatile variables can only provide visibility feature to the application.

If a variable is declared volatile, so any thread reading the variable will always see the most updated value of the variable.

Java 5 onward, to access a volatile variable JVM creates a memory barrier and synchronize the main memory will all cached instances, and returns latest and most updated information.

Volatile variables must be used where their assignment do not depend on previous value.

volVar += 1; //It is not a correct use to volatile variables.

That’s why volatile variables cannot be used as counters.

Volatile variables are preferred over Synchronized blocks due to Simplicity, No need to write lot of code, and scalability, no locks are held while processing, but volatile has got its own restrictions as well.

Conditions for correct use of volatile

You can use volatile variables instead of locks only under a restricted set of circumstances. Both of the following criteria must be met for volatile variables to provide the desired thread-safety:

Writes to the variable do not depend on its current value.
The variable does not participate in invariant with other variables.

Patterns for using volatile correctly

Though people discourage to use to volatile variables, but still there are few well defined patterns to use volatile variables.

Pattern #1: status flags

In Producer/consumer problem we change the status of the flag so that other thread can enter in the critical section and can read or write the data. The access of the block is managed with a Boolean. This Boolean can be volatile variables.

volatile boolean stopRequested;
public void stop() { stopRequested = true; }
public void doWork() { 
    while (!stopRequested) { 
        // do stuff
    }
}

Now if you will see the transactions performed on volatile variables, then you will realize that volatile variable’s state was not considered while updating it. So this is the correct use of volatile variable.

Pattern #2: one-time safe publication

Without synchronization, sometime it is possible to see updated object’s reference but stale data. It is quite a tricky problem, but it can be handled with volatile variable.

A key requirement for this pattern is that the object being published must either be thread-safe or effectively immutable (effectively immutable means that its state is never modified after its publication). The volatile reference may guarantee the visibility of the object in its as-published form, but if the state of the object is going to change after publication, then additional synchronization is required.

public class BackgroundFloobleLoader {
    public volatile Flooble theFlooble;

    public void initInBackground() {
        // do lots of stuff
        theFlooble = new Flooble();  // this is the only write to theFlooble
    }
}

public class SomeOtherClass {
    public void doWork() {
        while (true) { 
            // do some stuff...
            // use the Flooble, but only if it is ready
            if (floobleLoader.theFlooble != null) 
                doSomething(floobleLoader.theFlooble);
        }
    }
}

Pattern #3: independent observations

It is very similar to temperature reading. Temperature is read and written in the memory which can be accessed by multiple threads.

There are few more patterns listed in the article, but those patterns will need further understanding of java related technologies.

Wednesday, July 17, 2013

Threads: Why is it not good practice to synchronize on Boolean?

Here I got information to post for my reference.

Why is it not good practice to synchronize on Boolean?

Instead writing something new, I am taking the same example that is also copied from other blog post.

private Boolean isOn = false;
private String statusMessage = "I'm off";
public void doSomeStuffAndToggleTheThing(){
   // Do some stuff
   synchronized(isOn){
      if(isOn){
         isOn = false;
         statusMessage = "I'm off";
         // Do everything else to turn the thing off
      } else {
         isOn = true;
         statusMessage = "I'm on";
         // Do everything else to turn the thing on
      }
   }
}

As it turned out that it is not possible to get a neat lock on Booleans. As due to auto boxing mechanism (As you can get lock on objects only), true and false are Auto boxed to Boolean.True and Boolean.False objects. In a JVM, there can be only one object for Boolean.True and another object for Boolean.False. So application is written in a way that multiple threads locks on Boolean at various places, then it is possible that the complete code will be messed up without our knowledge, and it will be difficult to figure out the reason without knowing about Boolean classes.

Taken from the answer 2by McDowell

This is a terrible idea. isOn will reference the same object as Boolean.FALSE which is publicly available. If any other piece of badly written code also decides to lock on this object, two completely unrelated transactions will have to wait on each other. Locks are performed on object instances, not on the variables that reference them:

Any Boolean that is created through autoboxing (isOn = true) is the same object as Boolean.TRUE which is a singleton in theClassLoader across all objects. Your lock object should be local to the class it is used in. The proper pattern if you need to lock around a boolean is to define a private final lock object:

Let’s assume that our application does not have any other thread which is locking on Boolean object, then also above mentioned code is vulnerable and it is a good candidate for RACE Condition.

Let’s say one thread enters in the synchronized block when isOn was false. In this case Thread.False object is locked, and thread is in critical section of the code. Now isOn is changed to true, and after this another thread try to grab a lock on the same section. Thread.True is not locked with any other thread, which means that Thread 2 will grab the lock and will enter into critical section.

In this scenario mentioned above, both the threads are accessing critical section simultaneously which is not a desired scenario for us.

Following issue can be fixed with the following ways

There are loads of ways you could fix this:

Synchronize on this
Synchronize on a private final Object specifically designated for the purpose (neater if someone else might extend our class)
Replace isOn with a final AtomicBoolean that you can alter using get and set methods rather than assignments (you’ll still need to synchronize for testing the state)
Redesign the class to avoid this sort of faffing about (such as using constant message Strings for each state)

This is a fairly subtle consequence of the Java abstraction, and could have caused me a huge headache if it had gone unchecked – leaving one of those wonderfully annoying intermittent bugs floating around (the kind that take forever to track down and debug).

BOTTOM LINE: Avoid using primitives types which might change in critical section, and be aware of Autoboxing.

Tuesday, July 16, 2013

WSConsume Task and Jaxb Issues

My experience with WSDL and JAXB is not much. So I face various issues while working on those WSDL once a while. Though most of the issues have wide support on internet but this issue took plenty of time to find the solution.

Following is my ant entry to run the wsconsume utility.

Content of TEST_binding xml are as following

<?xml version="1.0" encoding="UTF-8"?>
<jxb:bindings version="1.0" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
                xmlns:jxb="http://java.sun.com/xml/ns/jaxb"
                xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc"
                xmlns:xs="http://www.w3.org/2001/XMLSchema">
                <jxb:bindings schemaLocation="TEST.wsdl#types?schema1">
                                <jxb:schemaBindings>
                                                <jxb:package name="com.test " />
                                </jxb:schemaBindings>
                                <jxb:bindings node=                "//xs:complexType[@name='test']/xs:simpleContent/xs:extension[@base='s:string']/xs:attribute[@name='value']">
                                                <jxb:property name="testValue" />
                                </jxb:bindings>
                </jxb:bindings>
</jxb:bindings>

My file TEST.wsdl is stored in /META-INF/wsdl/TEST folder. So it is kind of all messed up from the start. Developer has already built a monster and I had to deal with it without knowing much about functionality.

When I ran the build xml with the following set up, I encountered the following issue.

"<COMPLETE_PATH>/TEST.wsdl" is not a part of this compilation. Is this a mistake for "file: <COMPLETE_PATH>/TEST.wsdl #types?schema1"?

I found few links which suggested me to copy the WSDL under wsdl folder.

So now I moved WSDL files to /META-INF/wsdl/TEST/wsdl folder.

On Rerunning the application, I got file not found exception. As my application was already messed up, so I messed up a little more, and backed up the files in /META-INF/wsdl/TEST/wsdl and /META-INF/wsdl/TEST/ folders.

I reran the application, but again I faced same issues. So with some help of google sir, I got one alternative of TEST.wsdl#types?schema1. So I attempted that.

I replaced TEST.wsdl#types?schema1 test to TEST.wsdl#types1, and run the application again.

This time I did not get the compilation error. Now error was different. Error was as mentioned below.

[wsconsume] org.apache.cxf.tools.common.ToolException: java.lang.reflect.UndeclaredThrowableException

[wsconsume] at org.apache.cxf.tools.wsdlto.WSDLToJavaContainer.execute(WSDLToJavaContainer.java:279)

[wsconsume] at org.apache.cxf.tools.common.toolspec.ToolRunner.runTool(ToolRunner.java:103)

[wsconsume] at org.apache.cxf.tools.wsdlto.WSDLToJava.run(WSDLToJava.java:113)

[wsconsume] Caused by: java.lang.NoSuchMethodException: javax.xml.bind.annotation.XmlElementRef.required()

[wsconsume] at java.lang.Class.getDeclaredMethod(Unknown Source)

It means that required method did not exist in the library files of jaxb. Java 6 works older version of JAXB, and Java 7 works with latest version of jaxb. You can choose to either use Java 7 or provide endorsed libraries to java. For me endorsed library path did not work, so I ran the application on Java 7 and it compiled and created respective jar to me.

I know, I have not given reasons and facts, but still I tried to write down all the problems I faced. Hope helps.

Sunday, July 14, 2013

Threads: calling thread.start() within its own constructor

This post will not provide lot of information, but it is good to know.

calling thread.start() within its own constructor

public class MyNewThread implements Runnable {

Thread t;

Int counter;

String name;

MyNewThread() {

t = new Thread (this, "Data Thread");

t.start();

counter = 0;

name = “Default”;

}

public void run() {

// New Thread code here

}

public class Main {

public static void main(String[] args) throws Exception{

new MyNewThread();

// First thread code there

}

On executing the program, compiler warns from starting the thread at the time of initialization.

Simplest answer is as following

In this instance the MyNewThread.this is said to escape the constructor. That means that the object is available to reference but all of its fields that are being built in the constructor may not be created. To take this to another level what if b was final you would expect it to be available but it is not ensured. This is known as partially constructed objects and is perfectly legal in java.

For memory-safety reasons, you shouldn't expose a reference to an object or that object's fields to another thread from within its constructor. Assuming that your custom thread has instance variables, by starting it from within the constructor, you are guaranteed to violate, though legal, the Java Memory Model guidelines.

Another good reason for not starting the thread in constructor is convention. Anyone familiar with Java will assume that the thread has not been started. Worse yet, if they write their own threading code which interacts in some way with yours, then some threads will need to call start and others won't.

This may not seem compelling when you're working by yourself, but eventually you'll have to work with other people, and it's good to develop good coding habits so that you'll have an easy time working with others and code written with the standard conventions.

Various other such memory violation issues are reported at this place. http://www.ibm.com/developerworks/java/library/j-jtp0618/index.html

Following text is specific to our case.

“Don't start threads from within constructors

A special case of the problem in Listing 4 is starting a thread from within a constructor, because often when an object owns a thread, either that thread is an inner class or we pass the this reference to its constructor (or the class itself extends the Thread class). If an object is going to own a thread, it is best if the object provides a start() method, just like Thread does, and starts the thread from the start() method instead of from the constructor. While this does expose some implementation details (such as the possible existence of an owned thread) of the class via the interface, which is often not desirable, in this case the risks of starting the thread from the constructor outweigh the benefit of implementation hiding.”

Thursday, July 11, 2013

Threads: Avoid synchronized(this) in Java

This is a quite an interesting question, and questioner has followed every answer quite religiously. Though question seems like naïve, questioner has taken the discussion to different planet altogether.

Avoid synchronized(this) in Java?

Whenever a question pops up on SO about Java synchronization, some people are very eager to point out that synchronized(this) should be avoided. Instead, they claim, a lock on a private reference is to be preferred.

Some of the given reasons are:

some evil code may steal your lock (very popular this one, also has an "accidentally" variant)

Try to avoid synchronizing on this because that would allow everybody from the outside who had a reference to that object to block my synchronization. Instead, I create a local synchronization object:

public class Foo {

private final Object syncObject = new Object();

…

}

Now I can use that object for synchronization without fear of anybody “stealing” the lock.

all synchronized methods within the same class use the exact same lock, which reduces throughput

The issue is that a synchronized method is actually just syntax sugar for getting the lock on this and holding it for the duration of the method. Thus, public synchronized void setInstanceVar() would be equivalent to something like this:

public void setInstanceVar() {

synchronized(this) {

instanceVar++;

}

This is bad for two reasons:

All synchronized methods within the same class use the exact same lock, which reduces throughput
Anyone can get access to the lock, including members of other classes.

you are (unnecessarily) exposing too much information

The equivalent of that is:

synchronized (this)

(And no, you shouldn't generally do it in either C# or Java. Prefer locking on private references which nothing else has access to. You may be aware of that already, of course - but I didn't want to leave an answer without the warning :)

People argue that synchronized(this) is an idiom that is used a lot (also in Java libraries), is safe and well understood. It should not be avoided because you have a bug and you don't have a clue of what is going on in your multithreaded program. In other words: if it is applicable, then use it.

Answer

Some evil code may steal your lock (very popular this one, also has an "accidentally" variant)

I'm more worried about accidentally. What it amounts to is that this use of this is part of your class' exposed interface, and should be documented. Sometimes the ability of other code to use your lock is desired. This is true of things like Collections.synchronizedMap (see the javadoc).

All synchronized methods within the same class use the exact same lock, which reduces throughput

This is overly simplistic thinking; just getting rid of synchronized(this) won't solve the problem. Proper synchronization for throughput will take more thought.

You are (unnecessarily) exposing too much information

This is a variant of #1. Use of synchronized(this) is part of your interface. If you don't want/need this exposed, don't do it.

But in the same thread one more person has proved that defensive approach is required in some of the cases while dealing with threads.

Let's say your system is a servlet container, and the object we're considering is the ServletContextimplementation. Its getAttribute method must be thread-safe, as context attributes are shared data; so you declare it as synchronized. Let's also imagine that you provide a public hosting service based on your container implementation.

I'm your customer and deploy my "good" servlet on your site. It happens that my code contains a call togetAttribute.

A hacker, disguised as another customer, deploys his malicious servlet on your site. It contains the following code in the init method:

synchronized (this.getServletConfig().getServletContext()) {

while (true) {}

}

Assuming we share the same servlet context (allowed by the spec as long as the two servlets are on the same virtual host), my call on getAttribute is locked forever. The hacker has achieved a DoS on my servlet.

This attack is not possible if getAttribute is synchronized on a private lock, because 3rd-party code cannot acquire this lock.

I admit that the example is contrived and an over simplistic view of how a servlet container works, but IMHO it proves the point.

So I would make my design choice based on security consideration: will I have complete control over the code that has access to the instances? What would be the consequence of a thread's holding a lock on an instance indefinitely?

Discussion can be concluded saying that it is mandatory to define the purpose of threading and various other aspect of the application, and accordingly we must implements the Synchronization.

Wednesday, July 10, 2013

Threads: Difference between Synchronized Block and Synchronized Method

What is the difference between a synchronized method and synchronized block in Java?

What is the difference between synchronized methods and blocks?

Let’s look at some code to know Synchronized method and Synchronized block.

Synchronized method

public synchronized void synchronizedMethod(){
//Do something in this method
}

Synchronized block

public void synchronizedBlock(){

//Do Something before Synchronization

synchronized(this){

//Do Something here in synchronized block.

}

//Do Something after Synchronization

}

When static method is synchronized, then class’s object (created by Class.forName()) is locked whereas in case of instance method, class’s instance (this) is locked.

When a block is synchronized, then developer has to choose the object to lock. JVM implicitly do not lock anything here.

Something that JSR says

JLS 14.19 The synchronized Statement

A synchronized statement acquires a mutual-exclusion lock on behalf of the executing thread, executes a block, and then releases the lock. While the executing thread owns the lock, no other thread may acquire the lock.

JLS 8.4.3.6 synchronized Methods

A synchronized method acquires a monitor before it executes. For a class (static) method, the monitor associated with the Class object for the method's class is used. For an instance method, the monitor associated with this (the object for which the method was invoked) is used.

These are the same locks that can be used by the synchronized statement; thus, the code:

class Test {

int count;

synchronized void bump() { count++; }

static int classCount;

static synchronized void classBump() {

classCount++;

}

Has exactly the same effect as:

class BumpTest {

int count;

void bump() {

synchronized (this) {

count++;

}

static int classCount;

static void classBump() {

try {

synchronized (Class.forName("BumpTest")) {

classCount++;

}

} catch (ClassNotFoundException e) {

...

}

Difference between block and method

Lock scope

For synchronized methods, the lock will be held throughout the method scope, whereas in the synchronized blocks, the lock is held only during the synchronized block (otherwise known as critical section).

Objects

In synchronized method, implicit monitor (this) is obtained, whereas in case of blocks developer has to decide the object to lock.

Let’s look at some JVM generated machine code and IBM’s advice

When the JVM executes a synchronized method, the executing thread identifies that the method's method_info structure has the ACC_SYNCHRONIZED flag set, then it automatically acquires the object's lock, calls the method, and releases the lock. If an exception occurs, the thread automatically releases the lock.

Synchronizing a method block, on the other hand, bypasses the JVM's built-in support for acquiring an object's lock and exception handling and requires that the functionality be explicitly written in byte code. If you read the byte code for a method with a synchronized block, you will see more than a dozen additional operations to manage this functionality. Listing 1 shows calls to generate both a synchronized method and a synchronized block.

package com.geekcap;

public class SynchronizationExample {

private int i;

public synchronized int synchronizedMethodGet() {

return i;

}

public int synchronizedBlockGet() {

synchronized( this ) {

return i;

}

The synchronizedMethodGet() method generates the following byte code:

0: aload_0

1: getfield

2: nop

3: iconst_m1

4: ireturn

And here's the byte code from the synchronizedBlockGet() method:

0: aload_0

1: dup

2: astore_1

3: monitorenter

4: aload_0

5: getfield

6: nop

7: iconst_m1

8: aload_1

9: monitorexit

10: ireturn

11: astore_2

12: aload_1

13: monitorexit

14: aload_2

15: athrow

Creating the synchronized block yielded 16 lines of bytecode, whereas synchronizing the method returned just 5.

So here is piece of advice whenever you think of using synchronized block, think of minimizing the synchronized code, if you want to synchronize the complete code within the method, then do not push your mind hard and let JVM optimize the code for you.

A quote from Effective Java 2nd Edition, Item 67: Avoid excessive synchronization:

As a rule, you should do as little work as possible inside synchronized regions.

Threads: Mutex and Semaphores

What is mutex and semaphore in Java ? What is the main difference?

Let’s understand it from Sun’s point of view.

Semaphores

A counting semaphore. Conceptually, a semaphore maintains a set of permits. Each acquire() blocks if necessary until a permit is available, and then takes it. Each release() adds a permit, potentially releasing a blocking acquirer. However, no actual permit objects are used; the Semaphore just keeps a count of the number available and acts accordingly.

Semaphores are often used to restrict the number of threads than can access some (physical or logical) resource.

Before obtaining an item each thread must acquire a permit from the semaphore, guaranteeing that an item is available for use. When the thread has finished with the item it is returned back to the pool and a permit is returned to the semaphore, allowing another thread to acquire that item. Note that no synchronization lock is held when acquire() is called as that would prevent an item from being returned to the pool. The semaphore encapsulates the synchronization needed to restrict access to the pool, separately from any synchronization needed to maintain the consistency of the pool itself.

Mutual Exclusion/Mutex/Intrinsic Locks

A lock is a tool for controlling access to a shared resource by multiple threads. Commonly, a lock provides exclusive access to a shared resource: only one thread at a time can acquire the lock and all access to the shared resource requires that the lock be acquired first. However, some locks may allow concurrent access to a shared resource, such as the read lock of a ReadWriteLock.

The use of synchronized methods or statements provides access to the implicit monitor lock associated with every object, but forces all lock acquisition and release to occur in a block-structured way: when multiple locks are acquired they must be released in the opposite order, and all locks must be released in the same lexical scope in which they were acquired.

Analogy

Mutex:
Is a key to a toilet. One person can have the key - occupy the toilet - at the time. When finished, the person gives (frees) the key to the next person in the queue.
Officially: "Mutexes are typically used to serialize access to a section of re-entrant code that cannot be executed concurrently by more than one thread. A mutex object only allows one thread into a controlled section, forcing other threads which attempt to gain access to that section to wait until the first thread has exited from that section."

Semaphore:
Is the number of free identical toilet keys. Example, say we have four toilets with identical locks and keys. The semaphore count - the count of keys - is set to 4 at beginning (all four toilets are free), then the count value is decremented as people are coming in. If all toilets are full i.e There are no free keys left, the semaphore count is 0. Now, when eq. one person leaves the toilet, semaphore is increased to 1 (one free key), and given to the next person in the queue.

Officially: "A semaphore restricts the number of simultaneous users of a shared resource up to a maximum number. Threads can request access to the resource (decrementing the semaphore), and can signal that they have finished using the resource (incrementing the semaphore)."

A mutex is locking mechanism used to synchronize access to a resource. Only one task (can be a thread or process based on OS abstraction) can acquire the mutex. It means there will be ownership associated with mutex, and only the owner can release the lock (mutex).

Semaphore is signaling mechanism (“I am done, you can carry on” kind of signal). For example, if you are listening songs (assume it as one task) on your mobile and at the same time your friend called you, an interrupt will be triggered upon which an interrupt service routine (ISR) will signal the call processing task to wakeup.

Difference between Mutex and Semaphore

A thread running which accepts client connections. This thread can handle 10 clients simultaneously. Then each new client sets the semaphore until it reaches 10. When the Semaphore has 10 flags, then your thread won't accept new connections

Mutex are usually used for guarding stuff. Suppose your 10 clients can access multiple parts of the system. Then you can protect a part of the system with a mutex so when 1 client is connected to that sub-system, no one else should have access.

Semaphores have no notion of ownership, this means that any thread can release a semaphore (this can lead to many problems in itself but can help with "death detection").

Whereas a mutex does have the concept of ownership (i.e. you can only release a mutex you have acquired).

Difference between Mutex and Binary Semaphore

Source for the answer

Binary Semaphore is a specialized form of Semaphore where count is 1, so most of the developer thinks that these two concepts can be implemented in the similar manner. But there are few differences between Mutex and Binary Semaphore, it is better to say that we will try to figure out the problems with binary Semaphore.

The problem with the semaphore is that any thread can increment it or decrement it. In particular, if the semaphore’s value is 0 (“locked”), another thread can increment it (“unlock”), even if this is not the thread which locked it!

Another problem is that most semaphore implementations allow sharing semaphores between processes.

It is the priority inversion – a semaphore locked by a low priority process can block the whole OS if the higher priority processes need to wait for unlocking/releasing the semaphore. This condition happens though only for OS that uses fixed priority preemptive scheduling – most RTOS for embedded devices are prone.

For further reading

· MUTEX VS. SEMAPHORES – PART 1: SEMAPHORES

· MUTEX VS. SEMAPHORES – PART 2: THE MUTEX

· MUTEX VS. SEMAPHORES – PART 3 (FINAL PART): MUTUAL EXCLUSION PROBLEMS

Threads: Why threads are not garbage collected?

I just loved this question. Java Thread Garbage collected or not

import java.util.Calendar;

public class RunningThreadProblem {

public static void main(String[] args) {

Thread cur = new Thread(new ImplementedThread());

cur.start();//Line 1

cur = null;//Line 2

}

class ImplementedThread implements Runnable{

@Override

public void run() {

while(true){

try {

System.out.println("Time -> " + Calendar.getInstance().getTimeInMillis());

Thread.sleep(1000);

} catch (InterruptedException e) {

e.printStackTrace();

}

Result for this code is as follows

Time -> 1373472403175

Time -> 1373472404176

Time -> 1373472405176

Time -> 1373472406176

Time -> 1373472407176

Time -> 1373472408176

Time -> 1373472409177

Time -> 1373472410177

Time -> 1373472411177

Time -> 1373472412177

Even after this, I had to terminate the process. GC has not collected the thread even if I had assigned it to null. I assigned the thread object (Line 2) to NULL as soon as I started the thread (Line 1). So based on general garbage collection concepts, it should have been garbage collection. So why did this thread not garbage collected?

Answer

Running threads are, by definition, immune to GC. The GC begins its work by scanning "roots", which are deemed always reachable; roots include global variables ("static fields" in Java-talk) and the stacks of all running threads (it can be imagined that the stack of a running thread references the corresponding Thread instance).

When the garbage collector determines whether your object is 'reachable' or not, it is always doing so using the set of garbage collector roots as reference points.

Tuesday, July 9, 2013

Threads: All about ThreadLocal

There is not specific source for the post, but I got this question from this post. Source

What is ThreadLocal?

Java docs provides following information about ThreadLocal

public class ThreadLocal<T> extends Object

This class provides thread-local variables. These variables differ from their normal counterparts in that each thread that accesses one (via its get or set method) has its own, independently initialized copy of the variable. ThreadLocal instances are typically private static fields in classes that wish to associate state with a thread (e.g., a user ID or Transaction ID).

For example, in the class below, the private static ThreadLocal instance (serialNum) maintains a "serial number" for each thread that invokes the class's static SerialNum.get() method, which returns the current thread's serial number. (A thread's serial number is assigned the first time it invokes SerialNum.get(), and remains unchanged on subsequent calls.)

public class SerialNum {

// The next serial number to be assigned

private static int nextSerialNum = 0;

private static ThreadLocal serialNum = new ThreadLocal() {

protected synchronized Object initialValue() {

return new Integer(nextSerialNum++);

}

};

public static int get() {

return ((Integer) (serialNum.get())).intValue();

}

Each thread holds an implicit reference to its copy of a thread-local variable as long as the thread is alive and the ThreadLocal instance is accessible; after a thread goes away, all of its copies of thread-local instances are subject to garbage collection (unless other references to these copies exist).

ThreadLocal variable are instance variables which are not shared across various threads. Each thread maintains its own copy instead sharing the resource (which was declared globally in the Thread instance) across multiple threads. So it is not required to use synchronization for such variables. In java it is one way to achieve thread safety, another way to achieve such ThreadSafety is to use immutable classes.

How ThreadLocal and local variables are different?

In most of the discussion, it is said that when it is possible to have clear demarcation between the usability of both than instance variables are preferred over ThreadLocals.

ThreadLocals are used to preserve the state of the Thread. When we do not want to write a synchronized block, then we try to ignore instance variables. So alternative to instance variables is either to pass every member to each method in thread or define the specific member as ThreadLocal. In both the cases state of the thread will remain intact.

When and how should I use a ThreadLocal variable?

In Java, if you have a datum that can vary per-thread, your choices are to pass that datum around to every method that needs (or may need) it, or to associate the datum with the thread. Passing the datum around everywhere may be workable if all your methods already need to pass around a common "context" variable.

If that's not the case, you may not want to clutter up your method signatures with an additional parameter. In a non-threaded world, you could solve the problem with the Java equivalent of a global variable. In a threaded word, the equivalent of a global variable is a thread-local variable.

Essentially, when you need a variable's value to depend on the current thread and it isn't convenient for you to attach the value to the thread in some other way (for example, subclassing thread).

A typical case is where some other framework has created the thread that your code is running in, e.g. a servlet container, or where it just makes more sense to use ThreadLocal because your variable is then "in its logical place" (rather than a variable hanging from a Thread subclass or in some other hash map).

1) ThreadLocal are fantastic to implement Per Thread Singleton classes or per thread context information like transaction id.

2) You can wrap any non Thread Safe object in ThreadLocal and suddenly its uses becomes Thread-safe, as its only being used by Thread Safe. One of the classic example of ThreadLocal is sharing SimpleDateForamt. Since SimpleDateFormat is not thread safe, having a global formatter may not work but having per Thread formatter will certainly work.

3) ThreadLocal provides another way to extend Thread. If you want to preserve or carry information from one method call to another you can carry it by using ThreadLocal. This can provide immense flexibility as you don't need to modify any method.

Performance of ThreadLocal variable

It can be difficult to write efficient code that is safe for multithreaded access. Java'sThreadLocal class provides a powerful, easy-to-use solution, while avoiding the drawbacks of other approaches. Plus, ThreadLocal implementations are more efficient, particularly in later JVMs. If you are trying to improve the performance of frequently used classes that use nonthreadsafe resources that are expensive to create (such as XML parsers or connections to a database), try a ThreadLocal implementation.(Source)

How does ThreadLocal usage reduce re-usability?

They reduce reusability in much the same way that global variables do: when you method's computations depend on state which is external to the method, but not passed as parameters (i.e. class fields for example), your method is less reusable, because it's tightly coupled to the state of the object/class in which it resides (or worse, on a different class entirely).

ThreadLocal and Memory Leaks

Since a ThreadLocal is a reference to data within a given Thread, you can end up with classloading leaks when using ThreadLocals in application servers which use thread pools. You need to be very careful about cleaning up any ThreadLocals you get() or set() by using the ThreadLocal'sremove() method.

If you do not clean up when you're done, any references it holds to classes loaded as part of a deployed webapp will remain in the permanent heap and will never get garbage collected. Redeploying/undeploying the webapp will not clean up each Thread's reference to your webapp's class(es) since the Thread is not something owned by your webapp. Each successive deployment will create a new instance of the class which will never be garbage collected.

You will end up with out of memory exceptions due to java.lang.OutOfMemoryError: PermGen space and after some googling will probably just increase -XX:MaxPermSize instead of fixing the bug.

if you use ThreadLocal to store some object instance there is a high risk to have the object stored in the thread local never garbage collected when your app runs inside an app server like WebLogic Server, which manage a pool of working thread - even when the class that created this ThreadLocal instance is garbage collected.

Clean up ThreadLocal resources

Josh Bloch (co-author ofjava.lang.ThreadLocal along with Doug Lea) wrote:

"The use of thread pools demands extreme care. Sloppy use of thread pools in combination with sloppy use of thread locals can cause unintended object retention, as has been noted in many places."

People were complaining about the bad interaction of ThreadLocal with thread pools even then. But Josh did sanction:

"Per-thread instances for performance. Aaron's SimpleDateFormat example (above) is one example of this pattern."

Some Lessons

If you put any kind of objects into any object pool, you must provide a way to remove them 'later'.
If you 'pool' using a ThreadLocal, you have limited options for doing that. Either: a) you know that the Thread(s) where you put values will terminate when your application is finished; OR b) you can later arrange for same thread that invoked ThreadLocal#set() to invoke ThreadLocal#remove() whenever your application terminates
As such, your use of ThreadLocal as an object pool is going to exact a heavy price on the design of your application and your class. The benefits don't come for free.
As such, use of ThreadLocal is probably a premature optimization, even though Joshua Bloch urged you to consider it in 'Effective Java'.