Thursday, July 1, 2010

Equals and HashCode

In java each class internally extends the Object class, which in turns implies that all the methods which are available to the Object class are also available to all the class in the project.

Object class has numerous methods but among them there are few methods which should be overridden for each class. Among those method 2 methods are equals and hashCode.

Source code for equals method is as shown below:

public boolean equals(Object obj) {
    return (this == obj);
}

According to this, whenever references of parameter object is similar to the reference of this, equals returns true true otherwise false.


Source code for hashCode method is as shown below:



public native int hashCode();

This method returns a integer which is unique to each object and if will return same hashCode is called n number of times on the same object.


Now questions comes why do I need to overrides these methods which is already implemented by Java?


Lets look at the example mentioned below:



class UniqueIdentifier {
    String name = "";
    int id = 0;  
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }  
}
 
public class UnderstandingEqualsHashCode {        
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        
        System.out.println("Person1:" + person1.toString());
        System.out.println("Person2:" + person2.toString());
        System.out.println("Person3:" + person3.toString());
        
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
 
        if(person1.equals(person2)){
            System.out.println("Person1 and person2 are same.");
        }else{
            System.out.println("Person1 and person2 are different.");
        }
        
        if(person1.equals(person3)){
            System.out.println("Person1 and person3 are same.");
        }
        
    }
    
}
If you will execute the above code the output will be:

Person1:UniqueIdentifier@19821f
Person2:UniqueIdentifier@addbf1
Person3:UniqueIdentifier@19821f
Person1 HashCode:1671711
Person2 HashCode:11394033
Person3 HashCode:1671711
Person1 and person2 are different.
Person1 and person3 are same.

Shocked?

First 3 lines shows the results of the toString method, where you can see Class name along with the hexadecimal codes. So here we can see toString returns the class name appended by hashCode in hexadecimal format.

As you can see, for a human person1 and person2 are the equal objects but Java does not considering them same as it has to allocate the different memory and accordingly a different hashCode, whereas person1 and person3 are same for java because person1 and person3 are referring to the same memory location.

Now it must be clear why do we need to implement equals… But if we are implementing equals then due to a contract between hashCode and equals, it becomes programmers responsibility to define the hashCode as well. Though java does not force programmer to implement both.

The general contract between hashCode() and equals() methods is:


  • Whenever hashCode is invoked on the same object more than once during an execution of a Java application, it must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

Look at the example mentioned below



class UniqueIdentifier {
    String name = "";
    int id = 0;
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }
    public boolean equals(Object obj){
        if( obj instanceof UniqueIdentifier ){
            UniqueIdentifier ui = (UniqueIdentifier)obj;
            if(name.equals(ui.name) && (id == ui.id)){
                return true;
            }
        }
        return false;
    }
}
 
public class UnderstandingEqualsHashCode {
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        UniqueIdentifier person4 = new UniqueIdentifier(105, "XYZ");
        Object abs = new Object();
        System.out.println("Person1:" + person1.toString());
        System.out.println("Person2:" + person2.toString());
        System.out.println("Person3:" + person3.toString());
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
        if(person1.equals(person2)){
            System.out.println("Person1 and person2 are same.");
        }else{
            System.out.println("Person1 and person2 are different.");
        }
        if(person1.equals(person4)){
            System.out.println("Person1 and person4 are same.");
        }else{
            System.out.println("Person1 and person4 are different.");
        }
        if(person1.equals(abs)){
            System.out.println("Person1 and abs are same.");
        }else{
            System.out.println("Person1 and abs are different.");
        }
        if(person1.equals(person3)){
            System.out.println("Person1 and person3 are same.");
        }
    }
}


In the above mentioned example equals method has been implemented whereas hashCode is not implemented yet.


There 4 conditions have been tested for equality among



  • Person1 and person2.
  • Person1 and person4
  • Person1 and person3
  • Person1 and abs

Results of the above tests are


Person1:UniqueIdentifier@19821f
Person2:UniqueIdentifier@addbf1
Person3:UniqueIdentifier@19821f
Person1 HashCode:1671711
Person2 HashCode:11394033
Person3 HashCode:1671711
Person1 and person2 are same.
Person1 and person4 are different.
Person1 and abs are different.
Person1 and person3 are same.

Results are little shocking… Is not it? Even though hashCode is not implemented, equality test between person1 and person2 shows different results…

I just want to state equality test does not depend whether hashCode has been implemented or not, but it is good to have it.

Ok, when it does not matter for the equality test then why bother about implementing it?

Have a look on the code mentioned below:


import java.util.HashSet;
import java.util.Set;
 
class UniqueIdentifier {
    String name = "";
    int id = 0;
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }
    public boolean equals(Object obj){
        if( obj instanceof UniqueIdentifier ){
            UniqueIdentifier ui = (UniqueIdentifier)obj;
            if(name.equals(ui.name) && (id == ui.id)){
                return true;
            }
        }
        return false;
    }
}
 
public class UnderstandingEqualsHashCode {
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        UniqueIdentifier person4 = new UniqueIdentifier(105, "XYZ");
        UniqueIdentifier person5 = new UniqueIdentifier(105, "XYZ");
        
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
        System.out.println("Person4 HashCode:" + person4.hashCode());
        System.out.println("Person5 HashCode:" + person5.hashCode());
        
        Set<UniqueIdentifier> personSet = new HashSet<UniqueIdentifier>();
        personSet.add(person1);
        personSet.add(person2);
        personSet.add(person3);
        personSet.add(person4);
        personSet.add(person5);
        personSet.add(person1);
        
        System.out.println("Size:" + personSet.size());
    }
}


Here I have created 5 references of the UniqueIdentifier class and I added all of them in a HashSet. A set can only have unique entries. So uniqueness of the object is defined by equal method. If java finds there is an element equal to object to be added, then it discard the object without adding it, else creates a new entry in the set.


On the basis of logic stated above, it is clear size of the Set should be 2. Now lets look for the output of program listed above.


Person1 HashCode:11394033
Person2 HashCode:4384790
Person3 HashCode:11394033
Person4 HashCode:9634993
Person5 HashCode:1641745
Size:4


How did the count reach to 4, where it is quite obvious it should contain only 2 records? There should be some bug in the code, but it is not. Okay to understand little more lets understand the logic to add object in HashSet class.



HashSet.java
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}
 
HashMap.java
public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
 
    modCount++;
    addEntry(hash, key, value, i);
    return null;
}


From the above code it is quite clear, that hashCode is used to identify the bucket of the element and after identification the bucket it tries to iterate through the elements of the existing bucket and try equating them. In last example where hashCode was not implemented, element will fetch object’s hashcode and try to find the bucket if it could not find any it will add the element separately as a different entry.


Have a look on the code mentioned below:



import java.util.HashSet;
import java.util.Set;
 
class UniqueIdentifier {
    String name = "";
    int id = 0;
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }
    public boolean equals(Object obj){
        if( obj instanceof UniqueIdentifier ){
            UniqueIdentifier ui = (UniqueIdentifier)obj;
            if(name.equals(ui.name) && (id == ui.id)){
                return true;
            }
        }
        return false;
    }
    
    public int hashCode(){
        return (name.hashCode() + id);
    }
    
    public String toString(){
        return("Name:" + name + " Id" + id);
    }
}
 
public class UnderstandingEqualsHashCode {
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        UniqueIdentifier person4 = new UniqueIdentifier(105, "XYZ");
        UniqueIdentifier person5 = new UniqueIdentifier(105, "XYZ");
        
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
        System.out.println("Person4 HashCode:" + person4.hashCode());
        System.out.println("Person5 HashCode:" + person5.hashCode());
        
        Set<UniqueIdentifier> personSet = new HashSet<UniqueIdentifier>();
        personSet.add(person1);
        personSet.add(person2);
        personSet.add(person3);
        personSet.add(person4);
        personSet.add(person5);
        personSet.add(person1);
        
        System.out.println("Size:" + personSet.size());
    }
}

Here in this code, i have implemented hashCode, equals and toString (Not necessary) methods. Now if you look at the results you will see the results that is expected while writing the class and adding them into hashSet.


Here is the result of the updated class.

Person1 HashCode:87517
Person2 HashCode:87517
Person3 HashCode:87517
Person4 HashCode:87522
Person5 HashCode:87522
Size:2

Miracle, this is what i expected. So it is quite clear if we want to support equal in a class it makes sense to implement hashCode to achieve the desired results. 

Though it is possible to define a constant hashCode, as it meets with the contract between hashCode and equals, but defining constant hashCode will abolish the purpose of hashing and worsen the performance of the hashing based algorithm. So we need to take care of the equals, hashCode contract and provide the best possible hashing algorithm to make the hash based algorithms more efficient and fast.


Disclaimer: All the material here is the result of research and understanding of other material. If there is some content which is either incorrect or inappropriate to the context of the article I am always open for discussion.