Saturday, July 24, 2010

Breadth First Search Algorithm

Here we will discuss one among the most important topic of DS i.e. Graphs. Breadth First Search Algorithm

BFS is the simplest algorithm to find a node in a graph.

Given a graph G = (V, E) and a vertex s to search, breadth first search algorithm finds all the elements reachable element from s, along with finding the element it computes the distance of the each reachable vertices and creates a BFT with root as s. For any vertex u reachable from s, the path in the BFT from s to u corresponds to the Shortest path in G.
Breadth First Search expands frontier between discovered and undiscovered uniformly across the breadth. i.e. As part of this algorithm first all the node at height K is discovered before discovering any node at depth k+1… BFS works both for directed and undirected graphs.

Algorithm

In this algo to keep track the progress, three colors will be used.
  • White: Initial stage of all the vertices.
  • Black: Vertex has been marked black if it does not have any white vertices next to it.
  • Grey: A vertex is marked grey if it has adjacent vertices of any color.
BFS constructs BFT having s as source. Whenever a white vertex v is discovered, in the course of scanning all adjacency vertices of u (Already discovered vertex), new edge(u,v) is added to tree. This is the way the whole graph will be accessed. To store grey vertices, queue data structure is used. Algo is using queue to use the vertices in the order they have been discovered.
Graphic display of the algorithm
G = (V,E)
V = {r, s, t , u, v, w, x, y};
E = {r-s, s-w, r-v, w-t, w-x, t-u, t-x, u-y, x-y}
First step: All the vertices are marked as white except source vertices. We will mark source vertex as Grey and add it into the queue.
bfs1
Second Step: It will retrieve one element from the queue and mark the fetched element as Black and will push its adjacent elements to the queue marking them as grey.
bfs2  \
here as you can see, s has been deleted from the queue and now marked as black. Adjacent element of s i.e. w and r are added to the queue.
Next: Second step will be followed till queue do not have any more elements. At each vertex level the distance of the vertex is also stored.
bfs9
 bfs3
 bfs4
bfs5
bfs6
bfs7
bfs8
Pseudo Code
Step 1: Assign white color to all the vertices, infinite for each vertex and to make it a tree it is necessary to have parent information, assign parent vertex as null to all vertices.
Step 2: Add source to the queue. Assign distance 0. Change the color of the source as grey.
Step 3: Retrieve first element from the queue. Mark the retrieved element as black.
    1. Find all the adjacent white vertices of the retrieve vertex.
    2. Mark them grey.
    3. Set the distance of the adjacent vertices to one more than the distance of existing vertex.
    4. Add them to queue.
Step 4: Repeat the step 3 until queue is empty.
Pseudo Code in programmatically
   1: BFS(G,s)
   2:  
   3: for each vertex u in V[G] - {s}
   4:     do  color[u] = white
   5:         dist[u] = inf
   6:         parent[u] = nil
   7:         
   8: color[s] = G
   9: dist[s] = 0
  10: parent[s] = nil
  11: Q.add(s)
  12:  
  13: while( Q is empty){
  14:     
  15:     do     u = delete(Q){
  16:         for each v in adj[u] {
  17:             
  18:             do if color[v] = w {
  19:                 
  20:                 color[v] = G
  21:                 parent[v] = u
  22:                 dist[v] = dist[u] + 1
  23:                 Q.add(v)
  24:                 
  25:             }
  26:             
  27:         }
  28:     }
  29:     
  30:     color[u] = B
  31:     
  32: }
Java Code
There is the Java code for the pseudo code mentioned above. I do not think there is need to explain this here.
   1: import java.util.HashMap;
   2: import java.util.List;
   3: import java.util.Map;
   4: import java.util.Queue;
   5: import java.util.Set;
   6: import java.util.concurrent.SynchronousQueue;
   7:  
   8:  
   9: public class BFS {
  10:  
  11:     Map<String, List<String>> graph;
  12:     BFTNode sourceNode;
  13:     Map<String, BFTNode> tree = new HashMap<String, BFTNode>();
  14:     
  15:     BFS(Map<String, List<String>> g, String source){
  16:         graph = g;
  17:         sourceNode = new BFTNode(source);
  18:     }
  19:     
  20:     void createBFTNodes(){
  21:         Set<String> vertices = graph.keySet();
  22:         for(String vertex : vertices){
  23:             tree.put(vertex, new BFTNode(vertex));
  24:             
  25:         }
  26:     }
  27:     
  28:     void algorithm(){
  29:         
  30:         createBFTNodes();
  31:         sourceNode.setColor(1);
  32:  
  33:         Queue<BFTNode> Q = new SynchronousQueue<BFTNode>();
  34:         Q.add(sourceNode);
  35:  
  36:         do{
  37:             
  38:             BFTNode node = Q.remove();
  39:             List<String> adjacentVertices = graph.get(node.getVertex());
  40:             
  41:             for(String adjacentVertex : adjacentVertices){
  42:                 
  43:                 BFTNode temp = new BFTNode(adjacentVertex);
  44:                 temp.setColor(1);
  45:                 temp.setDist(node.getDist() + 1);
  46:                 temp.setParent(node);
  47:                 Q.add(temp);
  48:                 tree.put(adjacentVertex, temp);
  49:                 
  50:             }
  51:             
  52:             node.setColor(2);
  53:             tree.put(node.getVertex(), node);
  54:             
  55:         }while(Q.size() > 0);
  56:         
  57:     }
  58:  
  59: }
  60:  
  61: //This class has getter/Setter method for all the property that a tree node require.
  62: class BFTNode{
  63:     
  64:     private int dist = 0;
  65:     private int color = 0;
  66:     private BFTNode parent = null;
  67:     private String vertex = "";
  68:     
  69:     public BFTNode(String name) {
  70:         vertex = name;
  71:     }
  72:     
  73:     public String getVertex() {
  74:         return vertex;
  75:     }
  76:  
  77:     public int getDist() {
  78:         return dist;
  79:     }
  80:  
  81:     public void setDist(int dist) {
  82:         this.dist = dist;
  83:     }
  84:  
  85:     public int getColor() {
  86:         return color;
  87:     }
  88:  
  89:     public void setColor(int color) {
  90:         this.color = color;
  91:     }
  92:  
  93:     public BFTNode getParent() {
  94:         return parent;
  95:     }
  96:  
  97:     public void setParent(BFTNode parent) {
  98:         this.parent = parent;
  99:     }
 100:  
 101:     public boolean equals(Object anotherObject){
 102:         if (anotherObject instanceof BFTNode){
 103:             return vertex.equals(((BFTNode)anotherObject).vertex);
 104:         }
 105:         return false;
 106:     }
 107:  
 108:     public int hashCode(){
 109:         return vertex.hashCode();
 110:     }
 111:     
 112:     public String toString(){
 113:         if(parent != null)
 114:             return ("Vertex:" + vertex + " Color:" + color + " Dist:" + dist + " Parent:" + parent.vertex);
 115:         else
 116:             return ("Vertex:" + vertex + " Color:" + color + " Dist:" + dist + " Parent: Null");
 117:     }
 118: }
Analysis
It is always mandatory to know how is the performance and resource utilization of the algorithm.
Time taken for the algorithm is O(V+E). This algorithm has linear time complexity which particularly depends upon the sum of the vertices and edges of the graph.
Other Algos implementing this algorithm internally
  1. Dijkstra’s Single Source Shortest Path Algorithm
  2. Prim’s Minimum Spanning tree Algorithm
Disclaimer: All the material here is the result of research and understanding of other material. If there is some content which is either incorrect or inappropriate to the context of the article I am always open for discussion.

Thursday, July 1, 2010

Equals and HashCode

In java each class internally extends the Object class, which in turns implies that all the methods which are available to the Object class are also available to all the class in the project.

Object class has numerous methods but among them there are few methods which should be overridden for each class. Among those method 2 methods are equals and hashCode.

Source code for equals method is as shown below:

public boolean equals(Object obj) {
    return (this == obj);
}

According to this, whenever references of parameter object is similar to the reference of this, equals returns true true otherwise false.


Source code for hashCode method is as shown below:



public native int hashCode();

This method returns a integer which is unique to each object and if will return same hashCode is called n number of times on the same object.


Now questions comes why do I need to overrides these methods which is already implemented by Java?


Lets look at the example mentioned below:



class UniqueIdentifier {
    String name = "";
    int id = 0;  
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }  
}
 
public class UnderstandingEqualsHashCode {        
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        
        System.out.println("Person1:" + person1.toString());
        System.out.println("Person2:" + person2.toString());
        System.out.println("Person3:" + person3.toString());
        
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
 
        if(person1.equals(person2)){
            System.out.println("Person1 and person2 are same.");
        }else{
            System.out.println("Person1 and person2 are different.");
        }
        
        if(person1.equals(person3)){
            System.out.println("Person1 and person3 are same.");
        }
        
    }
    
}
If you will execute the above code the output will be:

Person1:UniqueIdentifier@19821f
Person2:UniqueIdentifier@addbf1
Person3:UniqueIdentifier@19821f
Person1 HashCode:1671711
Person2 HashCode:11394033
Person3 HashCode:1671711
Person1 and person2 are different.
Person1 and person3 are same.

Shocked?

First 3 lines shows the results of the toString method, where you can see Class name along with the hexadecimal codes. So here we can see toString returns the class name appended by hashCode in hexadecimal format.

As you can see, for a human person1 and person2 are the equal objects but Java does not considering them same as it has to allocate the different memory and accordingly a different hashCode, whereas person1 and person3 are same for java because person1 and person3 are referring to the same memory location.

Now it must be clear why do we need to implement equals… But if we are implementing equals then due to a contract between hashCode and equals, it becomes programmers responsibility to define the hashCode as well. Though java does not force programmer to implement both.

The general contract between hashCode() and equals() methods is:


  • Whenever hashCode is invoked on the same object more than once during an execution of a Java application, it must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

Look at the example mentioned below



class UniqueIdentifier {
    String name = "";
    int id = 0;
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }
    public boolean equals(Object obj){
        if( obj instanceof UniqueIdentifier ){
            UniqueIdentifier ui = (UniqueIdentifier)obj;
            if(name.equals(ui.name) && (id == ui.id)){
                return true;
            }
        }
        return false;
    }
}
 
public class UnderstandingEqualsHashCode {
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        UniqueIdentifier person4 = new UniqueIdentifier(105, "XYZ");
        Object abs = new Object();
        System.out.println("Person1:" + person1.toString());
        System.out.println("Person2:" + person2.toString());
        System.out.println("Person3:" + person3.toString());
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
        if(person1.equals(person2)){
            System.out.println("Person1 and person2 are same.");
        }else{
            System.out.println("Person1 and person2 are different.");
        }
        if(person1.equals(person4)){
            System.out.println("Person1 and person4 are same.");
        }else{
            System.out.println("Person1 and person4 are different.");
        }
        if(person1.equals(abs)){
            System.out.println("Person1 and abs are same.");
        }else{
            System.out.println("Person1 and abs are different.");
        }
        if(person1.equals(person3)){
            System.out.println("Person1 and person3 are same.");
        }
    }
}


In the above mentioned example equals method has been implemented whereas hashCode is not implemented yet.


There 4 conditions have been tested for equality among



  • Person1 and person2.
  • Person1 and person4
  • Person1 and person3
  • Person1 and abs

Results of the above tests are


Person1:UniqueIdentifier@19821f
Person2:UniqueIdentifier@addbf1
Person3:UniqueIdentifier@19821f
Person1 HashCode:1671711
Person2 HashCode:11394033
Person3 HashCode:1671711
Person1 and person2 are same.
Person1 and person4 are different.
Person1 and abs are different.
Person1 and person3 are same.

Results are little shocking… Is not it? Even though hashCode is not implemented, equality test between person1 and person2 shows different results…

I just want to state equality test does not depend whether hashCode has been implemented or not, but it is good to have it.

Ok, when it does not matter for the equality test then why bother about implementing it?

Have a look on the code mentioned below:


import java.util.HashSet;
import java.util.Set;
 
class UniqueIdentifier {
    String name = "";
    int id = 0;
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }
    public boolean equals(Object obj){
        if( obj instanceof UniqueIdentifier ){
            UniqueIdentifier ui = (UniqueIdentifier)obj;
            if(name.equals(ui.name) && (id == ui.id)){
                return true;
            }
        }
        return false;
    }
}
 
public class UnderstandingEqualsHashCode {
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        UniqueIdentifier person4 = new UniqueIdentifier(105, "XYZ");
        UniqueIdentifier person5 = new UniqueIdentifier(105, "XYZ");
        
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
        System.out.println("Person4 HashCode:" + person4.hashCode());
        System.out.println("Person5 HashCode:" + person5.hashCode());
        
        Set<UniqueIdentifier> personSet = new HashSet<UniqueIdentifier>();
        personSet.add(person1);
        personSet.add(person2);
        personSet.add(person3);
        personSet.add(person4);
        personSet.add(person5);
        personSet.add(person1);
        
        System.out.println("Size:" + personSet.size());
    }
}


Here I have created 5 references of the UniqueIdentifier class and I added all of them in a HashSet. A set can only have unique entries. So uniqueness of the object is defined by equal method. If java finds there is an element equal to object to be added, then it discard the object without adding it, else creates a new entry in the set.


On the basis of logic stated above, it is clear size of the Set should be 2. Now lets look for the output of program listed above.


Person1 HashCode:11394033
Person2 HashCode:4384790
Person3 HashCode:11394033
Person4 HashCode:9634993
Person5 HashCode:1641745
Size:4


How did the count reach to 4, where it is quite obvious it should contain only 2 records? There should be some bug in the code, but it is not. Okay to understand little more lets understand the logic to add object in HashSet class.



HashSet.java
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}
 
HashMap.java
public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
 
    modCount++;
    addEntry(hash, key, value, i);
    return null;
}


From the above code it is quite clear, that hashCode is used to identify the bucket of the element and after identification the bucket it tries to iterate through the elements of the existing bucket and try equating them. In last example where hashCode was not implemented, element will fetch object’s hashcode and try to find the bucket if it could not find any it will add the element separately as a different entry.


Have a look on the code mentioned below:



import java.util.HashSet;
import java.util.Set;
 
class UniqueIdentifier {
    String name = "";
    int id = 0;
    public UniqueIdentifier(int ids, String n) {
        name = n;
        id = ids;
    }
    public boolean equals(Object obj){
        if( obj instanceof UniqueIdentifier ){
            UniqueIdentifier ui = (UniqueIdentifier)obj;
            if(name.equals(ui.name) && (id == ui.id)){
                return true;
            }
        }
        return false;
    }
    
    public int hashCode(){
        return (name.hashCode() + id);
    }
    
    public String toString(){
        return("Name:" + name + " Id" + id);
    }
}
 
public class UnderstandingEqualsHashCode {
    public static void main(String[] args){
        UniqueIdentifier person1 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person2 = new UniqueIdentifier(100, "XYZ");
        UniqueIdentifier person3 = person1;
        UniqueIdentifier person4 = new UniqueIdentifier(105, "XYZ");
        UniqueIdentifier person5 = new UniqueIdentifier(105, "XYZ");
        
        System.out.println("Person1 HashCode:" + person1.hashCode());
        System.out.println("Person2 HashCode:" + person2.hashCode());
        System.out.println("Person3 HashCode:" + person3.hashCode());
        System.out.println("Person4 HashCode:" + person4.hashCode());
        System.out.println("Person5 HashCode:" + person5.hashCode());
        
        Set<UniqueIdentifier> personSet = new HashSet<UniqueIdentifier>();
        personSet.add(person1);
        personSet.add(person2);
        personSet.add(person3);
        personSet.add(person4);
        personSet.add(person5);
        personSet.add(person1);
        
        System.out.println("Size:" + personSet.size());
    }
}

Here in this code, i have implemented hashCode, equals and toString (Not necessary) methods. Now if you look at the results you will see the results that is expected while writing the class and adding them into hashSet.


Here is the result of the updated class.

Person1 HashCode:87517
Person2 HashCode:87517
Person3 HashCode:87517
Person4 HashCode:87522
Person5 HashCode:87522
Size:2

Miracle, this is what i expected. So it is quite clear if we want to support equal in a class it makes sense to implement hashCode to achieve the desired results. 

Though it is possible to define a constant hashCode, as it meets with the contract between hashCode and equals, but defining constant hashCode will abolish the purpose of hashing and worsen the performance of the hashing based algorithm. So we need to take care of the equals, hashCode contract and provide the best possible hashing algorithm to make the hash based algorithms more efficient and fast.


Disclaimer: All the material here is the result of research and understanding of other material. If there is some content which is either incorrect or inappropriate to the context of the article I am always open for discussion.