Friday, January 17, 2014

Serialization: All about Persisting the Object

I was thinking to improve my java fundamentals, so I planned to make few notes for myself that will assist me in future (Obviously at the time of Interviews). Following questions and answers have been taken from stackoverflow.com or such other sites. This post is a sticky note post for me. Please do not use the answers for reference purpose.

I have heard lot of Serialization, but do I really know the real purpose of Serialization? When someone tells me that serialization is required to save data in DBs and files, then it confuses my small brain. Did I ever serialize a class explicitly? Let’s find some answers to our question.

What do you mean by Serialization?
Serialization is the conversion of an object to a series of bytes, so that the object can be easily saved to persistent storage or streamed across a communication link. The byte stream can then be deserialised - converted into a replica of the original object.

Objects to be stored and retrieved frequently refer to other objects. Those other objects must be stored and retrieved at the same time to maintain the relationships between the objects. When an object is stored, all of the objects that are reachable from that object are stored as well.

The goals for serializing JavaTM objects are to:
  •  Have a simple yet extensible mechanism.
  • Maintain the JavaTM object type and safety properties in the serialized form.
  • Be extensible to support marshaling and unmarshaling as needed for remote objects.
  •  Be extensible to support simple persistence of JavaTM objects.
  • Require per class implementation only for customization.
  • Allow the object to define its external format.
Understanding Serialization with an analogy
After hard work of many years, Earth's scientist developed a robot who can help them in daily work. But this robot was less featured than the robots which were developed by the scientist of Mars planet.
After a meeting between both planet's scientist, it is decided that mars will send their robots to earth. But a problem occurred. The cost of sending 100 robots to earth was $100 millions. And it takes around 60 days for traveling.
Finally, Mar's scientist decided to share their secret with Earth's scientists. This secret was about the structure of class/robot. Earth's scientists developed the same structure on earth itself. Mar's scientistsserialized the data of each robot and send it to earth. Earth's scientists deserialized the data and fed it into each robot accordingly.

This process saved the time in communicating mass amount of data.
Some of the robots were being used in some defensive work on Mars. So their scientists marked some crucial properties of those robots as transient before sending their data to Earth. Note that transient property is set to null(in case of reference) or to default value(in case of primitive type) when the object gets deserialized.

One more point which was noticed by Earth's scientists is that Mars's scientists ask them to create some static variables to keep environmental detail. This detail is used by some robots. But Mars's scientists dint share this detail because the environment on earth was different from Mars.

Understand Serialization with Real life example
  • Banking example: When the account holder tries to withdraw money from the server through ATM, the account holder information along with the withdrawl details will be serialized (marshalled/flattened to bytes) and sent to server where the details are deserialized (unmarshalled/rebuilt the bytes)and used to perform operations. This will reduce the network calls as we are serializing the whole object and sending to server and further request for information from client is not needed by the server.
  • Stock example: Lets say an user wants the stock updates immediately when he request for it. To achieve this, everytime we have an update, we can serialize it and save it in a file. When user requests the information, deserialize it from file and provide the information. This way we dont need to make the user wait for the information until we hit the database, perform computations and get the result.

Why should Serialization be used?
·         To persist data for future use. It is possible to persist data in DB server, file system or other mean which can hold it forever.
·         To send data to a remote computer using such client/server Java technologies as RMI or socket programming.
  • To "flatten" an object into array of bytes in memory.
  • To exchange data between applets and servlets.
  • To store user session in Web applications.
  • To activate/passivate enterprise java beans.
  • To send objects between the servers in a cluster.

Terms to understand serialization

Static fields
If you are a java programmer, static fields does not require any explanation here. You must remember that static fields are associated with the class instead the object, so while serializing the state of the object, these fields are not serialized with object data.

The serialization runtime associates with each serializable class a version number, called a serialVersionUID, which is used during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization. If the receiver has loaded a class for the object that has a different serialVersionUID than that of the corresponding sender's class, then deserialization will result in an InvalidClassException. A serializable class can declare its own serialVersionUID explicitly by declaring a field named "serialVersionUID" that must be static, final, and of type long:

ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;

If a serializable class does not explicitly declare a serialVersionUID, then the serialization runtime will calculate a default serialVersionUID value for that class based on various aspects of the class, as described in the Java(TM) Object Serialization Specification. However, it is strongly recommended that all serializable classes explicitly declare serialVersionUID values, since the default serialVersionUID computation is highly sensitive to class details that may vary depending on compiler implementations, and can thus result in unexpected InvalidClassExceptions during deserialization. Therefore, to guarantee a consistent serialVersionUID value across different java compiler implementations, a serializable class must declare an explicit serialVersionUID value. It is also strongly advised that explicit serialVersionUID declarations use the private modifier where possible, since such declarations apply only to the immediately declaring class--serialVersionUID fields are not useful as inherited members.


So serialVersionUID is used for following 2 scenarios
  • SerialVersionUID acts as meta-data for Serialization and Deserialization process.
  • SerialVersionUID is changed to reject unused serialized objects in future releases.


Variables may be marked transient to indicate that they are not part of the persistent state of an object.

The basic mechanism of Java serialization is simple to use, but there are some more things to know. As mentioned before, only objects marked Serializable can be persisted. The java.lang.Object class does not implement that interface. Therefore, not all the objects in Java can be persisted automatically. The good news is that most of them -- like AWT and Swing GUI components, strings, and arrays -- are serializable.

On the other hand, certain system-level classes such as Thread, OutputStream and its subclasses, and Socket are not serializable. Indeed, it would not make any sense if they were. For example, thread running in my JVM would be using my system's memory. Persisting it and trying to run it in your JVM would make no sense at all. Another important point about java.lang.Object not implementing the Serializable interface is that any class you create that extends only Object (and no other serializable classes) is not serializable unless you implement the interface yourself (as done with the previous example).

That situation presents a problem: what if we have a class that contains an instance of Thread? In that case, can we ever persist objects of that type? The answer is yes, as long as we tell the serialization mechanism our intentions by marking our class's Threadobject as transient.

Another reason to use transient is when your class does some kind of internal caching. If, for example, your class can do calculations and for performance reasons it caches the result of each calculation, then saving that cache might not be desirable (because recalculating it might be faster than restoring it, or because it's unlikely that old cached values are of any use). In this case you'd mark the caching fields as transient.

Let’s serialize a class
Java has provided 2 ways to serialize a class. Developer is allowed to use either default mechanism provided by java by implementing java.io.Serializable interface or custom mechanism by implementing java.io. java.io.Externalizable interface. Serializable is a marker interface/tag interface which defines a contract of serializability and suggests VM to serialize class whereas Externalizable contains two methods which should be implemented to serialize the object.

Let’s serialize the class with default implementation.

Create a class which implements java.io.Serializable interface.
List 1
import java.io.Serializable;
import java.util.Date;
public class Person implements Serializable {
      private static final long serialVersionUID = -8708626244864641161L;
      String name;
      Date dob;
      public String getName() {
            return name;
      }
      public void setName(String name) {
            this.name = name;
      }
      public Date getDob() {
            return dob;
      }
      public void setDob(Date dob) {
            this.dob = dob;
      }
}


Create OutputObjectStream by passing applicable OutputStream. E.g. to create File, one of the application output stream is FileOutputStream.

List 2
FileOutputStream fos = new FileOutputStream(new File("PersonDetail.ser"));
ObjectOutputStream oos = new ObjectOutputStream(fos);
Call writeObject method of ObjectOutputStream with the instance of the object you want to serialize.
List 3
outStream.writeObject(personInstance);

Let's club last 2 steps into one program.
List 4
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectOutputStream;
import java.util.Date;
public class DemoSerialization {
      public static void main(String[] args) throws IOException {
            Person per = new Person();
            per.setName("Vishal Jain");
            per.setDob(new Date(1971, 1, 1));//Let's forget that it is deprecated method.
            FileOutputStream fos = new FileOutputStream(new File("Person.ser"));
            ObjectOutputStream oos = new ObjectOutputStream(fos);
            oos.writeObject(per);
      }
}

On running this program, it will create a file Person.ser in current folder which contains content of the Object Person.

Let’s de-serialize the class with default implementation.
Create InputObjectStream by passing applicable InputStream. E.g. to create File, one of the application input stream is FileInputStream.
List 5
FileInputStream fis = new FileInputStream(new File("PersonDetail.ser"));
ObjectInputStream ois = new ObjectInputStream(fos);
Call readObject method of ObjectInputStream with the instance of the object you want to serialize.
List 6
(CAST_TO_PERSON)outStream.readObject();
Let's club last 2 steps into one program.
List 7
public class DemoSerialization { 
      public static void main(String[] args) throws IOException, ClassNotFoundException {
            FileInputStream fis = new FileInputStream(new File("Person.ser"));
            ObjectInputStream ois = new ObjectInputStream(fis);
            Person per = (Person)ois.readObject();
            System.out.println("Name:"  + per.getName() + " DOB:" + per.getDob());
      }
}

Output for the following program as follows:
List 8
Name:Vishal Jain DOB:Wed Feb 01 00:00:00 IST 3871

Let’s check the importance of serialVersionUID
Our application has person class with serialVersionUID, and client contains a persisted object. Another developer did not know the importance of serialVersionUID, so he changes the version UID as following.
List 9
import java.io.Serializable;
import java.util.Date;
public class Person implements Serializable {
      //Changing -8708626244864641161L to 8708626244864641161L.
      private static final long serialVersionUID = 8708626244864641161L;
      String name;
      Date dob;
      public String getName() {
            return name;
      }
      public void setName(String name) {
            this.name = name;
      }
      public Date getDob() {
            return dob;
      }
      public void setDob(Date dob) {
            this.dob = dob;
      }
}


On re-running the list 6 again with new Person (as per the list 7), following exception is encountered.
List 10
Exception in thread "main" java.io.InvalidClassException: Person; local class incompatible: stream classdesc serialVersionUID = -8708626244864641161, local class serialVersionUID = 8708626244864641161
      at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:562)
      at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1582)
      at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
      at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
      at DemoSerialization.main(DemoSerialization.java:12)


Till now it looks that the serialization is quite simple but is it that simple as demonstrated till now? So let’s understand it with various cases.

When an object has a reference to other objects
In general all the classes are not simple to have only primitives. Most of the classes contains object of another Java class. So what happens when application tries to serialize such classes?
Let’s introduce Person’s address. To add Person’s address, application need to have Address class in the classpath.
List 11
package com.serial;
public class Address {
 int houseNo;
 String city;
 int pincode;
 public int getHouseNo() {
  return houseNo;
 }
 public void setHouseNo(int houseNo) {
  this.houseNo = houseNo;
 }
 public String getCity() {
  return city;
 }
 public void setCity(String city) {
  this.city = city;
 }
 public int getPincode() {
  return pincode;
 }
 public void setPincode(int pincode) {
  this.pincode = pincode;
 }
 public Address(int houseNo, String city, int pincode) {
  this.houseNo = houseNo;
  this.city = city;
  this.pincode = pincode;
 }
}
Let’s change the definition of Person class
List 12
package com.serial;
import java.io.Serializable;
public class Person implements Serializable {
 private static final long serialVersionUID = 100L;
 String name;
 Address office;
 Address home;
 public String getName() {
  return name;
 }
 public void setName(String name) {
  this.name = name;
 }
 public Address getOffice() {
  return office;
 }
 public void setOffice(Address office) {
  this.office = office;
 }
 public Address getHome() {
  return home;
 }
 public void setHome(Address home) {
  this.home = home;
 }
}
Okay, stage is set to serialize the new version of Person class. Let’s serialize.
List 13
public class DemoSerialization {
      public static void main(String[] args) throws IOException {
            Person per = new Person();
            per.setName("Vishal Jain");
            Address home = new Address(100, "Noida", 201300);
            Address office = new Address(100, "Delhi", 101300);
            per.setHome(home);
            per.setOffice(office);
            FileOutputStream fos = new FileOutputStream(new File("Person.ser"));
            ObjectOutputStream oos = new ObjectOutputStream(fos);
            oos.writeObject(oos);
      }
}
After running the program, application threw exception mentioned below.
List 14
Exception in thread "main" java.io.NotSerializableException: com.serial.Address
 at java.io.ObjectOutputStream.writeObject0(Unknown Source)
 at java.io.ObjectOutputStream.defaultWriteFields(Unknown Source)
 at java.io.ObjectOutputStream.writeSerialData(Unknown Source)
 at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)
 at java.io.ObjectOutputStream.wr iteObject0(Unknown Source)
 at java.io.ObjectOutputStream.writeObject(Unknown Source)
 at com.serial.DemoSerialization.main(DemoSerialization.java:16)
Oops, it was not expected. Is not it? As java says, to serialize a class it is required to implement java.io.Serializable interface, but address class do not implement this interface.
In the first version of Person class, String, still a java class, was used. Why did not application throw any exception? The answer is clear. String class implements java.io.Serializable, Comparable<String>, CharSequence interfaces, so it is possible to serialize classes having String fields.

Let’s implement the interface in Address class and rerun the program.
List 15
package com.serial;
import java.io.Serializable;
public class Address implements Serializable {
 private static final long serialVersionUID = 1000L;
 int houseNo;
 String city;
 int pincode;
 public int getHouseNo() {
  return houseNo;
 }
 public void setHouseNo(int houseNo) {
  this.houseNo = houseNo;
 }
 public String getCity() {
  return city;
 }
 public void setCity(String city) {
  this.city = city;
 }
 public int getPincode() {
  return pincode;
 }
 public void setPincode(int pincode) {
  this.pincode = pincode;
 }
 public Address(int houseNo, String city, int pincode) {
  this.houseNo = houseNo;
  this.city = city;
  this.pincode = pincode;
 }
}
After running the program, Person.ser file is created with serialized object.

Now let’s try to deserialize the content with the following code
List 16
public static void main(String[] args) throws IOException, ClassNotFoundException {
            FileInputStream fis = new FileInputStream(new File("Person.ser"));
            ObjectInputStream ois = new ObjectInputStream(fis);
            Person per = (Person)ois.readObject();
            System.out.println("Name:"  + per.getName() + " Home City:" + per.getHome().getCity()
              + " Office City:" + per.getOffice().getCity());
      }
  
}
Result is as followed:
List 17
Name:Vishal Jain Home City:Noida Office City:Delhi
Impact of Inheritance on Serialization
When a super class is implementing the Serializable interface, then all the classes extending the specific super class is serializable.

Subclass is serializable but superclass is not
Let’s change person class such that it does not implement Serializable interface anymore, and let’s define a constructor of the class, so that default construction is not allowed anymore.
List 18
public class Person{
.
.
.
 public Person(String name, Address office, Address home) {
  super();
  this.name = name;
  this.office = office;
  this.home = home;
 }
}
Create a subclass player that extends Person class.
List 19
package com.serial;
import java.io.Serializable;
public class Player extends Person implements Serializable {
 private static final long serialVersionUID = -154L;
 String academy;
 public Player(String name, Address office, Address home, String aced) {
  super(name, office, home);
  academy = aced;
 }
 public String getAcademy() {
  return academy;
 }
 public void setAcademy(String academy) {
  this.academy = academy;
 }
}
Serialize the Player class as following
List 20
public class DemoSerialization {
 public static void main(String[] args) throws IOException {
  Address home = new Address(100, "Noida", 201300);
  Address office = new Address(100, "Delhi", 101300);
  Player per = new Player("Vishal Jain", home, office, "SRT Aceds");
  FileOutputStream fos = new FileOutputStream(new File("Player.ser"));
  ObjectOutputStream oos = new ObjectOutputStream(fos);
  oos.writeObject(per);
 }
}
It creates a file called Player.ser which contains information about the primitives and fields that are part of Player class.
Let’s try to deserialize the object with the following code.
List 21
public class DemoDeSerialization {
      public static void main(String[] args) throws IOException, ClassNotFoundException {
            FileInputStream fis = new FileInputStream(new File("Player.ser"));
            ObjectInputStream ois = new ObjectInputStream(fis);
            Player per = (Player)ois.readObject();
            System.out.println("Academy:"  + per.getAcademy() );
      } 
}
On executing the application following exception is encountered.
List 22
Exception in thread "main" java.io.InvalidClassException: com.serial.Player; com.serial.Player; no valid constructor
 at java.io.ObjectStreamClass.checkDeserialize(Unknown Source)
 at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
 at java.io.ObjectInputStream.readObject0(Unknown Source)
 at java.io.ObjectInputStream.readObject(Unknown Source)
 at com.serial.DemoDeSerialization.main(DemoDeSerialization.java:12)
Does not that mean that we are missing something? Yes we are. So add a default constructor is Person class and this program will work fine without any issues.
List 23
public class Person{
 .
 .
 .
 public Person() {
 }
}
It is quite clear that when a superclass is not serialized then application do not serialize the information about the class and at the time of Deserilizing the object is tries to create the object with default constructor. If constructor is not found, it throws the exception else creates default instance of super class.
Superclass is serializable but subclass should not be serializable
When superclass is serializable, but application does not want to store classes extending the superclass.
In this case subclass must implement readObject and writeObject methods and these methods must throw java.io.NotSerializableException.
List 24
public class Player extends Person {
 .
 .
 .
 private void writeObject(ObjectOutputStream os) throws IOException, ClassNotFoundException{
  throw new NotSerializableException();
 }
  private void readObject(ObjectInputStream is) throws IOException, ClassNotFoundException{
  throw new NotSerializableException();
 }
}
On running the program listed above, application throws following exception.
List 25
Exception in thread "main" java.io.NotSerializableException
 at com.serial.Player.writeObject(Player.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at java.io.ObjectStreamClass.invokeWriteObject(Unknown Source)
 at java.io.ObjectOutputStream.writeSerialData(Unknown Source)
 at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)
 at java.io.ObjectOutputStream.writeObject0(Unknown Source)
 at java.io.ObjectOutputStream.writeObject(Unknown Source)
 at com.serial.DemoSerialization.main(DemoSerialization.java:13)
Serialize when source code is not available
There are instances when third party class/subclass must be serialized but vendor has not provided the source code. Then what options do the coder has?

Don’t Serialize the class
In case of composition, you can mark the variable/field transient, which will suggest compiler not to serial the object.
List 26
transient Address office;
Okay, Application cannot live without the object, and it is required to serialize
In this case, it is required to implement readObject and writeObject according to the requirement.
List 27
 private void writeObject(ObjectOutputStream oos) throws IOException, ClassNotFoundException {
  try {
   oos.defaultWriteObject();
   oos.writeInt(home.getHouseNo());
  }
  catch (Exception e) { 
   e.printStackTrace(); 
  }
 }
 private void readObject(ObjectInputStream ois) throws IOException, ClassNotFoundException {
  try {
   ois.defaultReadObject();
   home = new Address(ois.readInt(), "", 0);
  } catch (Exception e) { 
   e.printStackTrace(); 
  }
 }
Please maintain the same order of reading and writing, else serialized object will differ from the object de-serialized.


Enum constants are serialized differently than ordinary serializable or externalizable objects. The serialized form of an enum constant consists solely of its name; field values of the constant are not present in the form. To serialize an enum constant, ObjectOutputStream writes the value returned by the enum constant's name method. To deserialize an enum constant, ObjectInputStream reads the constant name from the stream; the deserialized constant is then obtained by calling the java.lang.Enum.valueOf method, passing the constant's enum type along with the received constant name as arguments. Like other serializable or externalizable objects, enum constants can function as the targets of back references appearing subsequently in the serialization stream.

The process by which enum constants are serialized cannot be customized: any class-specific writeObject, readObject, readObjectNoData, writeReplace, and readResolve methods defined by enum types are ignored during serialization and deserialization. Similarly, any serialPersistentFields or serialVersionUID field declarations are also ignored--all enum types have a fixed serialVersionUID of 0L. Documenting serializable fields and data for enum types is unnecessary, since there is no variation in the type of data sent.


When java has provided default and robust implementation of Serialization, then why it mandated developers to implement marker interface. Java could have provided serialization as default behavior of class.

Best explanation for java’s such decision is following

Serialization is fraught with pitfalls. Automatic serialization support of this form makes the class internals part of the public API (which is why javadoc gives you the persisted forms of classes).

For long-term persistence, the class must be able to decode this form, which restricts the changes you can make to class design. This breaks encapsulation.
Serialization can also lead to security problems. By being able to serialize any object it has a reference to, a class can access data it would not normally be able to (by parsing the resultant byte data).

There are other issues, such as the serialized form of inner classes not being well defined.
Making all classes serializable would exacerbate these problems. Check out Effective Java Second Edition, in particular Item 74: Implement Serializable judiciously.

Another reason, but it has it’s counter argument, for not providing default serialization can be following

Not everything is genuinely serializable. Take a network socket connection, for example. You could serialize the data/state of your socket object, but the essence of an active connection would be lost.

Default serialization is somewhat verbose, and assumes the widest possible usage scenario of the serialized object, and accordingly the default format (Serializable) annotates the resultant stream with information about the class of the serialized object.

Externalization give the producer of the object stream complete control over the precise class meta-data (if any) beyond the minimal required identification of the class (e.g. its name). This is clearly desirable in certain situations, such as closed environments, where producer of the object stream and its consumer (which reifies the object from the stream) are matched, and additional metadata about the class serves no purpose and degrades performance.

Additionally (as Uri point out) externalization also provides for complete control over the encoding of the data in the stream corresponding to Java types. For (a contrived) example, you may wish to record boolean true as 'Y' and false as 'N'. Externalization allows you to do that.

If you want to read more about serialization, you can browse following articles.


There are few more topics to cover on serialization, that we can take up sometime later.
Few of them are as following:
  • Content of Serialized Objects.
  • Performance of Serialized Objects.

No comments:

Post a Comment