I was reading this post from Eric Sink. It got me thinking about object equality. I work in the .NET world, but I am pretty sure what I am saying is also basically true for Java.

In C#, or indeed any .NET language, every class, value, structure, derives from System.Object. Object has four methods, Equals(), GetHashCode, ToString() and GetType(). The first three are virtual and can be overriden in any derived class. GetType() returns the System.Type of the object and cannot be overriden. So you cannot create two classes, foo and bar, and fool an instance of bar into thinking it is a foo.

Suppose we have a Customer class and we produce two customers:

Customer a = new customer();
Customer b = new customer();
if (a.Equals(b))
{
// We cannot get here
}

Customer a and customer b are two separate objects. object1.Equals(object2) returns true if object1 and object2 are the same object. But in the example above, a and b were two different customers.

So far so good, and all standard stuff.

The problem is that often a customer object will often be created by pulling data from a database. Within the database each customer is identified by a unique identifier, or primary key. Within the database two customers are considered equal if they have the same primary key.

To get the data into a customer object we have a method somewhere, maybe in a factory class, a repository, an O/R mapper, or even in the Customer Entity object itself (I know, the OO purists will complain about this).

Let’s suppose we have a CustomerManager class with a method that fetches a customer from the database using the primary key. So our code above now looks like this:

Customer a = CustomerManager.FetchCustomer(1);
Customer b = CustomerManager.FetchCustomer(1);

We now have the two customers, a and b, who share identical data. As far as the database is concerned they are the same entity. But a.Equals(b); will still return false because a and b are two different objects. At a lower level when we create a and b the CLR allocates space for two integers on the stack. These two integers contain the addresses on the managed heap for each object. The object.Equals() method compares the two references. In this case the references are different because we have created two objects.

What do we do if we want a.Equals(b); to return true? The simple answer, indeed the only answer, is to override the Equals method in the Customer class. So Customer now looks like this:

public class Customer
{
   private int customerId;

   public int CustomerId { get { return customerId; } }

   public Customer(int id)
   {
      customerId = id;
   }

   public override bool Equals(Customer cust)
   {
      if (cust.CustomerId == customerId)
         return true;

      return false;
    }
}

But I am not sure that I like this solution. The O/R mapper that I use, LLBLGen Pro, does it this way. I am a fan of LLBL Gen, and I think it is reasonable for it to use this approach. The point here is that there is a direct mapping from database rows to object entities. So if I want to compare two entities which are mapped from the database then they should be the same if the underlying database rows are the same. But if I create my own classes, distinct from the O/R mapper entity classes then the situation is not quite so clear cut.

Jeffrey Richter, in his excellent CLR via C#, suggests that Microsoft should have implemented the Object.Equals() method differently and gives advice on how to implement the Equals method:

  1. If the obj argument is null, return false because the current object identified by this is obviously not null when the nonstatic Equals method is called.
  2. If the this and obj arguments refer to objects of different types, return false. Obviously, checking if a String object is equal to FileStream object should result in a false result.
  3. For each instance field defined by the type, compare the value in the this object with the value in the obj object. If any fields are not equal, return false.
  4. Call the base class’s Equals method so it can compare any fields defined by it. If the base class’s Equals method returns false, return false; otherwise, return true.

This sounds good, but there are problems. The entity object obtained from the database is a case in point. Suppose we fetch a customer entity and then edit it. Prior to saving the entity we may fetch it again, perhaps within a collection. Using Richter’s definition of equality the two objects would not be equal. Using the method in LLBLGen’s entity classes the two objects would be equal, but the edited object would have a property IsDirty set to true.

I really don’t think that there is one solution to fit all possible cases. I suppose it is just because of the Object/Relational mismatch, which is the problem which O/R mappers have tried, and in many cases quite successfully, to address.

The whole point is that equality and identity are not as straightforward as they might, at first, appear.