Lecture Notes: Equality, Hashing, and Cloning

Related Reading: Ch. 7.3 - 7.4

Equality Testing

Last time, we introduced the basic issue of equality testing, namely the distinctin for reference types between

Requirements for a consistent equals method

These requirements are important for any types that might be stored in Java's standard container classes, as they depend greatly on these properties for their algorithmic integrity. As natural as they same, it is actually quite difficult to properly implement such an equals method, especially in an object-oriented inheritance hierarchy.

Our book gives a worthy discussion of the issue based on the example of Employee and Manager classes. However, it is a bit difficult to motivate the need for equivalence testing for employees, as there would not likely be two distinct instances representing the same person, nor would there be much need for considering two different people "equal" when they happen to share the identical name and salaray. For variety, we will examine another classic case study from the literature.


Case Study: Point classes

We will use an example that seems to be attributed to Joshua Block's book, Effective Java Programming language Guide (Addison-Wesley, 2001), based on a simple Point class with x and y attributes, and a ColorPoint subclass, that adds a color attribute. We also define a RobustPoint subclass of Point, that adds no new state information, but some extra functionality. Our ideas are based upon

For our own experiments, we have a test harness that defines the following variables:

We then consider what outcome we expect from each of the following tests:

Now, we will consider various attempts at implementing equals for these classes.
You may download all variants of code we will consider, or find them at turing:/Public/goldwasser/290


Version 1 (flawed)

In the Point class we define:

    public boolean equals(Point other) {
        return (x == other.x && y == other.y);
    }
and in the ColorPoint class we define:
    public boolean equals(ColorPoint other) {
        return c == other.c;
    }

There are many flaws. As a warmup, we note the following unexpected result. g1.equals(g2) returns false. Why?


Version 2 (flawed)

Same Point class:

    public boolean equals(Point other) {
        return (x == other.x && y == other.y);
    }
slight modification to ColorPoint:
    public boolean equals(ColorPoint other) {
        return c.equals(other.c);
    }
There are still many flaws. Why does g1.equals(g4) return true?


Version 3 (flawed)

Same Point class:

    public boolean equals(Point other) {
        return (x == other.x && y == other.y);
    }
Another modification to ColorPoint:
    public boolean equals(ColorPoint other) {
        return (super.equals(other) && c.equals(other.c));
    }
Unfortunately, still flawed. In particular, g1.equals(p1) returns true, as does p1.equals(g1). At least its symmetric!. But still a problem since b1.equals(p1) is also true, yet b1.equals(g1) is false, so lack of transitivity.


Version 4 (flawed)

In this version, we ensure that all signatures rely on Object as the parameter type, and then cast back to the target type.

    public boolean equals(Object other) {
        if (!(other instanceof Point)) return false;  // this covers null case as well
        Point p = (Point) other;
        return (x == p.x && y == p.y);
    }
Another modification to ColorPoint:
    public boolean equals(Object other) {
        if (!(other instanceof ColorPoint)) return false;  // this covers null case as well
        ColorPoint cp = (ColorPoint) other;
        return (super.equals(cp) && c.equals(cp.c));
    }

Some interesting cases:


Version 5 (flawed)

The problem of asymmetry stems from the fact that a ColorPoint qualifies as an instance of Point, but a Point does not qualify as an instance of ColorPoint. Should we ever consider instances of those two types to be quivalent? If so, we need to somehow have p.equals(cp) match cp.equals(p).

Resolving a consistent semantics for equivalence across levels of an inheritance hierarchy is difficult to do. A typical way to ensure an equivalence relation is simply to say that instances of two different types are never evaluated as equivalent. But still, how should we implement this? Let's consider the following. For the Point class, we might define:

    public boolean equals(Object other) {
        if (this == other) return true;
        if (other == null) return false;
        if (other.getClass() != Point.class) return false;
        Point p = (Point) other;
        return (x == p.x && y == p.y);
    }
and for ColorPoint as:
    public boolean equals(Object other) {
        if (this == other) return true;
        if (other == null) return false;
        if (other.getClass() != ColorPoint.class) return false;
        ColorPoint cp = (ColorPoint) other;
        return super.equals(cp) && c.equals(cp.c);
    }
Still, there are problems. Of particular note, we find that b1.equals(b2) is false. How can that be???

Version 6 (good)

So the problem in the above was that the Point.equals method was written assuming that everything had to be a true Point to succeed, but when two ColorPoint instances are compared, the ColorPoint.equals relied upon super.equals for the check of the inherited state. Instead, we can ensure that any two instances being compared belong to the same class, but without assuming that we know what that class is. That is, the Point.equals method can be coded as

    public boolean equals(Object other) {
        if (this == other) return true;
        if (other == null) return false;
        if (getClass() != other.getClass()) return false;
        Point p = (Point) other;
        return (x == p.x && y == p.y);
    }

We could choose to use a simlar approach for ColorPoint, but knowing that the base class already performs these checks, we can have a much simpler implementation for the ColorPoint subclass as:

    public boolean equals(Object other) {
        if (this == other) return true;
        if (!super.equals(other)) return false;  
        ColorPoint cp = (ColorPoint) other;
        return c.equals(cp.c);
    }
Note that the call to super ensures matching classes, non-null values, and equivalent inherited states.


Version 7 (great)

An unfortunate consequence of the decision to enforce matching classes, is that a RobustPoint can never be equivalent to a Point, even if they have the same state. A more advanced approach to allow this form of equivalence is not to enforce matching of exact classes, but instead to allow each class to define a canonical class to which it wants to be considered equivalent, and then to compare two instances canonical classes. It might work as follows. For the Point class, we define the following:

    protected Class getEquivalenceClass() {
        return Point.class;
    }

    public boolean equals(Object other) {
        if (!(other instanceof Point)) return false;   // including when other is null
        Point p = (Point) other;
        return (getEquivalenceClass() == p.getEquivalenceClass()
                && x == p.x && y == p.y);
    }
We do not do anything special for RobustPoint, and therefore it inherits both of these methods. Notice that this means that a call to getEquivalenceClass() on a RobustPoint instance still results in Point.class, and therefore it could match a point.

In contrast, if we do not want ColorPoint instances to be allowed to be equivalent to a Point (or RobustPoint), we do the following:

    protected Class getEquivalenceClass() {
        return ColorPoint.class;
    }

    public boolean equals(Object other) {
        if (other instanceof ColorPoint) {
            ColorPoint cp = (ColorPoint) other;
            if (!c.equals(cp.c)) return false;    // definite mismatch
        }
        return super.equals(other);
    }


Version 8 (outstanding)

We borrow one final clever example from the blog post by Tal Cohen, attributed to Rene Smit. Consider an interpretation where the designer of ColorPoint wishes to implicitly assume that the color of each standard Point is black, even though there is no explicit color field. We can guarantee all the desired equivalence properties, while allowing black ColorPoint instances to agree with all Point instances that share the same x and y values. Given our running test suite, we could define the ColorPoint class in a way such that b1.equals(p1) and p1.equals(b1) both evaluate to true. This is particularly tricky given that the syntax p1.equals(b1) is based upon the method from the Point class.

However, in this framework, we have a way to define the ColorPoint class to effect the semantics of Point.equals, in particular because of the role of getEquivalenceClass on that logic. The strategy is to define ColorPoint.getEquivalenceClass in a way such that all color points that happen to be black, get Point.class as their representative class, while all non-black points get ColorPoint.class. Of course, this means that we cannot have a black color point and a non-black color point as equivalent, but we would never want that to be the case. The only difference from the previous version to this one is to define Color.getEqivalenceClass() as follows:

    protected Class getEquivalenceClass() {
        return c.equals(Color.BLACK) ? Point.class : ColorPoint.class;
    }


Hashing

Object.hashCode() can be used to define an appropriate hash value for using an object in a hash-based container.

If objects from a class might ever be used in a hash-based container, it is imperative that if x.equals(y), then x.hashCode() == y.hashCode(). Note that the converse might not be the case, given that some hash codes collide by coincidence. Therefore, if there is ever reason to overload the equals method for a class, the designer should consider defining hashCode in accordance.

A hash code should depend upon the attributes that are used in defining equivalence, and it should be the case that those attributes are immutable; otherwise, use of such objects in a hash-based container will be unreliable.

For example, our Point class defines the following hash code, based upon the bitwise or of the two components:

    public int hashCode() {
        return (new Double(x).hashCode()) ^ (new Double(y).hashCode());
    }
For the ColorPoint class, we might consider defining another hash code that takes into consideration the color choice as well, but we could also choose to simply rely on the inherited hash code if we belive it provides sufficient spread of the domain. Note that the inherited one will still guarantee that equivalent ColorPoint instances get equal hash codes.


Shallow and Deep Copy

The Object.clone() method is intentionally protected so that there is no assumption that objects can be cloned by a user. However, if you wish to provide a definition for making a copy of your object, you should

  1. Override the clone method with public access.
  2. Tag your class by having it implement the Cloneable interface; part of the functionality of the parent Object.clone is to throw a CloneNotSupportedException if the object is not an instance of a Cloneable subtype.

Expected requirements for the semantics of clone() are that:


Last modified: Tuesday, 11 October 2011