Skip to end of metadata
Go to start of metadata

Java defines the equality operators == and != for testing reference equality but uses the equals() method defined in Object and its subclasses for testing abstract object equality. Naïve programmers often confuse the intent of the == operation with that of the Object.equals() method. This confusion is frequently evident in the context of processing String objects.

As a general rule, use the Object.equals() method to check whether two objects have equivalent contents and use the equality operators == and != to test whether two references specifically refer to the same object. This latter test is referred to as referential equality. For classes that require overriding the default equals() implementation, care must be taken to also override the hashCode() method (see MET09-J. Classes that define an equals() method must also define a hashCode() method).

Numeric boxed types (for example, Byte, Character, Short, Integer, Long, Float, and Double) should also be compared using Object.equals() rather than the == operator. While reference equality may appear to work for Integer values between the range −128 and 127, it may fail if either of the operands in the comparison are outside that range. Numeric relational operators other than equality (such as <, <=, >, and >=) can be safely used to compare boxed primitive types (see  EXP03-J. Do not use the equality operators when comparing values of boxed primitives for more information).

Noncompliant Code Example

This noncompliant code example declares two distinct String objects that contain the same value:

public class StringComparison {
  public static void main(String[] args) {
    String str1 = new String("one");
    String str2 = new String("one");
    System.out.println(str1 == str2); // Prints "false"
  }
}

The reference equality operator == evaluates to true only when the values it compares refer to the same underlying object. The references in this example are unequal because they refer to distinct objects.

Compliant Solution (Object.equals())

This compliant solution uses the Object.equals() method when comparing string values:

public class StringComparison {
  public static void main(String[] args) {
    String str1 = new String("one");
    String str2 = new String("one");
    System.out.println(str1.equals( str2)); // Prints "true"
  }
}

Compliant Solution (String.intern())

Reference equality behaves like abstract object equality when it is used to compare two strings that are results of the String.intern() method. This compliant solution uses String.intern() and can perform fast string comparisons when only one copy of the string one is required in memory.

public class StringComparison {
  public static void main(String[] args) {
    String str1 = new String("one");
    String str2 = new String("one");

    str1 = str1.intern();
    str2 = str2.intern();

    System.out.println(str1 == str2); // Prints "true"
  }
}

Use of String.intern() should be reserved for cases in which the tokenization of strings either yields an important performance enhancement or dramatically simplifies code. Examples include programs engaged in natural language processing and compiler-like tools that tokenize program input. For most other programs, performance and readability are often improved by the use of code that applies the Object.equals() approach and that lacks any dependence on reference equality.

The Java Language Specification (JLS) [JLS 2013] provides very limited guarantees about the implementation of String.intern(). For example,

  • The cost of String.intern() grows as the number of intern strings grows. Performance should be no worse than O(n log n), but the JLS lacks a specific performance guarantee.
  • In early Java Virtual Machine (JVM) implementations, interned strings became immortal: they were exempt from garbage collection. This can be problematic when large numbers of strings are interned. More recent implementations can garbage-collect the storage occupied by interned strings that are no longer referenced. However, the JLS lacks any specification of this behavior.
  • In JVM implementations prior to Java 1.7, interned strings are allocated in the permgen storage region, which is typically much smaller than the rest of the heap. Consequently, interning large numbers of strings can lead to an out-of-memory condition. In many Java 1.7 implementations, interned strings are allocated on the heap, relieving this restriction. Once again, the details of allocation are unspecified by the JLS; consequently, implementations may vary.

String interning may also be used in programs that accept repetitively occurring strings. Its use boosts the performance of comparisons and minimizes memory consumption.

When canonicalization of objects is required, it may be wiser to use a custom canonicalizer built on top of ConcurrentHashMap; see Joshua Bloch's Effective Java, second edition, Item 69 [Bloch 2008], for details.

Applicability

Confusing reference equality and object equality can lead to unexpected results.

Using reference equality in place of object equality is permitted only when the defining classes guarantee the existence of at most one object instance for each possible object value. The use of static factory methods, rather than public constructors, facilitates instance control; this is a key enabling technique. Another technique is to use an enum type.

Use reference equality to determine whether two references point to the same object.

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

2.1.3

Interning CheckerCheck for errors in equality testing and interning (see Chapter 5)
SonarQube6.7S1698 

Bibliography

[Bloch 2008]Item 69, "Prefer Concurrency Utilities to wait and notify"

[FindBugs 2008]

ES, "Comparison of String Objects Using == or !="

[JLS 2013]

§3.10.5, "String Literals"
§5.6.2, "Binary Numeric Promotion"

 


10 Comments

  1. Do we have a guideline along the lines of "do not assume that an object implements a meaningful equals method (i.e. one that is not just reference equality under the covers)"? If not, should we add one (or, extend this one)? I've certainly been frequently bitten by this mistake.

  2. I have strong reservations about including String.intern in this guideline.  for one thing, this rule isn't specific to a class, but that CS is specific to a single class.  primarily, however, the correct uses of String.intern is few and far between.  in fact, i would recommend never using String.intern, and if that sort of functionality were needed, to implement your own interning solution.  it's really designed for internal jvm usage, and it's unclear why it was ever really exposed at all in the public api.

    1. Well, we do have to include the 'intern' CS because it is available and works as advertised. If we omitted it, people would ask why.

      I'd always assumed that JVMs had some latitude in when they interned Strings (much like they have latitude in how much they memoize integers). It's possible that there exists a JVM where our NCCE actually prints true. Except for string literals, the JLS is silent about interning strings.

      Do you have a citation for your recommendation against using String.intern()?

      1. Couple of relevant links:

        http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html

        http://books.google.com/books?id=Ft8t0S4VjmwC&pg=PA274&lpg=PA274&dq=josh+bloch+java+string+intern&source=bl&ots=D3y7Y4tIzk&sig=oaRTOlTvmFfE6ExwyOlVwlEsfVA&hl=en&sa=X&ei=wQybUNCwGImo0AGLsoGoCA&ved=0CC4Q6AEwAA#v=onepage&q=josh%20bloch%20java%20string%20intern&f=false

        tl;dr:

        • String.intern and '==' is faster than ".equals", but difference isn't as large as you'd think (unsure if the benchmark also included the cost of initial intern call)
        • String.intern saves heap memory at the expense of permgen (which is typically much smaller)
        • Custom String intern is faster than builtin impl
        • String.intern Strings are garbage collected (they are not immortal as stated above after the CS)

        All this to say, there is a minefield of "gotchas" around using String.intern.  I see your point that not including could be seen as a hole, but i think there should be a fairly strong recommendation about using it in practice.  (also, as i mentioned above, it is a CS which is specific to the String class and not a general solution for the rule).

        As a side note, i don't think you could have a compliant JVM where the NCCE prints true since you are explicitly invoking the String constructor.  The "magic" number interning is only used for auto-boxing or the static "valueOf" method (neither of which explicitly invokes a constructor, e.g. "new Integer(3)").

        1. I've edited the String.intern CS to reflect the various gotchas.  I note that the gotchas are present because the JLS leaves most of the characteristics of String.intern unspecified (which is probably a good thing!); consequently implementations have varied quite a bit over the years. The CS now gives natural language processing and compiler-like tools as examples where String.intern() might be a good choice, notes the varying behavior and lack of specification, and refers to Bloch 2008 for a more general (and possibly higher performance) example of canonicalizing objects.

          Note: Still need to fix the link to Bloch 2008 in the References section. 

          1. Yes, i think the caveats are much clearer now.

  3. Does the NCE isEqual() implementation really need to be that complex?  i realize it is "paralleling" the CS, but the code really boils down to return str1 == str2;.

    1. I simplified the code examples.

    • The following quote stands out a little too much

    When canonicalization of objects is required, it may be wiser to use a custom canonicalizer built on top of ConcurrentHashMap; see Joshua Bloch's Effective Java, second edition, Item 69 [Bloch 2008], for details.

    • Either we should add a CS to explain it, or add some text for it to make sense or let a black hole grab it.
    1. I think that it's good enough for now.  We have run out of time to do more.