Many classes allow inclusion of escape sequences in character and string literals; examples include java.util.regex.Pattern as well as classes that support XML- and SQL-based actions by passing string arguments to methods. According to the Java Language Specification (JLS), §3.10.6, "Escape Sequences for Character and String Literals" [JLS 2013],

The character and string escape sequences allow for the representation of some nongraphic characters as well as the single quote, double quote, and backslash characters in character literals (§3.10.4) and string literals (§3.10.5).

Correct use of escape sequences in string literals requires understanding how the escape sequences are interpreted by the Java compiler as well as how they are interpreted by any subsequent processor, such as a SQL engine. SQL statements may require escape sequences (for example, sequences containing \t\n\r) in certain cases, such as when storing raw text in a database. When representing SQL statements in Java string literals, each escape sequence must be preceded by an extra backslash for correct interpretation.

As another example, consider the Pattern class used in performing regular expression-related tasks. A string literal used for pattern matching is compiled into an instance of the Pattern type. When the pattern to be matched contains a sequence of characters identical to one of the Java escape sequences—"\" and "n", for example—the Java compiler treats that portion of the string as a Java escape sequence and transforms the sequence into an actual newline character. To insert the newline escape sequence, rather than a literal newline character, the programmer must precede the "\n" sequence with an additional backslash to prevent the Java compiler from replacing it with a newline character. The string constructed from the resulting sequence,

\\n

consequently contains the correct two-character sequence \n and correctly denotes the escape sequence for newline in the pattern.

In general, for a particular escape character of the form \X, the equivalent Java representation is

\\X

Noncompliant Code Example (String Literal)

This noncompliant code example defines a method, splitWords(), that finds matches between the string literal (WORDS) and the input sequence. It is expected that WORDS would hold the escape sequence for matching a word boundary. However, the Java compiler treats the "\b" literal as a Java escape sequence, and the string WORDS silently compiles to a regular expression that checks for a single backspace character.

public class Splitter {
  // Interpreted as backspace
  // Fails to split on word boundaries
  private final String WORDS = "\b";

  public String[] splitWords(String input) {
    Pattern pattern = Pattern.compile(WORDS);
    String[] input_array = pattern.split(input);
    return input_array;
  }
}

Compliant Solution (String Literal)

This compliant solution shows the correctly escaped value of the string literal WORDS that results in a regular expression designed to split on word boundaries:

public class Splitter {
  // Interpreted as two chars, '\' and 'b'
  // Correctly splits on word boundaries
  private final String WORDS = "\\b"; 

  public String[] split(String input){
    Pattern pattern = Pattern.compile(WORDS);
    String[] input_array = pattern.split(input);
    return input_array;
  }
}

Noncompliant Code Example (String Property)

This noncompliant code example uses the same method, splitWords(). This time the WORDS string is loaded from an external properties file.

public class Splitter {
  private final String WORDS;
 
  public Splitter() throws IOException {
    Properties properties = new Properties();
    properties.load(new FileInputStream("splitter.properties"));
    WORDS = properties.getProperty("WORDS");
  }

  public String[] split(String input){
    Pattern pattern = Pattern.compile(WORDS);
    String[] input_array = pattern.split(input);
    return input_array;
  }
}

In the properties file, the WORD property is once again incorrectly specified as \b

WORDS=\b

This is read by the Properties.load() method as a single character b, which causes the split() method to split strings along the letter b. Although the string is interpreted differently than if it were a string literal, as in the previous noncompliant code example, the interpretation is incorrect.

Compliant Solution (String Property)

This compliant solution shows the correctly escaped value of the WORDS property:

WORDS=\\b

Applicability

Incorrect use of escape characters in string inputs can result in misinterpretation and potential corruption of data.

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

2.1.3

Tainting CheckerTrust and security errors (see Chapter 8)

Bibliography

 


6 Comments

  1. The intro example involving \n is wrong. "\n" is not a backreference, it is '\1', '\2', etc. The Pattern class has no character that means one thing after a \ and another after two
    \\s. The closest it has to ambiguity is wrt case: \a and \A have completely different meanings. So we need a different example in the intro.

    The NCCE uses \b, which is, in fact, the only Java escape sequence that isn't supported by the Pattern class. You could, however, confuse backreferences with octal codes (eg is \000 a backreference or octal code?)

    1. Hopefully fixed.

       

      I think that the NCCE is OK - \b in a string will be interpreted as a backspace and, if that is used in a Pattern, it will not be correctly interpreted as a word boundary.

  2. 1) SQL statements written in Java, for example, sometimes require certain escape characters or sequences (e.g., sequences containing \t\n\r). When representing SQL queries in Java string form, all escape sequences must be preceded by an extra backslash for correct interpretation.


    SQl statements are actually not written out that way. The usual method is to read individual queries from a properties file (key-value pair). 

    2) To avoid inserting a newline character, the programmer must precede the "\n" sequence with an additional backslash to prevent the Java compiler from treating it as an escape sequence. The string constructed from the resulting sequence

    \\n

     consequently contains the correct two-character sequence \n and correctly denotes a newline character in the pattern.


    On first read the conclusion appears to contradict the introductory statement. Perhaps it would help to reword to "and correctly denotes a newline character in the pattern for the purpose of matching with an input string"

    3) The guideline is actually saying "add an extra slash when comparing escape sequences". Isn't that easier to understand?

    1. SQl statements are actually not written out that way. The usual method is to read individual queries from a properties file (key-value pair).

      They more certainly are. Here is the MySQL documentation regarding string literals: http://dev.mysql.com/doc/refman/5.0/en/string-literals.html

      On first read the conclusion appears to contradict the introductory statement. Perhaps it would help to reword to "and correctly denotes a newline character in the pattern for the purpose of matching with an input string"

      I agree. I wordsmithed that text.

      3) The guideline is actually saying "add an extra slash when comparing escape sequences". Isn't that easier to understand?

      That is the specific advice on fixing the NCCE. The guidline is trying to be more general and focus on why you need two backslashes. Can we have another NCCE that isn't solved by adding a second backslash? (Perhaps just a description of such an example is enough.)

      1. I am just going to answer #1 for now: I think that the SQL query sentence is a bit unclear which led to the confusion (I haven't seen any escape sequence in any SQL queries in two years). The example link you provided exemplifies how escape characters can be used from the MySql command line to format text for the command line.

        A better example I can think of is - if an application wants to save formatted user input into the DB (such as to preserve newlines) then it may expect escape sequences. We should cite such an example otherwise the sentence sounds a bit confusing and open to reader interpretation.

        1. Made a bunch of changes to this rule, please review.  Dhruv, I wordsmithed the SQL text.