Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: fixed HTML tags of NCCE

...

When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.

Noncompliant Code Example

The Normalizer.normalize() method transforms Unicode text into the standard normalization forms described in Unicode Standard Annex #15 Unicode Normalization FormsFrequently, the most suitable normalization form for performing input validation on arbitrarily encoded strings is KC (NFKC) .

...

Code Block
bgColor#FFcccc
// String s may be user controllable
// \uFE64 is normalized to < and \uFE65 is normalized to > using the NFKC normalization form
String s = "\uFE64" + "script" + "\uFE65";

// Validate
Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
  // Found black listed tag
  throw new IllegalStateException();
} else {
  // ...
}

// Normalize
s = Normalizer.normalize(s, Form.NFKC);
 

The validation logic fails to detect the <script> tag because it is not normalized at the time. Therefore the system accepts the invalid input.

Compliant Solution

This compliant solution normalizes the string before validating it. Alternative representations of the string are normalized to the canonical angle brackets. Consequently, input validation correctly detects the malicious input and throws an IllegalStateException.

...