IDS01-J. Sanitize data passed across a trust boundary

Many programs accept data from untrusted sources, and then pass the (modified or unmodified) data across a trust boundary to a component in a separate trust domain.

Input must be sanitized, both because an application may be unprepared to handle the malformed input, and also because unsanitized input may include an injection attack.

As a result, it is necessary to sanitize all string data passed to complex subsystems so that the resulting string is innocuous in the context in which it will be interpreted.

Noncompliant Code Example (Blacklisting)

Blacklisting is the process of examining input data, looking for components that are known to be invalid. One advantage of this approach is that detection of known invalid input is often straightforward. A disadvantage is that the set of all possible invalid inputs may be unknown, or too large to enumerate fully.

Depending on the language and subsystem in question, certain characters and character sequences are frequently considered to be invalid input when encountered in strings. A common set of such characters includes:

Character	Name
LF \r	Line Feed
CR \n	Carriage Return
CRLF \r\n	Line Feed + Carriage Return
" and '	Quotes
, and ;	Comma, semicolon, white space
/ and \	Forward and back slash
< and >	Angle brackets
&	Ampersand
%00	NULL
( and )	Parentheses
%	Percent

A blacklist of invalid inputs would forbid the appearance of any of these characters in their raw form. Note that determination of what constitutes invalid input can be difficult. For example, input validation of textual data using a black-listing approach requires enumerating not only the invalid characters shown above, but also the alternate Unicode representations of these characters in differing locales.

This noncompliant example must build a URI from untrusted input. It sanitizes the input by checking for angle brackets. However, the URI may consist of UTF-8 encoded character sequences. If the filter fails to forbid the % characters that comprise part of the UTF-8 encoding, it cannot achieve its purpose. For example, an attacker can bypass the filter by specifying the hexadecimal encoded form of the sequence <script> as %3C%73%63%72%69%70%74%3E.

String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script>

Pattern pattern = Pattern.compile("[<>]");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException("Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

Noncompliant Code Example

This noncompliant code example attempts to check for the hex-encoded form in addition to the canonical representation of the angle brackets. Note, however, that the program remains vulnerable when an alternative encoding, such as a modified Base64 URL encoding, is used farther along the chain.

String tainted = Base64.encode("%3C%73%63%72%69%70%74%3E".getBytes()); // <script>

Pattern pattern = Pattern.compile("(%3C|<)(.*)(%3E|>)");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException("Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

This approach also fails to prevent other forms of injection attacks that do not rely on angle brackets. Further, the infeasibility of exhaustive enumeration of all forms of blacklisted characters renders the use of methods such as String.replaceAll() ineffective for sanitizing untrusted user input.

Compliant Solution (Whitelisting)

The whitelisting approach to input validation consists of building a list of valid input elements (such as characters) and ensuring that untrusted input elements appear on that list. Whitelisting is easier than blacklisting when it is easier to enumerate valid input elements than to detect and reject all instances of invalid input elements. But this advantage over blacklisting fails to apply when the set of valid input elements is difficult or impossible to enumerate and creating a subset of valid input elements is not a viable solution.

This compliant solution validates the input based on a whitelist. It permits the URL to contain only alphanumeric characters and the encoded forms of the space (" ") and period (".") characters; all other characters are treated as invalid and are rejected.

String tainted = "%3C%73%63%72%69%70%74%3E"; // Hex encoded equivalent form of <script>

Pattern pattern = Pattern.compile("[\\W&&[IDS01-J. Sanitize data passed across a trust boundary^\\s\\.]]");
if (pattern.matcher(tainted).find()) {
  throw new ValidationException( "Invalid Input");
}
URI uri = new URI("http://vulnerable.com/" + tainted);

Risk Assessment

Failure to sanitize user input before processing or storing it can lead to injection attacks.

Guideline	Severity	Likelihood	Remediation Cost	Priority	Level
IDS01-J	high	probable	medium	P12	L1

Related Vulnerabilities

CVE-2008-2370 describes a vulnerability in Apache Tomcat 4.1.0 through 4.1.37, 5.5.0 through 5.5.26, and 6.0.0 through 6.0.16. When a RequestDispatcher is used, Tomcat performs path normalization before removing the query string from the URI, which allows remote attackers to conduct directory traversal attacks and read arbitrary files via a .. (dot dot) in a request parameter.

Search for other vulnerabilities resulting from the violation of this guideline on the CERT website.

Bibliography

[[OWASP 2008]] Testing for XML Injection (OWASP-DV-008)

[[OWASP 2005]]
[[OWASP 2007]]

13. Input Validation and Data Sanitization (IDS) 13. Input Validation and Data Sanitization (IDS) IDS02-J. Normalize strings before validating them

Space shortcuts

Page tree