Proper input sanitization can prevent insertion of malicious data into a subsystem such as a database. However, different subsystems require different types of sanitization. Fortunately, it is usually obvious which subsystems will eventually receive which inputs, and consequently what type of sanitization is required.

Several subsystems exist for the purpose of outputting data. An HTML renderer is one common subsystem for displaying output. Data sent to an output subsystem may appear to originate from a trusted source. However, it is dangerous to assume that output sanitization is unnecessary because such data may indirectly originate from an untrusted source and may include malicious content. Failure to properly sanitize data passed to an output subsystem can allow several types of attacks. For example, HTML renderers are prone to HTML injection and cross-site scripting (XSS) attacks [OWASP 2011]. Output sanitization to prevent such attacks is as vital as input sanitization.

As with input validation, data should be normalized before sanitizing it for malicious characters. Properly encode all output characters other than those known to be safe to avoid vulnerabilities caused by data that bypasses validation. See IDS01-J. Normalize strings before validating them for more information.

Noncompliant Code Example

This noncompliant code example uses the model-view-controller (MVC) concept of the Java EE–based Spring Framework to display data to the user without encoding or escaping it. Because the data is sent to a web browser, the code is subject to both HTML injection and XSS attacks.

@RequestMapping("/getnotifications.htm")
public ModelAndView getNotifications(
  HttpServletRequest request, HttpServletResponse response) {
  ModelAndView mv = new ModelAndView();
  try {
    UserInfo userDetails = getUserInfo();
    List<Map<String,Object>> list = new ArrayList<Map<String, Object>>();
    List<Notification> notificationList = 
        NotificationService.getNotificationsForUserId(userDetails.getPersonId());
           
    for (Notification notification: notificationList) {
      Map<String,Object> map = new HashMap<String, Object>();
      map.put("id", notification.getId());
      map.put("message", notification.getMessage());
      list.add(map);
    }
            
     mv.addObject("Notifications", list);
  } 
    catch(Throwable t) {
    // Log to file and handle
  }
 
  return mv;
}

Compliant Solution

This compliant solution defines a ValidateOutput class that normalizes the output to a known character set, performs output sanitization using a whitelist, and encodes any unspecified data values to enforce a double-checking mechanism. Note that the required whitelisting patterns can vary according to the specific needs of different fields [OWASP 2013].

public class ValidateOutput {
  // Allows only alphanumeric characters and spaces
  private static final Pattern pattern = Pattern.compile("^[a-zA-Z0-9\\s]{0,20}$");

  // Validates and encodes the input field based on a whitelist
  public String validate(String name, String input) throws ValidationException {
    String canonical = normalize(input);

    if (!pattern.matcher(canonical).matches()) {
      throw new ValidationException("Improper format in " + name + " field");
    }
    
    // Performs output encoding for nonvalid characters 
    canonical = HTMLEntityEncode(canonical);
    return canonical;
  }

  // Normalizes to known instances 	
  private String normalize(String input) {
    String canonical = 
      java.text.Normalizer.normalize(input, Normalizer.Form.NFKC);
    return canonical;
  }

  // Encodes nonvalid data
  private static String HTMLEntityEncode(String input) {
    StringBuffer sb = new StringBuffer();

    for (int i = 0; i < input.length(); i++) {
      char ch = input.charAt(i);
      if (Character.isLetterOrDigit(ch) || Character.isWhitespace(ch)) {
        sb.append(ch);
      } else {
        sb.append("&#" + (int)ch + ";");
      }
    }
    return sb.toString();
  }
}
 
// ...
 
@RequestMapping("/getnotifications.htm")
public ModelAndView getNotifications(HttpServletRequest request, HttpServletResponse response) {
  ValidateOutput vo = new ValidateOutput();

  ModelAndView mv = new ModelAndView();
  try {
    UserInfo userDetails = getUserInfo();
    List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();
    List<Notification> notificationList = 
        NotificationService.getNotificationsForUserId(userDetails.getPersonId());
           
    for (Notification notification: notificationList) {
      Map<String,Object> map = new HashMap<String,Object>();
      map.put("id", vo.validate("id", notification.getId()));
      map.put("message", vo.validate("message", notification.getMessage()));
      list.add(map);
    }
            
     mv.addObject("Notifications", list);
  }
  catch(Throwable t) {
    // Log to file and handle
  }
 
  return mv;
}

Output encoding and escaping is mandatory when accepting dangerous characters such as double quotes and angle braces. Even when input is whitelisted to disallow such characters, output escaping is recommended because it provides a second level of defense. Note that the exact escape sequence can vary depending on where the output is embedded. For example, untrusted output may occur in an HTML value attribute, CSS, URL, or script; output encoding routine will be different in each case. It is also impossible to securely use untrusted data in some contexts. Consult the OWASP XSS (Cross-Site Scripting) Prevention Cheat Sheet for more information on preventing XSS attacks.

Noncompliant Code Example

This noncompliant code example takes a user input query string and build a URL. Because the URL is not properly encoded, the URL returned may not be valid if it contains non-URL-safe characters, as per RFC 1738.

String buildUrl(String q) {
  String url = "https://example.com?query=" + q;
 
  return url;
}

For example, if the user supplies the input string "<#catgifs>", the url returned is "https://example.com?query=<#catgifs>" which is not a valid URL.

Compliant Solution (Java 8)

Use java.util.Base64 to encode and decode data when transferring binary data over mediums that only allow printable characters like URLs, filenames, and MIME.

String buildEncodedUrl(String q) {
    String encodedUrl = "https://example.com?query=" + Base64.getUrlEncoder().encodeToString(q.getBytes());
 
    return encodedUrl;
}

If the user supplies the input string "<#catgifs>", the url returned is "https://example.com?query=%3C%23catgifs%3E" which is a valid URL.

Applicability

Failure to encode or escape output before it is displayed or passed across a trust boundary can result in the execution of arbitrary code.

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

2.1.3

Tainting CheckerTrust and security errors (see Chapter 8)
Parasoft Jtest
2023.1
CERT.IDS51.TDRESP
CERT.IDS51.TDXSS
Protect against HTTP response splitting
Protect against XSS vulnerabilities

Related Vulnerabilities

The Apache GERONIMO-1474 vulnerability, reported in January 2006, allowed attackers to submit URLs containing JavaScript. The Web Access Log Viewer failed to sanitize the data it forwarded to the administrator console, thereby enabling a classic XSS attack.

Bibliography



10 Comments

    • The real point of escaping and/or encoding is to not just have a second level of defense but to aid situations where dangerous characters must be permitted in the input. For example a user must be allowed to send a message to another user with any selection of unsafe characters. We need to highlight that in the guideline.
    • Some examples of cases where output encoding is necessary especially in the context of Java based web applications will particularly be of value. A line that summarizes [OWASP 2011] Cross-site Scripting (XSS) examples would be nice.
    • The NCE looks rather empty. Here is some code that you could add. It uses the MVC concept of the Java EE based Spring Framework to display some data to the user without encoding or escaping it.
    @RequestMapping("/getnotifications.htm")
    public ModelAndView getNotifications(HttpServletRequest request, HttpServletResponse response){
            ModelAndView mv = new ModelAndView();
            try{
               UserInfo userDetails = getUserInfo();
               List<Map<String,Object>> list = new ArrayList<Map<String,Object>>();
               List<Notification> notificationList = notificationService.getNotificationsForUserId( userDetails.getPersonId());
              
    		   for(Notification notification: notificationList) {
                   Map<String,Object>map = new HashMap<String,Object>();
                   map.put("id",notification.getId());
                   map.put( "message", notification.getMessage());
                   list.add( map);
               }
               
               mv.addObject("Notifications",list);
            }
            catch(Throwable t){
                // log to file and handle
            }
    
           return mv;
    }

     

     

  1. Also, we traditionally describe the exploit-ability of code in text after the code, not in a comment.

    • Dhruv's code example is a good start at a NCCE...it also does not indicate why not sanitizing its input is bad, but that can be fixed.
    1. For your first point, the rules IDS00-J. Sanitize untrusted data passed across a trust boundary and IDS01-J. Normalize strings before validating them are talking about sanitizing input.  This guideline emphasizes that it can also be important to sanitize output.

      (The other things can be fixed.)

  2. Revised to try to address all the points raised by the comments above.

  3. In the intro section, we talk about output sanitization and not output filtering.  i'm guessing these two concepts are the same and we should use the term "sanitization" for consistency.

  4. If we are going to include "GERONIMO-1474" as a related vulnerability, this is certainly insufficient for a print book.  We'll need to include some description of why this particular vulnerability is a violation of this guideline.

  5. I get lost sometimes in the exact definition of Cross Site Scripting (XSS), but if I'm putting JavaScript in a database as an insider and then viewing the results, I don't think there is anything cross site about that.

  6. I've attempted to address the three comments above.

    • We should make it clear that the exact form of output encoding depends on the context in which the untrusted input is used. For example, in an HTML div, content, value, attribute, script, url or CSS. The method we have given only helps with the content of tags.
    • Programmers should use built in routines that various frameworks provide such as <fn:escapeXML>, <c:out> and so on and only use custom methods when JSTL is not available.
    • See OWASP prevention cheatsheet. I have sample code for all scenarios readily available. In fact that link does not clarify that for a URL, you must first outputescape escape and then url encode a link that must be formed using untrusted data.
    • I would have liked to include some JSTL code and advice here to make it more complete and applicable. Comments?