Perl provides a feature called taint mode, which is a simple model for detecting data flow vulnerabilities such as SQL injection. When active, all scalar values are associated with a taint flag, and so they can be considered "tainted" or "untainted." Taint can propagate from variables to other variables in an expression—taint is associated with a value, not a particular variable. The Perl interpreter issues a fatal error if any tainted variable is used in certain operations, such as invoking the system() function. Finally, there are a few ways you can sanitize tainted data, thereby removing the taint.

The details of how taint mode works are extensively documented in the perlsec manpage, which states:

Taint checking is most useful when although you trust yourself not to have written a program to give away the farm, you don't necessarily trust those who end up using it not to try to trick it into doing something bad.

If taint mode detects untainted data being used in a manner it deems insecure, it aborts the program. This is an improvement in security, as it is better for a program to crash than to reveal sensitive information or allow arbitrary code execution. However, many programs, such as web servers, may run in environments where a crash is unacceptable. The taint mode can be configured to emit a warning rather than a fatal error, but this practice is not recommended because it can permit behavior worse than a crash. Consequently, taint mode is best viewed as a testing tool for verifying code before putting it into production use.

Taint mode is an example of a dynamic analysis tool, which provides useful information about a program while it runs. Taint mode has the usual advantages and disadvantages of dynamic analysis tools. It does not produce false positives; it emits errors only when tainted data is used in a manner it considers insecure, and therefore its messages warrant attention and demand a fix. It imposes a minor performance penalty from doing taint checks. Finally, it checks only code that actually runs. If a program is run in taint mode and happens to never execute one particular file, then that file may still contain dataflow vulnerabilities.

Taint mode has a very simple model of what data is tainted, when tainted data becomes untainted, and what operations may not be performed on tainted data. This model is sufficient for some programs but not for others. Taint mode forbids tainted data from being passed to a command interpreter (such as system()) or a file opened for writing (via open() or rename()). However, taint mode does not prevent tainted data from being used in certain other contexts, some of which are forbidden by various CERT rules:

Taint mode also provides a handful of mechanisms to produce untainted data from tainted data. The preferred means of sanitizing tainted data is to use a regex:

my $tainted        =  # initialized
my $regex          =  # data is sanitary if it satisfies this
   $tainted_data   =~ m{($regex)};
my $sanitized_data =  $1;

In this case, the sanitized data may have the same value as the tainted data, but data harvested from a regex match is always considered to be untainted. It is up to the programmer to ensure that the regex will match only sanitary data.

There are other ways to sanitize tainted data. For instance, hash keys cannot be tainted, so using tainted data as the key to a hash will sanitize it. Perl will also not stop tainted data from being sent to a subroutine or method referenced by a variable, as in:




The specific issue of what data is tainted depends on the execution environment. For example, data read from a database may or may not be considered tainted. Perl's DBI module provides an optional TaintOut attribute. If set, then any data retrieved from a database will be considered tainted.

Likewise, the specific set of actions that should not be performed on tainted data depends on the execution environment. For instance, a CGI script will print out data to be displayed on a web page. Such data requires sanitization to prevent various web-based vulnerabilities, but taint mode does not prevent tainted data from being printed.

Consequently, taint mode may be used for certain Perl scripts and is required for some. All scripts with the setuid or setgid bit set run with taint mode. It should be used during testing and quality assurance. It will not detect all potential dataflow vulnerabilities, and it is critical to know when taint mode can be relied on and when it cannot. The following is an example of a vulnerable program that cannot rely on taint mode.

Noncompliant Code Example (CGI)

Taint mode assumes a simple model: all data is either tainted or untainted. Although this assumption is useful for some applications, it is not sufficient for web-based security. Consider this noncompliant code example, from IDS33-PL. Sanitize untrusted data passed across a trust boundary:

use CGI qw(:standard);
print header;
print start_html('A Simple Example'),
  h1('A Simple Example'),
  start_form,                                  # Line A
  "What's your name? ",textfield('name'),      # Line B
if (param()) {
  print "Your name is: ",em(param('name')),    # Line C
print end_html;

This example contains a CGI form that prompts the user for a name and, when given, displays the name on the page. Like all web forms, it also takes a URL argument indicating what link to visit when the user clicks Submit. Lines A, B, and C all involve tainted data being printed to standard output, from which it is used to render a web page.

  • Line A contains the URL to visit. This URL will include all arguments, including the name. The CGI::start_form sanitizes the URL in a suitable manner so that it may be visited and will appear in the Address bar of a web browser.
  • Line B contains the user's name. The CGI::textfield() method escapes text in a manner suitable for displaying in a text field.
  • Line C again contains the user's name but with no sanitization. This permits an XSS vulnerability, as described in IDS33-PL. It is recommended that this text be sanitized using the CGI::escapeHTML() method in order to be safely displayed in a web page.

All three lines provide different contexts for their unsanitized data, so each line requires a different type of sanitization. Applying one sanitization method to the wrong line is likely to leave the data improperly sanitized and subject to a potential injection attack.

Because taint mode does not distinguish between different contexts, it cannot discern that text sanitized for a URL should not be provided to a text field, and vice versa. Therefore, we do not recommend using taint mode for scripts that interact with the web.

Risk Assessment




Remediation Cost











1 Comment

  1. What should go in the Risk Assessment section for this rule?