Perl provides a feature called 'taint mode' which is a simple model for detecting data flow vulnerabilities such as SQL injection. When active, all scalar values are associated with a taint flag, and so they can be considered 'tainted' or 'untainted'. Taint can propagate from variables to other variables in an expression...taint is associated with a value, not a particular variable. The Perl interpreter will issue a fatal error if any tainted variable is used in certain operations, such as invoking the system() function. Finally there are a few ways you can sanitize tainted data, thereby removing the taint.

The details of how taint mode works are extensively documented in the perlsec manpage, which states:

Taint checking is most useful when although you trust yourself not to have written a program to give away the farm, you don't necessarily trust those who end up using it not to try to trick it into doing something bad.

If taint mode detects untainted data being used in a manner it deems insecure, it aborts the program. This is an improvement in security, as it is better for a program to crash than to reveal sensitive info, or allow arbitrary code execution. However, many programs, such as Web servers, may run in environments where a crash is unacceptable. The taint mode can be configured to emit a warning rather than a fatal error, but this is not recommended, as it can permit behavior worse than a crash. Consequently, taint mode is best viewed as a testing tool, for verifying code before putting it into production use.

Taint mode is an example of a dynamic analysis tool, which is a tool that provides useful information about a program while it runs. Taint mode has the usual advantages and disadvantages of dynamic analysis tools. It does not produce false positives; it only emits errors when tainted data is used in a manner it considers insecure, and therefore its messages warrant attention and demand a fix. It imposes a minor performance penalty from doing taint checks. Finally, it only checks code that actually runs. If a program is run in taint mode, and happens to never execute one particular file, than that file may still contain dataflow vulnerabilities.

Taint mode has a very simple model of what data is tainted, when tainted data becomes untainted, and what operations may not be performed on tainted data. This model is sufficient for some programs, but not for others. Taint mode forbids tainted data from being passed to a command interpreter (such as system()), or a file opened for writing (via open() or rename()). However, taint mode does not prevent tainted data from being used in certain other contexts, some of which are forbidden by various CERT rules:

Data

Rule

Filenames that are open only for reading

FIO01-PL. Do not operate on files that can be modified by untrusted users

Numbers that are used as an array index

IDS32-PL. Validate any integer that is used as an array index

Strings printed to standard output

IDS33-PL. Sanitize untrusted data passed across a trust boundary

Taint mode also provides a handful of mechanisms to produce untainted data from tainted data. The preferred means of sanitizing tainted data is to use a regex:

my $tainted        =  # initialized
my $regex          =  # data is sanitary if it satisfies this
   $tainted_data   =~ m{($regex)};
my $sanitized_data =  $1;

In this case, the sanitized data may have the same value as the tainted data, but data harvested from a regex match is always considered to be untainted. It is up to the programmer to ensure that the regex will only match sanitary data.

There are other ways to sanitize tainted data. For instance, hash keys cannot be tainted, so using tainted data as the key to a hash will sanitize it. Perl will also not stop tainted data from being sent to a subroutine or method referenced by a variable, as in:

$obj->$method(@args);

or

$foo->(@args);

The specific issue of what data is tainted depends on the execution environment. For example, data read from a database may or may not be considered tainted. Perl's DBI module provides an optional TaintOut attribute. If set, then any data retrieved from a database will be considered tainted.

Likewise, the specific set of actions that should not be performed on tainted data depends on the execution environment. For instance, a CGI script will print out data to be displayed on a web page. Such data requires sanitization to prevent various web-based vulnerabilities, but taint mode does not prevent tainted data from being printed.

Consequently, taint mode may be used for certain Perl scripts, and is required for some. All scripts with the setuid or setgid bit set run with taint mode. It should be used during testing and quality assurance. It will not detect all potential dataflow vulnerabilities, and it is critical to know when taint mode can be relied upon, and when it cannot. The following is an example of a vulnerable program that cannot rely on taint mode.

Noncompliant Code Example (CGI)

Taint mode assumes a simple model...all data is either tainted or untainted. While this is useful for some applications, it is not sufficient for web-based security. Consider this noncompliant code example, from IDS33-PL. Sanitize untrusted data passed across a trust boundary:

use CGI qw(:standard);
 
print header;
print start_html('A Simple Example'),
  h1('A Simple Example'),
  start_form,                                  # Line A
  "What's your name? ",textfield('name'),      # Line B
  submit,
  end_form,
  hr;
 
if (param()) {
  print "Your name is: ",em(param('name')),    # Line C
    hr;
}
print end_html;

This example contains a CGI form that prompts the user for a name, and when given, displays the name on the page. Like all web forms, it also takes a URL argument indicating what link to visit when the user clicks Submit. Lines A, B and C all involve tainted data being printed to standard output, from which it is used to render a webpage.

All three lines provide different contexts for their unsanitized data, and so each line requires a different type of sanitization. Applying one sanitization method to the wrong line is likely to leave the data improperly sanitizied, and subject to a potential injection attack.

Because taint mode does not distinguish between different contexts, it cannot discern that text sanitized for a URL should not be provided to a text field and vice versa. Therefore, we do not recommend using taint mode for scripts that interact with the web.

Bibliography

Birznieks, Gunther, "CGI/Perl Taint Mode FAQ Version 1.0", June 3, 1998
[CPAN] Bunce, Tim, DBI
[CPAN] Stosberg, Mark, CGI
Lester, Andy. "Perl's taint mode to the rescue", O'Reilly OULamp.com. Friday November 17, 2006 1:51PM
Schwartz, Randal L, "Taint checking made simple", Unix Review Column 33 (Aug 2000), Stonehenge, the Perl Review
[Wall 2011] perlsec
StackOverflow "Is Perl's taint mode useful?, Feb 9 2010 10:56