IDS04-J. Safely extract files from ZipInputStream

Be careful when extracting entries from java.util.zip.ZipInputStream. Two particular issues to avoid are entry file names that canonicalize to a path outside of the target directory of the extraction and entries that cause consumption of excessive system resources. In the former case, an attacker can write arbitrary data from the zip file into any directories accessible to the user. In the latter case, denial of service can occur when resource usage is disproportionately large in comparison to the input data that causes the resource usage. The nature of the zip algorithm permits the existence of zip bombs in which a small file, such as ZIPs, GIFs, and gzip-encoded HTTP content, consumes excessive resources when uncompressed because of extreme compression.

The zip algorithm can produce very large compression ratios [Mahmoud 2002]. For example, a file consisting of alternating lines of a characters and b characters can achieve a compression ratio of more than 200 to 1. Even higher compression ratios can be easily obtained using input data that is targeted to the compression algorithm, or using more input data (that is untargeted), or using other compression methods.

Any entry targeting a file not within the directory intended by the client program (after file name canonicalization, as per IDS02-J. Canonicalize path names before validating them), must not be extracted or must be extracted to a safe location. Any entry in a zip file whose uncompressed file size is beyond a certain limit must not be uncompressed. The actual limit is dependent on the capabilities of the platform.

Noncompliant Code Example

This noncompliant code fails to validate the name of the file that is being unzipped. It passes the name directly to the constructor of FileOutputStream. It also fails to check the resource consumption of the file that is being unzipped. It permits the operation to run to completion or until local resources are exhausted.

static final int BUFFER = 512;
// ...

public final void unzip(String filename) throws java.io.IOException{
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  try {
    while ((entry = zis.getNextEntry()) != null) {
      System.out.println("Extracting: " + entry);
      int count;
      byte data[] = new byte[BUFFER];
      // Write the files to the disk
      FileOutputStream fos = new FileOutputStream(entry.getName());
      BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
      while ((count = zis.read(data, 0, BUFFER)) != -1) {
        dest.write(data, 0, count);
      }
      dest.flush();
      dest.close();
      zis.closeEntry();
    }
  } finally {
    zis.close();
  }
}

Compliant Solution

In this compliant solution, the code validates the name of each entry before extracting the entry. If the name is invalid, the entire extraction is aborted. However, a compliant solution could also skip only that entry and continue the extraction process, or it could even extract the entry to some safe location.

Furthermore, the code inside the while loop tracks the uncompressed file size of each entry in a zip archive while extracting the entry. It throws an exception if the entry being extracted is too large—about 100MB in this case. We do not use the ZipEntry.getSize() method because the value it reports is not reliable. Finally, the code also counts the number of file entries in the archive, and throws an exception if there are more than 1024 entries.

static final int BUFFER = 512;
static final int TOOBIG = 0x6400000; // max size of unzipped data, 100MB
static final int TOOMANY = 1024;     // max number of files
// ...

private String validateFilename(String filename, String intendedDir) {
  File f = new File(filename);
  String canonicalPath = f.getCanonicalPath(); 

  File iD = new File(intendedDir);
  String canonicalID = iD.getCanonicalPath();
  
  if (canonicalPath.startsWith(canonicalID)) {
    return canonicalPath;
  } else {
    throw new IllegalStateException("File is outside extraction target directory.");
  }
}

public final void unzip(String filename) throws java.io.IOException{
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  int entries = 0;
  int total = 0;
  try {
    while ((entry = zis.getNextEntry()) != null) {
      System.out.println("Extracting: " + entry);
      int count;
      byte data[] = new byte[BUFFER];
      // Write the files to the disk, but ensure that the filename is valid,
      // and that the file is not insanely big
      String name = validateFilename(entry.getName(), ".");
      FileOutputStream fos = new FileOutputStream(name);
      BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
      while (total <= TOOBIG && (count = zis.read(data, 0, BUFFER)) != -1) {
        dest.write(data, 0, count);
        total += count;
      }
      dest.flush();
      dest.close();
      zis.closeEntry();
      entries++;
      if (entries > TOOMANY) {
        throw new IllegalStateException("Too many files to unzip.");
      }
      if (total > TOOBIG) {
        throw new IllegalStateException("File being unzipped is too big.");
      }
    }
  } finally {
    zis.close();
  }
}

Risk Assessment

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
IDS04-J	low	probable	high	P2	L3

Related Guidelines

MITRE CWE	CWE-409, Improper handling of highly compressed data (data amplification)
Secure Coding Guidelines for the Java Programming Language, Version 3.0	Guideline 2-5, Check that inputs do not cause excessive resource consumption

Android Implementation Details

Although not directly a violation of this rule, the Android Master Key vulnerability (insecure use of ZipEntry) is related to this rule. Another attack vector found by a Chinese researcher is also related to this rule.

Bibliography

[Mahmoud 2002]

Compressing and Decompressing Data Using Java APIs

IDS05-J. Use a subset of ASCII for file and path names

Space shortcuts

Page tree