Be careful when extracting entries from
java.util.zip.ZipInputStream. Two particular issues to avoid are entry file names that canonicalize to a path outside of the target directory of the extraction and entries that cause consumption of excessive system resources. In the former case, an attacker can write arbitrary data from the zip file into any directories accessible to the user. In the latter case, denial of service can occur when resource usage is disproportionately large in comparison to the input data that causes the resource usage. The nature of the zip algorithm permits the existence of zip bombs in which a small file, such as ZIPs, GIFs, and gzip-encoded HTTP content, consumes excessive resources when uncompressed because of extreme compression.
The zip algorithm can produce very large compression ratios [Mahmoud 2002]. For example, a file consisting of alternating lines of a characters and b characters can achieve a compression ratio of more than 200 to 1. Even higher compression ratios can be easily obtained using input data that is targeted to the compression algorithm, or using more input data (that is untargeted), or using other compression methods.
Any entry targeting a file not within the directory intended by the client program (after file name canonicalization, as per IDS02-J. Canonicalize path names before validating them), must not be extracted or must be extracted to a safe location. Any entry in a zip file whose uncompressed file size is beyond a certain limit must not be uncompressed. The actual limit is dependent on the capabilities of the platform.
Noncompliant Code Example
This noncompliant code fails to validate the name of the file that is being unzipped. It passes the name directly to the constructor of
FileOutputStream. It also fails to check the resource consumption of the file that is being unzipped. It permits the operation to run to completion or until local resources are exhausted.
In this compliant solution, the code validates the name of each entry before extracting the entry. If the name is invalid, the entire extraction is aborted. However, a compliant solution could also skip only that entry and continue the extraction process, or it could even extract the entry to some safe location.
Furthermore, the code inside the
while loop tracks the uncompressed file size of each entry in a zip archive while extracting the entry. It throws an exception if the entry being extracted is too large—about 100MB in this case. We do not use the
ZipEntry.getSize() method because the value it reports is not reliable.
CWE-409, Improper handling of highly compressed data (data amplification)
Guideline 2-5, Check that inputs do not cause excessive resource consumption