ZIP archives store compressed files including their metadata (filesize, date/time, ...). When a contained file is password protected, the compressed data is encrypted, but the metadata is not.
As an example, take this ZIP file that I created. It contains a single file (mimikatz.exe), and that file is protected with a password (infected):
Although the file is password protected, it's the compressed file content that is encrypted (see screenshot: Encrypted +) but the filename, the filsize, filedate, ..., all that metadata is not encrypted. That can be read without knowing the password.
I was involved in a forum discussion, where the OP shared a password protected ZIP archive of a file that the OP considered suspicious. For whatever reason, the OP wanted us to express our opinion about the file without having the opportunity to take a look at the file (the OP would share the password later with us). I could make an educated guess about the filecontent with the crc32 checksum.
Let me explain.
My tool zipdump.py can be used to analyze ZIP files using Python modules zipfile and pyzipper. But it can also parse the binary structure of a ZIP file, and extract all the relevant metadata in its raw form. I do this with option -f l (find list):
First we see a PKZIP file record (named PK0304 by zipdump), then a PKZIP directory entry record (PK0102) and finally, a PKZIP end-of-directory record (PK0506).
All the metadata is in cleartext.
With the filename and the CRC32 checksum, I can make an educated guess about the file content. I download mimikatz.exe from github, and I calculate its crc32 checksum with hash.py:
The crc32 checksum of the file inside the archive and the file that I downloaded, are the same. This is a weak indication that the files are the same.
crc32 is an error detection checksum, it is not a cryptographic hash. It's only 32 bits long, and it is easy to craft a file that produces a desired crc32 checksum. It is certainly not strong evidence.
The OP was surprised that metadata was not encrypted, so I was pretty sure that the crc32 had not been tampered with.
My trick worked because I had a good idea of what file was inside the archive. Wihout that information, it would have been impossible, because there are countless files with that crc32 checksum.
I think that this crc32 code is also used by Gmail to detect malicious files inside password protected ZIP files.
If you need to create archive files where metadata is also encrypted, you need to use other formats, like 7zip for example. Or double-ZIP your files.
Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com