MalDocA is a library to parse and extract features from Microsoft Office documents. It supports both OLE and OOXML documents.
The project's goal is to analyze potentially malicious documents to improve user safety and security.
Some testdata files contain malicious code! Hence, we use a xor-encoding for some testdata files as a safety measure (key = 0x42). Additionally, they are prefixed by "MALICIOUS_" and postfixed by "_xor_0x42_encoded". In general, be very careful when opening / processing test files!
For convenience, we provide a python script ("testdata_encode.py") to encode / decode those files. The script's output is stored in the same path, having "_xored" as file name appendix. Keep in mind that encoding a file twice decodes it again, i.e. restores the original file.
Example usage: python testdata_encode.py maldoca/service/testdata/c98661bcd5bd2e5df06d3432890e7a2e8d6a3edcb5f89f6aaa2e5c79d4619f3d.docx
- Bazel has some Windows related problems, e.g. maximum path length limitations. Make sure to read the best-practices to avoid them.
- Enable symlink support (how-to) as it is required by Bazel.
git clone --recurse-submodules https://github.com/google/maldoca.git
cd maldoca
Linux: bazel build --config=linux //maldoca/...
Windows: bazel build --config=windows //maldoca/...
Linux: bazel test --config=linux //maldoca/...
Windows: bazel test --config=windows //maldoca/...
We provide a docker file in "docker/Dockerfile". This is the reference platform we use for continuous integration and optionally (arguably recommended) for development as well. Please check the documentation in "docker/Dockerfile" on how to build and use for development.
This is not an official Google product.