Inspired by VXCage and VirusTotal, MalwareDB is a malware knowledge management system which handles the bookkeeping regarding malware/goodware samples: hashes, origination, similarity, file types, and more. Its intention is to help malware/cybersecurity researchers, forensic investigators, and others who have a need to handle malware, or other files of potentially unknown origin. This is very much a work in progress and alpha-quality project at present.
- Store malware, goodware, or unknown file samples.
- Categorize samples by:
- Labels, a hierarchical taxonomy (not yet implemented)
- Origin, the source of the sample.
- Permissions by group, access to file based on users' group membership
- Fetch samples by hash
- Search based on file similarity (requires the Postgres plugins mentioned below)
- Parse the files for features which may be useful for machine learning models
- Works on any modern operating system
- Allow encrypting the files on disk so the server does not cause problems with endpoint security or anti-virus software
- Supports the CaRT format using the default key.
- Postgres database server
- Rust to compile
- libmagic which is the
file
command. Installlibmagic-dev
on Linux, orbrew install libmagic
on macOS with Homebrew.- On Windows:
cargo install cargo-vcpkg; vcpkg install libmagic; vcpkg integrate install
- The
MAGIC
environment variable may be used to specify the paths for the libmagic database.
- On Windows:
- Similarity hash extensions for Postgres:
- Alternatively, use docker which provides a container with the Postgres extensions already installed (though they still have to be activated, see the readme).
This project is in active development and not yet stable, nor are all the features implemented.
Install from source. Check out the repository and build (recommended), or build from crates.io:
cargo install malwaredb-client
cargo install malwaredb --features=admin,admin-gui,sqlite,vt
(activates all the features, requires some external dependencies)
Server Features (which are all opt-in):
admin
: command-line administrative functionalityadmin-gui
: Slint-powered GUI, tested and works on macOS, Linux, Windows, might work elsewhere?sqlite
: Allow the use of SQLite as a database backend. Should only be used for testing and evaluation, as it lacks the similarity optimisations we have for Postgres.vt
: Allow (but still be enabled) the VirusTotal functionality (cache AV data for contained samples)
- Planned features:
- Web interface as a separate application
- GUI applications
- Support for Confidential Computing
- Initially for Enarx: Website, Code
- Learn more at the Confidential Computing Consortium website.
- Encrypting samples, if stored, so the anti-virus on host system doesn't trigger alerts, or allow for accidental infection.
- Train ML models based on features of the malicious & benign files:
- Potential features:
- File storage backends for HDFS, S3, others?
- Something missing? Get in touch: file an issue or start a discussion!
- Compile from source, ideally with
--features=admin,sqlite
. - Create your configuration file. Compile with the
sqlite
feature to use SQLite. This is more for testing and evaluation than using in a real environment. See the example file in the root of the repository for an example.
- If the storage section is empty (it's optional), then MalwareDB will only store the metadata about the files, and will not store the samples. That means getting the original file will not be available.
- Place the config file in
/etc/mdb_server/mdb_config.toml
on Linux, or/usr/local/etc/mdb_server/mdb_config.toml
on FreeBSD for automatic config file detection. Otherwise, run withmdb_server run load /path/to/file
, ormdb_server run config
to specify arguments on the command line. Run with--help
to see details.
- Since you compiled with the
admin
feature above, you can runmdb_server admin --help
to see administrative options. Admin options require-c /path/to/config.toml
to prevent making accidental changes. Note: using theadmin
command interactions with the database directly, so the server does not need to be running. - List users with:
mdb_server admin -c /path/to/config.toml list users
. There is a default admin user, but no password is set. So let's set one. - Reset Admin's password:
mdb_server admin -c /path/to/config.toml reset-password --uname admin
. You'll be prompted for the password and it won't echo. The admin user doesn't do anything special at the moment, but that will change. - File are organized by sources, and groups have access to sources. So groups and sources must be added and linked to be able to add files.
- Create a source, look at the command line options:
mdb_server admin -c /path/to/config.toml create source --help
- Create a group, look at the command line options:
mdb_server admin -c /path/to/config.toml create group --help
- Add the group to the source, look at the command line options:
mdb_server admin -c /path/to/config.toml add-group-to-source --help
- Add the user to the group, look at the command line options:
mdb_server admin -c /path/to/config.toml add-user-to-group --help
- Now, use the client to login with
mdb_client
whilemdb_server
is running:mdb_client login http://localhost:8080 admin
, replacing the URL with the actual IP and port you chose in the server configuration file. - Test that the client works with
mdb_client whoami
, it should show the user information and available groups and sources.
- Files may be uploaded using the client:
mdb_client submit-samples -s SOURCE_ID /path/to/files_or_dirs
. Paths may be to files or directories, and more than one path may be specified. All items will be uploaded to the same source (specified by the ID). If the file is a Zip, it will be decompressed in memory and each file submitted individually as long as it's not a known document type (like MS Office .docx, .xlsx, etc.). - Files may also be uploaded using the admin command from the server:
mdb_server admin -c /path/to/config.toml -s SOURCE_ID -u USER_ID /path/to/files_or_dirs
. With the server admin function, a user ID must also be provided. Otherwise, this works the same way as the client, directories and files may be provided, they will be associated with the same source, and Zip files will be decompressed in memory and submitted individually if not a known MS Office format.
- Using the client, a sample may be retrieved using it's hash. Hash types are detected by length, and supported hashes are: MD5, SHA1, SHA256, SHA384, and SHA512.
mdb_client retrieve-sample SPECIFY_HASH_HERE
. One hash per request, and it will be downloaded if it exists, and if the user has access to the group and source to which the sample is linked.
- Using the client, similarity hashes are calculated and submitted to the server. The sample is not sent to the server! Just hashes.
mdb_client find-similar /path/to/file.bin
. The same restriction with downloading applies: the user must have access to the group and source to which a potential similar file is linked. The output will be the hashes of the similar files, and by what means (similarity algorithm) the result is similar.
mdb_client server-info
displays some statics about the server, including version numbers, database type, and total amount of files.mdb_client server-types
displays a list and magic numbers of supported file types.
Some overall goals and design:
- MalwareDB shall be easy to use.
- MalwareDB shall be a place to store your data and use a simple database schema so that other applications may interact with the data directly.
- MalwareDB shall collect and enrich malicious and benign files so that some features may be used for machine learning models.
- MalwareDB should provide reusable components which may benefit other projects, even if not directly related.