-
Notifications
You must be signed in to change notification settings - Fork 1
Architecture
- Introduction
- CLI
- Hiding Technique Wrapper
- Hiding Technique
- Filesystem Detector
- Metadata Handling
- Generating Test Images
- Filesystem Info
One of the goals during development of fishy was keeping a modular structure. Separate layers enable the encapsulation of functions. The logical procedure of using fishy is as follows: The CLI (Command Line Interface) evaluates the parameters entered by the user and calls the appropriate hiding technique wrapper which checks the file system type and calls for the appropriate hiding technique. If the hiding technique is calling its write method, it will write a json file to disk containing metadata used to restore the hidden data.
The command line argument parsing is implemented in cli.py while the hiding technique wrapper is located in the root module. They convert input data into streams, cast/read/write the specific hiding technique metadata and call the appropriate methods related to the chosen hiding technique.
File system type detection of a certain image happens through the hiding technique wrapper which calls on the filesystem_detector. The filesystem_detector uses filesystem detection methods implemented in the filesystem modules. The filesystem-specific hiding techniques provide the actual functionality and provide at least read, write and clear methods to read, hide or delete data. Additionally, the hiding techniques use pytsk3 or custom filesystem parsers (which are located in the particular filesystem package) to gather information about about the given filesystem.
The CLI (Command Line Interface) is the user interface for this toolkit. Each hiding technique is accessible through a dedicated subcommand, each of which defines further options. The CLI can read the data a user wants to hide either from a file or from stdin. Similarly, if a user wants to read hidden data, that data can be returned via stdout or written to a file.
The main objective of the CLI is to parse command line arguments and call the appropriate hiding technique wrapper. If data is read from a file, the CLI is also tasked with turning the input into a buffered stream, on which the hiding technique operates.
Each type of hiding technique has its own wrapper that gets called by the CLI and in turn calls the filesystem specific hiding technique implementation based on the filesystem type. The filesystem type is determined by calling the filesystem detector function.
The read and clear methods require the metadata which is recorded during the write operation. Thus, the wrapper is also responsible for writing and reading metadata files and providing hiding technique specific metadata objects for the aforementioned methods. Similarly, if the user wants to read hidden data into a file instead of stdout, the wrapper is also responsible for opening and writing this file.
The hiding technique implementations are the core functionalities of this toolkit. Each technique offers a minimum of three methods - read, write and clear. They operate on streams to guarantee high reusability.
Every hiding technique is called by the wrapper after the toolkit receives a valid input via the CLI. To get information about a filesystem, the hiding techniques use filesystem parsers (either pytsk3 or a custom parser implementation within the appropriate filesystem package).
Should a hiding technique require metadata to restore hidden data, it must implement a hiding technique specific metadata class. This specific class determines and writes all needed metadata during the hiding process. The write method returns this data so the hiding technique wrapper can serialize and pass it to the appropriate read and clear methods.
If a write method fails all previously written data has to be cleared before exiting.
The clear method overwrites all hidden data with zeros leaves the filesystem in a consistent state. Warning: The clear method does not perform erasure of data in terms of any regulatory compliance - neither does it ensure that all possible traces of are removed from the filesystem. Do not rely on this method to securely wipe hidden data from the disk.
Besides read, write and clear it is possible to implement additional methods for hiding techniques.
To get more information about each currently implemented hiding technique, visit the dedicated page.
The filesystem detector is a simple wrapper function to unify calls to the filesystem detection functions which are located within the corresponding filesystem packages.
Most hiding techniques require additional information to restore hidden data. This information will be stored in a .json metadata file, produced during a hiding technique's write method. The class fishy.metadata provides both functions to read and write metadata as well as automatic encryption and decryption of the metadata if a password is provided.
The purpose of this class is to ensure that all metadata objects have a similar data structure. This way the program can detect a wrong hiding technique method to restore hidden data early on.
When implementing a new hiding technique it must have its own metadata class specific to this hiding technique. This way the hiding technique can decide which additional information is needed to restore the data later on. When calling this technique's write method, this information is then given to the wrapper, where it is serialized and stored.
With help from the script create_testfs.sh you can create prepared filesystem images. These already include files which get copied from utils/fs-files/. These filesystems are supposed to be used to develop new hiding techniques and perform unit tests.
The script has a few requirements:
- sudo - to get mount permission while creating images
- mount and umount executables
- mkfs.vfat, mkfs.ntfs, mkfs.ext4
- dd - to create an empty image file
./create_testfs.sh
This will create a set of test images.
The script is capable of handling "branches" to generate multiple images with a different file structure. This is especially useful when writing unit tests that expect a certain file structure on the tested filesystem.
In case you want to use existing test images while running unit tests, create a file called _.create_testfs.conf under utils. Here you can define the variable copyfrom to provide a directory where your existing test images are located. Example:
copyfrom="/my/images/folder"
To build all images that might be necessary for unit testing, run
./create_testfs.sh -t all
These files are currently included in the stable 1 branch:
.
regular file ├── another
parse longfilenames in fat parser ├── areallylongfilenamethatiwanttoreadcorrectly.txt
parse files greater than one cluster ├── long_file.txt
test fail of writes into empty file slack ├── no_free_slack.txt
regular directory ├── onedirectory
test reading files in sub directories │ ├── afileinadirectory.txt
parse long filenames in directories │ ├── areallylongfilenamethatialsowanttoreadassoonaspossible.txt
test parsing files in nested sub dirs │ └── nested_directory
test parsing files in nested sub dirs │ └── royce.txt
test if recursive directory parsing works └── testfile.txt
As of this moment, you still have to generate the ext4 image manually.
dd if=/dev/zero of=file.img bs=4M count=250
mkfs ext4 -F file.img
sudo mkdir -p /tmp/mount_tmp/ && sudo mount -o loop,rw,sync file.img /tmp/mount_tmp
sudo chmod -R ug+rw /tmp/mount_tmp
sudo mv <files> /tmp/mount_tmp/
As of right now, you have manually create an APFS image using a macOS machine. This can be achieved through multiple means, though it might be the most comfortable to use an external tool like AutoDMG. An official guide to create .dmg images can be found here. Once you have acquired a .dmg image file, you need to convert it to a .dd raw image. This can be achieved following these steps:
- Use sleuthkit's mmls command to find the starting point of the container.
- Follow up by using sleuthkit's mmcat command. An example would be:
mmcat apfs_image.dmg 4 > apfs_volume.dd
In this example "apfs_image.dmg" would represent the name of the extracted image, "4" is the starting point found through mmls and "apfs_volume.dd" would be the name of the extracted image.
This chapter is supposed to give a small look at the most important data structures of FAT, NTFS, ext4 and APFS.
Despite its age, the FAT filesystem is a heavily used filesystem, most often found on small sized devices such as USB sticks and SD-Cards. Its simple structure makes it easy to understand and lightweight implementations are possible. During its evolution the original FAT specification was extended to fit the growing disk sizes of hard drives, and resulted in various variants. Nowadays, the most common variants are FAT12, FAT16 and FAT32. Besides some details, which we will cover later, the main difference between these FAT types is the address size, used for addressing clusters in the data area. All three FAT types share a similar structure, consisting of
- reserved sectors, including the bootsector
- file allocation table
- root directory region
- data region
Important units in a FAT filesystem are sectors as the smallest logical unit for the pre data region and clusters as the smallest logical unit in the data region. The size of a cluster is defined by cluster size = sector size * sectors per cluster.
Reserved sectors are always located at the beginning of a FAT filesystem. The most important types are the bootsector, which starts at offset 0 and the FS Information Sector which is used by FAT32.
The bootsector contains important metadata about the filesystem. It consists of a core section, which is the same for every FAT type, and an extended region, which differs across the different types. Among others, the core region of the bootsector includes the sector size, the count of sectors per cluster and the count of root directory entries. The extended region stores the filesystem type value. FAT32 filesystems also store the address of the root directory cluster and the start value of the FS information sector here.
FAT32 includes a Filesystem information sector in the reserved sectors. It stores some additional metadata about the filesystem, which can be used to increase read and write performance. Interesting values are the count of free sectors and the id of the last written cluster.
The File Allocation Table records the status of data clusters. Depending on the type the size varies between 12 to 32 bit per record. Mainly there are four different status values.
- Free cluster
- Bad cluster
- Last cluster
- Next cluster in a cluster chain
The free cluster status is used to mark a cluster as free. This means, that during the write process this cluster can be used to write data into. If data was written into a cluster, the corresponding FAT entry is set to the last cluster value. If the written file is greater than the cluster size, multiple clusters will be allocated. The first cluster then points to the id of the next cluster, creating a chain of used clusters. The last cluster in this chain is terminated with the ‘Last cluster’ value. The bad cluster status indicates a faulty sector in this cluster. Once this status is set, the filesystem will never use this cluster again.
The root directory holds the start directory of the filesysten (“/”). For FAT12 and FAT16 it starts directly behind the file allocation table. The location in FAT32 filesystems is determined by the Root directory address field of the bootsector. It holds a series of directory entries. A directory entry stores information about a file or subdirectory:
- Name
- Extension
- Attributes (subdirectory, hidden, readonly, . . . )
- start cluster
- size
Subdirectory entries use the start cluster field to point to a cluster that then again holds a series of directory entries.
The New Technologies File System (NTFS) was designed by Microsoft and is the standard file system of Microsoft operating systems starting from Windows NT/2000. NTFS is a significantly more complex file system than FAT with many features to enhance reliability, security and scalability. Unfortunately, there is no official published specification for NTFS from Microsoft and low-level details can only be found in unofficial descriptions.
“Everything is a File” is the most important concept in the design of NTFS. Each byte of an NTFS file system belongs to a file and the entire file system is considered a data area. Even system data and meta data, which are usually hidden by other file systems, are allocated to files and could be located anywhere in the volume. Consequently, NTFS file systems do not have a specific layout apart from the first sectors of the volume containing the boot sector and initial program loader.
The Master File Table (MFT) contains an entry for every file and directory stored in a NTFS partition. It contains the necessary metadata such as the file name, -size, and the location of the stored data. MFT entries allocate a fixed size, usually 1024 bytes, but only the first 42 bytes have a defined purpose (MFT Entry Header). The remaining bytes store attributes, which contain the metadata for a file (e.g.: $STANDARD_INFORMATION, $FILE_NAME, $DATA).
NTFS stores its administrative Data in metadata files, which contain central information about the NTFS file system. Their names start with dollar character $ and the first letter is capitalized (except for ’.’). Microsoft reserves the first 16 MFT entries for file system metadata files. The following table contains the standard NTFS file system metadata files (Brian Carrier, p.202):
Entry | File Name | Description |
---|---|---|
0 | $MFT | Entry for MFT itself |
1 | $MFTMirr | Backup of the first entries in MFT |
2 | $LogFile | Journal that records metadata transactions |
3 | $Volume | Volume information (label, identifier, version) |
4 | $AttrDef | Attribute information (identifier values, name, sizes) |
5 | . | Root directory of filesystem |
6 | $Bitmap | Allocation status of each cluster in filesystem |
7 | $Boot | Boot sector & boot code for filesystem |
8 | $BadClus | Clusters that have bad sectors |
9 | $Secure | Information about security and access control for files |
10 | $Upcase | Uppercase version of every Unicode character |
11 | $Extend | Directory that contains files for optional file extensions.Microsoft does not typically place the files in this directory into the reserved MFT entries. |
The fourth extended filesystem is ext3's successor in Linux’s journaling filesystems, published first in 2006 by Andrew Morton. It still supports ext3, but uses 48bit for block numbers instead of 32bit. This results in bigger partitions up to 1 EiB. Furthermore it is now possible to use extents, which unite several contiguous blocks, improving handling of large files and performance. Moreover ext4 introduces better timestamps on a nanosecond basis, checksums for its journal and metadata, online defragmentation, flex groups and other improvements. The standard block size for ext4 is 4096 byte, but 1024 and 2048 are possible, too. These interfere with the ‘superblockslack’ hiding technique shown later. The filesystem itself consists of a bootsector and flex groups, holding block groups.
The superblock contains general information about the filesystem bock counts, sizes, states, versions, timestamps and others. It is located at byte 1024 of the filesystem and uses 1024 byte of its block, creating a superblock-slack (depending on the block size). Redundant copies of the superblock are stored in block groups with numbers 0 and to the power of 3, 5 and 7, unless the sparse_super feature flag is not set, which will store these redundant copies in each block group. Entries are amongst other information:
- total block and inode count
- blocks per block group
- unused block count
- first unused inode
- reserved GDT block count
The GDT is located behind the superblock in the filesystem and gets stored accordingly. It holds group descriptor entries for each block group, containing:
- address of block bitmap
- address of inode bitmap
- address of inode table
- unused block, inode and directory count
- flags
- checksums
An inode stores metadata of a file, such as:
- timestamps
- user/group permissions
- data references
The size varies, default is 256 Byte. An inode table holds a list of all inodes of its block group.
The extents replace ext3s indirect addressing and reduce data fragmentation. An inode can store 4 extents, further extents can be stored in a tree structure, each mapping up to 128MiB of contiguous blocks.
These blocks are reserved for expansion of the filesystem, which creates larger group descriptor tables. Therefore it is usable for data hiding as long as the filesystem does not get expanded.
The journal guarantees a successful write operation. After a committed data transaction is written to the disk, it is saved to a 128MiB big section on the disk, the journal. From there it gets written to its final destination and can be restored in case of a power outage or data corruption during the write operation.
APFS is Apple's new proprietary file system, introduced in 2017 with macOS version 10.13. Despite allowing data migration, it bares minimal resemblance to its predecessor HFS+. More so it is comparable to other modern file systems such as ZFS, BTRFS and newer iterations of XFS.
APFS can be described as a double layered file system. The outer layer is the Container layer. A Container is equivalent to one implementation of APFS. It acts as a managing instance of the file system, supervising higher level functions and information like block allocation (using the Space Manager structure) and the checkpoint functionality. The most crucial information is stored in the Container Superblock. There are multiple instances of this structure present with a copy of the newest version always in block 0 of the Container.
The inner layer is the Volume layer and usually consists of multiple Volumes. Volumes are somewhat comparable to traditional partitions, as they manage user data and operating systems. What separates them from traditional partitions is their lack of fixed size, as they all share the free space made available by the container. Like the container, the Volume has its most crucial information stored in a Volume Superblock.
All file system structures (the only exception being the allocation bitmap) are stored as objects and assigned 32 byte headers containing general information about the object like its type, subtype and version. More importantly, the first 8 byte contain the calculated checksum of the object. The checksum is calculated by using a for 64 bit optimized version of Fletcher's checksum, with the entire object (minus the first 8 byte) serving as the functions input.
All implemented hiding techniques recalculate and overwrite the checksum whenever an object is modified by a write or clear function. The used calculation of the checksum comes from and is under copyright by Jonas Plum's tool afro that can be found by clicking here and is licensed under GPL3.0.
The Container can generally be split into 3 distinct parts. The first part contains the Container metadata, which consists of the Checkpoint Areas managing past states of the Container. A major part of the Container metadata is the Container Superblock. The Container Superblock contains a signature of 4 magic bytes (NXSB) as well as important elementary information such as:
- block count
- block size
- Volume Superblock IDs
- feature compatibilities
They also manage information needed for further traversal of the file system such as pointers to the object map and information about size and location of the Checkpoint Areas.
The second part contains the Volume metadata. The most important structures contained are the Volume Superblocks. The Volume Superblocks begin with their own magic byte signature (APSB) and manage information about their respective volumes, such as:
- Reserved Blocks
- Block Quota
- Allocated Blocks
- Feature compatibilities
- Numbers of files and directories
- Volume name
Like the Container Superblock, the Volume Superblock also manages information needed for further traversal, such as pointers to the object map as well as information about the extent tree.
The third and last part is the Volume content. It is usually the largest area and contains all non-filesystem data such as user data and (potentially multiple) operating system(s). Despite potential (optional) lower and upper restrictions the Volumes filling this section have variable sizes.
Nodes have multiple important tasks within the file system. Which specific tasks they fulfill is dependent on their type as well as on their Entries. Generally Nodes are part of a B-Tree and can therefore have one of two (major) types, Leaf or Root. While Root Nodes are generally only used to structure the B-Tree by pointing to other Nodes and managing general B-Tree information, Leaf Nodes contain the actual data in form of Entries. Entries are split into Keys and Values.
The Keys determine the type of Entry and can contain additional information. There are 14 potential types used in APFS. Following is a list of all types:
- 0 - This type is officially called "Any", but is used in very specific instances, as it indicates an Object ID to Block address mapping.
- 1 - This type manages Snapshot Metadata.
- 2 - This type indicated a physical Extent record.
- 3 - This type indicates an Inode entry. Inodes contain metadata of files. A unique attribute of Inodes (and Directory Records) is the addition of Extended Fields, which are implemented in multiples of 8 byte.
- 4 - This type manages the Extended Attributes of objects.
- 5 - Type 5 is called Sibling Link - Here, an Inode is mapped to corresponding hard links.
- 6 - This type manages information needed to read a Data Stream.
- 7 - Type 7 manages the Crypto State.
- 8 - This type indicates a File Extent.
- 9 - This type represents a Directory Record. Like Inodes, they have an additional Extended Fields part after the common Value section.
- 10 - This entry type manages Directory Stats like its name or the number of elements inside the directory.
- 11 - Here, a Snapshot Name is saved.
- 12 - This type is called Sibling Map. Here, a hard link is mapped to corresponding inodes (the opposite of type 5).
- 15 - This type indicates an invalid entry.
The Value part of the entry usually contains all or most of the Entries information. Its structure depends solely on the previously mentioned type.
The Checkpoints are split into 2 different areas, the Checkpoint Descriptor Area and the Checkpoint Data Area.
In the Checkpoint Descriptor Area, previous (and the current) Container Superblocks and blocks of Checkpoint Metadata can be found. Container Superblocks have unique object maps, but not every checkpoint has a unique set of Volume Superblocks.
The Checkpoint Data Area manages structures such as the Space Manager and Reaper as well as data that was in-memory when the Checkpoint was written to disk. Both areas are implemented as ring buffers and have a fixed size based on the Container's size.