-
CDM Session Streams:
-
session stream ID: A monotonically increasing session ID will be added to every
TCCDM_DATUM
record. A new session will begin at everyHOST_RECORD
, and a session will end when aHOST_RECORD
with a new session number occurs. In almost all cases, a new session will begin when Theia's target system is rebooted. -
Sessions and UUID Uniqueness: A UUID can appear in multiple sessions, and if this is the case, it means Theia is providing a version of this object based on its state in the current session. If a topic contains two records with the same UUID, it means these two records represent the same object, but from different sessions. Records with larger sequence stream IDs obviously represent a more recent description of the object's state. For example, if a file's permissions have been modified, the file record with the highest session number will represent the most-current state of this record.
uuid filename session 0100d00f-3d07-4600-0000-00003f56565b /home/joey/ripe-prov-tests/Linux/bash_commands.sh 0 0100d00f-3d07-4600-0000-00003f56565b /home/joey/ripe-prov-tests/Linux/bash_commands.sh 1 -
per-session def-use provenance data : Theia will define all records in a session before they are used, which is necessary for intra-session analysis. Aditionally, for inter-session analysis, the provenance graph will have all nodes defined before they are used.
-
-
Kafka Topics:
- single-host multi-session Kafka topics: Each unique host will have a unique kafka topic. All sessions from the same host will be posted to the same topic. If two records with the same UUID are found in the same topic, it means Theia observed this record was used in two unique sessions, and Theia is providing an updated record that provides a snapshot of the records state during its first use in a session.
-
Standarization:
- Partially Constructed Records: In some cases, the Linux kernel does not provide Theia with every possible field in a CDM record. This is due to the kernel relying on lazy data allocation for performance reasons. In this situation, we designed Theia to fallback on partial record construction, which allows Theia to report the meaningful fields that are actually extractable. For engagement 4, all meaningful fields extracted will have the appropriate values, while any remaining fields will maintain the standard default value. This will allow TA2 teams to focus on the most meaingingful fields in a record, and avoiding unnecessary parsing exceptions.
-
Netflow Record:
-
Addresses and Ports: localAddress and remoteAddress fields will no longer contain "NA" values. Instead, when Theia uses partial event reconstruction, the default standardized None value will be used. This allows us to still provide meanginful information, without adding unique values into the CDM. Additionally, netflow records' localPort and remotePort fields will be set to their respective default values when the localAddress or remoteAddress values were not provided by the kernel respectively (Issue 115).
-
Unix Domain Sockets (in progress): Netflow Records will no longer be used to model Unix Domain sockets, which are used for intra-host inter-process communication. Instead, the new IPCObject Record type (Issue 112) will be used instead. Please see
Unix Domain Sockets
section below.
-
-
File Record:
- The
inode
anddevice
value in the property map are represented by hex strings (Record['properties']['inode'] = "0x1234"
). - In rare cases, Theia reported negative inode and device identifiers during engagement 3. This issue has been fixed.
- The ids (uid,gid) for a file are always reported in a file object's
property map. The format is
Record['properties']['ids'] = "<uid>/<gid"
. The ids represent the ids for a file based on their values when the file was first logged by Theia.
- The
-
Memory Record:
- The flags used during a
mmap
syscall will be provided in the property map in human-readable form"flags": "MAP_PRIVATE|MAP_ANONYMOUS|MAP_GROWSDOWN"
. - The protection properties of a memory object will now be provided in the
property map in human-readable form:
"prot": "PROT_READ|PROT_EXEC"
.
- The flags used during a
-
Events:
- During engagement 3,
EVENT_READ_SOCKET_PARAMS
andEVENT_WRITE_SOCKET_PARAMS
were used to model read and writes from sockets respectively. For engagement 4, we will model read and writes from sockets usingEVENT_READ
andEVENT_WRITE
instead. When a read or write event from a socket is reported, the event'spredicateObject1
field will be set to the netflow object or IPC object's UUID. EVENT_OPEN
will now provide the flags arguments in human-readable form in the property mapflags ":"O_WRONLY|O_CREAT|O_TRUNC"
.- New Events Supported:
EVENT_CORRELATE
,EVENT_EXIT
- During engagement 3,
-
UI Events:
- We provide sample CDM data that include UI events in
cdm-19/theia-ui.bin.gz
. - Current support for providing UI information in CDM data is best effort, and focus soley on mouse click (no keyboard support). It has been tested on both Firefox and Nautilus (Linux's file manager).
- UI information will be provide in properties map of the receive event record which corresponds to the receiving of the X11 button release event.
- Currently we provide the following information of the UI element that is clicked (using the corresponding keys in the properties map):
- app.name: name of the application which contains the UI clicked
- name: descriptive name of the clicked UI, for links in browser, that's the anchor text, for buttons on web form, that's usually the words on the button. Other example include the name of the file/folder icon clicked in Nautilus
- role: the role of the clicked UI (according to Linux accessibility standards). Examples include links, push buttons, icons
- url: this applies to links in a browser only, and it's the target URL of the clicked link
- window: this is the name of the window that contains the clicked UI, hopefully can provide some more context for the click
- src_url: this also applies a browser only, and it's the URL displayed in the address bar when a link is clicked (i.e. the URL of the page in which the click happens)
- time: this is for internal use, signifies the time (in ms) when the UI event is RECEIVED (but not processed), which helps us sync up events
- We provide sample CDM data that include UI events in
-
Misc:
TCCDM_DATUM
records will now explicitly state the record's type.
Info | Provided |
---|---|
uuid | V |
type | PRINCIPAL_LOCAL |
userId | V |
groupIDs | V |
properties | uid, gid |
Info | Provided |
---|---|
uuid | V |
type | PROCESS |
cid | V |
parentSubject | V |
localPrincipal | V |
startTimeStamp | V |
cmdLine | V |
properties | path, uid, euid, suid, fsuid, gid, egid, sgid, fsgid |
- Note that Linux kernel does not clearly distinguish threads from processes.
- THEIA reports the pid of a task_struct as cid, not tgid.
- Theia creates new subject records when a clone system call occurs. From the perspective of the kernel, the new subject will have the cmdLine of its parent. The new process will then possibly run an exec family system call which replaces the parent process's image with the new image, which in turns changes the cmdLine. In many cases, the new child process will not run an exec syscall and the cmdLine found in the subject record will be accurate. However, if it does uses an exec system call, then TA2 teams should use the cmdLine property provided in an EVENT_EXECUTE record
Info | Provided |
---|---|
uuid | V |
properties | filename, dev, inode, uid, gid |
type | FILE_OBJECT_FILE |
localPrincipal | V |
- inode and dev (device) are provided as hex strings.
Info | Provided |
---|---|
uuid | V |
localAddress | V |
localPort | V |
remoteAddress | V |
remotePort | V |
- Remote connection (Internet connection)
- Established connection: All localAddress, localPort, remoteAddress, remotePort are filled with values.
- Non-established connection (a socket has been created and bound to a pair of IP and port, but not yet connected to a remote client or server): localAddress and localPort have values, but remoteAddress and remotePort do not.
- localAddress == 0.0.0.0 would mean no address has been assigned to this socket yet (https://tools.ietf.org/html/rfc1700)
Info | Provided |
---|---|
uuid | V |
type | V |
properties | type, path |
- The path property will always be base64 encoded.
- Unix Domain Sockets: Theia uses the new IPC Record to model Unix domain sockets,
which are used for communication between two processes on the same host, and they do not cross
the host boundary. Additionally, there are three flavors of domain sockets: pathname, abstract,
and unnamed. Theia models all three using the IPCRecord, and uses the type field to specify the difference.
- For domain sockets of type pathname, the socket is bound to a file, and Theia provides the path of this file in the property map.
- abstract domain sockets leverage a unique string for identification, and Theia will provided this unique string in the path field.
- For unnamed sockets, we do not provide the path field, and it should be ignored if this is the case.
- Netlink Sockets: Netlink sockets are used for communication between
the kernel and a user process. Theia uses IPCObjects to model communication
through netlink sockets. Unfortunately, NETLINK is not a type that provided
in by the
IPCRecordType
enum. To identify when a netlink socket is used, we provide an additional key in the property maptype
, which will always be set to NETLINK.
Info | Provided |
---|---|
uuid | V |
memoryAddress | V |
properties | flags, prot |
size | V |
- In some cases, Theia cannot report the flags or prot values. If this is the case, the keys will not be placed in the property map.
parameters | value |
---|---|
queryId | uuid |
type | BACKWARD (will be supported), FORWARD (partially supported), POINT_TO_POINT |
sourceId | uuid |
sinkId | uuid |
startTimestamp | start time of suspicious events to filter out old events |
endTimestamp | end time of suspicious events to filter out future events |
parameters | value |
---|---|
queryId | uuid |
tagIds | an array of tag uuids |
- THEIA's query system supports instruction-level dynamic information flow tracking (DIFT) by replaying the execution of certain processes.
- Currently, we need to explicitly specify source and sink UUIDs to perform DIFT.
- If tagIds is null, it implies that there was no information flow between source and sink UUIDs.
parameters | value |
---|---|
tagId | uuid |
flowObject | uuid |
subject | uuid |
systemCall | syscall name |
programPoint | X |
prevTagId | X |
opcode | opcode |
tagIds | uuids |
itag | X |
ctag | X |
properties | X |
- Not available
- Currently, THEIA represents pipe as an event
- Not available
- It is for Windows, so THEIA has no plan to support it
- Not available
- Not available
- Not available
- Currently, THEIA generates all events in order such that TimeMarker would be not needed.
- THEIA is an event (syscall)-based system such that it may not be able to provide detailed information of certain subjects and objects when they were created before THEIA starts to monitor the system execution.
syscall | Event type | Predicate Objs | Predicate Obj2 | size | Properties |
---|---|---|---|---|---|
clone | CLONE | child process | |||
execve | EXECUTE | file | cmdLine | ||
exit | EXIT | X | |||
setuid | CHANGE_PRINCIPAL | newuid, rc | |||
mmap | MMAP | memory, file | file | length | flags, prot |
munmap | OTHER | memory | length | ||
mremap | CORRELATE | new memory object | old memory object | ||
mprotect | MPROTECT | memory | address, length, prot | ||
open | OPEN | file | flags | ||
unlink | UNLINK | file | |||
read | READ | file or netflow, ipc | bytesRead | ||
write | WRITE | file or netflow, ipc | bytesWritten | ||
pipe | CREATE_OBJECT | pipe object | |||
accept | ACCEPT | netflow, ipc | rc | ||
connect | CONNECT | netflow, ipc | rc | ||
send | SENDTO | netflow, ipc | |||
sendto | SENDTO | netflow, ipc | |||
sendmsg | SENDMSG | netflow, ipc | |||
recv | RECVFROM | netflow, ipc | |||
recvfrom | RECVFROM | netflow, ipc | |||
recvmsg | RECVMSG | netflow, ipc | |||
ioctl | FCNTL | file | command | ||
mount | MOUNT | devname, dirname, type, flags, rc | |||
shmat | SHM | file,memory | shmid, shmaddr, shmflg, rc, raddr | ||
chown | MODIFY_FILE_ATTRIBUTES | file | uid, gid | ||
fchown | MODIFY_FILE_ATTRIBUTES | file | uid, gid | ||
fchownat | MODIFY_FILE_ATTRIBUTES | file | uid, gid | ||
lchown | MODIFY_FILE_ATTRIBUTES | file | uid, gid | ||
chmod | MODIFY_FILE_ATTRIBUTES | file | mode | ||
fchmod | MODIFY_FILE_ATTRIBUTES | file | mode | ||
fchmodat | MODIFY_FILE_ATTRIBUTES | file | mode |
- We provide uuid, type, threadId, sequenceId, subject, timestamp, and name (syscall) of events.
- We do not provide predicateObjectPath, predicateObject2Path, parameters, and programPoint. **