Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add origin_referrer_url, origin_url and zone_identifier to the file attribute #1430

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

AsuNa-jp
Copy link

@AsuNa-jp AsuNa-jp commented Sep 25, 2024

Changes

This PR adds the following attributes.

  • file.origin_referrer_url
  • file.origin_url
  • file.zone_identifier

(Thanks @trisch-me for all the advice you gave me in creating this PR!)

Background: What are these fields for? (Updated 2024/Oct/21)

When downloading files from the internet (or network) using a web browser (such as Chrome or Edge) or a certain application, information about where the file came from is generally added to the file. This is a general behavior that can occur on all operating systems, and its primary use is to enhance security by providing context about the file’s source, allowing the system to assess potential risks and enforce appropriate security measures.

The details are explained below.

Windows

In Windows, it is known as the Mark of the Web(ref1, ref2), and is added to the file's NTFS alternate data stream.

For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to the file's Alternate Data Stream (ADS) as following.

image
  • Inside image17.webp:Zone.Identifier:$DATA
image

This PR adds a field to store the URL of the file's origin, which is saved in the NTFS alternate data stream (ADS).

  • ZoneId is inteded to be stored in the zone_identifier field.
  • ReferrerUrl is intended to be stored in the origin_referrer_url field
  • HostUrl is inteded to be stored in the origin_url field.

Note - In the case of Windows, MotW can be used not only with NTFS but also with ReFS (8.1/2012 R2 or later)

Linux

In Linux, some applications may store the file origin metadata in extended attributes (xattr) or Gnome virtual filesystem(gvfs) to track the source of a file.

For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to gvfs.

example of a file downloaded by using firefox
image

Additionally, by using Curl or Wget, the referer URL(user.xdg.referrer.url) and origin URL(user.xdg.origin.url) can be attached to the file's extended attributes. (Google Chrome used to add user.xdg.referrer.url and user.xdg.origin.url as well but it currently turned off this feature.)

example of a file downloaded by using curl
image

  • user.xdg.referrer.url is intended to be stored in the origin_referrer_url field
  • user.xdg.origin.url is inteded to be stored in the origin_url field.

Note - As written in this web page, all major Linux file systems including Ext4, Btrfs, ZFS, and XFS support extended attributes.

MacOS

(Since I don't have a Mac device, my investigation will be based on the internet.)

In MacOS, some applications may store the file origin metadata in extended attributes to track the source of a file as follows. It seems that both the referrer and origin URL are being saved.

image

The image source is as follows:
https://stackoverflow.com/questions/70444996/obtaining-metadata-where-from-of-a-file-on-mac

The same thing is mentioned on another website as well. (https://exiftool.org/forum/index.php?topic=14991.0)

Usually if we save a file from browser, the file will have 2 strings in the 'Where from' attribute:
image

Background: the use cases. (Updated 2024/Oct/21)

  • (A). For example, in Elastic Security (Elastic Defend), a file open event may be generated when a file is opened. By including the file's origin information, such as the Origin URL and Referrer URL, the system can assess whether the file might be malware downloaded from a malicious website based on those URLs.

  • (B). Another example would be adding file origin information (such as the Origin URL and Referrer URL) to the file creation event when a file is downloaded from the internet. This would make it possible to detect if the file was downloaded from a website on a blocklist and take actions such as deleting the file.

Merge requirement checklist

Copy link

linux-foundation-easycla bot commented Sep 25, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

model/file/registry.yaml Outdated Show resolved Hide resolved
model/file/registry.yaml Outdated Show resolved Hide resolved
model/file/registry.yaml Outdated Show resolved Hide resolved
@AsuNa-jp
Copy link
Author

Hi @trisch-me
Thank you for the prompt feedback on this PR. All of your points are absolutely valid. I have updated the PR based on your suggestions. (160b7ee, 37c9710)
If there is anything else, please feel free to let me know!

Copy link
Contributor

@trisch-me trisch-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -25,11 +25,13 @@ Describes file attributes.
| `file.mode` | string | Mode of the file in octal representation. | `0640` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `file.modified` | string | Time when the file content was last modified, in ISO 8601 format. | `2021-01-01T12:00:00Z` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `file.name` | string | Name of the file including the extension, without the directory. | `example.png` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `file.origin_referrer_url` | string | The URL of the webpage that linked to the file. [7] | `http://example.com/article1.html` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever think on adding more attributes on "original"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @joaopgrassi

The following are the values that may be included in the Mark of the Web.

Typically, the most common values are ZoneId, ReferrerUrl, and HostUrl. In this PR, I added ReferrerUrl(origin_referrer_url) and HostUrl(origin_url) for the file, but if it seems necessary to include ZoneId as well, I can add it.

.PARAMETER ZoneId
Specifies the ZoneId value (default: 3):
0: Local machine (URLZONE_LOCAL_MACHINE)
1: Local intranet (URLZONE_INTRANET)
2: Trusted sites (URLZONE_TRUSTED)
3: Internet (URLZONE_INTERNET)
4: Untrusted sites (URLZONE_UNTRUSTED)
This parameter is always set unless AppZoneId is specified.

.PARAMETER ReferrerUrl
Specifies the string for ReferrerUrl value of MOTW (default: undefined). Google Chrome, Microsoft Edge (Blink-based), and Mozilla Firefox set this value.

.PARAMETER HostUrl
Specifies the string for the HostUrl value of MOTW (default = undefined). Google Chrome, Microsoft Edge (Blink-based), and Mozilla Firefox set this value.

.PARAMETER HostIpAddress
Specifies the string for HostIpAddress of MOTW (default: undefined). Legacy Microsoft Edge (EdgeHTML-based) sets this value.

.PARAMETER LastWriterPackageFamilyName
Specifies the string for LastWriterPackageFamilyName of MOTW (default: undefined). Legacy Microsoft Edge (EdgeHTML-based) sets this value.

.PARAMETER AppZoneId
Specifies AppZoneId of MOTW (default: undefined). AppDefinedZoneId and ZoneId cannot be used if this parameter is specified. Old versions of SmartScreen set "AppZoneId=4" and remove ZoneId for an executable file when execution permission is given by clicking the "Run anyway" button. Recent versions of SmartScreen seem to just remove Zone.Identifier alternate data stream instead of setting "AppZoneId=4".

.PARAMETER AppDefinedZoneId
Specifies AppDefinedZoneId of MOTW (default: undefined). The purpose of AppDefinedZoneId is unknown and it is only mentioned in the "Zone.Identifier alternate data stream format" section of the document of IZoneIdentifier2 interface (https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/platform-apis/mt243886(v=vs.85)#zoneidentifier-alternate-data-stream-format).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the term origin used in the field is intended to mean origin rather than "original."

Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be alternative solutions such as:

  • reusing existing url attributes on the same telemetry item that describes a file
  • defining specific security events which list all applicable metadata as event fields (not attributes)

The current solution seems to be very specific to a certain use-case (I download a file and capture it's metadata which will not be accessible later). These attributes will not be available or applicable to any other case.
It won't make sense to reuse those attributes in other conventions.

So please consider alternative solutions and please provide a use-case for these attributes.

@AsuNa-jp AsuNa-jp changed the title add file.origin_referrer_url and file.origin_url attribute add origin_referrer_url, origin_url and zone_identifier to file attribute Oct 10, 2024
@AsuNa-jp AsuNa-jp changed the title add origin_referrer_url, origin_url and zone_identifier to file attribute add origin_referrer_url, origin_url and zone_identifier to the file attribute Oct 10, 2024
@AsuNa-jp
Copy link
Author

AsuNa-jp commented Oct 10, 2024

Thank you all for your comments. Based on feedback from various sources, I have added file.zone_identifier to this PR.

However, since there also have been concerns raised about whether the fields we plan to add are even necessary, we are considering having @trisch-me (and @magermark ) lead a more in-depth discussion during the upcoming Otel Semantic Convention meeting.

@AsuNa-jp
Copy link
Author

Hi @trisch-me @lmolkova @joaopgrassi
Based on last week’s discussion at the Symantec convention meeting, I have added additional explanations to this PR. If you need further explanations before approving this PR, please don't hesitate to let me know.

@trisch-me
Copy link
Contributor

@AsuNa-jp could you please fix conflicts? thanks

@jsuereth
Copy link
Contributor

jsuereth commented Nov 5, 2024

My only remaining concern with this PR (and based on @lmolkova's comments) is whether the things you're defining should be event fields vs. attributes.

The two use cases you mention both involve an event. Are these attributes you're defining things we'd want to include in Spans and Metrics?

I think it'd be reasonable to define a file.open Event that has these fields within it, but I'm not positive how you'd possible have that "turn into a metric" or otherwise interact with spans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs More Approval
Development

Successfully merging this pull request may close these issues.

6 participants