feat(nodestore): A file system-based node storage backend, with added support for S3 and GCS storage #76250

klboke · 2024-08-15T03:14:36Z

A file system-based node storage backend, with added support for S3 and GCS storage.

My starting point is as follows:

Using PostgreSQL storage by default can lead to rapid storage expansion due to PostgreSQL's own table cleaning mechanisms, and it cannot perform automatic cleanup. For example: https://develop.sentry.dev/self-hosted/troubleshooting/#postgres

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

…ort for S3 and GCS storage.

PMExtra · 2024-08-16T09:00:45Z

Great job! But I worry a little about the performance while reading multiple objects as a batch. (I didn't dig it, just an intuitive guess)

PMExtra · 2024-08-16T09:17:19Z

I'm not sure how frequently the node storage is accessed or how large the stored files are.

I don't think object storage services (such as S3 or GCS) are designed for frequent reads and writes of very small files.

This may lead to performance issues or higher costs.

klboke · 2024-08-16T09:57:51Z

@PMExtra

I'm not sure how frequently the node storage is accessed or how large the stored files are.

Based on our recorded data, each node data is approximately between 15~76KB in size, with the majority being around 15KB

This may lead to performance issues or higher costs.

I investigated the places where nodestore is written to and read from and found that there are only two scenarios that trigger these actions:

1. Writing to nodestore occurs in the Sentry worker when processing Kafka event messages.
1. Reading from nodestore happens in sentry-web when someone views detailed information.

Writing is essentially offline data stream processing, so a slower speed is acceptable. The QPS (queries per second) during reading is very low, so it is not a significant issue either. Additionally, OSS storage costs are definitely the lowest, which is why many projects in recent years (like Paimon and OpenObserve) support the separation of storage (such as S3 and other object storage) and computation.

PMExtra · 2024-08-16T10:10:39Z

@klboke Awesome. I am concerned that reading a large number of tiny files from object storage services might incur higher costs than from table store services. However, according to your analysis, the files are not that small, and the access frequency is not that high. Therefore, my concern may be unnecessary.

PMExtra · 2024-08-16T10:27:16Z

@klboke

Another point worth noting is that Sentry stores a large amount of highly repetitive ASCII text, and table store services can achieve high compression rates for this, thereby reducing storage costs.

However, object storage services usually do not support transparent compression. While each piece of data can be compressed separately, the additional computational overhead will be incurred, and the compression rate may be difficult to match compared to table store services.

Anyway, having one more option is always good. I’m just pointing out some potential drawbacks for discussion, not to undermine your work. Thank you for your contribution.

klboke · 2024-08-16T12:01:21Z

@PMExtra

However, object storage services usually do not support transparent compression. While each piece of data can be compressed separately, the additional computational overhead will be incurred, and the compression rate may be difficult to match compared to table store services.

Thank you very much for your reminder and suggestions. If we switch to S3, considering our 3TB nodestore (which actually is more than most people have), cost won't be an issue at all. I would prefer storing plain JSON files directly, as it would be more convenient for observing and debugging some issues.

aldy505 · 2024-08-29T03:33:16Z

src/sentry/nodestore/filesystem/backend.py

+        if not settings.DEBUG and options_store.get("filestore.backend") == "filesystem":
+            raise ValueError("Local fileSystem should only be used in development!")


This would break existing self-hosted users that don't use (or don't have access to) S3 compatible services. This should just log a warning instead of throwing a ValueError.

I don't think so. The default NodeStore is sentry.nodestore.django.DjangoNodeStorage rather than this. So it won't break any existing users. If you want to configure this FileSystemNodeStorage in non-debug environment, you must configure FileStore also.

getsantry · 2024-09-19T07:00:32Z

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry · 2024-10-12T07:00:36Z

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsantry · 2024-11-04T08:00:11Z

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you add the label WIP, I will leave it alone unless WIP is removed ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

feat(nodestore): file system-based nodestore backend, with added supp…

e22aea8

…ort for S3 and GCS storage.

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Aug 15, 2024

Merge branch 'master' into kl_node

dc0cc78

This was referenced Aug 16, 2024

Cleaning nodestore_node table getsentry/self-hosted#1808

Open

Sentry Nodestore Storage Issue PMExtra/sentry-tablestore#1

Open

aldy505 reviewed Aug 29, 2024

View reviewed changes

getsantry bot added Stale and removed Stale labels Sep 19, 2024

getsantry bot added Stale and removed Stale labels Oct 12, 2024

getsantry bot added Stale and removed Stale labels Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nodestore): A file system-based node storage backend, with added support for S3 and GCS storage #76250

feat(nodestore): A file system-based node storage backend, with added support for S3 and GCS storage #76250

klboke commented Aug 15, 2024

PMExtra commented Aug 16, 2024

PMExtra commented Aug 16, 2024

klboke commented Aug 16, 2024

PMExtra commented Aug 16, 2024 •

edited

Loading

PMExtra commented Aug 16, 2024

klboke commented Aug 16, 2024

aldy505 Aug 29, 2024

PMExtra Aug 29, 2024

getsantry bot commented Sep 19, 2024

getsantry bot commented Oct 12, 2024

getsantry bot commented Nov 4, 2024

		if not settings.DEBUG and options_store.get("filestore.backend") == "filesystem":
		raise ValueError("Local fileSystem should only be used in development!")

feat(nodestore): A file system-based node storage backend, with added support for S3 and GCS storage #76250

Are you sure you want to change the base?

feat(nodestore): A file system-based node storage backend, with added support for S3 and GCS storage #76250

Conversation

klboke commented Aug 15, 2024

Legal Boilerplate

PMExtra commented Aug 16, 2024

PMExtra commented Aug 16, 2024

klboke commented Aug 16, 2024

PMExtra commented Aug 16, 2024 • edited Loading

PMExtra commented Aug 16, 2024

klboke commented Aug 16, 2024

aldy505 Aug 29, 2024

Choose a reason for hiding this comment

PMExtra Aug 29, 2024

Choose a reason for hiding this comment

getsantry bot commented Sep 19, 2024

getsantry bot commented Oct 12, 2024

getsantry bot commented Nov 4, 2024

PMExtra commented Aug 16, 2024 •

edited

Loading