Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for new logs schema #348

Merged
merged 25 commits into from
Aug 23, 2024
Merged

Conversation

nityanandagohain
Copy link
Member

@nityanandagohain nityanandagohain commented Jul 10, 2024

New logs schema

fixes SigNoz/signoz#5555

---- testing /running

  1. Make sure macro is enabled for running migrations
    <macros>
        <shard>01</shard>
        <replica>example01-01-1</replica>
    </macros>
  1. Run migrations using go run cmd/signozschemamigrator/migrate.go --dsn http://localhost:9000 --replication true

exporter/clickhouselogsexporter/exporter.go Outdated Show resolved Hide resolved
exporter/clickhouselogsexporter/exporter.go Outdated Show resolved Hide resolved
exporter/clickhouselogsexporter/exporter.go Outdated Show resolved Hide resolved
exporter/clickhouselogsexporter/exporter.go Outdated Show resolved Hide resolved
exporter/clickhouselogsexporter/logsv2/fingerprint.go Outdated Show resolved Hide resolved
exporter/clickhouselogsexporter/logsv2/fingerprint.go Outdated Show resolved Hide resolved
exporter/clickhouselogsexporter/logsv2/fingerprint.go Outdated Show resolved Hide resolved
@srikanthccv
Copy link
Member

Can we export to v2 parallelly? also, need to update the instrumentation to record the time for each table.

Screenshot 2024-08-14 at 8 35 47 PM

@srikanthccv
Copy link
Member

Two more things I wanted to point out.

  1. I noticed the table engine on customers instances is not replicated for just the v2 table (AFAIK at least two customers have it deployed).

  2. I was playing around and looked at the queries. All of them use the GLOBAL IN for resource fingerprints and other Limit queries. I want to throw around the idea that if you can devise a fingerprinting and sharding mechanism that ensures the data is distributed evenly and the same fingerprint goes to the same shard, you can get rid of the GLOBAL IN which is not optimal compared to local IN. In its current shape I don't think data distribution will be even since it's entirely based on resource attributes and one set of resources can send disproportionately high data compared to others so we probably don't have an alternative but I want to bring it up anyway if you have ideas.

https://clickhouse.com/docs/en/sql-reference/operators/in

When using GLOBAL IN / GLOBAL JOIN, first all the subqueries are run for GLOBAL IN / GLOBAL JOIN, and the results are collected in temporary tables. Then the temporary tables are sent to each remote server, where the queries are run using this temporary data.

This will work correctly and optimally if you are prepared for this case and have spread data across the cluster servers such that the data for a single UserID resides entirely on a single server. In this case, all the necessary data will be available locally on each server. Otherwise, the result will be inaccurate. We refer to this variation of the query as “local IN”.

@nityanandagohain
Copy link
Member Author

nityanandagohain commented Aug 21, 2024

Major changes made

  • logs_v2_resource_bucket changed to logs_v2_resource
  • span_attributes is changed from array to string
  • Write to both logs, logs_v2, logs_v2_resource table in parallel
  • Separate duration metrics for all the above three tables

raj-k-singh
raj-k-singh previously approved these changes Aug 21, 2024
Copy link
Contributor

@raj-k-singh raj-k-singh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

srikanthccv
srikanthccv previously approved these changes Aug 22, 2024
Copy link
Member

@srikanthccv srikanthccv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking point

exporter/clickhouselogsexporter/exporter.go Outdated Show resolved Hide resolved
@nityanandagohain nityanandagohain merged commit 6d6ba57 into main Aug 23, 2024
5 checks passed
@nityanandagohain nityanandagohain deleted the feat/logs_new_schema branch August 23, 2024 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EPIC]: Logs new schema
4 participants