Skip to content

Commit

Permalink
refactor: per review
Browse files Browse the repository at this point in the history
  • Loading branch information
discord9 committed Sep 2, 2024
1 parent bf48b6b commit 5f728cb
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 134 deletions.
7 changes: 4 additions & 3 deletions docs/user-guide/continuous-aggregation/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ SELECT
min(size) as min_size,
max(size) as max_size,
avg(size) as avg_size,
sum(case when `size` > 550::double then 1::double else 0::double end) as high_size_count,
sum(case when `size` > 550 then 1 else 0 end) as high_size_count,
date_bin(INTERVAL '1 minutes', access_time) as time_window,
FROM ngx_access_log
GROUP BY
Expand Down Expand Up @@ -133,14 +133,15 @@ Here is the explanation of the columns in the `ngx_statistics` table:
- `time_window`: The time window of the aggregation.
- `update_at`: The time when the aggregation is updated.

NOTE: if you don't manually create sink table, the Flow engine will automatically create it for you based on the query(i.e. using columns in `GROUP BY` as primary tags and time index), however, sometimes you may want to create the sink table manually to have more control over the schema.
<!-- TODO(discord9): improve auto create table then add back this feature explain, i.e. for now everything in group by is put to primary key, and time index is always a placeholder -->
<!-- if you don't manually create sink table, the Flow engine will automatically create it for you based on the query(i.e. using columns in `GROUP BY` as primary tags and time index), however, sometimes you may want to create the sink table manually to have more control over the schema. -->

## Next Steps

Congratulations you already have a preliminary understanding of the continuous aggregation feature.
Please refer to the following sections to learn more:

- [Usecase Example](./usecase-example.md) provides more examples of how to use continuous aggregation in real-time analytics, monitoring, and dashboard.
- [Manage Flows](./manage-flow.md) describes how to create, update, and delete a flow. Each of your continuous aggregation query is a flow.
- [Write a Query](./query.md) describes how to write a continuous aggregation query.
- [Define Time Window](./define-time-window.md) describes how to define the time window for the continuous aggregation. Time window is an important attribute of your continuous aggregation query. It defines the time interval for the aggregation.
- [Expression](./expression.md) is a reference of available expressions in the continuous aggregation query.
105 changes: 0 additions & 105 deletions docs/user-guide/continuous-aggregation/query.md

This file was deleted.

31 changes: 6 additions & 25 deletions docs/user-guide/continuous-aggregation/usecase-example.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Usecase Example
# Usecase Examples
Following are three major usecase examples for continuous aggregation:

1. **Real-time Analytics**: A real-time analytics platform that continuously aggregates data from a stream of events, delivering immediate insights while optionally downsampling the data to a lower resolution. For instance, this system can compile data from a high-frequency stream of log events (e.g., occurring every millisecond) to provide up-to-the-minute insights such as the number of requests per minute, average response times, and error rates per minute.
Expand All @@ -9,30 +9,11 @@ Following are three major usecase examples for continuous aggregation:

In all these usecases, the continuous aggregation system continuously aggregates data from a stream of events and provides real-time insights and alerts based on the aggregated data. The system can also downsample the data to a lower resolution to reduce the amount of data stored and processed. This allows the system to provide real-time insights and alerts while keeping the data storage and processing costs low.

## Real-time Analytics Example
## Real-time analytics example

Consider a usecase where you have a stream of log events from a web server that you want to analyze in real-time. The log events contain information such as the status of the request, the size of the response, the client IP address, and the timestamp of the request. You want to continuously aggregate this data to provide real-time analytics on the number of requests per minute, the min/max/average packet size, and the error rate per minute. Then the query for continuous aggregation would be:
See [Overview](overview.md) for an example of real-time analytics. Which is to calculate the total number of logs, the minimum size, the maximum size, the average size, and the number of packets with the size greater than 550 for each status code in a 1-minute fixed window for access logs.

```sql
CREATE FLOW ngx_aggregation
SINK TO ngx_statistics
AS
SELECT
status,
count(client) AS total_logs,
sum(case when status >= 400 then 1 end) as error_logs,
min(size) as min_size,
max(size) as max_size,
avg(size) as avg_size
FROM ngx_access_log
GROUP BY
status,
date_bin(INTERVAL '1 minutes', access_time, '2024-01-01 00:00:00'::Timestamp);
```

The above query continuously aggregates the data from the `ngx_access_log` table into the `ngx_statistics` table. It calculates the total number of logs, the number of error logs, the min/max/average packet size, and the error rate per minute. The `date_bin` function is used to group the data into one-minute intervals. The `ngx_statistics` table will be continuously updated with the aggregated data, providing real-time insights into the web server's performance.

## Real-time Monitoring Example
## Real-time monitoring example

Consider a usecase where you have a stream of sensor events from a network of temperature sensors that you want to monitor in real-time. The sensor events contain information such as the sensor ID, the temperature reading, the timestamp of the reading, and the location of the sensor. You want to continuously aggregate this data to provide real-time alerts when the temperature exceeds a certain threshold. Then the query for continuous aggregation would be:

Expand Down Expand Up @@ -61,7 +42,7 @@ HAVING max_temp > 100;
The above query continuously aggregates the data from the `temp_sensor_data` table into the `temp_alerts` table. It calculates the maximum temperature reading for each sensor and location and filters out the data where the maximum temperature exceeds 100 degrees. The `temp_alerts` table will be continuously updated with the aggregated data, providing real-time alerts (Which is a new row in the `temp_alerts` table) when the temperature exceeds the threshold.


## Real-time Dashboard
## Real-time dashboard

Consider a usecase in which you need a bar graph that show the distribution of packet sizes for each status code to monitor the health of the system. The query for continuous aggregation would be:

Expand All @@ -83,6 +64,6 @@ GROUP BY

The above query puts the data from the `ngx_access_log` table into the `ngx_distribution` table. It calculates the total number of logs for each status code and packet size bucket (in this case, since `trunc`'s second argument is -1, meaning a bucket size of 10) for each time window. The `date_bin` function is used to group the data into one-minute intervals. The `ngx_distribution` table will be continuously updated with the aggregated data, providing real-time insights into the distribution of packet sizes for each status code.

# Conclusion
## Conclusion

Continuous aggregation is a powerful tool for real-time analytics, monitoring, and dashboarding. It allows you to continuously aggregate data from a stream of events and provide real-time insights and alerts based on the aggregated data. By downsampling the data to a lower resolution, you can reduce the amount of data stored and processed, making it easier to provide real-time insights and alerts while keeping the data storage and processing costs low. Continuous aggregation is a key component of any real-time data processing system and can be used in a wide range of usecases to provide real-time insights and alerts based on streaming data.
2 changes: 1 addition & 1 deletion sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ const sidebars: SidebarsConfig = {
label: 'Continuous Aggregation',
items: [
'user-guide/continuous-aggregation/overview',
'user-guide/continuous-aggregation/usecase-example',
'user-guide/continuous-aggregation/manage-flow',
'user-guide/continuous-aggregation/query',
'user-guide/continuous-aggregation/define-time-window',
'user-guide/continuous-aggregation/expression',
],
Expand Down

0 comments on commit 5f728cb

Please sign in to comment.