Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructures & Improvements for Milk #8

Draft
wants to merge 47 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
b7dd0c4
feat(milk): properly name the initialize methods and do not require s…
ShindouMihou Oct 19, 2023
486de90
feat(milk): more consistent file names and simpler interfaces
ShindouMihou Oct 19, 2023
0cf9691
feat(milk): add trust proxy support
ShindouMihou Oct 19, 2023
52fa3dc
feat(milk): improve log messages
ShindouMihou Oct 19, 2023
cf7bdf0
feat(milk): separate types from the kafka client
ShindouMihou Oct 19, 2023
d8df498
feat(milk): add more logging to raid management kafka client and simp…
ShindouMihou Oct 19, 2023
3980fc7
feat(milk): inline cache creation
ShindouMihou Oct 19, 2023
7e68899
feat(milk): add support for JSON content type.
ShindouMihou Oct 19, 2023
7bb7a41
feat(milk): minimize date util and fix potential index out of bounds
ShindouMihou Oct 19, 2023
d960687
feat(milk): minimize utils such as string and numbers
ShindouMihou Oct 19, 2023
455099f
feat(milk): fix formatting for user ids
ShindouMihou Oct 19, 2023
6aab24b
feat(milk): removed non-existent route in logging
ShindouMihou Oct 19, 2023
c317c2a
feat(milk): disable request logging
ShindouMihou Oct 19, 2023
02cce1a
feat(milk): support both `.json` path or `Accept` header
ShindouMihou Oct 29, 2023
9c12178
feat(milk): minor readability changes
ShindouMihou Oct 29, 2023
7fb8941
feat(milk): use new `/raid` route over `/antispam`
ShindouMihou Oct 30, 2023
7d649d9
feat(milk): hide `internal_raid_id` from json output
ShindouMihou Oct 30, 2023
dde88d8
feat(milk): consistency with camelCase for non-auto-generated fields
ShindouMihou Oct 30, 2023
31b45d4
feat(milk): maintain camel-case consistency with `.json` output
ShindouMihou Oct 30, 2023
14d9b7d
feat(milk): use string instead of `bigint` for public json
ShindouMihou Oct 30, 2023
0a883a6
feat(milk): add kafka and openapi documentation
ShindouMihou Oct 30, 2023
99821b1
feat(milk): use `async` for `retriable`
ShindouMihou Oct 31, 2023
de3e460
feat(milk): simplify the code for `run` (formerly `retry`)
ShindouMihou Oct 31, 2023
6294bfe
feat(milk): separate raid conclusion from `batch-insert` in `RaidMana…
ShindouMihou Oct 31, 2023
0f89341
feat(milk): document `conclude-raid` key
ShindouMihou Oct 31, 2023
6e01207
feat(milk): changes to general error handling
ShindouMihou Oct 31, 2023
8708363
feat(milk): fix somehow `infinity` cache
ShindouMihou Oct 31, 2023
6514b34
feat(milk): remove `/raid` empty route and some minor simplifications
ShindouMihou Oct 31, 2023
8cabb9e
feat(milk): move constants to dedicated directory
ShindouMihou Oct 31, 2023
c9772bf
feat(milk): fix cache misses on json route
ShindouMihou Oct 31, 2023
e5db2dc
feat(milk): add cache headers
ShindouMihou Oct 31, 2023
9b03a2d
feat(milk): document cache headers
ShindouMihou Oct 31, 2023
8423333
feat(milk): separate `/raid/:id` and `/raid/:id.json` into their own …
ShindouMihou Nov 1, 2023
53d8201
feat(milk): separate database queries into their own files
ShindouMihou Nov 1, 2023
ab9aac9
feat(milk): fix handling of dates in Kafka client
ShindouMihou Nov 1, 2023
9de2f1b
feat(milk): fix prisma schema
ShindouMihou Nov 1, 2023
af3473e
feat(milk): update prisma to latest version
ShindouMihou Nov 1, 2023
1d4d2ae
feat(milk): use relationships, rename fields and update caching logic
ShindouMihou Nov 1, 2023
085b6cc
feat(milk): fix improper renaming of fields and fix missing `public_id`
ShindouMihou Nov 1, 2023
dea97f2
feat(milk): fix formatting of text route and send `Content-Type` headers
ShindouMihou Nov 1, 2023
729fccb
feat(milk): properly format numbers in text format
ShindouMihou Nov 1, 2023
b000f0b
feat(milk): resource-efficiency related changes, commit details for m…
ShindouMihou Nov 2, 2023
b76ae78
feat(milk): different wording for beyond 2,000 accounts warning
ShindouMihou Nov 19, 2023
7bf581a
feat(milk): init kafka clients before starting
ShindouMihou Nov 20, 2023
056559e
feat(milk): use epoch millis over date string when passing from kafka
ShindouMihou Nov 25, 2023
a879814
feat(milk): fix foreign key not available issue
ShindouMihou Dec 1, 2023
1ea5800
feat(milk): add `create_raid` key and do not query for raid during `b…
ShindouMihou Dec 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion milk/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,5 @@ KAFKA_HOST=
#
# IMPORTANT: To configure the server port for Docker Swarm, PLEASE LOOK INTO THE DOCKER COMPOSE FILE!
# ----------------------
# SERVER_PORT=7732
# SERVER_PORT=7732
# TRUST_PROXY=true
155 changes: 155 additions & 0 deletions milk/docs/[beemo-devs]/kafka.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Developer Documentations for Kafka client

This documentation is primarily for developers of Beemo. The Kafka client isn't something
accessible to third-parties, therefore, more than likely, this is of no-use for third-parties.

### Key Points
- [`Client Specifications`](#client-specifications) talks about the different parts of the client.
- [`overview`](#overview) summarizes some key points of the client.
- [`keys`](#keys)
- [`create-raid`](#create-raid) used to create a raid.
- [`batch-insert-raid-users`](#batch-insert-raid-users) used to insert one or more bots detected; creates the Raid if it doesn't exist.
- [`conclude-raid`](#conclude-raid) used to conclude an existing raid; if no date provided, uses current time.
- [`schemas`](#schemas)
- [`RaidManagementData`](#raid-management-data) is the primary type transported between clients.
- [`RaidManagementRequest`](#raid-management-request) is used by a requesting client to add more raid users,
start a raid or conclude a raid. Primarily created clients such as Tea.
- [`RaidManagementResponse`](#raid-management-response) is used by a responding client after processing a request.
This is primarily used by clients such as Milk.
- [`RaidManagementUser`](#raid-management-user) is used to hold information about a bot detected.

## Client Specifications

In this section, the different specifications of the Kafka client will be discussed and understood to
provide some understanding over how the Kafka client of Milk processes requests.

### Overview
- Topic: `raid-management`
- Keys:
- `batch-insert-raid-users`
- Transport Type:
- [`RaidManagementData`](#raid-management-data)

## Keys

### Batch Insert Raid Users

```yaml
key: batch-insert-raid-users
```

This is a specific key, or endpoint, in the Kafka client where clients can insert
bots detected, start a raid or conclude a raid. It is expected that this creates a new raid
when the `raidId` provided does not exist already. In addition, if the raid hasn't been concluded
but a `concludedAt` property is provided then it will conclude the raid, if the raid has been
concluded before, but a newer date has been provided then it will conclude the raid.

This endpoint expects to receive a [`RaidManagementData`](#raid-management-data) with the `request` property
following the [`RaidManagementRequest`](#raid-management-request) schema.

After processing the request, this endpoint should respond with a similar [`RaidManagementData`](#raid-management-data) but
with the `response` property following the [`RaidManagementResponse`](#raid-management-response) schema.

### Conclude Raid

```yaml
key: conclude-raid
```

This is a specific key, or endpoint, in the Kafka client where clients can declare an existing raid as concluded.
It is not needed for the `concludedAt` property to be provided as it will use the current time if not provided.
Although you cannot modify the `concludedAt` of an existing raid, if a raid is already concluded then it will skip.

This endpoint expects to receive a [`RaidManagementData`](#raid-management-data) with the `request` property
following the [`RaidManagementRequest`](#raid-management-request) schema. Unlike [`batch-insert-raid-users`](#batch-insert-raid-users),
this doesn't expect the `users` property to not be empty, even `concludedAt` can be of a zero value or even
null as long as the `raidId` and `guildIdString` are not null.

After processing the request, this endpoint should respond with a similar [`RaidManagementData`](#raid-management-data) but
with the `response` property following the [`RaidManagementResponse`](#raid-management-response) schema.

### Create Raid

```yaml
key: create-raid
```

This is a specific key, or endpoint, in the Kafka client where clients can create a new raid in the database. This should be
done at the start before the users are added, and should be awaited otherwise it will lead to a foreign key issue.

This endpoint expects to receive a [`RaidManagementData`](#raid-management-data) with the `request` property
following the [`RaidManagementRequest`](#raid-management-request) schema. Unlike [`batch-insert-raid-users`](#batch-insert-raid-users),
this doesn't expect the `users` property to not be empty as long as the `raidId` and `guildIdString` are not null.

After processing the request, this endpoint should respond with a similar [`RaidManagementData`](#raid-management-data) but
with the `response` property following the [`RaidManagementResponse`](#raid-management-response) schema.

## Schemas

### Raid Management Data

```json
{
"request": "nullable(RaidManagementRequest)",
"response": "nullable(RaidManagementResponse)"
}
```

Used by the clients to transport either a request or a response without the need to perform additional identification.

- `request` is a nullable property containing the request details, used by the requesting client. This should be
guaranteed from a request, otherwise there is a bug with that client.
- `response` is a nullable property containing the response details, used by the responding client. This should be
guaranteed from a response of a client, otherwise there is a bug with that client.


### Raid Management Request
```json
{
"raidId": "string",
"guildIdString": "string",
"users": "array(RaidManagementUser)",
"concludedAt": "nullable(date as string)"
}
```

Used by a requesting client to start a raid, insert bots detected or conclude a raid.

- `raidId` refers to the internal raid id of the raid. Clients shouldn't use the external raid id as that is created
and used only by Milk itself.
- `guildIdString` refers to the id of the guild that this raid belonged to. It must be of `string` type due to
the nature of JavaScript not inherently supporting `int64` or `long` type.
- `users` refers to the bots detected in the raid, this can be an empty array when simply concluding a raid.
- `concludedAt` refers to the date when the raid should be declared as concluded.

### Raid Management Response
```json
{
"publicId": "nullable(string)",
"acknowledged": true
}
```

Used by a responding client to notify that the request was processed, and a publicly accessible id is now
available to be shared in the log channels.

- `externalId` refers to the publicly accessible id that can be used in `/raid/:id`

### Raid Management User
```json
{
"idString": "string",
"name": "string",
"avatarHash": "nullable(string)",
"createdAt": "date as string",
"joinedAt": "date as string"
}
```

Contains information about a bot that was detected in a raid.

- `idString` refers to the id of the bot's account.
- `name` refers to the name of the bot during detection.
- `avatarHash` refers to the hash of the bot's avatar during detection.
- `createdAt` refers to the creation time of the bot.
- `joinedAt` refers to when the bot joined the server.
201 changes: 201 additions & 0 deletions milk/docs/json_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Documentations for Third-Party Developers

This documentation primarily discusses the JSON API (Application Interface) available to everyone.

## Limitations of Text API

```text
logs.beemo.gg/raid/:id
```

Following some recent testings, we've discovered that raid logs reaching over 500,000 users took **tens of megabytes**
to load, and that isn't ideal for the general users and for our systems. As such, after careful consideration, we've
decided to limit the Text API (`/raid/:id`) to display, at maximum, 2,000 users, which is about as much as we expect
the public to skim at maximum without using a tool to aggregate the data.

For people who aggregates the data using specialized tools, we have a JSON API (`/raid/:id.json`) that is paginated
available to use. You can find details about it in the following:
1. [`OpenAPI Schema`](openapi.yaml)
2. [`JSON API`](json_api.md)

## JSON API

```text
logs.beemo.gg/raid/:id.json
```

In this section, the specifications of the JSON API will be discussed and understood to provide clarity and understanding
over how one can use this to collect data about their raid.

> **Warning**
>
> Please note that this documentation is written for people who have some understanding of general
> data types and JSON as this is intended for people who are usually developing their own in-house tools.

### Specifications

Our JSON API is different from the Text API as the data here is chunked into different pages by 200 users each page,
and contains more detailed information of users, such as their avatar hash, when the account was created.

### Date Format

Dates are formatted under the [`ISO 8601`](https://en.wikipedia.org/wiki/ISO_8601) specification[^1], which are as follows:
```text
YYYY-MM-DDTHH:mm:ss.sssZ
```

[^1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/toISOString

### Response Schema
```json
{
"raid": {
"startedAt": "date",
"concludedAt": "nullable(date)",
"guild": "int64 as string",
"size": "int32"
},
"users": {
"next": "nullable(date)",
"size": "int16",
"data": "array of User"
}
}
```

### User Schema
```json
{
"id": "int64 as string",
"name": "string",
"joinedAt": "date",
"createdAt": "date",
"avatarHash": "nullable(string)"
}
```

### Query Parameters
- `cursor`: an ISO-8601 date that indicates what next set of data to get.
- **description**: an ISO-8601 date that indicates what next set of data to get.
- **type**: ISO-8601 date or `null`.
- **example**: `?cursor=2023-11-02T01:02:36.978Z`

### Querying data

To query the initial data, one needs to send a request to `logs.beemo.gg/raid/:id.json` where `:id` refers to the
raid identifier provided by Beemo, it should look gibberish and random, such as: `Ht2Erx76nj13`. In this example, we'll
use `Ht2Erx76nj13` as our `:id`.

> **Note**
>
> `Ht2Erx76nj13` is a sample raid from our tests. It may or may not exist on the actual `logs.beemo.gg` domain as
> of writing, but it could exist at some point. We recommend using your own raid identifier to follow along with
> our example.

Depending on your tooling, this may be different, but in this example, we'll be using `curl` in our command-line
to demonstrate.
```shell
curl http://logs.beemo.gg/raid/Ht2Erx76nj13.json
```

Running the above command gives us a very long JSON response, which follows our [schema](#response-schema) containing
information about the raid and the users.
```json
{
"raid": {
"startedAt": "2023-11-02T01:25:36.970Z",
"concludedAt": null,
"guild": "697474023914733575",
"size": 5000
},
"users": {
"next": "2023-11-02T01:02:36.978Z",
"size": 200,
"data": [
{
"id": "972441010812461952",
"name": "9hsr6JA5jk7vVA35",
"joinedAt": "2023-11-02T01:01:36.975Z",
"createdAt": "2023-11-14T01:25:36.975Z",
"avatarHash": null
},
{
"id": "495325603979310784",
"name": "TW871dd7YTb7sZv9",
"joinedAt": "2023-11-02T01:01:36.975Z",
"createdAt": "2023-11-03T01:25:36.975Z",
"avatarHash": null
},
...,
{
"id": "732275137613676288",
"name": "EbWp0ORBYH33BOX5",
"joinedAt": "2023-11-02T01:02:36.978Z",
"createdAt": "2023-11-21T01:25:36.978Z",
"avatarHash": null
}
]
}
}
```

From the above output, we can see that the `next` property is filled, which means, we can use that to possibly query
the next set of data, if there is any. This `next` property also happens to be the same as the `joinedAt` time of the
**last user in the array**, in this case, the user with the id of "732275137613676288" and the name of "EbWp0ORBYH33BOX5".

To query the next set of information, all we need to do is add `?cursor=<next property's value>` to the link. In our
example, our `<next property's value>` is `2023-11-02T01:02:36.978Z`, which means, we have to add `?cursor=2023-11-02T01:02:36.978Z` to our link.

```shell
curl http://logs.beemo.gg/raid/Ht2Erx76nj13.json?cursor=2023-11-02T01:02:36.978Z
```

Running that command on our command-line gives us a similar output to the above, but instead, we get a different set
of users and a different `next` property value that directly links with the **last user in the array**'s `joinedAt`.

```json
{
"raid": {
"startedAt": "2023-11-02T01:25:36.970Z",
"concludedAt": null,
"guild": "697474023914733575",
"size": 5000
},
"users": {
"next": "2023-11-02T01:03:36.982Z",
"size": 200,
"data": [
{
"id": "819796273096448256",
"name": "676kq1AUwXh8Ygwf",
"joinedAt": "2023-11-02T01:02:36.979Z",
"createdAt": "2023-11-09T01:25:36.979Z",
"avatarHash": null
},
{
"id": "679522196993485440",
"name": "93j0HK2MvEud9n1Y",
"joinedAt": "2023-11-02T01:02:36.979Z",
"createdAt": "2023-11-26T01:25:36.979Z",
"avatarHash": null
},
...,
{
"id": "151281220654798656",
"name": "W04Qy8a4p1bZxSMu",
"joinedAt": "2023-11-02T01:03:36.982Z",
"createdAt": "2023-11-25T01:25:36.982Z",
"avatarHash": null
}
]
}
}
```

And similar to the previous one, you can then request the next set of data by replacing the value of `cursor` on your
link to the value of `next` and keep repeating until you get all the data that you want.

## Have more questions?

If you have more questions, feel free to come over at our [Discord Server](https://beemo.gg/discord) and ask over there,
we'll happily answer to our best capacity!
Loading