Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v5 plans #597

Open
5 of 14 tasks
zoriya opened this issue Aug 14, 2024 · 12 comments
Open
5 of 14 tasks

v5 plans #597

zoriya opened this issue Aug 14, 2024 · 12 comments
Assignees
Milestone

Comments

@zoriya
Copy link
Owner

zoriya commented Aug 14, 2024

I talked a bit about it on the discord or twitter, but it's time to put everything on text:

I'm planning a complete rewrite of the backend & database restructuration. This is needed to support #87 and #282. This will also make #463 (big thanks to @Arthi-chaud for the brainstorming there) or #549 possible. This is also a great occasion to tackle tech dept and bad decisions I made during the 5 years I've been working on this codebase.

I plan on writing diagrams and asking for feedback on the discord at most turning points of the code, so please give your feedback if you're interested!

To give a concrete vision, I plan to:

I'm going to explain each points and why I want to design Kyoo this way, as always feel free to give your opinion or ideas.

Why a separate auth service

Kyoo has multiples HTTP services, for now we have the API & the transcoder. To ensure users have the correct level of permissions, all requests hit the API which does permission validation and then proxy the transcoder. This is bad for performances, scaling and DX. This idea with a centralized auth service is to have the reverse-proxy/ingress/gateway call the service and trade an opaque auth token for a jwt (see #573's phantom token part). The short-lived jwt will be used by downstream services (API, transcoder, scanner...) to check for permissions.

Having this service stand alone also makes it possible/simple to have #346.

The auth service could also be used by others applications (as long as they are compliant w/ the license).

Why rewrite the API from scratch

The current backend is written in C# which lacks sum-types. Kyoo's logic often works on types like Movie | Serie, Episode | Movie, Episode | Special and so on. The lack of sum-types in C# makes it hard to work with, we have multiples interfaces and custom logic scattered everywhere to handle this well.
This is why JavaScript was chosen as the replacement (we could have used a more functional language like Elixir, OCaml or even Gleam, but the core value of kyoo is not it's API, so I think trading some perfs for velocity here will be really important. I also think Gleam is too early in development to write everything in it).

For #87, we need to rewrite basically every single type of kyoo (Series, Episodes, Movies... all need their translatable fields moved to another type & so on). Fixing types one by one & their SQL interaction would probably take more time than just rewriting everything (and is wayyy more boring).

What's up with episodes/movies/special/recaps/extra

Right now, Kyoo took the simplest approach of having either a Movie or a Serie containing seasons that contains episodes. In reality, this is a bit more complicated. Serie can have movies that should be watched between seasons.

Most online databases TVDB/TheMovieDB uses the "Season 0" as a special season, and we've used that until now, but this feels more like a workaround than a proper feature. Some specials are:

  • critical to the watching experience and needs to be watched between seasons/episodes.
  • simple recaps that rehash one/multiple episodes and can be skipped (but still need to be shown at their proper place in the timeline of the app)
  • extra content like short episodes (2/3min long)

Note that specials can also be movies.

To give an example:

Made in abyss is an anime with 2 seasons & 3 movies (at the time of writing). The first 2 movies recap the first season and the 3rd movie must be watched before the 2nd season.
This means watch order is 1st season -> 3rd movie -> 2nd season. The 1st/2nd movies should be shown close to the 1st season but be greyed out since it's a recap.

Websockets

I wanted to add websockets to kyoo for a long time (for features like #341, #297 or #342). This would also make invalidating cache for "Continue watching", "Next up" and "Watch status" easier in various apps.

I never really got around to writing it, since I was not happy with the options I had. C#'s built-in websocket solution uses a weird format that can only be used w/ their own lib so it felt wrong & writing a service specifically for that was counterproductive since it would need lots of logic shared by the API (I still did a poc in the feat/ws-rabit branch).

Elysia as a good websocket handling & the format is easily readable by any client so I'm happy about this. We would just need a message queue to handle replications.

On the scanning API

Right now, the matcher (part of the scanner that fetch metadata & pushes them to kyoo) is using a REST API to register new videos. When there are a lot of new videos to register, this kinda DDOS the API. This is also inadequate for data that could exist or not. For example. when we register an episode, the associated season/series can be already registered in kyoo or not.

Migrating to a queue based system w/ the matcher producing items to register & the API consuming these items seems like the way to go. When the API encounters an episode missing season/series data, it could push a request in another queue.

Why merge autosync

For those unaware, autosync is the service responsible for marking episodes watched on external services (SIMKL and in the future Trackt, MyAnimeList, AniList & so on).

Making this a separate service was an error, some services need to hook at different times of the playback (for example Trackt want to be notified when playback starts, is paused/resumed and finishes).
The current way also makes it impossible to report errors to the client. Integrating it to the backend directly would make this way easier.

Open questions

I'm still undecided about some things:

Should we keep Meilisearch as a search backend, or can postgres do that for us?

this was discuted in #420 and I think meilisearch is a great way to solve search but I'm open to reconsider this if we can have similar results w/ postgres only.
Side note but one of most highly rated under consideration feature of their roadmap is a recomandation system.

Should we use both RabbitMQ & Redis?

I plan on adding Redis (probably via valkey) soon for #579, distributing the transcoder's lock and the scanner's cache. I know redis can be used as a message queue, should we simply use redis for everything?

@zoriya zoriya added this to the v5.0.0 milestone Aug 14, 2024
@zoriya zoriya pinned this issue Aug 14, 2024
@zoriya
Copy link
Owner Author

zoriya commented Aug 14, 2024

Here is a draft of the new database schema:

image

I'll open a PR with it once I get some more work in it.

@zoriya zoriya self-assigned this Aug 14, 2024
@Arthi-chaud
Copy link
Collaborator

Would be happy to help you with all this!

@acelinkio
Copy link
Contributor

Going into the next major version, would be worth considering moving kyoo to an Github organization and moving each of the microservices into projects of their own. Outside of just organizing differently, the next priority would be ensuring a sane development experience.

@zoriya
Copy link
Owner Author

zoriya commented Aug 14, 2024

I think for a small team a multi-repo setup is worse for DX.

Having a single repo means a single issue tracker which is a definite +

It's also possible to do PRs impacting multiple services instead of having two/three and jumping between repo/pr to get the whole context.

@zoriya
Copy link
Owner Author

zoriya commented Aug 28, 2024

To give a small update: ive started working on the auth service. I decided to do it in golang instead of gleam, gleam feels too early for that yet. (branch is feat/auth)

I'll continue working on it and make a PR with the api's spec in the next week.

@thinkbig1979
Copy link

With regard to the DB, I suggest you take a look at EdgeDB. Their demo dataset is actually a movie dataset 😄
I'm a big fan of graph representation of data, and am currently developing an application using EdgeDB. I'm not a developer myself, but my dev team has enjoyed it so far.

@K3UL
Copy link

K3UL commented Sep 29, 2024

Wouldn't OneOf be the answer to your complaint about union types, rather than completely rewriting the backend just for this ?
Then they can easily be replaced in the (far) future when the official unions are implemented

@zoriya
Copy link
Owner Author

zoriya commented Sep 30, 2024

I saw OneOf before using interfaces for types likes Episode | Movie but it lacked in tooling and support to make it worth using.

The database and most types of the backend need to be rewritten either way, to support #87, #282, #463 or #549.
Elysia witch i plan to use for the v5 also as great websocket support.

@felipemarinho97
Copy link
Contributor

I don't know if you intend to add JWT validation to each microservice, but if so it might be worth using a proxy like Ory Oathkeeper (the docker images are very light). It can work with websockets too. -> https://www.ory.sh/docs/oathkeeper

Their ecosystem is very robust: https://www.ory.sh/docs/ecosystem/projects

@zoriya
Copy link
Owner Author

zoriya commented Nov 11, 2024

The auth spec/service is already available on master here.

@felipemarinho97
Copy link
Contributor

Making this a separate service was an error, some services need to hook at different times of the playback (for example Trackt want to be notified when playback starts, is paused/resumed and finishes).

I don't see it as a mistake, it seems much more elegant to have it as a separate service.
Why not emit the playback events on rabbitMQ and listen for them in autosync? The same events could be used to generate statistics or do anything else for other microservices.

The current way also makes it impossible to report errors to the client. Integrating it to the backend directly would make this way easier.

You can use a dead-letter-queue for errors.

@zoriya
Copy link
Owner Author

zoriya commented Nov 15, 2024

We want to be able to report errors synchronously to clients (if they use an http call to update the playback status, we need to answer in this http response that syncing to an external service failed). Using a queue & watiing for the separate service is just wasteful.

Playback events would anyway be sent to a queue & over websockets for listeners, so we will not lose the ability to use an external service if needed.

Overall, I think the autosync service is a lot of abstraction & communications for what should be a super simple thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

6 participants