Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Graceful shutdown #30

Merged
merged 113 commits into from
Jul 18, 2019
Merged

Graceful shutdown #30

merged 113 commits into from
Jul 18, 2019

Conversation

nWacky
Copy link
Contributor

@nWacky nWacky commented Jun 18, 2019

Graceful Shutdown

Resolves #29

Synopsis

Add shutdown all connections and remove all data from db when a program receives SIGHUP, SIGINT, SIGTERM, SIGQUIT.

Checklist

  • Created PR:
    • In draft mode
    • Name contains WIP: prefix
    • Name contains issue reference
    • Has k:: labels applied
    • Has assignee
  • Documentation is updated (if required)
  • Tests are updated (if required)
  • Changes conform code style
  • CHANGELOG entry is added (if required)
  • FCM (final commit message) is posted
    • and approved
  • Review is completed and changes are approved
  • Before merge:
    • Milestone is set
    • PR's name and description are correct and up-to-date
    • WIP: prefix is removed
    • All temporary labels are removed

@nWacky nWacky added feature New feature or request k::api Related to API (application interface) labels Jun 18, 2019
@nWacky nWacky added this to the 0.2.0 milestone Jun 18, 2019
@nWacky nWacky self-assigned this Jun 18, 2019
@nWacky nWacky requested a review from alexlapa June 18, 2019 16:39
Copy link
Collaborator

@alexlapa alexlapa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nWacky ,

Тесты не компилируются.

@nWacky
Copy link
Contributor Author

nWacky commented Jun 19, 2019

@alexlapa,

Пересмотрел некоторые вещи еще раз, сервер (actix_web::HttpServer) выключается graceful только при SIGTERM

@nWacky nWacky requested a review from alexlapa June 19, 2019 14:33
@nWacky
Copy link
Contributor Author

nWacky commented Jun 20, 2019

@alexlapa,
Отписался о проблеме закрытия приложения в issue #29

@nWacky
Copy link
Contributor Author

nWacky commented Jun 21, 2019

Обсудили, решили сделать следующее:

Описание

Класс GracefulShutdown, который обрабатывает signal::Signal и отправляет всем подписавшимся сообщение о shutdown.

Детали

Адреса подписавшихся хранятся в BTreeMap<u8, Vec<Recipient<...>>.

Чем меньше число приоритета, тем выше приоритет. 0 - самый высокий приоритет.

При получении signal::Signal (кроме SIGCHILD) класс начинает отправлять всем сигналы:

  1. Из Recipient каждой итерации BTreeMap создаем большую future (из адресов для отправки), с помощью join.
  2. После этого (and_then) выполняем все для следующей итерации.

После того, как создали эту большую future, select её с timeout_shutdown (или другим образом даём ей timeout_shutdown времени на завершение) и выключаем систему

Например,
send_0_0.join(send_0_1.join(send_0_2))
.and_then(send_1_0.join(send_1_1))
.and_then(send_2_0)
.select(shutdown_timer)

Для выполнения futures внутри Actor рекомендуется использовать асинхронный контекст Actor.
Использовать Actor message с адресом и приоритетом для подписки.

Copy link
Collaborator

@alexlapa alexlapa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Тыц. Но, там не совсем все верно, требуется обсудить повторно.

@nWacky
Copy link
Contributor Author

nWacky commented Jun 25, 2019

@alexlapa,

Есть еще одна проблема: логгер в actix web паникует пока выключается actix_web:

No logger set. Use `slog_scope::set_global_logger` or `slog_scope::scope`

Из-за этого вся система actix зависает, невозможно никаким образом что-либо сделать (в том числе и таймаут зависает в текущем или если его вынести в другой actor).
Основная цель закрытия - закрыть соединения с кодом 1000 и выключить систему - выполняется успешно, если отключить graceful shutdown для actix_web.

@nWacky
Copy link
Contributor Author

nWacky commented Jul 12, 2019

Обсудили, решили следующее:

  • Убрать RoomSetStopped и изменять состояние с помощью ActorFuture контекста (пример в session.rs)
  • Вынести CloseRoom в отдельную функцию, чтобы не отправлять сообщения самому себе

@nWacky
Copy link
Contributor Author

nWacky commented Jul 15, 2019

@alexlapa, @tyranron,

Убрать RoomSetStopped и изменять состояние с помощью ActorFuture контекста (пример в session.rs)

Если использовать контекст Actora в Futures, то тогда надо посылать и обрабатывать ActorFuture вместо Future -> невозможно использовать join_all().

Я не нашел как сделать из ActorFuture любое dyn Future<Item = (), Error = ()> + Send

@tyranron
Copy link
Member

tyranron commented Jul 15, 2019

@nWacky why? Recipient::send() returns Future, not ActorFuture.

@nWacky
Copy link
Contributor Author

nWacky commented Jul 15, 2019

@tyranron,

I tried using it this way:

fn get_drop_fut(
        &mut self,
        ctx: &mut Context<Room>,
    ) -> Box<dyn Future<Item = (), Error = ()> + Send> {
        info!("Closing Room [id = {:?}]", self.id);
        self.state = RoomState::Stopping;
        let drop_fut =
            wrap_future(
                self.participants.drop_connections(ctx)
            ).then(move |_, room: &mut Room, _| {
                room.state = RoomState::Stopped;
                futures::future::ok(())
            });
        Box::new(drop_fut)
    }

get_drop_fut is called whenever we need to get future to close Room (we need this future twice). After drop_connections is called I need to set self.state to Stopped. If I am using example above, medea does not compile due to drop_fut being ActorFuture, not Future. And I don't know how to convert it to ActorFuture

For now the state is being changed with recipient.send(RoomSetStopped {})

EDIT: I tried using Arc and Weak with regular future, but they don't want to be sent around because of in let room_arc = Arc::new(self); self is a non-static reference

@nWacky
Copy link
Contributor Author

nWacky commented Jul 15, 2019

Обсудили, решили следующее:

  • В shutdown'е join_all применять к recipient.send(..)
  • В Handlerах Actorов использовать асинхронный контекст (если необходимо)
  • ShutdownMessageResult = Result<(),()>

@nWacky nWacky requested a review from tyranron July 15, 2019 14:01
Copy link
Member

@tyranron tyranron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is too "dirty". A lot of unnecessary changes were introduced (like Send bounds, unremoved CloseRoom message, etc). The documentation was mostly missing.

You should put much more effort to the quality of your implementations in future.

At the moment, I've refactored it. Consider the changes carefully, and do the final testing on the implementation.

FCM:

Implement graceful shutdown (#30, #29)

- impl GracefulShutdown service which listens to OS signals and shutdowns each component gracefully
- impl graceful shutdown for Room and HTTP server
- add [shutdown] config section

Additionally:
- provide explicit states for Room

Be careful with merging and specifying FCM in the Github form:

  • First line of FCM should go as a title.
  • Everything other as a body.

Do not forget to remove WIP: prefix.

Cargo.toml Outdated
actix = "0.7"
actix-web = "0.7"
actix = "0.8"
actix-rt = "0.2.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specify the patch version when all the other deps specify minor version only?


#[derive(Clone, Debug, Deserialize, Serialize, SmartDefault)]
#[serde(default)]
pub struct ShutdownConfiguration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like was said earlier: just Shutdown is enough here. No sense to specify Configuration term as you are in conf module already. You had the similar declarations nearby, don't the difference confuse you?

#[serde(default)]
pub struct ShutdownConfiguration {
#[default(5000)]
pub timeout: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is mandatory.


use crate::log::prelude::*;

#[derive(Ord, PartialOrd, PartialEq, Eq, Copy, Clone)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical order by default should be applied.

info!("Exit signal received, exiting");

if self.recipients.is_empty() {
error!("GracefulShutdown: No subscribers registered");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why is this an error? It's a totally normal situation.

pub fn drop_connections(
&mut self,
ctx: &mut Context<Room>,
) -> impl Future<Item = (), Error = ()> {
) -> Box<impl Future<Item = (), Error = ()> + Send> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either you use impl Future, or Box<dyn Future>.

Box<impl Future> is unlikely what you want.

#[derive(Message)]
#[rtype(result = "()")]
#[allow(clippy::module_name_repetitions)]
struct RoomSetStopped {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this Message for? I didn't found any usages of it.

Copy link
Contributor Author

@nWacky nWacky Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyranron,

It was used to set Room's state to Stopped back when I was returning futures from ShutdownGracefully because I couldn't modify self from these futures

There is no use for it now

@tyranron tyranron requested a review from alexlapa July 17, 2019 17:14
API HTTP server: {}",
e
)
})
Copy link
Collaborator

@alexlapa alexlapa Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyranron,

Напрягает толстый main. main, в моем представлении, должен просто собирать контекст. Предлагаю инжектить
graceful_shutdown в new всех интересующих нас компонентов и там уже подписываться.

Тут, теоретическим минусом может быть то, что тогда Room::new будет рурму не только собирать, но и стартовать. Но я не думаю что это будет проблемой.

Если не так, то тогда хотя бы какой-нибудь helper запилю, дабы кода в main было чуть меньше.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexlapa let's refactor it later. Things will be much simple with async/.await usage. And the subscribing itself will be moved to Actor initialization for each kind of long-living thing. But for now just left it as is...

alexlapa added 2 commits July 18, 2019 13:39
# Conflicts:
#	Cargo.lock
#	config.toml
#	src/api/client/server.rs
#	src/conf/mod.rs
@alexlapa alexlapa marked this pull request as ready for review July 18, 2019 16:01
@alexlapa alexlapa changed the title WIP: Graceful shutdown Graceful shutdown Jul 18, 2019
@alexlapa alexlapa modified the milestones: 0.2.0, 0.1.0 Jul 18, 2019
@alexlapa alexlapa merged commit 9fa4b5b into master Jul 18, 2019
@alexlapa alexlapa mentioned this pull request Jul 18, 2019
@tyranron tyranron deleted the 29-graceful-shutdown branch July 18, 2019 16:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New feature or request k::deploy Related to deployment capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Graceful shutdown
4 participants