Skip to content

Migration

bri25yu edited this page Nov 29, 2022 · 1 revision

Background

See issue: hkn-rails issue #186

When the OCF upgraded its machines from Debian 8 (jessie) to 9 (stretch), it had a transition period for users on the old apphost (werewolves.ocf.berkeley.edu) to migrate their apps to the new apphost (vampires.ocf.berkeley.edu).

The idea was that we would get hkn-rails running simultaneously on both werewolves (jessie) and vampires (stretch), so when the OCF re-routed web traffic from werewolves to vampires, there would be no downtime.

When @jvperrin and I (@jameslzhu) migrated hkn-rails, we created a separate capistrano target migrate, which would target the new apphost vampires in a separate deploy folder ~/hkn-rails/migrate/. (The previous deploy folder was ~/hkn-rails/prod.)

OCF setup

We implemented several workarounds in response to various issues arising from our specific setup on the OCF:

  • NFS (Network File System) sharing between werewolves and vampires, causing both to share the same files
  • (Not really related, but useful) Unix socket file binding, where traffic to hkn.eecs.berkeley.edu is routed to the program bound to the socket file /srv/apps/hkn/hkn.sock (see the apphosting docs).
  • Service starting / restarting management with systemd, which due to NFS also shares network files
  • RVM, ruby version manager, which installs and compiles Ruby on the apphost in our user directory (~hkn)
  • Our use of Solr, a Java indexing engine which runs as a separate subprocess from hkn-rails. We write its PID number to a file, which hkn-rails uses to know that Solr is running and which PID to connect to.

Past issues / workarounds

NFS, by itself, caused several issues:

  • Incompatible Ruby binaries
    • The same Ruby binaries were present on werewolves and vampires. Because Debian stretch upgraded various system libraries, the Ruby compiled on werewolves (2.5.0) linked to shared libraries that were not present on vampires.
    • Solution: we created a Git branch 'migrate', in which we edited the Gemfile ruby version from ruby: '2.5.0' to ruby: '2.5.1'. We installed Ruby 2.5.1 on vampires with rvm, and added rvm version config in the Capfile to denote which version capistrano should use when deploying.
  • Systemd unit file changes
    • The systemd unit file, which specifies the hkn-rails script to run at startup, runs only when the host is werewolves: ConditionHost: werewolves
    • Solution: in the migrate branch, the systemd unit file has the host changed to vampires. On the apphost, the service file (~/.config/systemd/user) has been renamed to hkn-rails-migrate.service (to avoid NFS collision with hkn-rails.service). hkn-rails.service was enabled on werewolves, and hkn-rails-migrate.service was enabled on vampires.
  • Solr detection failure
    • uh idk @jvperrin do you know how we got around this
  • Shared folder inconsistency
    • The deploy uses ~/hkn-rails/prod/shared to share files between releases, i.e. resumes, pid files, configuration. We don't want to lose access to this in the new deploy.
    • Solution: symlink the new shared folder to the old: ~/hkn-rails/migrate/shared -> ~/hkn-rails/prod/shared.

Current tasks

Production deployment today involves checking out the migrate git branch, then deploying to the migrate target with:

bundle exec cap migrate deploy

We would like to return to checking out the master git branch, and deploying to prod; this reduces confusion for new contributors, and reduces redundancy in our config. This will require merging all of the changes on migrate into master, as well as updating the server-side configuration through ssh:

  • systemd unit renamings (hkn-rails-migrate -> hkn-rails)
  • Double-checking shared/ folder consistency
  • Making sure Solr connections still work
  • Avoiding downtime (some will be required, to avoid simultaneous bindings to the socket file)

Checklist

  • Check shared folder will be consistent
  • Edit deploy.rb to restart hkn-rails.service, instead of hkn-rails-migrate.service
  • Edit logrotate systemd files on apphost to restart hkn-rails.service
  • systemctl --user daemon-reload
  • Merge into master
  • Stop hkn-rails-migrate.service
  • Delete old 2.5.0 bundler gems (~/hkn-rails/prod/shared/bundle)
  • Deploy prod with Capistrano
  • Start hkn-rails.service
  • Check if working
  • Disable hkn-rails-migrate.service
  • Enable hkn-rails.service