Skip to content

Latest commit

 

History

History
555 lines (468 loc) · 27.9 KB

README.md

File metadata and controls

555 lines (468 loc) · 27.9 KB

ANKLET

Inspired by our customer requirements, Anklet is a solution created to meet the specific needs of our users.

At a glance

  • Veertu's Anklet is a service that runs custom and specific plugins to communicate with your CI platform or other tools, and the Anka CLI.
  • It can run multiple plugins at once, in parallel, on the same host.
  • Depending on the plugins, it can run on both linux containers/instances and macOS hosts.

Why Anklet?

Here are a few needs our customers expressed so you can understand the motivation for Anklet:

  1. Each team and repository should not have knowledge of the Anka Build Cloud Controller URL, potential auth methods, Anka Node Groups, etc. These are all things that had to be set in the job yaml for the existing solution for github actions. This should be abstracted away for security and simplicity of use.
  2. Their CI workflow files cannot have multiple stages (start -> the actual job that runs in the VM -> a cleanup step) just to run a single Anka VM... that's just too much overhead to ask developers to manage. Instead, something should spin up the VM behind the scenes, register the runner, and then execute the job inside the VM directly.
  3. They don't want the job to be responsible for cleaning up the VM + registered runner either. Something should watch the status of the job and clean up the VM when it's complete.

While these reasons are specific to Github Actions, they apply to many other CI platforms too.

Anklet will have a configuration and run custom plugins (written by us and/or the community) which handle all of the logic necessary to watch/listen for jobs in the specific CI platform. The plugins determine what logic happens host-side to prepare a macOS VM and optionally register it to the CI platform for use. We'll talk more about that below. At the time of writing this, plugins are not independent, but will eventually be separated.

How does it really work?

  1. Anklet loads the configuration from the ~/.config/anklet/config.yml file on the same host. The configuration defines the plugins that will be started. Example below.
    • Each plugins: list item in the config specifies a plugin to load and use, the database (if there is one;optional), and any other specific configuration for that plugin.
  2. Plugins run in parallel, but have separate internal context to avoid collisions.
  3. It supports loading in a database (currently redis) to manage state across all of your hosts.
    • The github plugin, and likely others, rely on this to prevent race conditions with picking up jobs.
    • It is disabled: true by default to make anklet more lightweight.
  4. Logs are in JSON format and are written to ./anklet.log (unless otherwise specified). Here is an example of the log structure:
    {
      "time": "2024-04-03T17:10:08.726639-04:00",
      "level": "INFO",
      "msg": "handling anka workflow run job",
      "ankletVersion": "dev",
      "pluginName": "RUNNER1",
      "plugin": "github",
      "repo": "anklet",
      "owner": "veertuinc",
      "workflowName": "t1-without-tag",
      "workflowRunName": "t1-without-tag",
      "workflowRunId": 8544945071,
      "workflowJobId": 23414111000,
      "workflowJobName": "testJob",
      "ankaTemplate": "d792c6f6-198c-470f-9526-9c998efe7ab4",
      "ankaTemplateTag": "(using latest)",
      "jobURL": "https://github.com/veertuinc/anklet/actions/runs/8544945071/job/23414111000",
      "uniqueRunKey": "8544945071:1"
    }
    • All critical errors your Ops team needs to watch for are level ERROR.

How does it manage VM Templates on the host?

Anklet handles VM Templates/Tags the best it can using the Anka CLI.

  • If the VM Template or Tag does not exist, Anklet will pull it from the Registry using the default configured registry under anka registry list-repos. You can also set the registry_url in the config.yml to use a different registry.
    • Two consecutive pulls cannot happen on the same host or else the data may become corrupt. If a second job is picked up that requires a pull, it will send it back to the queue so another host can handle it.
  • If the Template AND Tag already exist, it does not issue a pull from the Registry (which therefore doesn't require maintaining a Registry at all; useful for users who use anka export/import). Important: You must define the tag, or else it will attempt to use "latest" and forcefully issue a pull.

Setup Guide

We're going to use Github Actions as an example, but the process is similar for other plugins.

With the Github Actions plugin, there is a Receiver Plugin and a Handler Plugin.

  • The Github Actions Receiver Plugin is a web server that listens for webhooks Github sends and then places the events in the database/queue. It can run on mac and linux.
  • The Github Actions Handler Plugin is responsible for pulling a job from the database/queue, preparing a macOS VM, and registering it to the repo's action runners so it can execute the job inside. It can run on mac as it needs access to the Anka CLI.

Anklet Setup

  1. Download the binary from the releases page.
  2. Use the Plugin Setup and Usage Guides to setup the plugin(s) you want to use.
  3. Create a ~/.config/anklet/config.yml file with the following contents and modify any necessary values. We'll use a config for github:
    ---
    work_dir: /tmp/
    pid_file_dir: /tmp/
    # plugins_path: ~/.config/anklet/plugins/
    global_database_url: localhost
    global_database_port: 6379
    global_database_user: ""
    global_database_password: ""
    global_database_database: 0
    global_private_key: /Users/nathanpierce/veertuinc-anklet.2024-07-19.private-key.pem
    log:
        # if file_dir is not set, it will be set to current directory you execute anklet in
        file_dir: /Users/myUser/Library/Logs/
    plugins:
      # GITHUB RECEIVER
      - name: GITHUB_WEBHOOK_RECEIVER
        plugin: github_receiver
        hook_id: 489747753
        port: 54321 # port that's open to the internet so github can post to it
        secret: 00000000
        #private_key: /Users/nathanpierce/veertuinc-anklet.2024-07-19.private-key.pem
        app_id: 949431
        installation_id: 52970581
        repo: anklet
        owner: veertuinc
        #database:
          #url: localhost
          #port: 6379
          #user: ""
          #password: ""
          #database: 0
      # GITHUB HANDLERS
      - name: RUNNER1
          plugin: github
          app_id: 949431
          installation_id: 52970581
          repo: anklet
          owner: veertuinc
          registry_url: http://anka.registry:8089
          sleep_interval: 10 # sleep 10 seconds between checks for new jobs
      - name: RUNNER2
          plugin: github
          token: github_pat_1XXXXX
          repo: anklet
          owner: veertuinc
          registry_url: http://anka.registry:8089
    

    Note: You can only ever run two VMs per host per the Apple macOS SLA. While you can specify more than two plugins, only two will ever be running a VM at one time. sleep_interval can be used to control the frequency/priority of a plugin and increase the odds that a job will be picked up.

  4. Run the binary on the host that has the Anka CLI installed (Anka is not needed if just running an Anklet Receiver).
    • tail -fF /Users/myUser/Library/Logs/anklet.log to see the logs. You can run anklet with LOG_LEVEL=DEBUG to see more verbose output.
  5. To stop, send an interrupt or ctrl+c. It will attempt a graceful shut down of plugins, sending unfinished jobs back to the queue or waiting until the job is done to prevent orphans.

It is also possible to use ENVs for several of the items in the config. They override anything set in the yml. Here is a list of ENVs that you can use:

ENV Description
ANKLET_WORK_DIR Absolute path to work directory for anklet (ex: /tmp/) (defaults to ./)
ANKLET_PID_FILE_DIR Absolute path to pid file directory for anklet (ex: /tmp/) (defaults to ./)
ANKLET_LOG_FILE_DIR Absolute path to log file directory for anklet (ex: /Users/myUser/Library/Logs/) (defaults to ./)

For error handling, see the github plugin README.

Database Setup

At the moment we support redis 7.x for the database. For testing, it can be installed on macOS using homebrew. We recommend choosing one of your Anklet hosts to run the database on and pointing all other hosts to it in their config.

brew install redis
sudo sysctl kern.ipc.somaxconn=511 # you can also add to /etc/sysctl.conf and reboot
brew services start redis # use sudo on ec2
tail -fF /opt/homebrew/var/log/redis.log

For production, we recommend running a redis cluster on infrastructure that is separate from your Anklet hosts and has guaranteed uptime.

Your config.yml file must define the database in one of the following ways:

  • Using the database variables (under each plugin).
  • Using the global_database_* variables (applies to and overrides the database variables under each plugin).
  • Using the ENVs: ANKLET_GLOBAL_DATABASE_URL, ANKLET_GLOBAL_DATABASE_PORT, ANKLET_GLOBAL_DATABASE_USER, ANKLET_GLOBAL_DATABASE_PASSWORD, ANKLET_GLOBAL_DATABASE_DATABASE.

Plugin Setup and Usage Guides

You can control the location plugins are stored on the host by setting the plugins_path in the config.yml file. If not set, it will default to ~/.config/anklet/plugins/.

NOTE: Plugin names MUST be unique across all hosts.

Github Actions

Docker / Containers

Docker images are available at veertu/anklet. You can find the example docker-compose file in the docker directory.

MacOS Daemon

You can find how to automate the installation of Anklet and run a PLIST here.


Metrics

Metrics for monitoring are available at http://127.0.0.1:8080/metrics?format=prometheus. This applies to both handler and receiver plugins, but receivers can differ slightly in what metrics are available. Be sure to check the specific plugin documentation for more information and examples.

Note: If port 8080 is already in use, Anklet will automatically increment the port by 1 until it finds an open port.

  • You can change the port in the config.yml under metrics, like so:

    metrics:
      port: 8080

Key Names and Descriptions

Key Description
total_running_vms Total number of running VMs
total_successful_runs_since_start Total number of successful runs since start
total_failed_runs_since_start Total number of failed runs since start
total_canceled_runs_since_start Total number of canceled runs since start
plugin_name Name of the plugin
plugin_plugin_name Name of the plugin
plugin_owner_name Name of the owner
plugin_repo_name Name of the repo
plugin_status Status of the plugin (idle, running, limit_paused, stopped)
plugin_last_successful_run_job_url Last successful run job url of the plugin
plugin_last_failed_run_job_url Last failed run job url of the plugin
plugin_last_successful_run Timestamp of last successful run of the plugin (RFC3339)
plugin_last_failed_run Timestamp of last failed run of the plugin (RFC3339)
plugin_status_since Timestamp of when the plugin was last started (RFC3339)
plugin_total_ran_vms Total number of VMs ran by the plugin
plugin_total_successful_runs_since_start Total number of successful runs since start
plugin_total_failed_runs_since_start Total number of failed runs since start
plugin_total_canceled_runs_since_start Total number of canceled runs since start
host_cpu_count Total CPU count of the host
host_cpu_used_count Total in use CPU count of the host
host_cpu_usage_percentage CPU usage percentage of the host
host_memory_total_bytes Total memory of the host (bytes)
host_memory_used_bytes Used memory of the host (bytes)
host_memory_available_bytes Available memory of the host (bytes)
host_memory_usage_percentage Memory usage percentage of the host
host_disk_total_bytes Total disk space of the host (bytes)
host_disk_used_bytes Used disk space of the host (bytes)
host_disk_available_bytes Available disk space of the host (bytes)
host_disk_usage_percentage Disk usage percentage of the host

Prometheus

  • If repo is not set, the metrics will not show repo=.
total_running_vms 0
total_successful_runs_since_start 0
total_failed_runs_since_start 0
total_canceled_runs_since_start 1
plugin_status{name=RUNNER1,plugin=github,owner=veertuinc} idle
plugin_last_successful_run{name=RUNNER1,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_failed_run{name=RUNNER1,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_canceled_run{name=RUNNER1,plugin=github,owner=veertuinc,job_url=https://github.com/veertuinc/anklet/actions/runs/12325604197/job/34405145636} 0001-01-01T00:00:00Z
plugin_status_since{name=RUNNER1,plugin=github,owner=veertuinc} 2024-12-13T19:24:47-06:00
plugin_total_ran_vms{name=RUNNER1,plugin=github,owner=veertuinc} 0
plugin_total_successful_runs_since_start{name=RUNNER1,plugin=github,owner=veertuinc} 0
plugin_total_failed_runs_since_start{name=RUNNER1,plugin=github,owner=veertuinc} 0
plugin_total_canceled_runs_since_start{name=RUNNER1,plugin=github,owner=veertuinc} 1
plugin_status{name=RUNNER2,plugin=github,owner=veertuinc} idle
plugin_last_successful_run{name=RUNNER2,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_failed_run{name=RUNNER2,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_canceled_run{name=RUNNER2,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_status_since{name=RUNNER2,plugin=github,owner=veertuinc} 2024-12-13T19:24:47-06:00
plugin_total_ran_vms{name=RUNNER2,plugin=github,owner=veertuinc} 0
plugin_total_successful_runs_since_start{name=RUNNER2,plugin=github,owner=veertuinc} 0
plugin_total_failed_runs_since_start{name=RUNNER2,plugin=github,owner=veertuinc} 0
plugin_total_canceled_runs_since_start{name=RUNNER2,plugin=github,owner=veertuinc} 0
host_cpu_count 12
host_cpu_used_count 1
host_cpu_usage_percentage 12.371134
host_memory_total_bytes 38654705664
host_memory_used_bytes 19959037952
host_memory_available_bytes 18695667712
host_memory_usage_percentage 51.634174
host_disk_total_bytes 994662584320
host_disk_used_bytes 783564165120
host_disk_available_bytes 211098419200
host_disk_usage_percentage 78.776881

Metrics Aggregator

In most cases each individual Anklet serving up their own metrics is good enough for your monitoring needs. However, there are situations where you may need to consume them from a single source instead. The Anklet Aggregator service is designed to do just that.

In order to enable the aggregator, you will want to run an Anklet with the aggregator flag set to true. You want to run this separate from any other plugins. This will start an Anklet Aggregator service that will collect metrics from all Anklet metrics stored in the Databse and make them available at http://{aggregator_url}:{port}/metrics?format=json or http://{aggregator_url}:{port}/metrics?format=prometheus. Here is an example config:

---
work_dir: /Users/nathanpierce/anklet/
pid_file_dir: /tmp/
log:
  # if file_dir is not set, it will be set to current directory you execute anklet in
  file_dir: /Users/nathanpierce/Library/Logs/
global_database_url: localhost
global_database_port: 6379
global_database_user: ""
global_database_password: ""
global_database_database: 0
metrics:
  aggregator: true
  port: 8081 # port to serve aggregator on
  sleep_interval: 10 # how often to fetch metrics from each Anklet defined

You can see that this requires a database to be running. The aggregator will store the metrics in Redis so that it can serve them up without delay.

It's possible to use ENVs instead of the yml file. This is useful if you want to running anklet metrics aggregator in kubernetes. Here is a list of ENVs that you can use:

ENV Description
ANKLET_METRICS_AGGREGATOR Whether to enable the aggregator (ex: true)
ANKLET_METRICS_PORT Port to serve aggregator on (ex: 8081)

| ANKLET_METRICS_SLEEP_INTERVAL | How many seconds between fetching metrics from each Anklet url defined | | ANKLET_METRICS_DATABASE_ENABLED | Whether to enable the database (ex: true) | | ANKLET_METRICS_DATABASE_URL | URL of the database (ex: localhost) | | ANKLET_METRICS_DATABASE_PORT | Port of the database (ex: 6379) | | ANKLET_METRICS_DATABASE_DATABASE | Database to use (ex: 0) | | ANKLET_METRICS_DATABASE_USER | User to use (ex: "") | | ANKLET_METRICS_DATABASE_PASSWORD | Password to use (ex: "") |

Finally, here are the example responses of each format:

Prometheus

total_running_vms 0
total_successful_runs_since_start 0
total_failed_runs_since_start 0
total_canceled_runs_since_start 1
plugin_status{name=RUNNER1,plugin=github,owner=veertuinc} idle
plugin_last_successful_run{name=RUNNER1,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_failed_run{name=RUNNER1,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_canceled_run{name=RUNNER1,plugin=github,owner=veertuinc,job_url=https://github.com/veertuinc/anklet/actions/runs/12325604197/job/34405145636} 0001-01-01T00:00:00Z
plugin_status_since{name=RUNNER1,plugin=github,owner=veertuinc} 2024-12-13T19:24:47-06:00
plugin_total_ran_vms{name=RUNNER1,plugin=github,owner=veertuinc} 0
plugin_total_successful_runs_since_start{name=RUNNER1,plugin=github,owner=veertuinc} 0
plugin_total_failed_runs_since_start{name=RUNNER1,plugin=github,owner=veertuinc} 0
plugin_total_canceled_runs_since_start{name=RUNNER1,plugin=github,owner=veertuinc} 1
plugin_status{name=RUNNER2,plugin=github,owner=veertuinc} idle
plugin_last_successful_run{name=RUNNER2,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_failed_run{name=RUNNER2,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_last_canceled_run{name=RUNNER2,plugin=github,owner=veertuinc,job_url=} 0001-01-01T00:00:00Z
plugin_status_since{name=RUNNER2,plugin=github,owner=veertuinc} 2024-12-13T19:24:47-06:00
plugin_total_ran_vms{name=RUNNER2,plugin=github,owner=veertuinc} 0
plugin_total_successful_runs_since_start{name=RUNNER2,plugin=github,owner=veertuinc} 0
plugin_total_failed_runs_since_start{name=RUNNER2,plugin=github,owner=veertuinc} 0
plugin_total_canceled_runs_since_start{name=RUNNER2,plugin=github,owner=veertuinc} 0
host_cpu_count 12
host_cpu_used_count 1
host_cpu_usage_percentage 13.112463
host_memory_total_bytes 38654705664
host_memory_used_bytes 20108099584
host_memory_available_bytes 18546606080
host_memory_usage_percentage 52.019797
host_disk_total_bytes 994662584320
host_disk_used_bytes 783569383424
host_disk_available_bytes 211093200896
host_disk_usage_percentage 78.777406

Development

Prepare your environment for development:

brew install go
go mod tidy
cd ${REPO_ROOT}
ln -s ~/.config/anklet/org-config.yml org-config.yml
ln -s ~/.config/anklet/repo-receiver-config.yml repo-receiver-config.yml
LOG_LEVEL=dev go run main.go -c org-receiver-config.yml # run the receiver
LOG_LEVEL=dev go run main.go -c org-config.yml # run the handler
  • NOTE: You'll need to change the webhook URL so it points to the public IP of the server running the receiver (for me, that's my ISP's public IP + open port forwarding to my local machine).

The dev LOG_LEVEL has colored output with text + pretty printed JSON for easier debugging. Here is an example:

[20:45:21.814] INFO: job still in progress {
  "ankaTemplate": "d792c6f6-198c-470f-9526-9c998efe7ab4",
  "ankaTemplateTag": "vanilla+port-forward-22+brew-git",
  "ankletVersion": "dev",
  "jobURL": "https://github.com/veertuinc/anklet/actions/runs/8608565514/job/23591139958",
  "job_id": 23591139958,
  "owner": "veertuinc",
  "plugin": "github",
  "repo": "anklet",
  "serviceName": "RUNNER1",
  "source": {
    "file": "/Users/nathanpierce/anklet/plugins/handlers/github/github.go",
    "function": "github.com/veertuinc/anklet/plugins/handlers/github.Run",
    "line": 408
  },
  "uniqueRunKey": "8608565514:1",
  "vmName": "anklet-vm-83685657-9bda-4b32-84db-6c50ee712268",
  "workflowJobId": 23591139958,
  "workflowJobName": "testJob",
  "workflowName": "t1-with-tag-1",
  "workflowRunId": 8608565514,
  "workflowRunName": "t1-with-tag-1"
}
  • LOG_LEVEL=ERROR go run main.go to see only errors
  • Run each service only once with LOG_LEVEL=dev go run -ldflags "-X main.runOnce=true" main.go

Plugins

Plugins are, currently, stored in the plugins/ directory. They will be moved into external binaries at some point in the future.

Guidelines

Important: Avoid handling context cancellations in places of the code that will need to be done before the runner exits. This means any VM deletion or database cleanup must be done using functions that do not have context cancellation, allowing them to complete.

If your plugin has any required files stored on disk, you should keep them in ~/.config/anklet/plugins/{plugin-name}/. For example, github requires three bash files to prepare the github actions runner in the VMs. They are stored on each host:

❯ ll ~/.config/anklet/plugins/handlers/github
total 0
lrwxr-xr-x  1 nathanpierce  staff    61B Apr  4 16:02 install-runner.bash
lrwxr-xr-x  1 nathanpierce  staff    62B Apr  4 16:02 register-runner.bash
lrwxr-xr-x  1 nathanpierce  staff    59B Apr  4 16:02 start-runner.bash

Each plugin must have a {name}.go file with a Run function that takes in context.Context, logger *slog.Logger, etc . See github plugin for an example.

The Run function should be designed to run multiple times in parallel. It should not rely on any state from the previous runs. - Always return out of Run so the sleep interval and main.go can handle the next run properly with new context. Never loop inside of the plugin code. - Should never panic but instead throw an ERROR and return. The github plugin has a go routine that loops and watches for cancellation, which then performs cleanup before exiting in all situations except for crashes. - It's critical that you check for context cancellation after/before important logic that could orphan resources.

Handling Metrics

Any of the plugins you run are done from within worker context. Each plugin also has a separate plugin context storing its Name, etc. The metrics for the anklet instance is stored in the worker context so they can be accessed by any of the plugins. Plugins should update the metrics for the plugin they are running in at the various phases.

For example, the github plugin will update the metrics for the plugin it is running in to be running, pulling, and idle when it is done or has yet to pick up a new job. To do this, it uses metrics.UpdatePlugin with the worker and plugin context. See github plugin for an example.

But metrics.UpdateService can also update things like LastSuccess, and LastFailure. See metrics.UpdateService for more information.

FAQs

  • Can I guarantee that the logs for Anklet will contain the anklet (and all plugins) shut down message?
    • No, there is no guarantee an error, not thrown from inside of a plugin, will do a graceful shutdown.

Copyright

All rights reserved, Veertu Inc.