Skip to content
matt maier edited this page Apr 23, 2015 · 6 revisions

Status

Current status (build, version, etc.) of container on Docker Hub

Requirements

google/cadvisor

Usage

Use of this container does require some configuration. Collectd, by its very nature, has fairly target specific configurations. Generally speaking, it is not usually possible to apply one configuration to an entire infrastructure.

Initially, a mounted volume is leveraged to provide the configurations, which reside outside the container. This enables the container to be generic but does require that the configurations themselves are distributed as part of whatever configuration management or orchestration methodology applies (manual, ansible, puppet, chef, salt, etc.)

The mounted volume requirement will switch to an option as configuration support via etcd and consul is added.

The fundamental steps in using the container are as follows:

  1. Clone the repository
  2. Perform site-wide configuration changes
    1. Renaming the examples to be used throughout the infrastructure
    2. Integrating or templating configuration files
  3. Deploy the etc-collectd directory to target hosts
  4. Perform host-specific configuration updates to Collectd, the CAdvisor plugin and optionally the Mesos plugin or any other Collectd plugins to be used.
  5. Start the cadvisor and cadvisor-collectd containers.

Configuration

  1. Collectd required
  2. CAdvisor collector required
  3. Mesos collector optional

Starting

Command line

# if 'etc-collectd' were deployed to /home/vagrant/etc-collectd

sudo docker run --name=cadvisor \
  -v /:/rootfs:ro \
  -v /var/run:/var/run:rw \
  -v /sys:/sys:ro \
  -v /var/lib/docker/:/var/lib/docker:ro \
  -d google/cadvisor:latest

sudo docker run --name=collectd \
  -v /home/vagrant/etc-collectd:/etc/collectd \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -d maier/cadvisor-collectd:latest

Systemd units

The collectd.service and cadvisor.service unit files from this repository can be used as a starting point. Note, modify collectd unit file to ensure the path for etc-collectd points to where the configuration files are actually located. (default is /conf/etc-collectd)

Troubleshooting

  1. shell access
    • CAdvisor docker exec -it cadvisor /bin/sh, busybox based, use opkg-install to add additional packages.
    • Collectd docker exec -it collectd /bin/sh, alipine based, use apk add to add additional packages.
  2. verify docker socket in collectd container
    • docker exec -it collectd /bin/sh
    • apk update && apk add socat
    • echo -e "GET /containers/json HTTP/1.1\r\n" | socat unix-connect:/var/run/docker.sock -
  3. verify cadvisor (from host)
    • curl -s "$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' cadvisor):8080/api/v2.0/machine" | python -m json.tool
  4. list cadvisor /system.slice/subcontainers (from host), useful when editing system_services: list in cadvisor.yaml
    • curl -s "$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' cadvisor):8080/api/v1.3/containers/system.slice" | python -c 'import json,sys,pprint;obj=json.load(sys.stdin);pprint.pprint(obj["subcontainers"]);'

Metric Continuity

Goal: maintain continuity of metrics for a given abstracted context across its physical manifestations.

As our operating environments become more and more dynamic, maintaining metric and monitoring integrity for the infrastructure becomes increasingly complex and difficult. The advent of micro-services, containers, iaas, paas, infrastructure-virtualization, etc. makes a host-centric view of the operating environment dysfunctional in a metrics context. Any given instance of a service may start/stop, migrate to a different host, migrate to a different DC, etc. Having metrics which are tied to the running state or physical assets on which the instance is/was running disrupts the ability to easily have a continuous stream of metric information on the service instance itself.

Given serviceA instance1 (a1) and serviceB instance1 (b1) with host1, host2, and host3. In the current world, most tools simply apply assumed ownership of any metrics from a1 or b1 to whichever host instances are physically running. It is further assumed that if the instance stops, that is the finite end of its life. If a1 moves to host2, the metrics for a1 from host1 just stop, and the metrics from a1 on host2 start and appear as new. Maintaining metric continuity for s1 has just become more complex. The complexity extrapolates from this point. More hosts, more services, more instances, backends treating the metric names differently, etc.

To help work around these issues, there are fledgeling namespace capabilities within the cadvisor plugin configuration which allow manipulating the host and plugin portions of the collectd metric name. Collectd also provides a capability called chains which can apply transformations to the metric names. Leveraging Collectd's changes may become an option in a later release of cadvisor-collectd, chains are a complex feature and difficult to get working correctly.

The purpose of the namespacing, which can be configured in the cadvisor and mesos plugins, is to help maintain the continuity of the metrics stream from the perspective of the abstraction (e.g. an instance of a service) regardless of where the stream originates at any point in time. This is a relatively naive implementation, striving to provide the most minimal solution for one specific use-case.