Skip to content

quortex/influxdb-athena-crawler

Repository files navigation

influxdb-athena-crawler

An AWS Athena crawler for InfluxDB.

Overview

This project is a utility designed to get AWS Athena results (CSV objects stored in AWS S3), parse them and write InfluxDB points.

Prerequisites

AWS

To be used with AWS and interact with the s3 bucket, an AWS account with the following permissions on s3 is required (note that s3:DeleteObject is only required if clean-objects is set):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "<BUCKET_NAME>"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:ListObjects", "s3:GetObject", "s3:DeleteObject"],
      "Resource": "<BUCKET_NAME>/*"
    }
  ]
}

Installation

Helm (Kubernetes install)

Follow influxdb-athena-crawler documentation for Helm deployment here.

Configuration

Optional args

influxdb-athena-crawler takes as argument the parameters below.

Key Description Default
region The AWS region. ""
bucket The AWS bucket to watch. ""
prefix The bucket prefix. ""
suffix Filename suffix to restrict files processed on the bucket. ""
clean-objects Whether to delete S3 objects after processing them. false
max-object-age How long to wait since last modification before file cleaning. 10m
timeout The global timeout. "30s"
influx-server The InfluxDB server address. ""
influx-token The InfluxDB token. ""
influx-org The InfluxDB org to write to. ""
influx-bucket The InfluxDB bucket write to. ""
measurement A measurement acts as a container for tags, fields, and timestamps. Use a measurement name that describes your data. ""
timestamp-row The timestamp row in CSV. "timestamp"
timestamp-layout The layout to parse timestamp. "2006-01-02T15:04:05.000Z"
tag Tags to add to InfluxDB point. Could be of the form --tag=foo if tag name matches CSV row or --tag='foo={row:bar}' to specify row. ""
field Fields to add to InfluxDB point. Could be of the form --field='foo={type:int,row:bar}', if not specified, CSV row matches field name. Type can be float, int, string or bool. ""
max-routines The max number of concurrent object processing routines. 100

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Versioning

We use SemVer for versioning.

Help

Got a question? File a GitHub issue.