Skip to content

Latest commit

 

History

History
419 lines (291 loc) · 14.6 KB

DESIGN.md

File metadata and controls

419 lines (291 loc) · 14.6 KB

Vector Remap Language: Guiding Design Principles

This document describes the high-level goals and directions of the Vector Remap Language (VRL). It is intended to help guide discussions and decisions made during the development of VRL itself. The document is less relevant to users of VRL, the language documentation exists for that audience.

Table of Contents

The Zen of VRL

  • Safety and performance over ease of use. VRL programs facilitate data processing for mission critical production systems. They are run millions of times per day for large Vector users and usually written once. Therefore, safety and performance are prioritized over developer ease of use.

  • The best VRL program is the one that most clearly expresses its output. VRL is an expression-oriented DSL designed to express data transformations. It is not a programming language. Users should not sacrifice the clarity of their data transformation for things like performance. The best VRL program is the one that most clearly describes the intended output. There should be no "tricks" to making VRL fast that take away from readability.

Why VRL

VRL exists to solve the need for a safe and performant domain specific language (DSL) to remap observability data.

Its purpose is to hit a sweet spot between Turing complete languages such as Lua, and static transforms such as Vector's old rename_fields. It should be flexible enough to cover most remapping use-cases, without requiring the flexibility and downsides of a full programming language, such as poor readability and performance.

See the introduction blog post for more details on the why.

Target Audience

VRL has a specialized purpose of solving observability data remapping needs. Because of this purpose, the language is mostly used by people who manage infrastructure at their organisations. The role of this group is usually referred to as "operator", "devops" (Development and Operations) or "SRE" (Site Reliability Engineer).

One common generalization of this group is that they are focused on maintaining infrastructure within an organization, and often write and maintain their own software to achieve this goal. They are adept enough at programming to achieve their goals, but have no need or desire to be as skilled in programming as dedicated software engineers, because their time is best spent elsewhere.

As with everything, there are exceptions to the rule, and many people in this group are highly skilled software engineers, but VRL must capture the largest possible segment of this group, and should therefore be limited in complexity.

Therefore, when extending the feature set of VRL, design the feature for the intended target audience, which will likely mean choosing different trade-offs than you'd make if you were to design the feature for your personal needs.

Language Limits

There are a number of features that we've so far rejected to implement in the language. This might change in the future, but there should be a good reason to do so.

  • modules (see: #5507)

    So far, we've had no need to split up functions over multiple modules. The function naming rules make it so that most functions are already grouped by their logical usage patterns.

  • classes

    Given that VRL is a simple DSL, and that any indirection in a program's source can lead to confusion, we've decided against introducing the concept of classes, and instead focused on the usage of function calls to solve operator needs.

  • user-defined functions

    User-defined functions again produce indirection. While it might be useful to some extremely large use-cases, in most cases, allowing people to read a program from top to bottom without having to jump around is more clear in the context within which VRL is used.

  • network calls (see #4517 and #8717)

    In order to avoid performance foot guns, we want to ensure each function is as performant as it can be, and there's no way to use functions in such a way that performance of a VRL program tanks. We might introduce network calls at some point, if we find a good caching solution to solve most of our concerns, but so far we've avoided any network calls inside our stdlib.

  • assignable closures (see #9001)

    While we do support closures, they are tied to function calls, and cannot be used elsewhere. This also means closures cannot be assigned to variables, and re-used between function-calls. This decision was made because it can lead to poor-performing code (to the extend of introducing infinite loops), and makes code less clear to reason about. While this is undoubtedly a powerful feature to have, the cons do not outweigh the pro's.

Conventions

Performance

The performance of VRL is a corner-stone of the language. However, given the target audience, and the goal to make the language as simple and straight-forward as possible, there are always trade-offs to be made when considering the performance implications of a feature or design decision.

Copying Data

For example, VRL is an expression-oriented language, and we favor returning a new copy of a piece of manipulated data over mutating the underlying data.

That is, you write this:

# explicitly assign the parsed JSON to `.message`
.message = parse_json(.message)

Instead of this:

# mutate `.message` in place
parse_json(.message)

While this results in less performance, it makes VRL programs easier to reason about, which in this particular case weigh more heavily than the performance implications.

Program Optimizations

We plan on introducing an "optimization step" to the compiler in the future, that would allow us to rewrite parts of a program into a more optimized variant, without having the operator having to worry about these transformations.

This means that if there's a reasonable path forward towards optimizing certain language constructs internally, we should not burden the operator with using/applying to optimizations manually.

For example, we favor composition over single-purpose functions, and while each individual function call adds extra performance overhead, the thought is that in the future, we can optimize multiple function calls into a single call (inlining), without having to expose this optimization technique to operators.

Diagnostics

Diagnostic messages shown by the compiler during compilation are one of the biggest tools we expose to operators to help them write correct programs.

Adding diagnostic messages to new and existing features in VRL should never be an afterthought.

Syntax

Keep VRL as syntax-light as possible. The less symbols, the more readable a VRL program is.

Use functions whenever possible, and only introduce new syntax if a common pattern warrants a more convenient syntax-based solution, or functions are too limited in their capability to solve the required need.

Fallibility

A VRL program must be infallible by default once the compiler generates a program from the provided source. This means that any expression that can result in an error (adding a string and an object, dividing by zero, calling a fallible function) must be explicitly handled before the compiler accepts the input source.

The fallibility system is an important part of the language and its goals, and is also the part that most often trips people up. This chapter tries to shed some light on its inner workings.

Fallible Expressions

When the VRL compiler compiles a source to a valid program, it queries each expression whether the expression itself can fail at runtime or not. If it can, the compiler refuses to compile the program, until the operator handles the failure case.

For example:

. = parse_json(.message)

The above program can fail at runtime, because there's no guarantee the message field contains a JSON-encoded string.

The operator needs to handle the failure case using one of the available failure-handling features in VRL.

Type Checking

In addition to expressions being fallible, the type checker also considers a program fallible if the type expected by an expression cannot be guaranteed at compile-time.

For example:

.message = "log message: " + .log

In this case, the log field type cannot be determined at compile-time, and thus concatenating the field value with a string might fail (e.g. if it's an array, or any other type that cannot be combined with a string).

This too needs to be handled at compile-time by the operator.

Progressive Type Checking

A quirk in the type checker is that arguments passed to functions are not type checked individually. Instead, the function call itself can be marked fallible if any of its arguments do not adhere to the expected type.

For example:

upcase(.message)

Upcasing a string is an infallible operation, however, because we can't guarantee that the message field will actually be a string, the function call is still invalid, as the function is marked as fallible.

This decision was made for ergonomics purposes.

Errors

  • All errors are caught at compile-time by the compiler (see "fallibility" chapter).

  • The only exception to this rule is if the operator explicitly allows a function to fail the program at runtime (e.g. safe_call() vs unsafe_call!()).

  • A function should be marked as "fallible" if its internal implementation can fail.

  • A function should not be marked as fallible if it receives the wrong argument type. This is handled by the compiler.

  • Errors should contain explicit messages detailing what went wrong, and how the operator can solve the problem.

Functions

Composition

  • Favor function composition over single-purpose functions.

  • If a problem needs to be solved in multiple steps, consider adding single-purpose functions for each individual step.

  • If usability or readability is hurt by composition, favor single-purpose functions.

Naming

  • Function names are lower-cased (e.g. round).

  • Multi-name functions use underscore (_) as a separator (e.g. encode_key_value).

  • Favor explicit verbose names over terse ones (e.g. parse_aws_cloudwatch_log_subscription_message over parse_aws_cwl_msg).

  • Functions should be preceded with their function category for organization and discovery.

    • Use parse_* for functions that decode a string to another type of data (e.g. parse_json and parse_grok).

    • Use decode_* for string to string decoding functions (e.g. decode_base64).

    • Use encode_* for string encoding functions (e.g. encode_base64).

    • Use to_* to convert from one type to another (e.g. to_bool).

    • Use is_* to determine if the provided value is of a given type (e.g. is_string or is_json).

    • Use format_* for string formatting functions (e.g. format_timestamp and format_number).

    • Use get_* for functions that return a single result, or error if zero or more results are found.

    • Use find_* when multiple possible results are returned in an array.

Return Types

  • Return boolean from is_* functions (e.g. is_string).

  • Return a string from encode_* functions (e.g. encode_base64).

  • Return an error whenever the function can fail at runtime.

Mutability

As a general rule, functions never mutate values.

Favor this:

# explicitly assign the parsed JSON to `.message`
.message = parse_json(.message)

Over this:

# mutate `.message` in place
parse_json(.message)

There are exceptions to this rule (such as the del function), but they are limited, and additional exceptions should be well reasoned.

Fallibility

  • A function must be marked as fallible if it can fail in any way.

  • A function should be designed with the goal of making it infallible.

  • A function must not hide fallibility for the sake of pursuing the previous rule.

  • A function implementation may assume it receives the argument type it has defined.

  • parse_* functions should almost always error when used incorrectly.

  • get_* functions should fail when it can't find a single result to return.

  • to_* functions must never fail.

Signatures

  • Functions can have zero or more parameters (e.g. now() and assert(true)).

  • Argument naming follows the same conventions as function naming.

  • For one or more parameters, the first parameter must be the "value" the function is acting on (e.g. parse_regex(value: <string>, pattern: <regex>)).

  • The first parameter must therefore almost always be named value.

  • The exception to this is when you're dealing with an actual VRL path (e.g. del(path)) or in special cases such as assert.

Patterns

The following is a list of patterns we've experienced while writing and using VRL. This section of the document is intended to be updated frequently with new insights.

These insights are meant to guide future design decisions, and shape our thinking of the language as it matures and we learn more about its strengths and weaknesses from our users.

Error Chaining

Observability data can be structured in unexpected ways. A common pattern is to try to decode the data in one way, only to try a different decoder if the first one failed.

This pattern uses [function calls][] and [error coalescing][] to achieve its goal:

data = parse_json(.message) ??
       parse_nginx_log(.message) ??
       parse_apache_log(.message) ??
       { "error": "invalid data format" }