From 41503b4531f052357fd16fabf7003af4f74ab87b Mon Sep 17 00:00:00 2001 From: Robert Fekete Date: Sat, 12 Oct 2024 11:13:30 +0200 Subject: [PATCH] Parser-related review fixes, part 1 --- content/filterx/filterx-parsing/csv/_index.md | 10 +++++----- .../csv/reference-parsers-csv/_index.md | 18 +++++++++--------- .../filterx-parsing/key-value-parser/_index.md | 6 +++--- .../kv-parser-options/_index.md | 2 +- .../chunk/csv-parser-multiple-delimiters.md | 4 ++-- .../headless/wnt/n-kv-parser-repeated-keys.md | 2 +- 6 files changed, 21 insertions(+), 21 deletions(-) diff --git a/content/filterx/filterx-parsing/csv/_index.md b/content/filterx/filterx-parsing/csv/_index.md index 9a5c4be7..a1a0ea58 100644 --- a/content/filterx/filterx-parsing/csv/_index.md +++ b/content/filterx/filterx-parsing/csv/_index.md @@ -7,11 +7,11 @@ weight: 400 {{< include-headless "chunk/filterx-experimental-banner.md" >}} -The `parse_csv` FilterX function can separate parts of log messages (for example, the contents of the `${MESSAGE}` macro) at delimiter characters or strings to named fields (columns). +The `parse_csv` FilterX function can separate parts of log messages (that is, the contents of the `${MESSAGE}` macro) along delimiter characters or strings into lists, or key-value pairs within dictionaries, using the csv (comma-separated-values) parser. Usage: `parse_csv(, columns=json_array, delimiter=string, string_delimiters=json_array, dialect=string, strip_whitespace=boolean, greedy=boolean)` -Only the input string is mandatory. +Only the input parameter is mandatory. If the `columns` option is set, `parse_csv` returns a [dictionary]({{< relref "/filterx/_index.md#json" >}}) with the column names (as keys) and the parsed values. If the [`columns`]({{< relref "/filterx/filterx-parsing/csv/reference-parsers-csv/_index.md#columns" >}}) option isn't set, `parse_csv` returns a list. @@ -53,7 +53,7 @@ Here is a sample message: 192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.mycompany ``` -To parse such logs, the delimiter character is set to a single whitespace (`delimiter=" "`). Whitespaces are stripped. +To parse such logs, the delimiter character is set to a single whitespace (`delimiter=" "`). Excess leading and trailing whitespace characters are stripped. ```shell block filterx p_apache() { @@ -71,7 +71,7 @@ block filterx p_apache() { }; ``` -The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. If the field is empty, the `nouser` name is assigned. +The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. in case the field is empty, the `nouser` string is assigned as default. ```shell log { @@ -86,7 +86,7 @@ destination d_file { ## Segment a part of a message {#example-parser-multiple} -You can use multiple parsers to split a part of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields. Note that the [scoping of FilterX variables]({{< relref "/filterx/_index.md#scoping" >}}) is important: +You can use multiple parsers in a layered manner to split parts of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields. Note that the [scoping of FilterX variables]({{< relref "/filterx/_index.md#scoping" >}}) is important: - If you add the new parser to the FilterX block used in the [previous example](#example-parser-apache), every variable is available. - If you use a separate FilterX block, only global variables and name-value pairs (variables with names starting with the `$` character) are accessible from the block. diff --git a/content/filterx/filterx-parsing/csv/reference-parsers-csv/_index.md b/content/filterx/filterx-parsing/csv/reference-parsers-csv/_index.md index 7dc3883a..ee29b87b 100644 --- a/content/filterx/filterx-parsing/csv/reference-parsers-csv/_index.md +++ b/content/filterx/filterx-parsing/csv/reference-parsers-csv/_index.md @@ -13,7 +13,7 @@ The `parse_csv` FilterX function has the following options. | Synopsis: | `columns=["1st","2nd","3rd"]` | | Default value: | N/A | -*Description:* Specifies the names of the columns in a JSON array. +*Description:* Specifies the names of the columns, and correspondingly the keys in the resulting JSON array. - If the `columns` option is set, `parse_csv` returns a dictionary with the column names (as keys) and the parsed values. - If the [`columns`]({{< relref "/filterx/filterx-parsing/csv/reference-parsers-csv/_index.md#columns" >}}) option isn't set, `parse_csv` returns a list. @@ -25,10 +25,10 @@ The `parse_csv` FilterX function has the following options. | Synopsis: | `delimiter=""` | | Default value: | `,` | -*Description:* The delimiter is the character that separates the columns in the input string. If you specify multiple characters, every character will be treated as a delimiter. Note that the delimiters aren't included in the column values. For example: +*Description:* The delimiter parameter contains the characters that separate the columns in the input string. If you specify multiple characters, every character will be treated as a delimiter. Note that the delimiters aren't included in the column values. For example: - To separate the text at every hyphen (-) and colon (:) character, use `delimiter="-:"`. -- To separate the columns at the tabulator (tab character), specify `delimiter="\\t"`. +- To separate the columns along the tabulator (tab character), specify `delimiter="\\t"`. - To use strings instead of characters as delimiters, see [`string_delimiters`](#string-delimiters). {{< include-headless "chunk/csv-parser-multiple-delimiters.md" >}} @@ -40,7 +40,7 @@ The `parse_csv` FilterX function has the following options. | Synopsis: | `dialect=""` | | Default value: | `escape-none` | -*Description:* Specifies how to handle escaping in the parsed strings. +*Description:* Specifies how to handle escaping in the input strings. The following values are available. @@ -53,7 +53,7 @@ The following values are available. | Synopsis: | `greedy=true` | | Default value: | `false` | -If the `greedy` option is enabled, {{% param "product.name" %}} adds the not-yet-parsed part of the message to the last column, ignoring any delimiters that may appear in this part of the message. You can use this option to process messages where the number of columns varies. +If the `greedy` option is enabled, {{% param "product.name" %}} adds the remaining part of the message to the last column, ignoring any delimiters that may appear in this part of the message. You can use this option to process messages where the number of columns varies from message to message. For example, you receive the following comma-separated message: `example 1, example2, example3`, and you segment it with the following parser: @@ -61,7 +61,7 @@ For example, you receive the following comma-separated message: `example 1, exam my-parsed-values = parse_csv(${MESSAGE}, columns=["COLUMN1", "COLUMN2", "COLUMN3"], delimiter=","); ``` -The `COLUMN1`, `COLUMN2`, and `COLUMN3` variables will contain the strings `example1`, `example2`, and `example3`, respectively. If the message looks like `example 1, example2, example3, some more information`, then any text appearing after the third comma (that is, `some more information`) is not parsed, and possibly lost if you use only the parsed columns to reconstruct the message (for example, if you send the columns to different columns of an database). +The `COLUMN1`, `COLUMN2`, and `COLUMN3` variables will contain the strings `example1`, `example2`, and `example3`, respectively. If the message looks like `example 1, example2, example3, some more information`, then any text appearing after the third comma (that is, `some more information`) is not parsed, and thus possibly lost if you use only the parsed columns to reconstruct the message (for example, if you send the columns to different columns of a database table). Using the `greedy=true` flag will assign the remainder of the message to the last column, so that the `COLUMN1`, `COLUMN2`, and `COLUMN3` variables will contain the strings `example1`, `example2`, and `example3, some more information`. @@ -84,10 +84,10 @@ my-parsed-values = parse_csv(${MESSAGE}, columns=["COLUMN1", "COLUMN2", "COLUMN3 | --------- | ------------------------------------------------ | | Synopsis: | `string_delimiters=json_array(["first-string","2nd-string"])` | -*Description:* If you have to use a string as a delimiter, list your string delimiters as a JSON array in the `string_delimiters=["", "", ...]` option. +*Description:* In case you have to use a string as a delimiter, list your string delimiters as a JSON array in the `string_delimiters=["", "", ...]` option. -By default, the `parse_csv` FilterX function uses the comma as a delimiter. If you want to use only strings as delimiters, you have to disable the space delimiter, for example: `delimiter="", string_delimiters=[""])` +By default, the `parse_csv` FilterX function uses the comma as a delimiter. If you want to use only strings as delimiters, you have to disable the default space delimiter, for example: `delimiter="", string_delimiters=[""])` -Otherwise, {{% param "product.abbrev" %}} will use the string delimiters in addition to the default character delimiter, so `string_delimiters=["=="]` actually equals `delimiters=",", string_delimiters=["=="]`, and not `delimiters="", string_delimiters=["=="]` +Otherwise, {{% param "product.abbrev" %}} will use the string delimiters in addition to the default character delimiter, so for example, `string_delimiters=["=="]` is actually equivalent to `delimiters=",", string_delimiters=["=="]`, and not `delimiters="", string_delimiters=["=="]` {{< include-headless "chunk/csv-parser-multiple-delimiters.md" >}} diff --git a/content/filterx/filterx-parsing/key-value-parser/_index.md b/content/filterx/filterx-parsing/key-value-parser/_index.md index a6ae3a85..79f7a006 100644 --- a/content/filterx/filterx-parsing/key-value-parser/_index.md +++ b/content/filterx/filterx-parsing/key-value-parser/_index.md @@ -6,7 +6,7 @@ weight: 1100 {{< include-headless "chunk/filterx-experimental-banner.md" >}} -The `parse_kv` FilterX function can separate a string consisting of whitespace or comma-separated `key=value` pairs (for example, Postfix log messages). You can also specify other separator character instead of the equal sign, for example, colon (`:`) to parse MySQL log messages. The {{% param "product.abbrev" %}} application automatically trims any leading or trailing whitespace characters from the keys and values, and also parses values that contain unquoted whitespace. +The `parse_kv` FilterX function can split a string consisting of whitespace or comma-separated `key=value` pairs (for example, Postfix log messages). You can also specify other value separator characters instead of the equal sign, for example, colon (`:`) to parse MySQL log messages. The {{% param "product.abbrev" %}} application automatically trims any leading or trailing whitespace characters from the keys and values, and also parses values that contain unquoted whitespace. {{< include-headless "wnt/n-kv-parser-repeated-keys.md" >}} @@ -23,7 +23,7 @@ The names of the keys can contain only the following characters: numbers (0-9), Usage: `parse_kv(, value_separator="=", pair_separator=",", stray_words_key="stray_words")` -The `value_separator` must be a single-character string. The `pair_separator` must be a string. +The `value_separator` must be a single-character string. The `pair_separator` can be a regular string. ## Example @@ -39,7 +39,7 @@ filterx { }; ``` -You can set the separator character between the key and the value to parse for example, `key:value` pairs, like MySQL logs: +You can set the value separator character (the character between the key and the value) to parse for example, `key:value` pairs, like MySQL logs: ```shell Mar 7 12:39:25 myhost MysqlClient[20824]: SYSTEM_USER:'oscar', MYSQL_USER:'my_oscar', CONNECTION_ID:23, DB_SERVER:'127.0.0.1', DB:'--', QUERY:'USE test;' diff --git a/content/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md b/content/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md index 8a20d577..ec772c79 100644 --- a/content/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md +++ b/content/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md @@ -42,5 +42,5 @@ Specifies the character that separates the keys from the values. Default value: For example, to parse `key:value` pairs, use: ```shell -${MESSAGE} = parse_kv("key1:value1;key2:value2", value_separator=":"); +${MESSAGE} = parse_kv("key1:value1,key2:value2", value_separator=":"); ``` diff --git a/content/headless/chunk/csv-parser-multiple-delimiters.md b/content/headless/chunk/csv-parser-multiple-delimiters.md index c1e1da63..1a1a2353 100644 --- a/content/headless/chunk/csv-parser-multiple-delimiters.md +++ b/content/headless/chunk/csv-parser-multiple-delimiters.md @@ -8,5 +8,5 @@ If you use more than one delimiter, note the following points: - {{% param "product.abbrev" %}} will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter. - You can use both string delimiters and character delimiters in a parser. -- The string delimiters can include characters that are also used as character delimiters. -- If a string delimiter and a character delimiter both match at the same position of the message, {{% param "product.abbrev" %}} uses the string delimiter. +- The string delimiters may include characters that are also used as character delimiters. +- If a string delimiter and a character delimiter both match at the same position of the input, {{% param "product.abbrev" %}} uses the string delimiter. diff --git a/content/headless/wnt/n-kv-parser-repeated-keys.md b/content/headless/wnt/n-kv-parser-repeated-keys.md index 12046e3c..b5c5fd7a 100644 --- a/content/headless/wnt/n-kv-parser-repeated-keys.md +++ b/content/headless/wnt/n-kv-parser-repeated-keys.md @@ -3,6 +3,6 @@ {{% alert title="Note" color="info" %}} -If a log message contains the same key multiple times (for example, `key1=value1, key2=value2, key1=value3, key3=value4, key1=value5`), then {{% param "product.abbrev" %}} stores only the last (rightmost) value for the key. Using the previous example, {{% param "product.abbrev" %}} will store the following pairs: `key1=value5, key2=value2, key3=value4`. +If a log message contains the same key multiple times (for example, `key1=value1, key2=value2, key1=value3, key3=value4, key1=value5`), then {{% param "product.abbrev" %}} only stores the last (rightmost) value for the key. Using the previous example, {{% param "product.abbrev" %}} will store the following pairs: `key1=value5, key2=value2, key3=value4`. {{% /alert %}}