Skip to content

Latest commit

 

History

History
228 lines (193 loc) · 12.8 KB

string_handling.md

File metadata and controls

228 lines (193 loc) · 12.8 KB

Documentation for the functions in string_handling.sh. A general overview is given in the project documentation.

Quick access

Function documentation

If the pipes are not documented, the default is:

  • stdin: piped input ignored
  • stdout: empty

Parameters enclosed in brackets [ ] are optional.

escape()

Takes the piped input and escapes specified character(s) with backslashes

Special care is taken to disable bash globbing to make sure that affected characters, typically *, can be escaped properly. At the end, the original globbing configuration is restored.

Example:

echo "path/to/file" | escape "/"

prints path\/to\/file

Param.$1character to escape
[$2...n]additional character(s) to escape
Pipesstdinread completely
stdoutstdin where $1...n were escaped
Status0

sanitize_variable_quotes()

If a string contains a value enclosed in quotes (the quotes are part of string), this function removes them. It checks for single and double quotes.

Examples:

  • Input as parameter: sanitize_variable_quotes "'quoted value'"
  • Piped input $(echo "'quoted value'" | sanitize_variable_quotes)

both print quoted value

Param.[$1]string to sanitize, if omitted or empty stdin is read
Pipesstdinif $1 is undefined or empty, read completely
stdoutsanitized $1 or stdin
Status0

trim()

Cut leading and trailing whitespace on either the provided parameter or the piped stdin

Examples:

  • Input as parameter: trimmed_string=$(trim "$string_to_trim")
  • Piped input: trimmed_string=$(echo "$string_to_trim" | trim)
Param.[$1]string to trim, if omitted or empty stdin is read
Pipesstdinif $1 is undefined or empty, read completely
stdouttrimmed $1 or stdin
Status0

find_substring()

Finds the position of the first instance of $2 in $1. If $3 is omitted the search begins at the beginning of $1, otherwise it begins after $3 characters.

Inspired by this StackOverflow thread

Param. $1string to search in
$2character/string to find - exact matching is used (bash's matching special characters are disabled)
[$3]search start position inside $1 - if it's omitted, search starts at the beginning
Pipesstdout
  • the position of the first character of the first occurence of $2 in the considered part of $1
  • -1 if $2 is not found in $1
Status 0success, search executed and result written on stdout
1$1 undefined or empty
2$2 undefined or empty

get_absolute_path()

Transforms $1 in a absolute filepath if it's relative. Uses $2 as base directory if it's defined, the current working directory otherwise.

The path $1 and the directory $2 don't have to exist.

Param. $1path to "absolutify" if necessary
[$2]root path - if omitted, the current working directory is used
Pipesstdoutthe computed absolute path
Status0

is_string_a()

Checks if string $1 is of a certain type $2:

TypeTest condition
absolute_filepaththe first non-whitespace character of $1 is a /
integer$1 contains only numbers

The test may be inverted if the type is prepended with a !, f.ex. !absolute_filepath.

Example:

is_string_a "$potential_int" "integer" && echo "This is a integer: $potential_int"
Param. $1string to check
$2test type, see table above; can be inverted with a leading !, f.ex. !integer
Status 0the test was executed and succeeded
1the test was executed, but failed
2$1 is empty
3$2 is empty
4$2 is unknown

get_string_bytelength()

Gives the byte length of $1. Characters which are part of the ASCII set are encoded on 1 byte, hence, for strings which contain only ASCII characters, the byte length and the string length correspond. Characters from other sets like f.ex. é, à, å, etc. require 2 or more bytes and lead to a higher byte length than string length. Uses the C locale internally.

Param.$1string to get the bytelength of
Pipesstdoutthe bytelength of $1
Status0

get_string_bytes()

Computes the byte representation of a string. Non-ASCII chars like à,é,å,ê,etc. are transformed to their character code, é f.ex. is \303\251. Uses the C locale internally.

Param.$1string to get the byte representation of
Pipesstdoutthe byte representation of $1
Status0

get_sed_extract_expression()

Generates sed string extraction expression. Example:

    echo "first|second|third" | sed -e $(get_sed_extract_expression "|" "before" "first")

prints first (the sed expression is s/|.*//g).

Param. $1marker
$2part to extract: can be before or after, with regard to the occurence of $1 selected with $3
$3occurence: can be first or last
Pipesstdoutthe sed extraction expression, empty in case of error
Status 0success, expression was computed and written on stdout
1the function was unable to find a suitable sed operation separator character
2the value of $2 and/or $3 is unknown

get_sed_replace_expression()

Generates sed string replacement expression.

Example:

echo "some string" | sed -e $(get_sed_replace_expression "some" "awesome")

print awesome string (the sed expression is s/some/awesome/g).

Param. $1sed match regex/string
$2sed replace string
[$3]occurence selection:
  • if omitted or empty, replace every occurence (aka global)
  • first to replace only the first occurence
  • last to replace only the last occurence
Pipesstdoutthe sed replace expression, empty in case of error
Status 0success, the expression was computed and written on stdout
1the function was unable to find a suitable separator character
2the occurence selection $3 is unknown

find_sed_operation_separator()

Provides a sed separator character which doesn't occur in $1 and $2.

Param. $1sed match regex/string
[$2]2nd sed argument
Pipesstdoutthe sed operation separator character, empty in case of error
Status 0found a suitable separator character, written on stdout
1none of the 23 characters available is suited

escape_sed_special_characters()

Adds a backslash to every occurence of a character which has a special signification in sed expressions:

. + ? * [ ] ^ $
Param.$1string to escape
Pipesstdoutescaped $1
Status0

get_random_string()

Provides a string composed of $1 alphanumeric characters taken from /dev/urandom.

Important: it's not suited for critical security applications like cryptography. However, it's useful to get unique strings for non-critical usecases, f.ex. "run IDs" which may be used to distinguish interleaving log entries from several instances of the same script running in parallel.

Param.[$1]length of the random string, defaults to 16 if omitted
Pipesstdoutthe random string
Status 0success, the random string is on stdout
1/dev/urandom doesn't exist