Your friendly >> chevron >> based
syntax for piping data through multiple
transformations.
A Julia package with all the good ideas from Chain.jl and Pipe.jl, but with nicer syntax and REPL integration.
Here is a simple example:
julia> using Chevy, DataFrames, TidierData
julia> Chevy.enable_repl() # magic to enable Chevy syntax in the REPL
julia> df = DataFrame(name=["John", "Sally", "Roger"], age=[54, 34, 79], children=[0, 2, 4])
3×3 DataFrame
Row │ name age children
│ String Int64 Int64
─────┼─────────────────────────
1 │ John 54 0
2 │ Sally 34 2
3 │ Roger 79 4
julia> df >> @filter(age > 40) >> @select(num_children=children, age)
2×2 DataFrame
Row │ num_children age
│ Int64 Int64
─────┼─────────────────────
1 │ 0 54
2 │ 4 79
Quick comparison with similar packages:
Feature | Chevy.jl | Chain.jl | Pipe.jl |
---|---|---|---|
Piping syntax | ✔️ (>> ) |
✔️ (@chain ) |
✔️ (|> ) |
Side effects | ✔️ (>>> ) |
✔️ (@aside ) |
❌ |
Pipe backwards | ✔️ (<< ) |
❌ | ❌ |
Recursive syntax | ✔️ | ❌ | ❌ |
REPL integration | ✔️ | ❌ | ❌ |
Line numbers on errors | ❌ | ✔️ | ❌ |
Click ]
to enter the Pkg REPL then do:
pkg> add Chevy
Chevy exports a macro @chevy
which transforms expressions like x >> f(y, z)
into
f(x, y, z)
. These can be chained together, so that
@chevy Int[] >> push!(5, 2, 4, 3, 1) >> sort!()
is equivalent to
sort!(push!(Int[], 5, 2, 4, 3, 1))
In fact we can see exactly what it is transformed to with @macroexpand
. This is
equivalent code but with intermediate results saved for clarity.
julia> @macroexpand @chevy Int[] >> push!(5, 2, 4, 3, 1) >> sort!()
quote
var"##chevy#241" = Int[]
var"##chevy#242" = push!(var"##chevy#241", 5, 2, 4, 3, 1)
sort!(var"##chevy#242")
end
If you are using the Julia REPL, you can activate Chevy's REPL integration like
julia> Chevy.enable_repl()
This allows you to use this syntax from the Julia REPL without typing @chevy
every
time. Use Chevy.enable_repl(false)
to disable it again. The rest of the examples here
will be from the REPL.
Also see this tip for automatically enabling the REPL integration.
Expressions like x >> f(y, z)
are transformed to insert x
as an extra first argument
in the function call, like:
julia> [5,2,4,3,1] >> sort!() >> println()
[1, 2, 3, 4, 5]
If you want the argument to appear elsewhere, you can indicate where with _
:
julia> [5,2,4,3,1] >> filter!(isodd, _) >> println()
[5, 3, 1]
In fact, you can use any expression involving _
:
julia> [5,2,4,3,1] >> filter!(isodd, _ .+ 10) >> println()
[15, 13, 11]
Sometimes you want to do something with an intermediate value in the pipeline, but then
continue with the previous value. For this, you can use x >>> f()
which is transformed
to tmp = x; f(tmp); tmp
. It is very similar to Chain.jl's @aside
syntax.
One use for this is to log intermediate values for debugging:
julia> [5,2,4,3,1] >> filter!(isodd, _) >>> println("x = ", _) >> sum()
x = [5, 3, 1]
9
You can assign values, and even use them in later steps:
julia> 10 >> (_ * 2) >>> (x = _) >> (x^2 - _)
380
julia> x
20
It is also useful for functions which mutate the argument but do not return it:
julia> [5,2,4,3,1] >> popat!(4)
3
julia> [5,2,4,3,1] >>> popat!(4) >> println()
[5, 2, 4, 1]
You can use <<
to pipe backwards: f(y) << x
is transformed to f(x, y)
.
This can be useful as a sort of "inline do-notation":
julia> write("hello.txt", "ignore this line\nkeep this line!");
julia> (
"hello.txt"
>> open()
<< (io -> io >>> readline() >> read(String))
>> uppercase()
)
"KEEP THIS LINE!"
You can instead just use regular do-notation:
julia> (
"hello.txt"
>> open() do io
io >>> readline() >> read(String)
end
>> uppercase()
)
"KEEP THIS LINE!"
The @chevy
macro works recursively, meaning you can wrap an entire module (or script
or function or any code block) and all >>
/>>>
/<<
expressions will be converted.
For example here is the first example in this README converted to a script:
using Chevy, DataFrames, TidierData
@chevy begin
df = DataFrame(name=["John", "Sally", "Roger"], age=[54, 34, 79], children=[0, 2, 4])
df2 = df >> @filter(age > 40) >> @select(num_children=children, age)
df2 >> println("data:", _)
df2 >> size >> println("size:", _)
end
Or the data manipulation step can be encapsulated as a function like so:
@chevy munge(df) = df >> @filter(age > 40) >> @select(num_children=children, age)
If you surround your pipelines with parentheses then you can place each transformation on a separate line for clarity. This also allows you to easily comment out individual transformations.
@chevy (
df
# >> @filter(age > 40)
>> @select(nchildren=children, age)
)
Or you can use >>(x, y, z)
syntax instead of x >> y >> z
like so:
@chevy >>(
df,
# @filter(age > 40),
@select(nchildren=children, age),
)
You can add the following lines to your startup.jl
file (usually at
~/.julia/config/startup.jl
) to enable Chevy's REPL integration automatically:
if isinteractive()
try
using Chevy
catch
@warn "Chevy not available"
end
if @isdefined Chevy
Chevy.enable_repl()
end
end
Chevy has no dependencies so is safe to add to your global environment - then it will always be available at the REPL.
See the docstrings for more help:
@chevy ...
: Transform and execute the given code.chevy(expr)
: Transform the given expression.Chevy.enable_repl(on=true)
: Enable/disable the REPL integration.