Or, for short, GDL: Golang Dataframe Library
Gandalff is a library for data wrangling in Go. The goal is to provide a simple and efficient API for data manipulation in Go, similar to Pandas or Polars in Python, and Dplyr in R. It supports nullable types: null data is optimized for memory usage.
Gandalff is a work in progress, and the API is not stable yet. However, it already supports the following formats:
- CSV
- XPT (SAS)
- XLSX
- HTML
- Markdown
package main
import (
"strings"
gandalff "github.com/caerbannogwhite/gandalff"
)
func main() {
data1 := `
name,age,weight,junior,department,salary band
Alice C,29,75.0,F,HR,4
John Doe,30,80.5,true,IT,2
Bob,31,85.0,F,IT,4
Jane H,25,60.0,false,IT,4
Mary,28,70.0,false,IT,3
Oliver,32,90.0,true,HR,1
Ursula,27,65.0,f,Business,4
Charlie,33,60.0,t,Business,2
Megan,26,55.0,F,IT,3
`
gandalff.NewBaseDataFrame(gandalff.NewContext()).
FromCsv().
SetReader(strings.NewReader(data1)).
Read().
Select("department", "age", "weight", "junior").
GroupBy("department").
Agg(gandalff.Min("age"), gandalff.Max("weight"), gandalff.Mean("junior"), gandalff.Count()).
Run().
PrettyPrint(
gandalff.NewPrettyPrintParams().
SetUseLipGloss(true))
}
// Output:
// โญโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโฎ
// โ department โ age โ weight โ junior โ n โ
// โโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโค
// โ String โ Float64 โ Float64 โ Float64 โ Int64 โ
// โโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโค
// โ HR โ 29.00 โ 90.00 โ 0.5000 โ 2.000 โ
// โ IT โ 25.00 โ 85.00 โ 0.5000 โ 4.000 โ
// โ Business โ 27.00 โ 65.00 โ 0.5000 โ 2.000 โ
// โฐโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโฏ
You can join the Gandalff community on Discord.
The data types not checked are not yet supported, but might be in the future.
- Bool
- Bool (memory optimized, not fully implemented yet)
- Int16
- Int
- Int64
- Float32
- Float64
- Complex64
- Complex128
- String
- Time
- Duration
-
Filter
- filter by bool slice
- filter by int slice
- filter by bool series
- filter by int series
-
Group
- Group (with nulls)
- SubGroup (with nulls)
-
Map
-
Sort
- Sort (with nulls)
- SortRev (with nulls)
-
Take
-
Agg
-
Filter
-
GroupBy
-
Join
- Inner
- Left
- Right
- Outer
- Inner with nulls
- Left with nulls
- Right with nulls
- Outer with nulls
-
Map
-
OrderBy
-
Select
-
Take
-
Pivot
-
Stack/Append
- Count
- Sum
- Mean
- Median
- Min
- Max
- StdDev
- Variance
- Quantile
Built with:
- Improve filtering interface.
- Improve dataframe PrettyPrint: add parameters, optimize data display, use lipgloss.
- Implement string factors.
- SeriesTime: set time format.
- Implement
Set(i []int, v []any) Series
. - Add
Slice(i []int) Series
(using filter?). - Implement memory optimized Bool series with uint64.
- Use uint64 for null mask.
- Optimize XPT reader/writer with float32.
- Add url resolver to each reader.
- Add format option to each writer.
- JSON reader by records.
- Implement chunked series.
- Implement OpenAI interface.
- Implement Parquet reader and writer.
- Implement SPSS reader and writer.
- Implement SAS7BDAT reader and writer (https://cran.r-project.org/web/packages/sas7bdat/vignettes/sas7bdat.pdf)