Skip to content

Latest commit

ย 

History

History
174 lines (138 loc) ยท 4.37 KB

README.md

File metadata and controls

174 lines (138 loc) ยท 4.37 KB

GANDALFF: Golang, ANother DatAframe Library For Fun ๐Ÿง™โ€โ™‚๏ธ

Or, for short, GDL: Golang Dataframe Library

What is it?

Gandalff is a library for data wrangling in Go. The goal is to provide a simple and efficient API for data manipulation in Go, similar to Pandas or Polars in Python, and Dplyr in R. It supports nullable types: null data is optimized for memory usage.

Gandalff is a work in progress, and the API is not stable yet. However, it already supports the following formats:

  • CSV
  • XPT (SAS)
  • XLSX
  • HTML
  • Markdown

Examples

package main

import (
	"strings"

	gandalff "github.com/caerbannogwhite/gandalff"
)

func main() {
	data1 := `
name,age,weight,junior,department,salary band
Alice C,29,75.0,F,HR,4
John Doe,30,80.5,true,IT,2
Bob,31,85.0,F,IT,4
Jane H,25,60.0,false,IT,4
Mary,28,70.0,false,IT,3
Oliver,32,90.0,true,HR,1
Ursula,27,65.0,f,Business,4
Charlie,33,60.0,t,Business,2
Megan,26,55.0,F,IT,3
`

	gandalff.NewBaseDataFrame(gandalff.NewContext()).
		FromCsv().
		SetReader(strings.NewReader(data1)).
		Read().
		Select("department", "age", "weight", "junior").
		GroupBy("department").
		Agg(gandalff.Min("age"), gandalff.Max("weight"), gandalff.Mean("junior"), gandalff.Count()).
    Run().
		PrettyPrint(
      gandalff.NewPrettyPrintParams().
			  SetUseLipGloss(true))
}

// Output:
// โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
// โ”‚ department โ”‚ age     โ”‚ weight  โ”‚ junior  โ”‚ n     โ”‚
// โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
// โ”‚ String     โ”‚ Float64 โ”‚ Float64 โ”‚ Float64 โ”‚ Int64 โ”‚
// โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
// โ”‚ HR         โ”‚   29.00 โ”‚   90.00 โ”‚  0.5000 โ”‚ 2.000 โ”‚
// โ”‚ IT         โ”‚   25.00 โ”‚   85.00 โ”‚  0.5000 โ”‚ 4.000 โ”‚
// โ”‚ Business   โ”‚   27.00 โ”‚   65.00 โ”‚  0.5000 โ”‚ 2.000 โ”‚
// โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Community

You can join the Gandalff community on Discord.

Supported data types

The data types not checked are not yet supported, but might be in the future.

  • Bool
  • Bool (memory optimized, not fully implemented yet)
  • Int16
  • Int
  • Int64
  • Float32
  • Float64
  • Complex64
  • Complex128
  • String
  • Time
  • Duration

Supported operations for Series

  • Filter

    • filter by bool slice
    • filter by int slice
    • filter by bool series
    • filter by int series
  • Group

    • Group (with nulls)
    • SubGroup (with nulls)
  • Map

  • Sort

    • Sort (with nulls)
    • SortRev (with nulls)
  • Take

Supported operations for DataFrame

  • Agg

  • Filter

  • GroupBy

  • Join

    • Inner
    • Left
    • Right
    • Outer
    • Inner with nulls
    • Left with nulls
    • Right with nulls
    • Outer with nulls
  • Map

  • OrderBy

  • Select

  • Take

  • Pivot

  • Stack/Append

Supported stats functions

  • Count
  • Sum
  • Mean
  • Median
  • Min
  • Max
  • StdDev
  • Variance
  • Quantile

Dependencies

Built with:

TODO

  • Improve filtering interface.
  • Improve dataframe PrettyPrint: add parameters, optimize data display, use lipgloss.
  • Implement string factors.
  • SeriesTime: set time format.
  • Implement Set(i []int, v []any) Series.
  • Add Slice(i []int) Series (using filter?).
  • Implement memory optimized Bool series with uint64.
  • Use uint64 for null mask.
  • Optimize XPT reader/writer with float32.
  • Add url resolver to each reader.
  • Add format option to each writer.
  • JSON reader by records.
  • Implement chunked series.
  • Implement OpenAI interface.
  • Implement Parquet reader and writer.
  • Implement SPSS reader and writer.
  • Implement SAS7BDAT reader and writer (https://cran.r-project.org/web/packages/sas7bdat/vignettes/sas7bdat.pdf)