Skip to content

altworx/ecsv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Erlang NIF CSV parser and writer

Copyright (c) 2016 Altworx

Version: 0.2.0

Authors: Hynek Vychodil (hynek.vychodil@altworx.com).

See also: ecsv.

ecsv is fast NIF parser and writer based on libcsv

The main purpose of the module is the fast parsing of CSV data in GB volumes. This requirement leads to the necessity of stream oriented API (see ecsv:parse_stream/4).

Application requires development files for libcsv version 3.0.x (tested with 3.0.3). For example in debian you need run

apt-get install libcsv3 libcsv-dev

For building you need installed rebar3.

rebar3 compile

And run tests using

rebar3 eunit

For interactive Erlang shell use

rebar3 shell

Try as starter

1> ecsv:parse(<<"Hello,World">>).
[{<<"Hello">>,<<"World">>}]

See ecsv for API description.

ecsv:write/1 and ecsv:write_lines/1 doesn't perform as expected. For an unknown reason, it is slower than pure Erlang implementation when compiled using HiPE and used in a real application. On the other hand, ecsv:parse_stream/5 met expectations and performs around 100MB/s on commodity HW (i7 2.6GHz).

Current implementation uses enif_make_new_binary() for parsed fields. From our experience, this call allocates small binaries on the process heap in contrast to enif_alloc_binary() always allocates on the binary heap which is slightly slower. Fields from currently parsed line is kept between NIF calls in an own environment which could lead to bad behavior when parse_raw/3 is called with very short binaries and there is a long row with many fields. In an extreme case, one long line with short or in worst case empty fields will lead to quadratic behavior if fed one byte a time. If you would like parse CSV file with more than 20kB rows with thousands of fields you should probably use another parser or fix the issue.

Modules

ecsv