CSVInspector - A graphical interactive tool to inspect and process CSV files.
Copyright (C) 2020 J. Férard https://github.com/jferard
License: GPLv3
CSVInspector needs JRE8 (with embedded JavaFX) or JRE11 and Python 3.8:
$ mvn clean install
$ PYTHONPATH=lang/python:$PYTHONPATH /path/to/jre8/java -jar target/csv_inspector-0.0.1-SNAPSHOT.jar
Code samples are in the menu Help > Snippets.
CSVInspector main goal is to help user performing repetitive one shot tasks on small data sets. Those tasks may be data aggregation, table join, column selection or creation, ...
CSVInspector provides the show
method that displays the current stat of a data set.
(If you work on stable data or if the data sets are big, you should consider using SQL.)
Typical use case is:
- load one or two light csv files (< 10 k lines);
- aggregate some columns in both tables;
- add some columns;
- join files;
- save the result.
CSVInspector is a very basic client/server application:
- The server is a Python module that wraps some features of Pandas to handle CSV data.
- The client is a Kotlin/JavaFX GUI that sends Python scripts to the server and displays the results.
Python (won't work for now in a virtual env):
$ pushd lang/python
$ pip install --user -r requirements.txt
$ popd
Kotlin:
$ mvn clean install
$ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar target/csv_inspector-1.0-SNAPSHOT.jar
#!/usr/bin/env python3.8
from csv_inspector import *
data = read_csv("fixtures/datasets-2020-02-22-12-33.csv") # this will open a MetaCSV panel to create/save the .mcsv file
data.show() # show the data
# do whatever you want here
data.show()
data.save_as("fixtures/datasets-2020-02-22-12-33-new.csv")
The wrapper provides the following instructions:
path.csv
is the path to a csv file.
If the MetaCSV file
path.mcsv
exists, return aData
object. Else, detects the encoding, csv format and column types ofpath.csv
and generate a sample MetaCSV file that may be edited and saved. (Will return aData
object on next call.)
Shows the
Data
object in a window.
Shows the stats of the
Data
object in a window.
Returns a copy of the
Data
object in a window.
path.csv
is the path to a csv file.
Saves the
Data
object to a file.
Note the square brackets.
Create a new col
x
is an index, slice or tuple of slices/indices of column_indexfunc
is the function to apply tox
valuescol_name
is the name of the new columncol_type
is the type of the new columnindex
is the index of the new column
Drop the indices of the handle and select the other indices.
x
is an index, slice or tuple of slices/indices
Filter data on a function.
x
is an index, slice or tuple of slices/indices.func
is a function that takes thex
values and returns a boolean
Create a grouper on some rows.
g = data[x].grouper() g[y].agg(func) g.group()
x
is the index, slice or tuple of slices/indices of the rowsy
is the index, slice or tuple of slices/indices of the aggregate columnsfunc
is aggregate function
Make an inner join between two data sets.
x
is the index, slice or tuple of slices/indices of the keydata2
is anotherData
objecty
is the index, slice or tuple of slices/indices of the other keyfunc
is the function to compare thex
andy
values
Make an inner join between two data sets.
x
is the index, slice or tuple of slices/indices of the keydata2
is anotherData
objecty
is the index, slice or tuple of slices/indices of the other keyfunc
is the function to compare thex
andy
values
Create a new col by merging some columns. Those columns are consumed during the process.
x
is an index, slice or tuple of slices/indices of column_indexfunc
is the function to apply tox
valuescol_name
is the name of the new columncol_type
is the type of the new column
Move some column_group after a given index.
x
is an index, slice or tuple of slices/indices of column_indexidx
is the destination index
Move some column_group before a given index.
x
is an index, slice or tuple of slices/indices of column_indexidx
is the destination index
Make an outer join between two data sets.
x
is the index, slice or tuple of slices/indices of the keydata2
is anotherData
objecty
is the index, slice or tuple of slices/indices of the other keyfunc
is the function to compare thex
andy
values
Rename one or more columns
x
is the index, slice or tuple of slices/indices of the keynames
is a list of new names
Make an right join between two data sets.
x
is the index, slice or tuple of slices/indices of the keydata2
is anotherData
objecty
is the index, slice or tuple of slices/indices of the other keyfunc
is the function to compare thex
andy
values
Sort the rows in reverse order.
x
is the index, slice or tuple of slices/indices of the keyfunc
is the key function
Select the indices of the handle and drop the other indices.
x
is an index, slice or tuple of slices/indices
Show the first rows of this DataHandle. Expected format: CSV with comma
sort(self, func=None, reverse=False) Sort the rows.
x
is the index, slice or tuple of slices/indices of the keyfunc
is the key function
Show stats on the data
swap(self, other_handle: 'DataHandle') Swap two handles. Those handles may be backed by the same data or not.
x
andy
are indices, slices or tuples of slices/indices
Update some column using a function.
x
is an indexfunc
is a function ofdata[x]
(use numeric indices)