Quick start

CSVInspector - A graphical interactive tool to inspect and process CSV files.

License: GPLv3

Quick start

CSVInspector needs JRE8 (with embedded JavaFX) or JRE11 and Python 3.8:

$ mvn clean install
$ PYTHONPATH=lang/python:$PYTHONPATH /path/to/jre8/java -jar target/csv_inspector-0.0.1-SNAPSHOT.jar

Code samples are in the menu Help > Snippets.

What it looks like

Use case

CSVInspector main goal is to help user performing repetitive one shot tasks on small data sets. Those tasks may be data aggregation, table join, column selection or creation, ... CSVInspector provides the show method that displays the current stat of a data set.

(If you work on stable data or if the data sets are big, you should consider using SQL.)

Typical use case is:

load one or two light csv files (< 10 k lines);
aggregate some columns in both tables;
add some columns;
join files;
save the result.

Overview

CSVInspector is a very basic client/server application:

The server is a Python module that wraps some features of Pandas to handle CSV data.
The client is a Kotlin/JavaFX GUI that sends Python scripts to the server and displays the results.

Install & run

Python (won't work for now in a virtual env):

$ pushd lang/python
$ pip install --user -r requirements.txt
$ popd

Kotlin:

$ mvn clean install
$ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar target/csv_inspector-1.0-SNAPSHOT.jar

Usage

Code sample

#!/usr/bin/env python3.8

from csv_inspector import *

data = read_csv("fixtures/datasets-2020-02-22-12-33.csv") # this will open a MetaCSV panel to create/save the .mcsv file
data.show() # show the data
# do whatever you want here
data.show()
data.save_as("fixtures/datasets-2020-02-22-12-33-new.csv")

Main commands

The wrapper provides the following instructions:

`read_csv(path.csv)`

path.csv is the path to a csv file.

If the MetaCSV file path.mcsv exists, return a Data object. Else, detects the encoding, csv format and column types of path.csv and generate a sample MetaCSV file that may be edited and saved. (Will return a Data object on next call.)

`data.show()`

Shows the Data object in a window.

`data.stats()`

Shows the stats of the Data object in a window.

`data.copy()`

Returns a copy of the Data object in a window.

`data.save_as(path.csv)`

path.csv is the path to a csv file.

Saves the Data object to a file.

Other Commands

Note the square brackets.

`data[x].create(func, col_name, [col_type, [index]])`

Create a new col

x is an index, slice or tuple of slices/indices of column_index
func is the function to apply to x values
col_name is the name of the new column
col_type is the type of the new column
index is the index of the new column

`data[x].drop()`

Drop the indices of the handle and select the other indices.

x is an index, slice or tuple of slices/indices

`data[x].filter(func)`

Filter data on a function.

x is an index, slice or tuple of slices/indices.
func is a function that takes the x values and returns a boolean

`g = data[x].grouper()`

Create a grouper on some rows.
g = data[x].grouper()
g[y].agg(func)
g.group()

x is the index, slice or tuple of slices/indices of the rows
y is the index, slice or tuple of slices/indices of the aggregate columns
func is aggregate function

`data1[x].ijoin(data2[y], func)`

Make an inner join between two data sets.

x is the index, slice or tuple of slices/indices of the key
data2 is another Data object
y is the index, slice or tuple of slices/indices of the other key
func is the function to compare the x and y values

`data1[x].ljoin(data2[y], func)`

Make an inner join between two data sets.

x is the index, slice or tuple of slices/indices of the key
data2 is another Data object
y is the index, slice or tuple of slices/indices of the other key
func is the function to compare the x and y values

`data[x].merge(func, col_name, [col_type])`

Create a new col by merging some columns. Those columns are consumed during the process.

x is an index, slice or tuple of slices/indices of column_index
func is the function to apply to x values
col_name is the name of the new column
col_type is the type of the new column

`data[x].move_after(idx)`

Move some column_group after a given index.

x is an index, slice or tuple of slices/indices of column_index
idx is the destination index

`data[x].move_before(idx)`

Move some column_group before a given index.

x is an index, slice or tuple of slices/indices of column_index
idx is the destination index

`data1[x].ojoin(data2[y], func)`

Make an outer join between two data sets.

x is the index, slice or tuple of slices/indices of the key
data2 is another Data object
y is the index, slice or tuple of slices/indices of the other key
func is the function to compare the x and y values

`data[x].rename(names)`

Rename one or more columns

x is the index, slice or tuple of slices/indices of the key
names is a list of new names

`data1[x].rjoin(data2[y], func)`

Make an right join between two data sets.

x is the index, slice or tuple of slices/indices of the key
data2 is another Data object
y is the index, slice or tuple of slices/indices of the other key
func is the function to compare the x and y values

`data[x].rsort(func)`

Sort the rows in reverse order.

x is the index, slice or tuple of slices/indices of the key
func is the key function

`data[x].select()`

Select the indices of the handle and drop the other indices.

x is an index, slice or tuple of slices/indices

`data[x].rsort(func)`

Show the first rows of this DataHandle. Expected format: CSV with comma

sort(self, func=None, reverse=False) Sort the rows.

x is the index, slice or tuple of slices/indices of the key
func is the key function

`data.stats()`

Show stats on the data

`data1[x].swap(data2[y])

swap(self, other_handle: 'DataHandle') Swap two handles. Those handles may be backed by the same data or not.

x and y are indices, slices or tuples of slices/indices

`data[x].update(func)`

Update some column using a function.

x is an index
func is a function of data[x] (use numeric indices)

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
fixtures		fixtures
lang/python		lang/python
src		src
.csv_inspector.properties		.csv_inspector.properties
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
export_doc.py		export_doc.py
pom.xml		pom.xml

License

jferard/CSVInspector

Folders and files

Latest commit

History

Repository files navigation

Quick start

What it looks like

Use case

Overview

Install & run

Usage

Code sample

Main commands

read_csv(path.csv)

data.show()

data.stats()

data.copy()

data.save_as(path.csv)

Other Commands

data[x].create(func, col_name, [col_type, [index]])

data[x].drop()

data[x].filter(func)

g = data[x].grouper()

data1[x].ijoin(data2[y], func)

data1[x].ljoin(data2[y], func)

data[x].merge(func, col_name, [col_type])

data[x].move_after(idx)

data[x].move_before(idx)

data1[x].ojoin(data2[y], func)

data[x].rename(names)

data1[x].rjoin(data2[y], func)

data[x].rsort(func)

data[x].select()

data[x].rsort(func)

data.stats()

`data1[x].swap(data2[y])

data[x].update(func)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`read_csv(path.csv)`

`data.show()`

`data.stats()`

`data.copy()`

`data.save_as(path.csv)`

`data[x].create(func, col_name, [col_type, [index]])`

`data[x].drop()`

`data[x].filter(func)`

`g = data[x].grouper()`

`data1[x].ijoin(data2[y], func)`

`data1[x].ljoin(data2[y], func)`

`data[x].merge(func, col_name, [col_type])`

`data[x].move_after(idx)`

`data[x].move_before(idx)`

`data1[x].ojoin(data2[y], func)`

`data[x].rename(names)`

`data1[x].rjoin(data2[y], func)`

`data[x].rsort(func)`

`data[x].select()`

`data[x].rsort(func)`

`data.stats()`

`data[x].update(func)`

Packages