Skip to content

Latest commit

 

History

History
160 lines (117 loc) · 8.68 KB

README.md

File metadata and controls

160 lines (117 loc) · 8.68 KB

Goodreads

Books

Problemsolver

I'd call those books "Problemsolver". You might not read them from front to back but as a reference for specific problems.

Non-IT

These are non IT specific books, but touch essential things in our business. Either the way we work together or address some common misconceptions, for example sleep being an optional aspect to live:

Work in general

Essays and other articles

Best practices

Architecture

Concepts

Philosophy

My library

all.csv contains an incomplete list of books in my library. The CSV file has 6 columns separated by ,.

Name Description
Author One or more authors, last name, first name separated by &
Title Title of the book
Type R, S, C (Roman (Fiction), Sachbuch (Non-Fiction), Comic)
State R, U (Read, Unread)
Last read on Last time I read the book
Emoji rating My very subjective rating

Interacting with the CSV file

Using SQLite to query the database

sqlite3 :memory: \
 '.mode csv' \
 '.separator ,' \
 '.import "|curl -s https://raw.githubusercontent.com/michael-simons/goodreads/master/all.csv" books' \
 "SELECT title FROM books WHERE author like '%King%' ORDER by title"

Using DuckDB

DuckDB is an incredible versatile, in-process SQL OLAP database management system and while most likely total overkill for the small dataset, it's fun. Install and start DuckDB:

-- Required to directly import the csv file from Github
INSTALL httpfs;
-- Just query the dataset
SELECT DISTINCT author FROM read_csv('https://raw.githubusercontent.com/michael-simons/goodreads/master/all.csv', header=true, auto_detect=true);
-- Create a table named books
CREATE TABLE books AS SELECT * FROM read_csv('https://raw.githubusercontent.com/michael-simons/goodreads/master/all.csv', header=true, auto_detect=true);
-- Query and manipulate as needed
-- Save the result (overwriting all.csv and sorting it on the way)
COPY (SELECT * FROM books ORDER BY author COLLATE de ASC, title COLLATE de ASC) TO 'all.csv' WITH (header true);

Of course, a one shot query like the one above printing all books by Stephen King, is possible too:

duckdb --noheader --list -s "
SELECT title FROM read_csv('https://raw.githubusercontent.com/michael-simons/goodreads/master/all.csv', header=true, auto_detect=true)
WHERE author like '%King%' ORDER by title"

Tip

Shameless self-advertising: I wrote a book about DuckDB with a couple of friends, called DuckDB in Action and it's available at Manning or on Amazon. If you like some nice SQL, Python and Java, have a look.

Using Neo4j

I used to run a browseable, interactive list of all books on Heroku using a free Neo4j AuraDB instance, but Heroku stopped offering a free service a while ago. The repository containing the application code (based on Quarkus) is still available: neo4j-aura-quarkus-graphql project. Follow the instruction in the README.

The essential query to import the CSV into Neo4j looks like this

LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/michael-simons/goodreads/master/all.csv' AS row FIELDTERMINATOR ',' 
MERGE (b:Book {
  title: trim(row.Title)
})
SET b.type = row.Type, b.state = row.State
WITH b, row
UNWIND split(row.Author, '&') AS author
WITH b, split(author, ',') AS author
WITH b, ((trim(coalesce(author[1], '')) + ' ') + trim(author[0])) AS author
MERGE (a:Person {
  name: trim(author)
})
MERGE (a)-[r:WROTE]->(b)
WITH b, a
WITH b, collect(a) AS authors
RETURN id(b) AS id, b.title, b.state, authors

Using xsv

xsv is a powerful tool for manipulating CSV. Here's an example how to get a list of unique authors

curl -s https://raw.githubusercontent.com/michael-simons/goodreads/master/all.csv | \
  xsv select -d "," Author |\
  uniq

Using the webui

If you have JBang installed you can start an admin "UI" like this:

jbang admin.java

Access the page at localhost:8080.