Skip to content

Latest commit

 

History

History
246 lines (168 loc) · 6.65 KB

README.md

File metadata and controls

246 lines (168 loc) · 6.65 KB

Data

An Open Data Portal for Cannabis and tools for Converting and Importing said data.

This tool is designed to import data from BioTrack, GrowFlow, LeafData or METRC into a common schema. Then you can run reports on this data-set using a front-end tool like Metabase or Apache Superset.

Converting XLS

If the source data is in XLS, first convert it into CSV. Then those files can be processed into JSON.

./bin/cli.php convert \
	--source-type XLS \
	--source-file ./path-to-some-file.xls \
	--output-path ./path-to-directory/

If the XLS files worksheet headers are not conforming to the OpenTHC/CSV specification edit them before conversion. It may also be useful to remove columns that are not significant.

Other data may also need to be cleaned up, setting the proper product weights. Removing rows that duplicate (eg Plants in Inventory and in a Plants worksheet).

Converting CSV

OpenTHC has defined some "standard" CSV file Headers to use for data importing. Typically, one can use existing software exports, update their header column names and then convert/import. The names are not case sensitive.

Section_GUID
Section_Name
Section_Type
Variety_GUID
Variety_Name
Variety_Type
Product_GUID
Product_Name
Product_Type
Package_Unit_Weight
Package_Unit_Volume
Inventory_GUID
Inventory_Name
Inventory_Unit_Count_Current | Inventory_Unit_Count | Unit_Count
Inventory_Unit_Count_Initial | Unit_Count_Initial
Crop_GUID
Crop_Name
Crop_Unit_Count_Current | Crop_Unit_Count | Unit_Count
Crop_Unit_Count_Initial | Unit_Count_Initial
Plant_GUID
Plant_Name

Crop and Plant are the same thing; we have both names to help thing about their usage uniquely. A Crop is an identifier for one or more plants (or a whole field) but Plant explicitly means just a single plant.

Then just convert the objects out of the source document.

/bin/cli.php convert \
	--output-path output-data/convert-example/
	--source-file=./source-data/Inventory.csv --source-type=product-csv

/bin/cli.php convert \
	--output-path output-data/convert-example/
	--source-file=./source-data/Inventory.csv --source-type=section-csv

/bin/cli.php convert \
	--output-path output-data/convert-example/
	--source-file=./source-data/Inventory.csv --source-type=variety-csv

/bin/cli.php convert \
	--output-path output-data/convert-example/
	--source-file=./source-data/Inventory.csv --source-type=inventory-csv

Loading BioTrack

We can load BioTrack data from either the API, CSV or SQL data.

./bin/import.php \
	--source-type=BioTrack-API \
	--source=$SOURCE_URI
	--object=crop \
	--output=OUTPUT_PATH

Loading Akerna/LeafData/MJFreeway/MJPlatform Data

Get the ZIP files or Table Dumps from LeafData. Put the LeafData CSV/Zip files in some origin directory (./source-data/leafdata). Extract and prepare them as necessary. Symlink those files into ./source-data/$NAME.csv, using the name required by the import script. Then remove the origin file, leaving an orphan symlink to track what's been completed. And clean up when all the way done.

See ./bin/leafdata/extract.sh for an automated process.

Loading CCRS

Get the ZIP file dumps from the LCB and store them in something like ./source-data-YYYY-MM-DD. Use the box-download.php helper. Then, from that source-data directory run ../bin/ccrs/import.php [OBJECT] | OUTPUT-[OBJECT].txt

Loading GrowFlow

The scripts in lib/Import/GrowFlow can import from the CSV exports or from collected HAR files.

./bin/import.php \
	--source=source-data/More_Vegging_Plants_export.csv  \
	--source-type=GrowFlow-CSV \
	--object=crop \
	--output=OUTPUT_PATH

Loading Cultivera

The scripts in lib/Import/Cultivera can import from the HAR or XLSX exports.

./bin/import.php \
	--source=source-data/Inventory.xlsx  \
	--source-type=Cultivera-XLSX \
	--object=inventory \
	--output=OUTPUT_PATH

BOM, UTF-16

Sometimes the files are like this. Some of the files are UTF-16-LE. Our pre-processing scripts will normalise them.

iconv

  • Use iconv -f UTF-16LE -t UTF-8 iconv -f UTF-16LE -t UTF-8 -o

Extracting

The files may be delivered as zip, and internally they are labeled as csv files. They may use a comma, they may use a TAB.


Data Information

Product Categories: All the Standard Numbers +Joints, BHO

Group by Sizes But find the Common Sizes/Size Groups First!

Facet: License Facet: Product Facet: Variety

Retail Detail

Data!

The data scripts make TSV files from the SQL. Each TSV answers one question built around these "base facets"

  • Category: Plants, Wholesale, Retail, QA
  • An Inventory Type (5, 13, 28, etc)
  • A Date Delta - Daily, Weekly, Monthly, Q, Y

The answers are "business questions" - like price per gram or volume Details are in the ./data-*.php scripts, the parameter is 'a'

Each Data category is answered by a PHP script ./data-*.php Those scripts are called by the wrapper ./data.php, like this:

./data.php wholesale d=w t=28 a=ppg-avg

That will in-turn execute code in the ./data-wholesale.php and pass the parameters in as $_GET values

Those $_GET values are used to control logic that will assemble the SQL in the form of

SELECT (value-expression)
FROM (necessary-tables)
	JOIN (may-need-one-or-more-of-these)
WHERE (desired-constructed-filter)
	AND (additional-conditions-we-discover-to-remove-crap-data)

And then iterate the Date Delta spewing the results into a TSV file Actually, the PHP scripts output to STDOUT which is redirected to capture the output.

Shell Wrappers

The shell wrappers (data-*.sh) are designed to iterate over the "base facets"

foreach Date
	foreach Type
		foreach OtherThing
			./data.php [category] d=Date t=Type a=ppg-avg \
				> ./webroot/pub/data/[predictable-path-here-somehow]

We kind of know these to be the base queries for the data that people want. So we can pre-build these.

Time to First Harvest

Time from Harvest to Cure

Store-level summary by inventory type of what they paid for X and what they charged for X where X is the # of units or grams sold during the time period in question carry level of detail that allows drilldown to a specific processor or producer

This should enable a bunch of stuff I'll add to the document later today

Markup by County

Chocolate vs Cookies Regions Strains on Sales

Business Intelligence Tools