-
Notifications
You must be signed in to change notification settings - Fork 123
Docs: Loading Data
You can load data to Keshif from :
- Google Sheets
- Text File
- On Google Drive
- On Dropbox
- File on your webserver
Keshif can be used with the following data file types:
- CSV / TSV
- JSON
- XML
- Any other file type that you can load and parse in JavaScript. See Custom Data Loading
Hint: The dataset explorer at the frontpage indexes demos by file type and resource. Filter by data source to find example source code on how to apply a specific file loading approach.
Note: You cannot currently use multiple data sources or file types in a browser unless you use custom data loading function.
To load data to keshif, you can
- Use the API to create a keshif browser, or
- Use the experimental browser authoring page.
Each browser specification object must have a "source" key that describes the data for the keshif browser.
// Using Google Sheets, single table (sheet)
source: {
gdocId: '0Ai6LdDWgaqgNdFlZRk83NmpDLVc2cllCRjhpdkNYOWc',
tables: "Demos"
}
// Using Google Sheets, single table (sheet), with a custom link source URL
source: {
url: "http://www.bloomberg.com/infographics/2014-08-21/top-data-breaches.html",
gdocId: '14vd0RHPy-JyetjppxJ4R5UywaeszV0HR599MX91KkjI',
tables: "Breaches"
}
// Using Google Sheets, multiple tables, focusing on Publications
source: {
gdocId: '0Ai6LdDWgaqgNdEp1aHBzSTg0T0RJVURqWVNGOGNkNXc',
sheets: [ "Publications", "Venues", "Authors", "Keywords", "VenueTypes", "AuthorTypes" ]
}
// Using custom callback
source: { callback: function(browser){ ... } },
// Using Google Sheets, table with customized id column
source: {
gdocId: '1zmtJuAfh2foJD1Ha4Ppiuisq4Wx5DDUt61zwiCjf500',
tables: {name:"Companies", id:'Stock'}
}
// Using locally hosted JSON file, with custom URL for attribution on interface
source: {
url: "http://www.chromestatus.com/features",
dirPath: "./data/",
fileType: 'json',
tables: "chromefeatures"
}
// Using Google Drive hosted csv file
source: {
url: 'http://www.consumerfinance.gov/complaintdatabase',
dirPath: 'https://ca480fa8cd553f048c65766cc0d0f07f93f6fe2f.googledrive.com/host/0By6LdDWgaqgNfmpDajZMdHMtU3FWTEkzZW9LTndWdFg0Qk9MNzd0ZW9mcjA4aUJlV0p1Zk0/',
fileType: 'csv',
tables: "Consumer_Complaints_2015"
}
// Using Dropbox hosted json file
source: {
url: "http://ccnmtl.columbia.edu/portfolio/exhibit_view.html",
dirPath: 'https://dl.dropboxusercontent.com/u/1951639/',
fileType: 'json',
tables: "cnmtl_portfolio"
}
Data tables are described using tables key in your source object. The tables can be a single table description, or an array of descriptions.
Note: Earlier, the key name was sheets instead of tables. Both keys are currently supported to describe the table data, however, in future, sheets key support may be removed.
If you only specify a string as data description, it is used as the table name parameter.
If an array of descriptions is used, the first table description is expected to hold primary entity (shown/filtered on the list). The rest of the tables can be used to look-up / link to information in other tables.
# name
String - Table name. If using Google Sheet, it is the sheet name shown on bottom. If using text files, it is the file name without extension.
# id
String - The column name that holds the unique id for each record in the table. Default is 'id'. If your table has a unique descriptor under a different column name, specify the column name here.
If you are using a Google Sheet as data source, include the following script in your html page:
<script type="text/javascript" src="http://www.google.com/jsapi"></script>
API Parameters:
# gdocId
String - ID of your google document.
# query
String - Google Sheet Query, as documented on Google Sheet query language. Example: "select A,B,D".
Access control: Set share setting of your document to whoever you want to allow access (read) to your data on the webpage. You can make your spreadsheet public, or you can only share it with a specific group of people.
Each data table must be in a separate file. The file url for each table is generated using dirPath + table name + "." + fileType.
# dirPath
String - The directory path which stores the sheet files.
# fileType
String - File extension/type. Currently supported data types are 'csv' (comma separated file), 'tsv' (tab separated file), and 'json'.
The first row (line) must only include column names / headers.
Include the papaparse JavaScript library in your html page. For example,:
<script type="text/javascript" src="../js/papaparse.min.js" charset="utf-8"></script>
PapaParse is included under keshif/js/ directory.
If you want to use automated JSON loading, the source file should be an array at the top level, with a list of objects, each describing a record in your database. For example:
[ {name:'Joe', age:23} , {name:'Mary', age:25} , {name:'Nick', age:27} , (...) ]
Loading file from the cloud just requires setting the dirPath parameter correctly to point to the URL that's made available by the cloud file service.
For Google Drive, ... (more info)...
For Dropbox, you need to copy your Public folder URL. See here for more info.
To define your own function to load/parse your own data, use callback key.
# callback
Function - Callback function for data source. The first parameter is a pointer to browser object.
You can use ajax to load data files, and parse them to keshif tables.
Through this function, you can load custom JSON / XML files. For examples, please see existing demos.
Mehmet Adil Yalcin - HCIL - University of Maryland, College Park