-
Notifications
You must be signed in to change notification settings - Fork 6
Second Tutorial Part 9: JSON and CSV Export
Previous | Main | Next |
---|---|---|
<-- Part 8 | Main | Part 10 --> |
An important feature of Tilda is its built-in support for various output formats. Output maps are the way to automate that process and currently supports 3 forms (CSV, JSON and NVP) and even includes a sync protocol implementation.
As defined in Output maps, the Tilda JSON definition is straightforward: you give a name to your mapping (which could be ""), you specify the columns you want included, and then the types of mappings you want. All of this results in additional APIs and capabilities both in the DATA and FACTORY classes. Let's go through an example and create a new view to capture all the tests taken, with their questions and answers:
- The view joins
Form
,User
,TestAnswer
andTestQuestionAnswer
. - The table
TestQuestionAnswer
really provides meta-data for questions and answers and doesn't have a direct foreign key to, so we have to explicitly declare the join. - The where-clause excludes forms or answers that have been deleted.
- We create two queries to lookup forms by their primary key or look up everything
{ "name": "TestAnswer_View"
,"description": "A pivoted view of SAT_01 forms"
,"columns":[
{ "sameAs": "Form.refnum" , "name":"formRefnum" }
,{ "sameAs": "Form.type" , "name":"formType" }
,{ "sameAs": "Form.fillDate", "name":"formFillDate" }
,{ "sameAs": "User.refnum" , "name":"userRefnum" }
,{ "sameAs": "User.id" , "name":"userId" }
,{ "sameAs": "TestAnswer.timeMillis" }
,{ "sameAs": "TestAnswer.correct" }
,{ "sameAs": "TestQuestionAnswer.questionLabel"}
,{ "sameAs": "TestQuestionAnswer.questionSeq"}
,{ "sameAs": "TestQuestionAnswer.answerLabel"}
]
,"joins": [
{ "object": "TestQuestionAnswer"
, "joinType": "LEFT"
, "on": [ { "db":"*", "clause":"TestQuestionAnswer.\"questionId\" = TestAnswer.\"questionId\" and TestQuestionAnswer.\"answerId\" = TestAnswer.\"answerId\"" }
]
}
]
,"subWhereX":{
"clause":[" Form.deleted is null"
,"and TestAnswer.deleted is null"
]
,"description":["Look at all forms and answers that have not been deleted."
]
}
,"queries": [
{ "name" :"FormByQuestionSeq"
,"description": "blah blah"
,"from" : []
,"wheres": [
{ "db": "*", "clause": "formRefnum=?()" }
]
,"orderBy": ["questionSeq asc" ]
}
,{ "name" :"AllByFilldateDesc"
,"description": "blah blah"
,"from" : []
,"wheres": [
{ "db": "*", "clause": "1=1" }
]
,"orderBy": ["formFillDate desc", "formRefnum" ]
}
]
}
This will generate the Java objects to support that view, including the Data and Factory classes. That view will have a lookupFormByQuestionSeq
method that will take a formRefnum
as its parameter and a lookupAllByFilldateDesc
method with no specific parameter. Such a view would be useful for example for a service allowing a UI to display the details about a test. You can imagine how this view returns one row per question in the test and its associated answer, along with some meta-data about the test itself (the "Form"), and the user.
Now, how would you communicate that data back to a browser for example? You'd need to output the results from that view to JSON. How would you create a dump of that view to be viewed in Excel for example? You'd need to output the results to a CSV file. Let's declare the output maps:
,"outputMaps":[
{ "name": "", "columns": ["*"], "outTypes":["CSV","JSON"] }
]
What are the APIs generated as a result of this definition for the Factory class?
public static String getCSVHeader();
public static void toCSV(Writer out,TestAnswer_View_Data obj)
throws IOException;
public static void toCSV(Writer Out, List<TestAnswer_View_Data> L, boolean includeHeader)
throws IOException;
public static void toJSON(Writer Out, List<TestAnswer_View_Data> L, String lead ,boolean fullList)
throws IOException;
public static void toJSON(Writer Out, TestAnswer_View_Data obj, boolean fullObject)
throws IOException;
In addition, the Data class now implements tilda.interfaces.JSONable
to allow for dynamic behaviors will cover below.
We are not going to go into servlet territory for this example and will simply extend our command line test program with a way to export the data as CSV or JSON. We'll also provide a simple piece of code to generate sample dummy data for this to work. Let's first look at what the main code looks like if we really want to do things the right way:
- We will use an
ObjectProcessor
calledMyOutputProcessor
because we want to be able to process large data sets. Let's say we have over 10 million test answers! ObjectProcessors provide a streaming-based interface to handle one record at a time. - We even create a simple enum called
OutputMode
to select JSON vs. CSV output in a typed compile-time fashion. - We'll hack together something very simple here to output something in JSON, and then in CSV and use the generated APIs. Do note that Tilda's utility library does provide some simple helpers for JSON output and you are welcome to use them, but there are more mature and richer open source libraries out there like GSON (which Tilda uses internally and is packaged as a dependency), or Jackson.
private static void Test4a(Connection C)
throws Exception
{
PrintWriter out = new PrintWriter(System.out);
MyOutputProcessor P = new MyOutputProcessor(out, OutputMode.JSON);
out.println("{ \"data\":[");
TestAnswer_View_Factory.lookupWhereAllByFilldateDesc(C, P, 0, -1);
out.println("]}");
}
private static void Test4b(Connection C)
throws Exception
{
PrintWriter out = new PrintWriter(System.out);
MyOutputProcessor P = new MyOutputProcessor(out, OutputMode.CSV);
out.println(TestAnswer_View_Factory.getCSVHeader());
TestAnswer_View_Factory.lookupWhereAllByFilldateDesc(C, P, 0, -1);
}
The ObjectProcessor
class should then be familiar structurally, and here, we usetoCSV
or toJSON
.
private enum OutputMode { JSON, CSV };
private static class MyOutputProcessor implements ObjectProcessor<TestAnswer_View_Data>
{
public MyOutputProcessor(PrintWriter out, OutputMode mode)
{
_Out = out;
_Mode = mode;
}
protected PrintWriter _Out;
protected OutputMode _Mode;
@Override
public boolean process(int count, TestAnswer_View_Data obj)
throws Exception
{
if (_Mode == OutputMode.JSON)
{
if (count != 0)
_Out.print(" ,");
TestAnswer_View_Factory.toJSON(_Out, obj, true);
_Out.println();
}
else if (_Mode == OutputMode.CSV)
TestAnswer_View_Factory.toCSV(_Out, obj);
else
throw new Exception("Output mode was specified as "+_Mode.name()+" which the code is not handling.");
return true; // or false if failure and need to abort.
}
};
Now, if you are in a rush, or dealing with a simpler data set, you can take a shortcut:
- use the basic version of the lookupWhere method which returns a list, i.e., all objects processed will be help in memory, so this is truly not for general large data set cases, but can suffice in many web application scenarios where for example, a limited list of results are returned in a search (i.e., top 100 results etc... with pagination).
- use the dynamic features of
JSONAble
interface throughJSONUtil
. That allows you to call any toJSON method dynamically by just specifying the name of the output map (e.g., "" here).
private static void Test4c(Connection C)
throws Exception
{
PrintWriter out = new PrintWriter(System.out);
ListResults<TestAnswer_View_Data> L = TestAnswer_View_Factory.lookupWhereAllByFilldateDesc(C, 0, -1);
out.println("{");
// output an array element called "data" which is the first element in the JSON data we are exporting, so no comma.
// invoke dynamically the toJSON method
// provide a header string for indenting
JSONUtil.print(out, "data", "", true, L, " ");
out.println("}");
}
🎈 NOTE: If we had defined an output map called Xyz
and wanted to use it, we'd either call the toJSONXyz
method or invoke JSONUtil.print
with the map name of "Xyz".
,"outputMaps":[
{ "name": "", "columns": ["*"], "outTypes":["CSV","JSON"] }
,{ "name": "Xyz", "columns": ["formRefnum", "fillDate", "id"], "outTypes":["JSON"] }
]
TestAnswer_View_Factory.toJSONXyz(_Out, obj, true);
JSONUtil.print(out, "data", "Xyz", true, L, " ");
🎈 NOTE: This example highlights an important technique. Imagine that you were building a web UI and wanted to have a search producing a list, or maybe an auto-complete widget. You'd want to send back a summary to populate the search UI or the widget, not the full data. Creating alternate versions of your output maps allow you to customize the data load to fit your needs: a summary set of fields just for a search list, vs the full details of a record. Of course, the astute reader will realize that we would be sending back duplicate records in that case because the 3 columns selected are not part of the identity of the record. This is just shown here for illustration purpose and Tilda will not flag if an object identity is not represent in full in the list of columns selected for an output map.
Another important use case comes up when dealing with a mobile app with off-line storage, or a data warehouse project where we needed to send incremental CSV files etc... How can we implement an interface that handles something akin to a sync protocol so that a client may understand which records are new, been updated or deleted etc...? Tilda implements direct support for such use cases by adding the attribute "sync":true
to the output map definition.
,{ "name": "Sync", "columns": ["*"], "outTypes":["JSON","CSV"], "sync":true }
When we Gen though, we'll get an error:
1 - OBJECT 'tilda_tutorial.data.TILDATUTORIAL.TestAnswer_View' is defining a 'sync' Output map 'Sync' but the parent object is not OCC.
In order for Sync to work, your object/view must be OCC (Optimistic Concurrency Control). It is not the purpose of this tutorial to get into those details and you can read more on Wikipedia for example. In short, a common way to implement OCC is to use lifecycle timestamps such as created/lastUpdated/deleted to double check that an object you got once hasn't been touched since. For Tilda, what this means is simply the existence of the created
, lastUpdated
and deleted
timestamp columns that Tilda generates automatically for most objects. For views however, you need to specify that.
For our scenario at the present, let's make things simple and assume that the Form's timestamps are good enough. In practice, that may not be really the case as a form answer may get edited without touching the form's anchor record. In practice though, one would write code such that if a question/answer is added/changed, the timestamp for the form, acting as a container of sorts for all its questions, would get updated too. So let's assume that's what we are doing here and update the view.
,"columns":[
{ "sameAs": "Form.refnum" , "name":"formRefnum" }
,{ "sameAs": "Form.type" , "name":"formType" }
,{ "sameAs": "Form.fillDate", "name":"formFillDate" }
,{ "sameAs": "Form.created" }
,{ "sameAs": "Form.lastUpdated" }
,{ "sameAs": "Form.deleted" }
,{ "sameAs": "User.refnum" , "name":"userRefnum" }
,{ "sameAs": "User.id" , "name":"userId" }
,{ "sameAs": "TestAnswer.timeMillis" }
,{ "sameAs": "TestAnswer.correct" }
,{ "sameAs": "TestQuestionAnswer.questionLabel"}
,{ "sameAs": "TestQuestionAnswer.questionSeq"}
,{ "sameAs": "TestQuestionAnswer.answerLabel"}
]
Now, the view is OCC and the output map definition works as expected. The APIs generated however will be slightly different from the previous cases:
// Methods on the Factory class as expected from the previous sections above
public static void toJSONSync(Writer out, List<TestAnswer_View_Data> L, String lead
, boolean fullList) throws IOException
public static void toJSONSync(Writer Out, TestAnswer_View_Data obj
, boolean fullObject) throws IOException
public static void toCSVSync(Writer out, List<TestAnswer_View_Data> L
, boolean includeHeader) throws IOException
public static void toCSVSync(Writer out, TestAnswer_View_Data obj) throws IOException
// Additional methods on the Factory class with Sync support
public static void toJSONSync(Writer out, List<TestAnswer_View_Data> L, String lead
, boolean fullList, ZonedDateTime lastSync) throws IOException
public static void toJSONSync(Writer Out, TestAnswer_View_Data obj, boolean fullObject
, ZonedDateTime lastSync) throws IOException
public static void toCSVSync(Writer out, List<TestAnswer_View_Data> L, boolean includeHeader
, ZonedDateTime lastSync) throws IOException
public static void toCSVSync(Writer out, TestAnswer_View_Data obj
, ZonedDateTime lastSync) throws IOException
As you can see, the additional methods simply add an extra parameter to capture the date-time value of the last time the data was synched. What the method will then do is add a new attribute __sync
to the output with values as follows:
- If
created
>lastSync
anddeleted
>lastSync
then the object will be marked asG
for "ghost" (i.e., for a client, it's like the object never existed). - If
created
>lastSync
then the object will be marked asN
for "new". - If
deleted
>lastSync
then the object will be marked asD
for "deleted". - If
lastUpdated
>lastSync
then the object will be marked asU
for "updated". - Else, the value is
O
for "old" (i.e., the object hasn't changed)
🎈 NOTE: The output methods do not do any filtering of the data... If you don't need to show "deleted" records, then filter them out in your where-clause. It's the responsibility of the developer to define the filtering logic as part of the initial lookup query, and the sync-enabled toCSV/toJSON to mark each record appropriately and output them.
This is a more advanced feature which is used in scenarios where you need some generic representation of an object, for example, to feed to an analytics engine, or manipulate using meta-driven logic. In this case, it's not somuch about output to some external format, but output to a programmatic representation that is more generic.
What the NVP output map does is return a Map<T1,T2> based representation of the object. There are 2 configurations:
-
"nvpSrc":"COLUMNS"
simply returns a Map<String, T> of all the fields in the object, where the key is the column name, and T is a compatible base type for ALL the columns listed in the output map. Most of the time, that'sSTRING
and is specified for example as"nvpValueType": "STRING"
. -
"nvpSrc":"ROWS"
assumes that the data represents some key/value structure. In that case, you can only specify two columns where the first column is the key, and the second column is the value. The Map<T1,T2> created will match then the types of the first and second columns respectively.
,{ "name":"Abc1", "columns":["*"], "outTypes":["NVP"], "nvpSrc":"COLUMNS", "nvpValueType": "STRING" }
,{ "name":"Abc2", "columns":["questionLabel", "correct"], "outTypes":["NVP"], "nvpSrc":"ROWS" }
The APIs generated will be:
public static Map<String, String> toNVPAbc1(TestAnswer_View_Data D) throws Exception
public static Map<String, Boolean> toNVPAbc2(List<data.TestAnswer_View_Data> L) throws Exception
The first method will return a Map of all the columns (since we used "columns":["*"]
) with values converted to a String. A single object instance is converted as such. Looking at the generated code should make this self-evident:
public static Map<String, String> toNVPAbc1(tilda_tutorial.data.TestAnswer_View_Data D) throws Exception
{
Map<String, String> M = new HashMap<String, String>();
M.put("formRefnum", String.valueOf(D.getFormRefnum()));
M.put("formType", D.getFormType());
M.put("formFillDateTZ", D.getFormFillDateTZ());
M.put("formFillDate", String.valueOf(D.getFormFillDate()));
M.put("created", String.valueOf(D.getCreated()));
M.put("lastUpdated", String.valueOf(D.getLastUpdated()));
M.put("deleted", String.valueOf(D.getDeleted()));
M.put("userRefnum", String.valueOf(D.getUserRefnum()));
M.put("userId", D.getUserId());
M.put("timeMillis", String.valueOf(D.getTimeMillis()));
M.put("correct", String.valueOf(D.getCorrect()));
M.put("questionLabel", D.getQuestionLabel());
M.put("questionSeq", String.valueOf(D.getQuestionSeq()));
M.put("answerLabel", D.getAnswerLabel());
return M;
}
For the second method, it's conceptually a bit more complicated as a certain type of data structure really makes sense here. In our case, for a given Form, the "data" lives across multiple rows: for a given form, each question can have either a correct or incorrect answer. This method takes in a list of records and returns them as a single Map with the key and value as expected, in our case here, the key is the question label, a String, which the value is whether the answer was correct, a Boolean.
public static Map<String, Boolean> toNVPAbc2(List<tilda_tutorial.data.TestAnswer_View_Data> L) throws Exception
{
Map<String, Boolean> M = new HashMap<String, Boolean>();
for (tilda_tutorial.data.TestAnswer_View_Data D : L)
{
Boolean val = M.get(D.getQuestionLabel());
if(val != null)
throw new Exception("The key " + D.getQuestionLabel() + " with value " + String.valueOf(val) + " already exists in the Map. Key values must be unique.");
if(TextUtil.isNullOrEmpty(D.getQuestionLabel()) == false)
M.put(D.getQuestionLabel(), D.getCorrect());
}
return M;
}
You can imagine how the first case could be used to send the object to a generic piece of code that wouldn't be able to use the class name explicitly at compile time, but may know about the names of the columns. You would do interesting things that way by creating a list of multiple objects of multiple types, but with common column names. It's also an easy way to create a web UI for example that would allow some manipulation of objects in a more generic fashion. For the second scenario, you can imagine how it can be used where the information is expressed as rows. That can be configuration data, generic data (for forms, questionnaires etc...), meta-data and so on.
A scenario where we have used these features in combination involved logistic regression algorithms. A table would contain a list of factors and their weights, while another table would contain the actual data where columns are the factors. The algorithm uses the factor/weight meta-data to write a generic algorithm to compute a score for each business data object that contains the factor values.
factor | weight |
---|---|
height | 1.57 |
age | 0.31 |
gender | 0.35 |
id | height | age | gender |
---|---|---|---|
1111 | 73 | 50 | 1 |
2222 | 67 | 25 | 0 |
3333 | 78 | 19 | 1 |
A simplistic algorithm reads the first data into a Map<String,Double>
containing the factors and their weights, while the second data is read as a Map<String,Integer> for each person with the keys height
, age
, gender
with their associated values. Then it's a double loop to go through all factors for all objects and doing some calculations involving the column value and the matching factor's weight.
Previous | Main | Next |
---|---|---|
<-- Part 8 | Main | Part 10 --> |