Re-visiting the architecture of DC #554

arya-hemanshu · 2018-03-27T14:25:17Z

Description

Consider an example of a table of size 6*8, where there are 8 attributes and 6 subjects. In current implementation we are saving combination of every subject with every attribute, which for a table this small does 48 operations instead of just 6 as subject is common for all the 8 attributes of a single record. This significantly slows down the saving of the records in the database when the dataset is too large e.g an excel sheet with 100000 rows and 50 attributes, to save this dataset to database DC run 5 million operations instead of just 100 thousand.

Error log

NA

arya-hemanshu · 2018-04-10T18:00:17Z

While working on the issue, i took an approach a generating fixed_value and timed_value tables at runtime for every datasource.

Benefits:

As explained in the issue it takes care of the issue of saving same subject with multiple attributes which increases the number of transactions
For every datasource there would be one table for timed_values and one for fixed_values which allows java to access these tables through multiple thread
Corrupting one datasource will not effect the other datasources.

Disadvantages:

Keeping track of all the tables that gets generated for every datasource
a lot of code refactoring
managing communication between different tables at runtime.
if the link of the datasource would change it would also change the random generated names for timed and fixed values tables that could create conflict.

Another approach that we could take is using a nosql database, which would allow the same database structure as DC currently has and would also address the above listed issues.

Because it is a schema less approach we can reduce the number the operations as one row could have 10 cols but the other could have 100.

There would be no need to generate tables at runtime.

However it would require re-writing the backend from scratch but it would be more robust approach.

lorenaqendro added this to the Performance and Enhancements milestone Apr 11, 2018

lorenaqendro added enhancement effort:13 topic:datastore labels Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-visiting the architecture of DC #554

Re-visiting the architecture of DC #554

arya-hemanshu commented Mar 27, 2018

arya-hemanshu commented Apr 10, 2018 •

edited

Loading

Re-visiting the architecture of DC #554

Re-visiting the architecture of DC #554

Comments

arya-hemanshu commented Mar 27, 2018

Description

Error log

arya-hemanshu commented Apr 10, 2018 • edited Loading

arya-hemanshu commented Apr 10, 2018 •

edited

Loading