Build a Corpus Inventory #100

PonteIneptique · 2017-06-20T08:51:07Z

In order to speed up parsing and reduce parsing time, it might be interesting to build a Generic file merging all data at build. That would reduce the loading time by reducing multiple metadata access.

I can see also a point to group them by a maximum of X (Say 1.000 ? ) :

data
   |-- phi1294
   |-- phi1295
   |-- phi1296
   |-- ...
   |-- phi3300
   |-- __capitains_fastload_0__.xml
   |-- __capitains_fastload_1__.xml

So, from there, the nautilus resolver, if it detects such file, could default to parsing these insted of using glob. It would be a production trick let say...

What do you think @balmas @sonofmun

The text was updated successfully, but these errors were encountered:

balmas · 2017-06-20T11:14:38Z

It seems reasonable, although I wonder if we are starting to go down the path of reinventing a wheel that already exists. Would the addition of an indexing solution, such as Elastic Search be another approach?

PonteIneptique · 2017-06-20T11:51:34Z

It's not about reinventing the wheel. Even though there is solutions like ES and Solr, it's still more efficient to load all informations from one file rather than 1000. :)
I am looking at situation like the Pompei Corpus where it would be harassing to parse 12k metadata file everytime the rest is updated...

balmas · 2017-06-20T11:55:02Z

Ok, Just wanted to be sure we weren't overlooking something.

PonteIneptique added the enhancement label Jun 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build a Corpus Inventory #100

Build a Corpus Inventory #100

PonteIneptique commented Jun 20, 2017

balmas commented Jun 20, 2017

PonteIneptique commented Jun 20, 2017

balmas commented Jun 20, 2017

Build a Corpus Inventory #100

Build a Corpus Inventory #100

Comments

PonteIneptique commented Jun 20, 2017

balmas commented Jun 20, 2017

PonteIneptique commented Jun 20, 2017

balmas commented Jun 20, 2017