the core ontology tree browser is slow #77

bradfordcondon · 2019-01-29T21:16:06Z

this is really a core issue, we'll get thhere eventually.

I was reviewing a PR used that used the old tripal 2 tripal_cv ontology browser: tripal/tripal_analysis_go#12

It was fast. really fast.

Our module is slow, but its slow because the new tripal 3 core tree browser is slow.

Is there any way we can speed it up?

almasaeed2010 · 2019-01-30T12:33:09Z

I've been inspecting the code and here are my observations so far of what could be slowing us down:

Use of chado_generate_var('cvterm', $match); for every term in tripal_get_vocabulary_root_terms
In tripal_cv_xray_lookup_entities_for_terms_count, we get the counts per term by running the query once per term when we could use a group by and run the query only once for all terms
After the trem counts are obtained as outlined above, we then call tripal_chado_vocab_get_term_children which again hits the DB twice for every term.

almasaeed2010 · 2019-01-30T13:40:33Z

Ok after testing each function individually, it looks like tripal_cv_xray_lookup_entities_for_terms_count takes the longest to return results.

almasaeed2010 · 2019-01-30T13:50:05Z

A few trials changing how the query is structured:

Original

This is run once for every accession. In this case, multiply by 3 since those are the root GO terms. Effectively we are looking at 9,379.218 ms for only 3 terms!

SELECT COUNT(TCEL.entity_id) 
from tripal_cvterm_entity_linker TCEL
inner join chado_bio_data_7 CB on CB.entity_id = TCEL.entity_id 
inner join chado.feature CF on CF.feature_id = CB.record_id
where CF.organism_id = 46 and TCEL.database = 'GO' and accession = '0003674';
 count 
-------
 15156
(1 row)

Time: 3126.406 ms

The fastest

This is run once for all given accessions so the time shown below is final.

SELECT TCEL.database, TCEL.accession, COUNT(TCEL.entity_id) 
from tripal_cvterm_entity_linker TCEL 
inner join chado_bio_data_7 CB on CB.entity_id = TCEL.entity_id 
inner join chado.feature CF on CF.feature_id = CB.record_id 
where CF.organism_id = 46 and TCEL.database = 'GO' 
      and accession in ('0008150', '0003674', '0005575') group by TCEL.database, TCEL.accession;
 database | accession | count 
----------+-----------+-------
 GO       | 0003674   | 15156
 GO       | 0008150   | 13532
 GO       | 0005575   |  4337
(3 rows)

Time: 2969.810 ms

This means we manage to be 3 times faster if we switch ti eager loading.

The Drawback

In-memory operations are increased heavily using this approach and therefore we need to be careful about how many accessions we process at any given time. That said, we are already in the safe side since we never process over 25 accessions at a time.

almasaeed2010 · 2019-01-30T13:51:53Z

I am going to implement this change and see how that affects our dev site.

almasaeed2010 · 2019-01-30T14:03:55Z

Dev Stats

Stats are obtained for Fraxinus excelsior

Pre-eager loading

16.46s

Post-eager loading

12.97 s

Well that's disappointing 😞 Not nearly enough speedup but some progress

almasaeed2010 · 2019-01-30T14:25:15Z

Ok I think the answer lies in an mview that acts as a cache of counts for each entity:

The mview should look something like this

entity_id	database	accession	count
46	GO	0003674	15156
46	GO	0008150	13532
46	GO	0005575	4337

If we do implement this MView we need to populate it after the end of every indexing job which should be easy enough to do programatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the core ontology tree browser is slow #77

the core ontology tree browser is slow #77

bradfordcondon commented Jan 29, 2019 •

edited

Loading

almasaeed2010 commented Jan 30, 2019 •

edited

Loading

almasaeed2010 commented Jan 30, 2019

almasaeed2010 commented Jan 30, 2019

almasaeed2010 commented Jan 30, 2019

almasaeed2010 commented Jan 30, 2019

almasaeed2010 commented Jan 30, 2019 •

edited

Loading

the core ontology tree browser is slow #77

the core ontology tree browser is slow #77

Comments

bradfordcondon commented Jan 29, 2019 • edited Loading

almasaeed2010 commented Jan 30, 2019 • edited Loading

almasaeed2010 commented Jan 30, 2019

almasaeed2010 commented Jan 30, 2019

Original

The fastest

The Drawback

almasaeed2010 commented Jan 30, 2019

almasaeed2010 commented Jan 30, 2019

Dev Stats

Pre-eager loading

Post-eager loading

almasaeed2010 commented Jan 30, 2019 • edited Loading

bradfordcondon commented Jan 29, 2019 •

edited

Loading

almasaeed2010 commented Jan 30, 2019 •

edited

Loading

almasaeed2010 commented Jan 30, 2019 •

edited

Loading