Use T5 as a highlighter #32

rodrigonogueira4 · 2020-04-13T17:43:27Z

Now that we are using Huggingface's T5 reranker, we can try to replace BioBERT's highlighter with T5's context vectors. Thus, we will run inference in only one model, which will decrease our latency and spare one GPU.

Note: we will need to evaluate this T5-based highlighter on BioASQ to see if it is actually better than BioBERT.

daemon · 2020-04-14T15:23:14Z

Writing down some notes:

Eyeballing the T5-based highlighter as implemented seems to yield worse results (e.g., more super random highlightings of completely unrelated material, such as extraneous CC-BY attribution text).
We tried dynamic query representation as well as fixed query representation, i.e., f'Query: {query} Document: {document} Relevant:' vs '{query}' and '{document}'.
If we want to take advantage of caching the reranker, the T5-based highlighter is limited to a maximum sequence length of 256, whereas BioBERT to 512.
The T5-based highlighter is 0-25% faster with reranker caching and 10-20% slower without.

rodrigonogueira4 · 2020-04-14T15:37:32Z

I think the main problem is that we are using 256 tokens for the reranker. Could you please try increasing to 512 tokens? There might be only a small increase in latency because we were underutilizing the GPU when feeding it with 256 tokens..

Also, since we will then have a spare GPU, we can use it to cut the inference time by half (but that we can leave for another PR)

daemon · 2020-04-14T15:41:39Z

Sure, but the results won't be the same as the TensorFlow implementation. Is that okay?

daemon · 2020-04-14T15:43:14Z

I guess I can evaluate it on R04.

rodrigonogueira4 · 2020-04-14T16:01:12Z

Yeah, evaluating on R04 is an even better idea

santhoshkolloju · 2020-06-13T15:47:32Z

can you throw some light on what do u mean highlighting...
can i get link to this BioBERT's highlighter?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use T5 as a highlighter #32

Use T5 as a highlighter #32

rodrigonogueira4 commented Apr 13, 2020

daemon commented Apr 14, 2020

rodrigonogueira4 commented Apr 14, 2020

daemon commented Apr 14, 2020

daemon commented Apr 14, 2020

rodrigonogueira4 commented Apr 14, 2020

santhoshkolloju commented Jun 13, 2020

Use T5 as a highlighter #32

Use T5 as a highlighter #32

Comments

rodrigonogueira4 commented Apr 13, 2020

daemon commented Apr 14, 2020

rodrigonogueira4 commented Apr 14, 2020

daemon commented Apr 14, 2020

daemon commented Apr 14, 2020

rodrigonogueira4 commented Apr 14, 2020

santhoshkolloju commented Jun 13, 2020