-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fvh adaptor to highlight the top boost phrase only #799
Conversation
The <em>margarita pizza</em> and the <em>marinara pizza</em> in this pizzeria are yummy and inexpensive.
delicious: 4
margarita pizza: 3
marinara pizza: 3
yummy: 2
|
26c0db4
to
575baa1
Compare
...a/com/yelp/nrtsearch/server/luceneserver/highlights/TopBoostOnlyFragmentsBuilderAdaptor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update the highlighting doc and also first send the PR in main branch before backporting to v0? This would ensure that the proto field numbers are correct.
buffer, index, values, s, fragInfo.getEndOffset(), modifiedStartOffset); | ||
int srcIndex = 0; | ||
double topBoostValue = | ||
fragInfo.getSubInfos().stream().map(SubInfo::getBoost).max(Float::compare).orElse(0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use for loop instead of stream to avoid extra object creation. This code is called per document, so it can be called 1000s of times per second and we don't want it to be slow.
You can actually also store the subInfos with the highest boost along with the topBoostValue, and then you won't need to iterate over all of the subInfos again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called once for each fragment-to-be-created. We have to calculated the topBoostValue for each fragment as not all fragments contains all the desired phrases.
if (subInfo.getBoost() < topBoostValue) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the modifiedStartOffset
still track the correct offset even if some of the subInfos are skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
This method uses pointer to concatenate the "highlighted" and the other parts. Skipping some of the phrases will still create a complete fragment.
RP-12281
Add
top_boost_only
toggle to highlight the found highest boosted phrase. This is achieved by adding an adaptor to the FragmentBuilder.The test case is self-explanatory. Only the multiple occurances of the best match will be highlighted.