Skip to content

Commit

Permalink
grandexchange fuzzy: handle mid-term matches better
Browse files Browse the repository at this point in the history
partial regression introduced by runelite#16973

Searches that did not share a prefix string with the item name, such as searching "splinters" for "Sunfire splinters", was not returning expected results, as the heavy bias toward shared prefixes was penalizing the query for not matching "Sunfire". This is unexpected, and the fuzzy search scorer should handle searches that do not begin at the start of the item name as well.

This PR replaces the prefix bias with a longest-common-subsequence bias, to better handle those mid-string terms. It uses the maximum LCS between any pair of search words and item words, and then still defers to jaro-winkler for close-scoring tiebreakers.

This still maintains some of the preferred behaviours of the original change (runelite#16973) such as prioritizing short-named items like "Pot" and "Egg" more appropriately. Those still appear as the first search result when searching their names.
  • Loading branch information
LlemonDuck authored and Adam- committed Aug 6, 2024
1 parent 7ac9d7b commit d7cbfc9
Showing 1 changed file with 16 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import javax.inject.Singleton;
import net.runelite.api.ItemComposition;
import org.apache.commons.text.similarity.JaroWinklerDistance;
import org.apache.commons.text.similarity.LongestCommonSubsequence;
import org.apache.commons.text.similarity.SimilarityScore;

@Singleton
Expand All @@ -44,18 +45,24 @@ public Double score(String query, String itemName)
query = query.toLowerCase().replace('-', ' ');
itemName = itemName.toLowerCase().replace('-', ' ');

// we raise the score for items that share a prefix with the query
int prefixLen = 0;
int maxLen = Math.min(query.length(), itemName.length());
while (prefixLen < maxLen && query.charAt(prefixLen) == itemName.charAt(prefixLen))
// we raise the score for longest substring of a word, scoring within [0,1]
String[] queryWords = query.split(" ");
String[] itemWords = itemName.split(" ");
double lcsScore = 0.0;
for (String queryWord : queryWords)
{
prefixLen++;
for (String itemWord : itemWords)
{
int lcsLen = new LongestCommonSubsequence().longestCommonSubsequence(queryWord, itemWord).length();
lcsScore = Math.max(lcsScore, ((double) lcsLen) / queryWord.length());
}
}
double prefixScore = ((double) prefixLen) / query.length() - 0.25;

// and also raise the score for string "closeness"
double proximityScore = baseAlgorithm.apply(query, itemName) - 0.25;
return prefixScore + proximityScore;
// and also raise the score for string "closeness", but strongly prefer high closeness, scoring within [-0.5,0.5]
double proximityScore = Math.log10(10 * baseAlgorithm.apply(query, itemName)) - 0.5;

// subtract 1.0 to filter out low-scoring results
return lcsScore + proximityScore - 1.0;
}

public ToDoubleFunction<ItemComposition> comparator(String query)
Expand Down

0 comments on commit d7cbfc9

Please sign in to comment.