grandexchange fuzzy: handle mid-term matches better

partial regression introduced by runelite#16973 Searches that did not share a prefix string with the item name, such as searching "splinters" for "Sunfire splinters", was not returning expected results, as the heavy bias toward shared prefixes was penalizing the query for not matching "Sunfire". This is unexpected, and the fuzzy search scorer should handle searches that do not begin at the start of the item name as well. This PR replaces the prefix bias with a longest-common-subsequence bias, to better handle those mid-string terms. It uses the maximum LCS between any pair of search words and item words, and then still defers to jaro-winkler for close-scoring tiebreakers. This still maintains some of the preferred behaviours of the original change (runelite#16973) such as prioritizing short-named items like "Pot" and "Egg" more appropriately. Those still appear as the first search result when searching their names.
hex-agon · Aug 6, 2024 · d7cbfc9 · d7cbfc9
1 parent 7ac9d7b
commit d7cbfc9
Showing 1 changed file with 16 additions and 9 deletions.
diff --git a/...ite-client/src/main/java/net/runelite/client/plugins/grandexchange/FuzzySearchScorer.java b/...ite-client/src/main/java/net/runelite/client/plugins/grandexchange/FuzzySearchScorer.java
@@ -29,6 +29,7 @@
 import javax.inject.Singleton;
 import net.runelite.api.ItemComposition;
 import org.apache.commons.text.similarity.JaroWinklerDistance;
+import org.apache.commons.text.similarity.LongestCommonSubsequence;
 import org.apache.commons.text.similarity.SimilarityScore;
 
 @Singleton
@@ -44,18 +45,24 @@ public Double score(String query, String itemName)
  query = query.toLowerCase().replace('-', ' ');
  itemName = itemName.toLowerCase().replace('-', ' ');
 
- // we raise the score for items that share a prefix with the query
- int prefixLen = 0;
- int maxLen = Math.min(query.length(), itemName.length());
- while (prefixLen < maxLen && query.charAt(prefixLen) == itemName.charAt(prefixLen))
+ // we raise the score for longest substring of a word, scoring within [0,1]
+ String[] queryWords = query.split(" ");
+ String[] itemWords = itemName.split(" ");
+ double lcsScore = 0.0;
+ for (String queryWord : queryWords)
  {
- prefixLen++;
+ for (String itemWord : itemWords)
+ {
+ int lcsLen = new LongestCommonSubsequence().longestCommonSubsequence(queryWord, itemWord).length();
+ lcsScore = Math.max(lcsScore, ((double) lcsLen) / queryWord.length());
+ }
  }
- double prefixScore = ((double) prefixLen) / query.length() - 0.25;
 
- // and also raise the score for string "closeness"
- double proximityScore = baseAlgorithm.apply(query, itemName) - 0.25;
- return prefixScore + proximityScore;
+ // and also raise the score for string "closeness", but strongly prefer high closeness, scoring within [-0.5,0.5]
+ double proximityScore = Math.log10(10 * baseAlgorithm.apply(query, itemName)) - 0.5;
+
+ // subtract 1.0 to filter out low-scoring results
+ return lcsScore + proximityScore - 1.0;
  }
 
  public ToDoubleFunction<ItemComposition> comparator(String query)