Skip to content

An out-of-the-box, corpus-agnostic query expansion tool for lexical retrieval systems.

Notifications You must be signed in to change notification settings

jdlflr/sense_aware_query_expansion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

  1. About SAQE (Sense-Aware Query Expansion)
  2. Getting Started
  3. Code Examples

About SAQE (Sense-Aware Query Expansion)

saqe is an out-of-the-box, corpus-agnostic query expansion tool for lexical retrieval systems. It uses WordNet as its knowledge base. For word-sense disambiguation, it computes semantic similarity between query embeddings and Wordnet term embeddings. The embeddings are produced using a user-inputted language model (or SimCSE by default). Finally, it leverages NLTK, spaCy, and TextBlob to optimize query term tokenization (the least number of non-overlapping, WordNet-meaningful terms).

Getting Started

In a Python 3.8 virtual environment, install the saqe package and download its required artifacts.

python setup.py install
python -m textblob.download_corpora
python -m spacy download en_core_web_lg

Code Examples

Use Case #1: Expand original query with synonyms

import json

from saqe import SAQE


original_query = "the foreign policy of the United States"
query_expander = SAQE()

print('ORIGINAL QUERY:')
print(original_query)

expansion_terms = query_expander.expand(original_query)
print('ORIGINAL QUERY EXPANDED WITH SYNONYMS')
print(f"{original_query} {expansion_terms['as_a_string']}")
print('SYNONYMS ORGANIZED BY QUERY TERMS')
print(json.dumps(expansion_terms['by_term'], indent=4))

Output

ORIGINAL QUERY:
the foreign policy of the United States
ORIGINAL QUERY EXPANDED WITH SYNONYMS
the foreign policy of the United States U.S. U.S. government US Government United States government
SYNONYMS ORGANIZED BY QUERY TERMS
{
    "United States": {
        "synonyms": [
            "U.S.",
            "U.S. government",
            "US Government",
            "United States government"
        ]
    }
}

Use Case #2: Expand original query with synonyms and Hyponyms

import json

from saqe import SAQE


original_query = "the foreign policy of the United States"
query_expander = SAQE(enable_hyponyms=True)

print('ORIGINAL QUERY:')
print(original_query)

expansion_terms = query_expander.expand(original_query)
print('ORIGINAL QUERY EXPANDED WITH SYNONYMS AND HYPONYMS')
print(f"{original_query} {expansion_terms['as_a_string']}")
print('SYNONYMS AND HYPONYMS ORGANIZED BY QUERY TERMS')
print(json.dumps(expansion_terms['by_term'], indent=4))

Output

ORIGINAL QUERY:
the foreign policy of the United States
ORIGINAL QUERY EXPANDED WITH SYNONYMS AND HYPONYMS
the foreign policy of the United States U.S. U.S. government US Government United States government brinkmanship imperialism intervention isolationism monroe doctrine neutralism nonaggression nonintervention regionalism trade policy truman doctrine
SYNONYMS AND HYPONYMS ORGANIZED BY QUERY TERMS
{
    "United States": {
        "synonyms": [
            "U.S.",
            "U.S. government",
            "US Government",
            "United States government"
        ]
    },
    "foreign policy": {
        "hyponyms": [
            "brinkmanship",
            "imperialism",
            "intervention",
            "isolationism",
            "monroe doctrine",
            "neutralism",
            "nonaggression",
            "nonintervention",
            "regionalism",
            "trade policy",
            "truman doctrine"
        ]
    }
}

Use Case #3: Expand original query with synonyms, hyponyms, and noun phrases from query term definitions

import json

from saqe import SAQE


original_query = "the foreign policy of the United States"
query_expander = SAQE(enable_hyponyms=True, enable_noun_phrases_from_definition=True)

print('ORIGINAL QUERY:')
print(original_query)

expansion_terms = query_expander.expand(original_query)
print('ORIGINAL QUERY EXPANDED WITH SYNONYMS, HYPONYMS, AND NOUN PHRASES FROM TERM DEFINITIONS')
print(f"{original_query} {expansion_terms['as_a_string']}")
print('SYNONYMS, HYPONYMS, AND NOUN PHRASES FROM TERM DEFINITIONS ORGANIZED BY QUERY TERMS')
print(json.dumps(expansion_terms['by_term'], indent=4))

Output

ORIGINAL QUERY:
the foreign policy of the United States
ORIGINAL QUERY EXPANDED WITH SYNONYMS, HYPONYMS, AND NOUN PHRASES FROM TERM DEFINITIONS
the foreign policy of the United States U.S. U.S. government US Government United States United States government branches brinkmanship executive federal government imperialism international relations intervention isolationism judicial branches monroe doctrine neutralism nonaggression nonintervention policy regionalism trade policy truman doctrine
SYNONYMS, HYPONYMS, AND NOUN PHRASES FROM TERM DEFINITIONS ORGANIZED BY QUERY TERMS
{
    "United States": {
        "synonyms": [
            "U.S.",
            "U.S. government",
            "US Government",
            "United States government"
        ],
        "noun_phrases_from_definition": [
            "United States",
            "branches",
            "executive",
            "federal government",
            "judicial branches"
        ]
    },
    "foreign policy": {
        "noun_phrases_from_definition": [
            "international relations",
            "policy"
        ],
        "hyponyms": [
            "brinkmanship",
            "imperialism",
            "intervention",
            "isolationism",
            "monroe doctrine",
            "neutralism",
            "nonaggression",
            "nonintervention",
            "regionalism",
            "trade policy",
            "truman doctrine"
        ]
    }
}

Releases

No releases published

Packages

No packages published

Languages