Skip to content

mark-antal-csizmadia/finding-similar-items-textually-similar-documents

Repository files navigation

finding-similar-items-textually-similar-documents

Setup

Make a conda virtual environment from the environment.yml file as discussed here. Make the virtual environment available in Jupyter Notebooks as discussed here. Start Jupyter Notebooks and select the environment. Run the main.ipynb notebook.

Reproducibility

The Python PYTHONHASHSEED environment variable is fixed so that the built-in Python hash() function yields consistent results. Pass the seed variable to the MinHashing class constructor as minhashing uses the numpy.random.randint() function.

Results