Scaleable search engine with multiple versions and basic mistype correction.
- Searching through strings.
- Ignoring low coverage results.
- Sorting results by accuracy.
- Support for dictionaries.
A concise documentation of the algorithm can be found here.
Currently Fuzzle exists in three different languages:
To test the search engine, data was gathered from multiple sources to ensure features like tags, mistype correction and coverage work as expected. All the data was restructured into JSON lists/dictionaries for cross-platform compatability and is being expanded upon as new data is added to the sources.
A list of 28 795 movies from 1990 to the present day with the source being a JSON file found here which was restructured to turn the movies' names into keys and their cast into the tags. This allows you to search for a movie not only by name, but also by actor.
The set of games was scraped from the Steam API. It currently contains around 27 thousand games which were then loaded into the games.json file containing the game's name as the key
and the following as tags
:
- Categories: The SteamAPI has (so far) returned 27 unique categories including
captions available
,multi-player
,online multi-player
,includes source SDK
,includes level editor
,in-app purchases
,shared/split screen
,full controller support
,MMO
,online co-op
,cross-platform multiplayer
,partial controller support
,steam achievements
,local co-op
,steam leaderboards
,stats
,commentary available
,steam turn notifications
,steam workshop
,steam cloud
,single-player
,steam trading cards
,co-op
,local multi-player
. - Genres: The data contained 30 unique genres so far including
action
,utilities
,gore
,strategy
,animation & modeling
,photo editing
,education
,sports
,simulation
,web publishing
,documentary
,sexual content
,software training
,tutorial
,indie
,rpg
,massively multiplayer
,design & illustration
,game development
,video production
,nudity
,audio production
,casual
,free to play
,racing
,adventure
,violent
,early access
andaccounting
. - Developer(s) and publisher(s).
- Platform(s): Currently steam stores these values as booleans and the three available options are
windows
,linux
andmacos
.
Since this dataset is quite large and may prove useful in your own projects, the current state of the data as well as the scraper and it's dependancies were archived in the ZIP-Folder and can be downloaded for you to freely use!
A list of 5002 companies with their respective industry, state and city as tags which allows searches such as "california" or "food" to yield brands that do not contain the searched keyword in their name but instead are based in a specific state, city or are active in a certain industry.
A list of countries (presumably with duplicate values) with most of their major cities added as tags to allow finding a country by searching for a city.
- SteamAPI: The source of the list of games.
- Awesome JSON Datasets: Source of the list of movies as well as countries (without the cities).
- Cities of the World: List of cities with their corresponding Country Code which was migrated to countries.json.
- Inc 5000: A ranking of the 5000 quickest growing privately-held companies in America.
The Logo for Fuzzle was created by @lydocia who has her own GitHub profile as well as a website.
- Removing irrelevant results.
- Supporting tags.
- Mistype correction.
- Prioritizing fields.
- Support for custom objects.
- Returning 100% matches at first position followed by rest.
- Models for different search types.