Skip to content

theashwin/smart-recall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Notes Recall using Browser Extension

Installation

If you need to change drive in Anaconda Prompt, just type the DRIVE_LETTER followed by colon to move to D:\ drive type, they cd into the downloaded/cloned path.

D:

Django Backend

You'll need Anaconda / Miniconda installed on the system to install all the python dependencies that are needed to use this project. If you do not have Anaconda installed you can install it from the below link.

Install Anaconda, if not already installed

Create a new environment using Anaconda Prompt to avoid conflicting packages from in the current environment

conda create -n django python=3.10
conda activate django

Change directory to /downloaded_path/src/recall. This is where we can run the Django backend.

Install all the requirements for the project using the following command,

pip install -U -r requirements.txt

Ideally, all of the requirements would be installed using this command, if not please install the required package manually

Now, run the Django backend using the following command

python manage.py runserver

Keep this server running, we'll now install the browser extension to use the application intuitively.

Firefox Extension

Please install Mozilla Firefox if you don't have it installed from Mozilla Firefox Download

  • Go to the URL,
about:debugging#/runtime/this-firefox
  • Click on Load Temporary Add-On
  • Select manifest.json at /downloaded_path/src/firefox/manifest.json
  • You'll see a Triangle sign besides the menu icon in the top right corner.
  • You've successfully installed the Firefox extension

Usage

  • Once the Django backend is up and running and the Firefox extension is installed
  • You can triple click on any paragraph that automatically selects the whole paragraphs and makes a search to the backend

For the first time making the call to the backend it will download the bert base model with is atleast half gigabytes and will install it but that will only happens once for setting up an environment so practically the user will never experience it. It's part of the setup. Every subsequent call is almost instantaneous depending upon the processing power of the computer.

If the application seems stuck after selecting the text, make sure you're triple clicking.
Check the Anaconda Prompt for any processes that might be going on
Check if the Django Backend is running.
  • The backend will return most relevant notes if one exists else will return empty list.
    • The similarity is determined by the threshold which is saved in config.json.
    • The lower the threshold lower the relevance of the notes.

Debugging

If you change the hyperparameters in config.json file you need to delete the .faiss hidden folder created in the root directory of the notes location, in the default case it is at src/recall/markdown/.faiss

After this the model will retrain again for the new hyperparameters

Examples

Triple click on these paragraphs to test out the extension. 

Make sure that the Django backend is running and the extension is installed 
on the browser as instructed in the Installation section.

In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong independence assumptions between the features. They are among the simplest Bayesian network models, but coupled with kernel density estimation, they can achieve high accuracy levels.

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.

Please note that the results posted below are for a threshold value of 0.7, by default I have shipped the application with 0.5 threshold.

Screenshots

Screenshots for the above examples,

Example #1: Result No. 1 Example #1: Result No. 2 Example #1: Result No. 3

Example #1: Top Three Results

Example #2: Result No. 1 Example #2: Result No. 2
Example #2: Result No. 3 Example #2: Result No. 4

Example #2: Top Four Results

Proposal

Intelligent Notes Recall using Browser Extension, as time passes the percent of information retained from any notes or resource exponentially decreases, so as to combat the forgetting curve frequent recall / revision is utmost important. It uses three main topics taught during the class: Language Models, Relevant Search and Relevance Feedback.

Idea

The idea is to have an intelligent browser extension that reads the current browsing page dynamically and suggests the most related notes from your note-taking application databases. For instance, suppose you're reading an article on "Transformer Neural Networks" then the browser extension will dynamically read the webpage and show the most relevant notes to improve recall from the already saved notes.

Motivation

I personally take a lot of notes, but the taking notes is not enough, we should be able to re-read the notes once in a while to improve memorization or understanding of the notes. Without which the importance of taking notes drastically decreases.

This is the main problem that students face when the notes are not reviewed frequently and in an intuitive manner. The Information Retained decreases as the time elapses. The Forgetting Curve

Fig.1 - The Forgetting Curve

To combat this phenomenon the main solution is to keep reviewing the notes frequently so that the % of information retained is maximized. Combating Forgetting Curve

Fig.2 - Combating Forgetting Curve using Intelligent Recall

This extension will help with information retention as well as intuitive note-taking.

Languages

  • Python for Django Backend
  • Javascript for Firefox Extension
  • Markdown for Notes

Techniques / Tools

  • Facebook AI Similiarity Search (FAISS)
    • For searching for similar embedding vectos nearby the query vector
    • It is scalable and efficient and can return related vectors in milliseconds even with billions of dense vectors.
  • Large Multilingual Language Model
    • I used BERT Base for creating embedding vectors from notes and well as when making query vector from selected text that we get from browser.
  • Testing - Demonstrating the application works as expected.
    • Testing is done manually and with the help of users using this application on daily basis for 3 days, finding any loopholes and fixing algorithms to finetune relevance and effectiveness.

References

Team Members

Ashwin Patil anpatil2@illinois.edu SOLO PROJECT

Time Report [38 Hours]

Firefox Extension [14 Hours]

  • 2 Hours Learning extension development
  • 7 Hours Developing extension backend
  • 3 Hours Extension frontend design
  • 2 Hours Connecting extension with django backend using REST APIs

Django Backend [18 Hours]

  • 2 Hours Developing microservice architecture for backend
  • 2 Hours Exposing REST APIs for extension use, serialization to transfer data.
  • 2 Hours Creating embeddings from Notes using BERT Base
  • 4 Hours Facebook AI Similarity Search (FAISS) for efficient similarity search.
    • 2 Hours Learning FAISS
    • 2 Hours Implementing FAISS and exploring hyperparameters.
  • 1 Hour Converting Markdown to HTML for displaying on frontend

Miscellaneous [6 Hours]

  • 4 Hours Documentation
  • 2 Hours Presentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published