proposal.txt

Project Proposal

Project Name: Screen Reader for the Visually Impaired

Project Description:

    The project is a screen reader for the visually impaired which will aid
    visually impaired users to navigate websites. This aid will be in the form
    of text-to-speech that will occur whenever the user hovers over sections of the
    webpage and presses X.

Competitive Analysis:

    The types of screen readers that exist include built-in screen readers
    to PC operating systems, such as VoiceOver in Mac PCs and ChromeVox in
    Chromebooks. These screen readers exist to aid the visually impaired in
    navigating the whole computer, such as accessing files and utilizing
    various applications. My project will consist of a similar screen reader on
    a much smaller scale, in which I will focus on providing aid in reading
    webpages.

    Another type of screen reader created is one that acts as an extension to a
    web browser, such as Chrome's Screen Reader. In my project, I will be using
    some features that I have found to be helpful from this extension. The most
    prominent feature is the overall usability of the screen reader. In the
    Google extension, when the user clicks certain parts of a webpage, that
    specific section is read back to them. In my project I will attempt to
    implement this feature and will be my main goal with this project.

Structural Plan:

    The code will feature different functions that will achieve the goal of
    the program:
        1. The main function will deal with how the program interacts with the
        user, therefore, it will be responsible for how the user can start and
        end the program. It will also feature the calls to the different helper
        functions. Furthermore, the GUI for the user will be featured here.
        2. There will exist a helper function that splits the webpage into
        different sections by coordinates, through the distinguishing of
        different paragraphs on the webpage and the coordinates of each
        paragraph. If scrolling is detected this program will run again.
        This function will create a dictionary of coordinates associated with
        strings.
        3. There will exist a helper function that will detect if the mouse
        clicks are within any of those coordinates and perform text-to-speech
        on the strings associated with them.

Algorithmic Plan:

    The trickiest part of the project is to: "identify different sections of
    the webpage and the coordinates of each section":
        Through the Tesseract OCR, I will use it's text localization abilities
        to detect the different sections on the webpage. The following are the
        steps that I will follow:
        1. take a screenshot of the page
        2. use OCR on the page
        3. find new paragraphs based on spaces in string
        4. find locations of first word and word farthest to the left through
        5. extract string for each paragraph and associate it with a coordinate
           through a dictionary


Timeline Plan:

    The following is my timeline plan for the project:
    1. Thursday, November 18th - Complete "identify different sections of
    the webpage and the coordinates of each section"
    2. Saturday, November 20th - Complete OCR and mouse click/TTS on sections
    3. Monday, November 22nd - Complete user interaction and finalize project
    for MVP
    4. Wednesday, November 24th - Brainstorm and add extra features
    5. Friday, November 26th - Finalize project for final submission

Version Control Plan:

    In order to back up my code I will be using GitHub, in which I will update
    my private repository on GitHub whenever I make new changes to my project
    files.

    https://drive.google.com/file/d/1IhIhBU5jQv6dX8Z_ahm99EvWOvxu5BNt/view?usp=sharing

Module List:
    The following are the modules I will be using for my project:
        gtts
        playsound
        pyautogui
        pynput
        openCV
        pytesseract
        os

TP2 Update:
    The following are the updates I have made to my code for TP2:
        1. The image is made black and white to increase accuracy with OCR
        2. The smaller sections that were detected by the OCR, were manually made into bigger sections to increase
        accessibility
        3. The OCR detects changes with the page such as scrolling or pressing a hyperlink
        4. The user interface has been updated to only include sound. In the user interface, the accent of the speaker
        in the tts can be changed and an arabic version of the OCR can be accessed.
        5. An arabic version of the program has been created in which the instructions are in arabic and the OCR
        reads arabic text

TP3 Update:
    The following are the updates I have made to the code for TP3:
        1. Rather than clicking, the user will hover over the section they want to be read and press X.
        2. The program checks if a hyperlink has been activated by reading the URL of the page
        3. The program attempts to check if the user is hovering over a hyperlink (this section has a low accuracy rate)
        4. The program performs image detection, where it locates where images are on the webpage and reads what's in
        them
        5. A new feature has been added where the user can translate sections of the webpage into their desired language
        6. The program can now better detect scrolling

    More modules used:
        1. Image AI
        2. Google Translation API