-
Notifications
You must be signed in to change notification settings - Fork 0
/
proposal.txt
115 lines (95 loc) · 5.42 KB
/
proposal.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
Project Proposal
Project Name: Screen Reader for the Visually Impaired
Project Description:
The project is a screen reader for the visually impaired which will aid
visually impaired users to navigate websites. This aid will be in the form
of text-to-speech that will occur whenever the user hovers over sections of the
webpage and presses X.
Competitive Analysis:
The types of screen readers that exist include built-in screen readers
to PC operating systems, such as VoiceOver in Mac PCs and ChromeVox in
Chromebooks. These screen readers exist to aid the visually impaired in
navigating the whole computer, such as accessing files and utilizing
various applications. My project will consist of a similar screen reader on
a much smaller scale, in which I will focus on providing aid in reading
webpages.
Another type of screen reader created is one that acts as an extension to a
web browser, such as Chrome's Screen Reader. In my project, I will be using
some features that I have found to be helpful from this extension. The most
prominent feature is the overall usability of the screen reader. In the
Google extension, when the user clicks certain parts of a webpage, that
specific section is read back to them. In my project I will attempt to
implement this feature and will be my main goal with this project.
Structural Plan:
The code will feature different functions that will achieve the goal of
the program:
1. The main function will deal with how the program interacts with the
user, therefore, it will be responsible for how the user can start and
end the program. It will also feature the calls to the different helper
functions. Furthermore, the GUI for the user will be featured here.
2. There will exist a helper function that splits the webpage into
different sections by coordinates, through the distinguishing of
different paragraphs on the webpage and the coordinates of each
paragraph. If scrolling is detected this program will run again.
This function will create a dictionary of coordinates associated with
strings.
3. There will exist a helper function that will detect if the mouse
clicks are within any of those coordinates and perform text-to-speech
on the strings associated with them.
Algorithmic Plan:
The trickiest part of the project is to: "identify different sections of
the webpage and the coordinates of each section":
Through the Tesseract OCR, I will use it's text localization abilities
to detect the different sections on the webpage. The following are the
steps that I will follow:
1. take a screenshot of the page
2. use OCR on the page
3. find new paragraphs based on spaces in string
4. find locations of first word and word farthest to the left through
5. extract string for each paragraph and associate it with a coordinate
through a dictionary
Timeline Plan:
The following is my timeline plan for the project:
1. Thursday, November 18th - Complete "identify different sections of
the webpage and the coordinates of each section"
2. Saturday, November 20th - Complete OCR and mouse click/TTS on sections
3. Monday, November 22nd - Complete user interaction and finalize project
for MVP
4. Wednesday, November 24th - Brainstorm and add extra features
5. Friday, November 26th - Finalize project for final submission
Version Control Plan:
In order to back up my code I will be using GitHub, in which I will update
my private repository on GitHub whenever I make new changes to my project
files.
https://drive.google.com/file/d/1IhIhBU5jQv6dX8Z_ahm99EvWOvxu5BNt/view?usp=sharing
Module List:
The following are the modules I will be using for my project:
gtts
playsound
pyautogui
pynput
openCV
pytesseract
os
TP2 Update:
The following are the updates I have made to my code for TP2:
1. The image is made black and white to increase accuracy with OCR
2. The smaller sections that were detected by the OCR, were manually made into bigger sections to increase
accessibility
3. The OCR detects changes with the page such as scrolling or pressing a hyperlink
4. The user interface has been updated to only include sound. In the user interface, the accent of the speaker
in the tts can be changed and an arabic version of the OCR can be accessed.
5. An arabic version of the program has been created in which the instructions are in arabic and the OCR
reads arabic text
TP3 Update:
The following are the updates I have made to the code for TP3:
1. Rather than clicking, the user will hover over the section they want to be read and press X.
2. The program checks if a hyperlink has been activated by reading the URL of the page
3. The program attempts to check if the user is hovering over a hyperlink (this section has a low accuracy rate)
4. The program performs image detection, where it locates where images are on the webpage and reads what's in
them
5. A new feature has been added where the user can translate sections of the webpage into their desired language
6. The program can now better detect scrolling
More modules used:
1. Image AI
2. Google Translation API