-
Notifications
You must be signed in to change notification settings - Fork 0
/
main.py
141 lines (125 loc) · 7.1 KB
/
main.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# -*- coding: utf-8 -*-
"""
Created on Fri Jan 28 16:37:31 2022
main.py permits you to run "Puglia Sostenibile" with CLI setted to its default properties:
- ngram = unigram : the computation of the similarity will be done by means of the
TFIDF matrix whose keyphrases will be composed by only one token.
Based on the resultant matrix, the cossim similarity will be computed
between the law and all of the SDGs.
- sim_target = False : the similarity will be computed between the law and each SDGs.
An SDG = Goal description + each Target description.
You can choose to compute the most relevant Targets (target description only)
by changing the value to True
The result will be the first n (=3) relevant SDGs for the law.
IF it is the first time the user executes the program (or detected that some libreries are missing or needs to update them
), the software will ask him if he wants to install each packages (HE MUST NEED AN INTERNET CONNECTION AND PYTHON 3.9
INSTALLED ON THE MACHINE):
- YES: the software will install the packages (using the requirements.txt file) and will ask him the law's path
in order to correctly use Puglia Sostenibile;
- NO: the program will stop its execution.
Python 3.9 needed! (current packages versions are NOT compatible with Python 3.10)
@author: ClaudiaLorusso
"""
import sys
from os import path
#PYTHON 3.9 needed!!! (current packages versions are NOT compatible with Python 3.10)
# ---------------------------- UTILS -------------------------------------
def __get_path__(relative_path):
"""
Converts the relative path into an absolute path
:param relative_path: string, relative path of the file
:return: string
absolute path: base path + relative path
"""
try:
# NOTE:
# PyInstaller creates a temp folder and stores path in _MEIPASS
# It's a runtime computation. don't worry about the inline warning.
base_path = sys._MEIPASS
except Exception:
base_path = path.abspath(".")
return path.join(base_path, relative_path)
# ---------------------------- MAIN -------------------------------------
if __name__ == '__main__':
"""
it checks if all of the required packages are installed:
- yes: imports packages and runs the program
- no: asks the user if he wants to install all of the packages
contained in the requirements.txt file:
yes: installs and imports all of the packages and
runs the program
no: closes the program
the program will ask to insert the path of the file containing the law
(.pdf, .txt, .docx ONLY).
By default, it computes the similarity between the law and all of the SDGs ('cause sim_target = False).
BUT if you want to compute the similarity between the law and all of the Targets
you must set the sim_target value to True.
The similarity is computed by the computation of the TFIDF between each SDG and the law;
the cossim similarity is then applied. The first n (=3) more relevant SDGs are, then, returned.
Each keyphrase, used for the computation of the vocabulary and also for the TFIDF, is composed
of 1 token (unigram). You can change this feature by simply change the ngram parameter, passed
to the get_relevant function (Compute_similarity class), to whatever integer you want.
The bigram vocabulary (keyphrases composed of 1 or two tokens) is already available in the
VOCAB folder with the name of: vocabulary.xlsx . Same thing goes for the unigram vocabulary,
available in the VOCAB\\ngram folder named vocabulary_1.xlsx .
If not already in the folder, the vocabulary will automatically be generated by the program
under the name of VOCAB\\ngram\\vocabulary_#number_of_the_gram.xlsx
SUGGESTION: don't go over the bigram.
"""
try:
from Compute_Similarity import get_relevant
from FileHandler import ask_path
#asks laws path
dest = __get_path__(ask_path())
#gets first three relevant targets
print(get_relevant(path_law=dest, ngram = 1, sim_target=False))
except ModuleNotFoundError:
print("Benvenuto in 'Puglia Sostenibile'!\n"
"Poichè è la prima volta che esegui il programma, assicurati di essere correttamente connesso ad internet.\n"
"Per poter utilizzare correttamente 'Puglia Sostenibile', è necessario installare le seguenti librerie:\n")
with open("requirements.txt", 'r') as f:
lines = f.readlines()
print(lines)
f.close()
choice = input("\n\nVuoi procedere con l'installazione?\n"
"Premi su 'Y' per effettuare il download e l'installazione;\n"
"in alternativa, spingi su di un qualsiasi altro pulsante per chiudere il programma:\t").lower()
if choice == 'y':
from subprocess import run
# implement pip as a subprocess:
# I start with scikit-learn 'cause of dependencies
run(["pip3", "install", "scikit-learn == 1.0.2"], shell=True, capture_output=True)
run(["pip3", "install", "-r", "requirements.txt"], shell=True, capture_output=True)
print("\L'installazione è stata effettuata correttamente'!\n")
from Compute_Similarity import get_relevant
from FileHandler import ask_path
# asks laws path
dest = __get_path__(ask_path())
# gets first three relevant targets
try:
print(get_relevant(path_law=dest, ngram=1, sim_target = False))
input("\nSpingi su di un qualsiasi tasto per chiudere il programma.")
except ValueError:
print(
"WARNING: Il file selezionato potrebbe essere protetto da password.\nPer favore, seleziona un altro file.")
except OSError:
print("Warning: Il file risulta essere vuoto.\nPer favore, seleziona un altro file.")
except IOError:
print("Warning: Impossibile processare il file per uno dei seguenti motivi:"
"\n-\til file è vuoto;"
"\n-\til file contiene solo immagini;"
"\n-\til file è corrotto."
"\n\nPer favore, seleziona un altro file.")
else:
print("\nBye! ", "\U0001F984")
input("\npress any key to exit.")
except ValueError:
print("WARNING: Il file selezionato potrebbe essere protetto da password.\nPer favore, seleziona un altro file.")
except OSError:
print("Warning: Il file risulta essere vuoto.\nPer favore, seleziona un altro file.")
except IOError:
print("Warning: Impossibile processare il file per uno dei seguenti motivi:"
"\n-\til file è vuoto;"
"\n-\til file contiene solo immagini;"
"\n-\til file è corrotto."
"\n\nPer favore, seleziona un altro file.")