Machine Learning Training Tool

A tool to scrape Frequently Asked Questions (FAQs) from websites and presents them in a portal that allows users to suggest possible ways of paraphrasing the questions. The purpose of the tool is to help in the data gathering phase of Natural Language Processing (NLP).

Acknowledgment

This tool was created as part of a final year project at Middlesex University, under the supervision of Prof. Franco Raimondi (@fraimondi) and in conjunction with Kare Knowledgeware (formerly Gluru).

Setup

Google Datastore

Create a Google Cloud Account (if you don't already have one: here)
Create a new project (and take note of the project id)
Create a new Database instance here - Make sure to use Cloud Firestore in Native Mode
Create a 'Service account key' [here][https://console.cloud.google.com/apis/credentials] with at least the role 'Cloud Datastore User' - and store the resulting JSON file in a secure location.
Set the required environment variables:
export GOOGLE_APPLICATION_CREDENTIALS=<path_to_service_account_json_file>
export GOOGLE_PROJECT_ID=<project_id>

The Go WebServer

Install GoLang here (it was developed using go1.9.1)
Navigate to the /go-server directory within this repo
Run go get to install the application dependencies
Run go run main.go to run the server (runs on port 9090 by default)

The Python Web Scraper

Install Python 3 here (it was developed using Python 3.5.0)
(optional) create a virtual environment to isolate modules from the global instance
Navigate to the /fypScraper directory within this repo
Run pip install -r requirements.txt to install the application dependencies
Configure a cron job to run the /fypScraper/fypScraper/spiders/spiderLauncher.py script at a given interval (at each excecution, the script will find and process any unscraped websites found in the datastore), for example:
* * * * * <python3_path> <path_to_repo>/fypScraper/fypScraper/spiders/spiderLauncher.py will run the scraper to check every minute

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
UML		UML
fypScraper		fypScraper
go-server		go-server
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Training Tool

Acknowledgment

Setup

Google Datastore

The Go WebServer

The Python Web Scraper

About

Releases

Packages

Contributors 2

Languages

DiNozzo97/Machine-Learning-Training-Tool

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Training Tool

Acknowledgment

Setup

Google Datastore

The Go WebServer

The Python Web Scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages