Skip to content

This repository contains work conducted with the Data Science Institute and Disability Research Network at the University of Technology, Sydney.

License

Notifications You must be signed in to change notification settings

roupenminassian/UTS-DSI-x-Disability-Research-Network

Repository files navigation

Data Science Institute x Disability Research Network: A UTS HASS-DSI Research Project

Introduction

This repository contains work conducted in collaboration with the Data Science Institute (DSI) and Disability Research Network (DRN) at the University of Technology, Sydney.

The project involves preprocessing textual data from the Royal Commission into "Aged Care Quality and Safety", and "Violence, Abuse, Neglect and Exploitation of People with Disability" and utilising natural language processing (NLP) techniques to improve document search functionality. Initial attempts were made to create a document-fetching algorithm designed to minimise the amount of time a user may spend searching relevant information.

Our research spans various implementations of NLP techniques on this data, as well as utilising common deep-learning algorithms such as BERT and GPT-3. Most of our work is showcased in this repository in order for you to browse, but to also understand both the advantages and drawbacks on the applications of such algorithms in this particular use case.

We hope that with further reserarch and development, these automative tools will benefit legal professionals, as well as the general public in being able to access legal information more efficiently. A warm thank you to Adam Berry and Linda Steel who co-supervised this topic area of research, and who have also kindly given permission to make these findings available to the public.

Feel free to also test the current version of this experiment out (created using Streamlit). It is recommended that you upload a datafile that we have processed in order for it to be successfully readable for our code. The user also has the option to adjust the temperature of the GPT-3 response (this controls how much randomness is in the output). Note that we are not responsible for the output of the GPT3 model. There have been reports of inappropriate content being generated by the deep learning model.

Contents

  1. Data Preprocessing:

  2. Exploratory Data Analysis (EDA)

  3. BM25 (Retrieval Function)

  4. Deep Learning Implementation:

  5. Importance of Data Preprocessing:

About

This repository contains work conducted with the Data Science Institute and Disability Research Network at the University of Technology, Sydney.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages