Modern Problems Require Modern Solutions
In the race to big data solutions and data-driven analytics, it is important to preserve the privacy of the information source as data propagates into the loops of the Internet.
IBMocha is a hack on IBM Watson NLU tools to utilize the power of Machine Learning Cloud Infrastructure to redact sensitive information on the Internet.
If you are a web-admin, you can use this code to look for potential exposure of private data on your pages. This can help you screen your website for possible GDPR Violations.
IBMocha is also modelled to target the recent outbreak of Aadhaar Card Data that exploited search engine crawlers.
-
Individual Names
-
Location
-
Email Addresses
-
Phone numbers
-
Aadhaar Numbers (primitive) (XXXX-XXXX-XXXX format)
-
Go to IBM Cloud Console -> Login/Register -> Visit Dashboard
-
Visit Catalog -> AI -> Natural Language Understanding or visit Natural Language Understanding
-
Create a Watson NLU Service
-
Go to Dashboard
-
Select your newly created Natural Language Understanding service
-
Go to Service Credentials tab
-
Create new credentials if it doesn't show up
-
Click view credentials
-
Create
config.json
in root directory of repo -
Paste the credentials in json format in
config.json
-
Add
config.json
to.gitignore
to avoid misuse
-
Clone repo
git clone https://github.com/ajwad-shaikh/IBMocha.git
-
cd IBMocha
-
Install dependencies
npm install
-
npm install nodemon -g
-
npm run serve
(win-serve
if Windows Machine)
-
open
localhost:8008
-
There are two modes of input - Text and URL
-
URL Mode - Enter URL and click on submit to analyse the website for personal information exposure using IBM Watson NLU Service
- Text Mode - Enter text and click on submit to analyse the text for personal information exposure using IBM Watson NLU Service.
- Text Mode also renders a redacted preview that masks personal information.
- Include PDF file input.
- Include redacted website preview.
- Include PDF output.