Skip to content

Browser extension to add Multilingual Wikipedia Popup hints to any page

Notifications You must be signed in to change notification settings

areyasouka/bakuga

Repository files navigation

Bakuga - Multilingual Wikipedia Popup Definitions

Browser extension to add Multilingual Wikipedia Popup hints to any page

(warning: alpha version, prototype)

Follow @arex Postmeta.com

Features

  • Show multilingual wikipedia popup hints on any webpage (via web extension, browser plugin)
  • Great for language acquisition and reading complex journal articles

Screenshot showing wikipedia popup

Dependencies

Build

gunzip -c ./data/wikidata_enja_enja.tsv.gz > ./src/assets/wikidata_enja_enja.tsv
npm i
npm run build
npm run dev
# load unpacked extension from ./dist folder in browser extensions dev mode

Generate Dictionary TSV (optional)

pip3 install ujson qwikidata
vi ~/Library/Python/3.8/lib/python/site-packages/qwikidata/json_dump.py
# import ujson as json

# ~5.5hrs to download 80gb compressed
wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz -P ./data

# ~3.5hrs filter json and write tsv of mapping data
python3 ./utils/extract.py
python3 ./utils/sort.py

mv ./data/wikidata_enja_enja_sorted.tsv ./src/assets/wikidata_enja_enja.tsv

# rebuild

TODO

  • improve dom parsing, backtrack to first real break in dom and parse sentence left to right doing longest normalized string matching from index data, return entities and positions in dom for highlight, identify closest (or under-cursor) entity for highlight and popup
  • support Chinese/Japanese etc by left-right parsing longest string matches
    • exact match first
    • later try stemming, rem suffixes or JP conjugated endings
  • highlight matched text, embed tag around text and apply highlight style
  • remove jquery dependency