Skip to content

Releases: sagorbrur/bnlp

BNLP 4.0.3

20 Aug 11:37
Compare
Choose a tag to compare
  • Remove the version from NLTK to keep it up-to-date.

BNLP 4.0.2

12 Aug 16:22
Compare
Choose a tag to compare
  • Update NLTK version from 3.8.1 to 3.8.2. NLTK Version 3.8.1 has security vulnerabilities.
    Reference issue: #46

BNLP 4.0.1 Patch Release

04 May 18:51
ce9fa36
Compare
Choose a tag to compare

BNLP 4.0.1 Patch Release

  • Minor change with version adding in requirements to fix install problem with scipy

BNLP 4.0.0-dev3

14 Aug 06:53
Compare
Choose a tag to compare

The internal build version of bnlp 4.0.0

BNLP 4.0.0-dev4

14 Aug 11:31
Compare
Choose a tag to compare

fixed build problem dev version 3

BNLP 4.0.0

14 Aug 14:34
Compare
Choose a tag to compare

BNLP 4.0.0: Re-design of BNLP version 3 with proper OOP methods for re-use model, use separate train module, and so on

Highlights

BNLP v4.0.0 is re-design with proper object-orient programming method. In the earlier version pre-trained model was loading every time we try to tokenize or embed a text. But this version model will load only once and re-use for tokenization, embedding, and other task as well. Also added automatic model downloading so if someone passes no pre-train model path it will automatically load a pre-train model from the hub. In the earlier version training module was embedded with the same prediction module. Which was creating a problem to add some separate functionalities for train and predicting. So, we separated the training module for every task like tokenization, and embeddings. The Corpus module is now a class to reuse and add new features.

API Changes

Model loading changes: Previously model was loading every time it generate a results

  • Model was loading while initiating any classes
  • If no model passes through it will automatically load a pre-train model from the hub.
3.3.2 4.0.0
from bnlp import BengaliWord2Vec

bwv = BengaliWord2Vec()
model_path = "bengali_word2vec.model"
word = 'গ্রাম'
similar = bwv.most_similar(model_path, word, topn=10)
print(similar)
from bnlp import BengaliWord2Vec

model_path = "path/mymodel.model"
bwv = BengaliWord2Vec(model_path=model_path)

word = 'গ্রাম'
vector = bwv.get_word_vector(word)
print(vector.shape)

Training module changes

The training module separated from the main module and added relevant features into it.

3.3.2 4.0.0
from bnlp import BengaliWord2Vec

bwv = BengaliWord2Vec()
data_file = "raw_text.txt"
model_name = "test_model.model"
vector_name = "test_vector.vector"
bwv.train(data_file, model_name, vector_name, epochs=5)
from bnlp import Word2VecTraining

trainer = Word2VecTraining()

data_file = "raw_text.txt"
model_name = "test_model.model"
vector_name = "test_vector.vector"
trainer.train(data_file, model_name, vector_name, epochs=5)

Corpus is now class

3.3.2 4.0.0
from bnlp.corpus import stopwords, punctuations, letters, digits

print(stopwords)
print(punctuations)
print(letters)
print(digits)
from bnlp import BengaliCorpus as corpus

print(corpus.stopwords)
print(corpus.punctuations)
print(corpus.letters)
print(corpus.digits)
print(corpus.vowels)

Contributors

  • Ibrahim (automatic model downloading, fixing glove vector loading)

BNLP 4.0.0-dev2

12 Aug 17:33
Compare
Choose a tag to compare
v4.0.0dev2

add 4.0.0 dev2 version for building

BNLP 3.3.2

10 Jul 05:50
Compare
Choose a tag to compare

Bug fix

  • NLTK sentence tokenizer dummy token replacement bug fixed. It was not tokening the (.) based on the algorithm.

Incompatibility warning

  • The upcoming bnlp version 4.0.0 (dev release available) will be totally incompatible with the present and past versions. Added a deprecation warning so every time someone tries to import this version it will warn the user to put the exact version if they do not want to upgrade to the newer version.

v3.3.1: Patch release

29 Apr 03:40
Compare
Choose a tag to compare

Fixed version incompatibility of gensim and python 3.10

  • remove the exact version of Gensim and replace it with the latest Gensim version to fix the build problem in Python 3.10 (#29 )

BNLP 3.3.0

07 Mar 11:41
Compare
Choose a tag to compare

Bug Fix

  • remove wasabi text formatting for updated version build problem in different os, python version

New Feature

Text Cleaning

We adopted different text-cleaning formulas, and codes from clean-text and modified for Bangla. Now you can normalize and clean your text using the following methods.

from bnlp import CleanText

clean_text = CleanText(
   fix_unicode=True,
   unicode_norm=True,
   unicode_norm_form="NFKC",
   remove_url=False,
   remove_email=False,
   remove_emoji=False,
   remove_number=False,
   remove_digits=False,
   remove_punct=False,
   replace_with_url="<URL>",
   replace_with_email="<EMAIL>",
   replace_with_number="<NUMBER>",
   replace_with_digit="<DIGIT>",
   replace_with_punct = "<PUNC>"
)

input_text = "আমার সোনার বাংলা।"
clean_text = clean_text(input_text)
print(clean_text)