v0.10.0

ausgerechnet released this 21 Nov 19:18

· 274 commits to master since this release

d44057b

dropped cwb-python dependency

should close #35
cqp.py was already included in cwb-ccc
cl.pyx now included too
this means that C-code has to be compiled during installation

improved tests

closes #31
included new UCS reference counts on dedicated test corpus
included EmpiriST counts for testing keyword functionality
re-wrote most of the tests on dedicated test corpus
re-wrote most of the tests so they really assert instead of print

included github actions

build & test
dist & publish on PyPI (WIP)
helps closing #35

improved data_path / cache

address #30
data_path for each __version__
data_path for each library: this invalidates the cache when wordlists and macros are updated
library files must now end on ".txt"

introduced FreqFrames

FreqFrames are DataFrames with frequency information returned by
- Counts.cpos()
- Counts.dump()
- Counts.matches()
- Counts.mwus()
- Corpus.marginals()
- Corpus.marginals_complex()
new consistent behaviour:
- format = [(" ".join(p_att))] freq, p_att[0], p_att[1], ...
- indexed by a single character named item
- frequency column named freq
- additional columns with all separate p-attributes
cf. old behaviour: (p_att[0], p_att[1], ...) some_column
- MultiIndex
- inconsistently named frequency column
heuristics for MWUs remains unchanged, i.e. they are " "-joined in index

re-factored Discourseme Constellations

now with tests!
two types of constellations (inner vs textual constellations)
create_constellation() wrapper

improved Collocates

consistent AM scoring (ScoreFrame) for keywords and collocates
collocation retrieval considerably faster (count once for max window size)
upgrade of association-measures module gives more (and more stable) AMs

further enhancements

p-att selection in dump.breakdown()
corpus.marginals() can now be called without items (yielding marginal freq of all items)
cqpy_dump(), cqpy_load(), cqpy_dumps(), cqpy_loads()

miscellaneous

changed anchor correction behaviour: use context/contextend instead of NA when out of bounds
removed some for loops using list comprehensions
included some more __str__ and __repr__
sphinx documentation (WIP, address #7)
Lint (WIP)
Docker

Assets 2