Skip to content

v0.10.0

Compare
Choose a tag to compare
@ausgerechnet ausgerechnet released this 21 Nov 19:18
· 274 commits to master since this release
d44057b

dropped cwb-python dependency

  • should close #35
  • cqp.py was already included in cwb-ccc
  • cl.pyx now included too
  • this means that C-code has to be compiled during installation

improved tests

  • closes #31
  • included new UCS reference counts on dedicated test corpus
  • included EmpiriST counts for testing keyword functionality
  • re-wrote most of the tests on dedicated test corpus
  • re-wrote most of the tests so they really assert instead of print

included github actions

  • build & test
  • dist & publish on PyPI (WIP)
  • helps closing #35

improved data_path / cache

  • address #30
  • data_path for each __version__
  • data_path for each library: this invalidates the cache when wordlists and macros are updated
  • library files must now end on ".txt"

introduced FreqFrames

  • FreqFrames are DataFrames with frequency information returned by
    • Counts.cpos()
    • Counts.dump()
    • Counts.matches()
    • Counts.mwus()
    • Corpus.marginals()
    • Corpus.marginals_complex()
  • new consistent behaviour:
    • format = [(" ".join(p_att))] freq, p_att[0], p_att[1], ...
    • indexed by a single character named item
    • frequency column named freq
    • additional columns with all separate p-attributes
  • cf. old behaviour: (p_att[0], p_att[1], ...) some_column
    • MultiIndex
    • inconsistently named frequency column
  • heuristics for MWUs remains unchanged, i.e. they are " "-joined in index

re-factored Discourseme Constellations

  • now with tests!
  • two types of constellations (inner vs textual constellations)
  • create_constellation() wrapper

improved Collocates

  • consistent AM scoring (ScoreFrame) for keywords and collocates
  • collocation retrieval considerably faster (count once for max window size)
  • upgrade of association-measures module gives more (and more stable) AMs

further enhancements

  • p-att selection in dump.breakdown()
  • corpus.marginals() can now be called without items (yielding marginal freq of all items)
  • cqpy_dump(), cqpy_load(), cqpy_dumps(), cqpy_loads()

miscellaneous

  • changed anchor correction behaviour: use context/contextend instead of NA when out of bounds
  • removed some for loops using list comprehensions
  • included some more __str__ and __repr__
  • sphinx documentation (WIP, address #7)
  • Lint (WIP)
  • Docker