Skip to content

Releases: ausgerechnet/cwb-ccc

v0.11.2

25 Nov 12:47
b5f67f7
Compare
Choose a tag to compare

faster concordancing

  • introduce Corpus().quick_query() and Corpus().quick_conc() for extracting concordance lines without retrieving all relevant dumps as dataframe
  • used in mmda-toolkit >= v0.3.1
  • also addresses #30

misc

  • added citation file

v0.11.1

18 Oct 11:08
80d758e
Compare
Choose a tag to compare

downwards compatibility

  • deterministically link against -lcl -lm -lpcre -lglib-2.0 when CWB version < 3.4.37 during installation

v0.11.0

16 Oct 23:23
ba89495
Compare
Choose a tag to compare

usability and functionality:

  • implement breakdown() for discourseme constellations
  • make Corpus.query() the central entrypoint (also for s-attribute queries)
  • remove inplace subcorpus creation; return new Corpus instead

stability and compatibility:

  • make pandas operations future-proof
  • catch decode errors and missing s-attributes

misc:

  • simplify README and introduce Vignette
  • update github workflows (include python 3.10; simplify publishing)
  • work towards PEP compliance

v0.10.3

30 Aug 22:51
ff9434c
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.10.2...v0.10.3

v0.10.2

06 Mar 02:57
f6296ce
Compare
Choose a tag to compare
  • upgrade to association-measures v0.2.0
  • simplify freq list scoring
  • handle invalid queries
  • initialize macros defining wordlists
  • show available wordlists
  • handle display settings in cqpy queries
  • breakdown with flags
  • improve discourseme constellation handling, esp. concordancing

v0.10.1

01 Dec 20:07
626caeb
Compare
Choose a tag to compare
  • bugfix in anchor corrections for missing anchors
  • ensure anchors are integers (especially useful for CQPY files)
  • sort import statements to comply with PEP
  • cosmetic improvements to README

v0.10.0

21 Nov 19:18
d44057b
Compare
Choose a tag to compare

dropped cwb-python dependency

  • should close #35
  • cqp.py was already included in cwb-ccc
  • cl.pyx now included too
  • this means that C-code has to be compiled during installation

improved tests

  • closes #31
  • included new UCS reference counts on dedicated test corpus
  • included EmpiriST counts for testing keyword functionality
  • re-wrote most of the tests on dedicated test corpus
  • re-wrote most of the tests so they really assert instead of print

included github actions

  • build & test
  • dist & publish on PyPI (WIP)
  • helps closing #35

improved data_path / cache

  • address #30
  • data_path for each __version__
  • data_path for each library: this invalidates the cache when wordlists and macros are updated
  • library files must now end on ".txt"

introduced FreqFrames

  • FreqFrames are DataFrames with frequency information returned by
    • Counts.cpos()
    • Counts.dump()
    • Counts.matches()
    • Counts.mwus()
    • Corpus.marginals()
    • Corpus.marginals_complex()
  • new consistent behaviour:
    • format = [(" ".join(p_att))] freq, p_att[0], p_att[1], ...
    • indexed by a single character named item
    • frequency column named freq
    • additional columns with all separate p-attributes
  • cf. old behaviour: (p_att[0], p_att[1], ...) some_column
    • MultiIndex
    • inconsistently named frequency column
  • heuristics for MWUs remains unchanged, i.e. they are " "-joined in index

re-factored Discourseme Constellations

  • now with tests!
  • two types of constellations (inner vs textual constellations)
  • create_constellation() wrapper

improved Collocates

  • consistent AM scoring (ScoreFrame) for keywords and collocates
  • collocation retrieval considerably faster (count once for max window size)
  • upgrade of association-measures module gives more (and more stable) AMs

further enhancements

  • p-att selection in dump.breakdown()
  • corpus.marginals() can now be called without items (yielding marginal freq of all items)
  • cqpy_dump(), cqpy_load(), cqpy_dumps(), cqpy_loads()

miscellaneous

  • changed anchor correction behaviour: use context/contextend instead of NA when out of bounds
  • removed some for loops using list comprehensions
  • included some more __str__ and __repr__
  • sphinx documentation (WIP, address #7)
  • Lint (WIP)
  • Docker

v0.9.15

14 Apr 08:26
5ade791
Compare
Choose a tag to compare
v0.9.15 Pre-release
Pre-release
  • functional discourseme layer (for MMDA toolkit)
  • reasonable dependencies

v0.9.14

06 Apr 11:23
b43c415
Compare
Choose a tag to compare
v0.9.14 Pre-release
Pre-release
  • re-factor concordances
  • fix cached query results (NQRs)
  • fix s-attribute selection
  • collocates and keywords on p-att combinations
  • update demos & README

v0.9.13

17 Feb 11:18
3de0ea6
Compare
Choose a tag to compare
v0.9.13 Pre-release
Pre-release
  • some bugfixes
  • improve usability