Releases: ausgerechnet/cwb-ccc
Releases · ausgerechnet/cwb-ccc
v0.11.2
v0.11.1
downwards compatibility
- deterministically link against
-lcl -lm -lpcre -lglib-2.0
when CWB version < 3.4.37 during installation
v0.11.0
usability and functionality:
- implement
breakdown()
for discourseme constellations - make
Corpus.query()
the central entrypoint (also for s-attribute queries) - remove inplace subcorpus creation; return new
Corpus
instead
stability and compatibility:
- make pandas operations future-proof
- catch decode errors and missing s-attributes
misc:
- simplify README and introduce Vignette
- update github workflows (include python 3.10; simplify publishing)
- work towards PEP compliance
v0.10.3
v0.10.2
- upgrade to association-measures v0.2.0
- simplify freq list scoring
- handle invalid queries
- initialize macros defining wordlists
- show available wordlists
- handle display settings in cqpy queries
- breakdown with flags
- improve discourseme constellation handling, esp. concordancing
v0.10.1
- bugfix in anchor corrections for missing anchors
- ensure anchors are integers (especially useful for CQPY files)
- sort import statements to comply with PEP
- cosmetic improvements to README
v0.10.0
dropped cwb-python dependency
- should close #35
cqp.py
was already included in cwb-ccccl.pyx
now included too- this means that C-code has to be compiled during installation
improved tests
- closes #31
- included new UCS reference counts on dedicated test corpus
- included EmpiriST counts for testing keyword functionality
- re-wrote most of the tests on dedicated test corpus
- re-wrote most of the tests so they really assert instead of print
included github actions
- build & test
- dist & publish on PyPI (WIP)
- helps closing #35
improved data_path / cache
- address #30
- data_path for each
__version__
- data_path for each library: this invalidates the cache when wordlists and macros are updated
- library files must now end on ".txt"
introduced FreqFrames
- FreqFrames are DataFrames with frequency information returned by
Counts.cpos()
Counts.dump()
Counts.matches()
Counts.mwus()
Corpus.marginals()
Corpus.marginals_complex()
- new consistent behaviour:
- format =
[(" ".join(p_att))] freq, p_att[0], p_att[1], ...
- indexed by a single character named
item
- frequency column named
freq
- additional columns with all separate p-attributes
- format =
- cf. old behaviour:
(p_att[0], p_att[1], ...) some_column
MultiIndex
- inconsistently named frequency column
- heuristics for MWUs remains unchanged, i.e. they are " "-joined in index
re-factored Discourseme Constellations
- now with tests!
- two types of constellations (inner vs textual constellations)
create_constellation()
wrapper
improved Collocates
- consistent AM scoring (ScoreFrame) for keywords and collocates
- collocation retrieval considerably faster (count once for max window size)
- upgrade of association-measures module gives more (and more stable) AMs
further enhancements
- p-att selection in
dump.breakdown()
corpus.marginals()
can now be called without items (yielding marginal freq of all items)cqpy_dump()
,cqpy_load()
,cqpy_dumps()
,cqpy_loads()
miscellaneous
- changed anchor correction behaviour: use context/contextend instead of NA when out of bounds
- removed some for loops using list comprehensions
- included some more
__str__
and__repr__
- sphinx documentation (WIP, address #7)
- Lint (WIP)
- Docker
v0.9.15
- functional discourseme layer (for MMDA toolkit)
- reasonable dependencies
v0.9.14
- re-factor concordances
- fix cached query results (NQRs)
- fix s-attribute selection
- collocates and keywords on p-att combinations
- update demos & README
v0.9.13
- some bugfixes
- improve usability