CLDF dataset derived from Chén's "Miao and Yao Language" from 2012

How to cite

If you use these data please cite

the original source

Chén, Qíguāng 陳其光 (2012): Miáoyáo yǔwén 苗瑤语文 [Miao and Yao language]. Zhōngyāng Mínzú Dàxué 中央民族大学 [China Minzu University Press].
the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://en.wiktionary.org/wiki/Appendix:Hmong-Mien_comparative_vocabulary_list

Conceptlists in Concepticon:

Chen-2012-888

Notes

This dataset comprises 25 Hmong-Mien varieties, which were originally digitized from the source by Doug Cooper and later shared publicly on Wiktionary. We list the data in segmented form, adding also morpheme boundaries.

We have added a couple of custom commands that allow you to follow a specific workflow for computer-assisted language comparison. In order to do so, install the package and its dependencies, and then test the following commands:

$ cldfbench chenhmongmien.check_structure 
$ cldfbench chenhmongmien.wf_select
$ cldfbench chenhmongmien.wf_partial
$ cldfbench chenhmongmien.wf_alignment
$ cldfbench chenhmongmien.wf_crosssemantic
$ cldfbench chenhmongmien.wf_correspondence

For more details, compare our detailed tutorial at lingpy/workflow-paper. This tutorial has been accepted for publication with the Journal of Open Humanities Data. When using the processed data or the code to process data in your research, please cite this study as:

Wu, M.-S.; Schweikhard, N. E.; Bodt, T. A.; Hill, N. W. & List, J.-M. (forthcoming): "Computer-Assisted Language Comparison. State of the Art. Journal of Open Humanities Data.

The corresponding BibTeX format is:

@Article{Wu2020,
  author     = {Wu, Mei-Shin and Schweikhard, Nathanael E. and Bodt, Timotheus A. and Hill, Nathan W. and List, Johann-Mattis},
  title      = {Computer-Assisted Language Comparison. State of the Art},
  journal    = {Journal of Open Humanities Data},
  year       = {forthcoming},
  howpublished = {Accepted for publication in 2020}
}

Statistics

Varieties: 25 (linked to 22 different Glottocodes)
Concepts: 883 (linked to 799 different Concepticon concept sets)
Lexemes: 22,011
Sources: 1
Synonymy: 1.03
Invalid lexemes: 0
Tokens: 116,296
Segments: 259 (0 BIPA errors, 0 CLTS sound class errors, 254 CLTS modified)
Inventory size (avg): 72.04

Contributors

Name	GitHub user	Description	Role
Chen, Qiguang			Author
Johann-Mattis List	@LinguList	dataset patron	Editor
Mei-Shin Wu	@macyl	orthography profile, concept mapping	Other
Doug Cooper	@restinplace	digitized the data	DataCurator, Distributor

CLDF Datasets

The following CLDF datasets are available in cldf:

CLDF Wordlist at cldf/cldf-metadata.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CLDF dataset derived from Chén's "Miao and Yao Language" from 2012

How to cite

Description

Notes

Statistics

Contributors

CLDF Datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

CLDF dataset derived from Chén's "Miao and Yao Language" from 2012

How to cite

Description

Notes

Statistics

Contributors

CLDF Datasets