Skip to content

An open source Go parser for the CC-CEDICT Chinese Dictionary

License

Notifications You must be signed in to change notification settings

pzelchenko/cedict

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

CEDict Parser in Go

Package cedict provides a parser / tokenizer for reading entries from the CC-CEDict Chinese dictionary project.

Installation

Assuming you have Go installed, installation is as easy as running:

go get github.com/hermanschaaf/cedict

You will need a copy of the CEDict dictionary text file. You can download CEDict from MDBG.net. Extract the file somewhere you want to use it from, and then follow the usage instructions below.

Usage

Tokenizing is done by creating a CEDict for an io.Reader r. It is the caller's responsibility to ensure that r provides a CEDict-formatted dictionary.

    import "github.com/hermanschaaf/cedict"

    ...

    c := cedict.New(r) // r is an io.Reader to the cedict file

Given a CEDict called c, the dictionary is tokenized by repeatedly calling c.NextEntry(), which parses until it reaches the next entry, or an error if no more entries are found:

    for {
        err := c.NextEntry()
        if err != nil {
            break
        }
        entry := c.Entry()
        fmt.Println(entry.Simplified, entry.Definitions[0])
    }

To retrieve the current entry, the Entry method can be called. There is also a lower-level API available, using the bufio.Scanner Scan method. Using this lower-level API is the recommended way to read comments from the CEDict, should that be necessary.

Documentation

Full documentation can be found at https://godoc.org/github.com/hermanschaaf/cedict

About

An open source Go parser for the CC-CEDICT Chinese Dictionary

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published