Skip to content

Commit

Permalink
104 languages, 10 tasks, dual backends, HanLPv2.1
Browse files Browse the repository at this point in the history
  • Loading branch information
hankcs committed Jan 1, 2021
1 parent 27f5ded commit bce4a4e
Show file tree
Hide file tree
Showing 570 changed files with 45,601 additions and 5,646 deletions.
9 changes: 7 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -277,12 +277,17 @@ crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
# Editor-based Rest HanLPClient
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
.idea
*.iml
data
.vscode/settings.json
.vscode
*.pkl
*.pdf
_static/
_build/
_templates/
21 changes: 21 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
language: python
cache: pip
python:
- '3.6'
install:
- pip install .
deploy:
provider: pypi
username: __token__
password:
secure: KU0S/z54UMdS3rJT0fNndVnvhKB48YBzpwBZQZAOUJafFyqw1Nm366cpn9OdyWPQ54LolQNEKyQZc7xDpV89j1ukKQ1aGgZ5rXD8zrAqivcWEzEWzRpO8uPGbGT0TSJDfd3zX8vHO5UznmW2nNuRJfHFkEmB/27TlZAs2ph/SrEGvuBQOFgQZMShzFWGRKL+kEXX946qlw1EdLe2XvpK7jkWQpG9c8S5mNhbqBMAofVAXyNoHqX3FrPdEvN9MY9iRx3FxusHBqHeRLwrPHK2aQLVUE5D0WE1NzKwNZ4UxbY4PfiESYDueqGR8O/awpuLwg+6itk6FbtExAIAZyDLvGS4o88AGks6VJlJKwdT0LZ6cR1+WOGXyewSjHiJmjdBnFCtvyjn/O6sDEIDmku4FINuNIcmXy2bYwns9D3lNzb2EYpSTu5A9Q4EAAWZ4t0DsWBSRJmuauv6VNTHOENPRXR3fA9honp6GWiEh+4b/yfIaT9p0VnkR7D3KoN27eNmouU4s68hAfnFVPnB/OWU/DNoWs2PbLo4ztficmGOcOyDbS4BjrLjxuyU3aAHYIeXAff6A3I/a1tz+QknYCOJz/ZnQ3e4FC+2lm/cCGzPTfi+IVQ7QJryAY8hbblDX48PHCzVLa0PPer+v2NZVrnfddMZoLd1ox65hM2gHuy6NkQ=
on:
python: 3.6
branch: master
env:
global:
- HANLP_VERBOSE=0
script:
- python -m unittest discover ./tests
notifications:
email: false
349 changes: 67 additions & 282 deletions README.md

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
54 changes: 54 additions & 0 deletions docs/annotations/con/ctb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Chinese Tree Bank

See also [The Bracketing Guidelines for the Penn Chinese Treebank (3.0)](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1040&context=ircs_reports).

| Tag | Definition | 定义 | 例子 |
|------|----------------------------------------------|----------------------------------------------------|-------------------|
| ADJP | adjective phrase | 形容词短语,以形容词为中心词 | 不完全、大型 |
| ADVP | adverbial phrase headed by AD (adverb) | 副词短语,以副词为中心词 | 非常、很 |
| CLP | classifier phrase | 由量词构成的短语 | 系列、大批 |
| CP | clause headed by C (complementizer) | 从句,通过带补语(如“的”、“吗”等) | 张三喜欢李四吗? |
| DNP | phrase formed by ‘‘XP + DEG’’ | 结构为XP + DEG(的)的短语,其中XP可以是ADJP、DP、QP、PP等等,用于修饰名词短语。 | 大型的、前几年的、五年的、在上海的 |
| DP | determiner phrase | 限定词短语,通常由限定词和数量词构成 | 这三个、任何 |
| DVP | phrase formed by ‘‘XP + DEV’’ | 结构为XP+地的短评,用于修饰动词短语VP | 心情失落地、大批地 |
| FRAG | fragment | 片段 | (完) |
| INTJ | interjection | 插话,感叹语 | 哈哈、切 |
| IP | simple clause headed by I (INFL) | 简单子句或句子,通常不带补语(如“的”、“吗”等) | 张三喜欢李四。 |
| LCP | phrase formed by ‘‘XP + LC’’ | 用于表本地点+方位词(LC)的短语 | 生活中、田野上 |
| LST | list marker | 列表短语,包括标点符号 | 一. |
| MSP | some particles | 其他小品词 | 所、而、来、去 |
| NN | common noun | 名词 | HanLP、技术 |
| NP | noun phrase | 名词短语,中心词通常为名词 | 美好生活、经济水平 |
| PP | preposition phrase | 介词短语,中心词通常为介词 | 在北京、据报道 |
| PRN | parenthetical | 插入语 | ,(张三说), |
| QP | quantifier phrase | 量词短语 | 三个、五百辆 |
| ROOT | root node | 根节点 | 根节点 |
| UCP | unidentical coordination phrase | 不对称的并列短语,指并列词两侧的短语类型不致 | (养老、医疗)保险 |
| VCD | coordinated verb compound | 复合动词 | 出版发行 |
| VCP | verb compounds formed by VV + VC | VV + VC形式的动词短语 | 看作是 |
| VNV | verb compounds formed by A-not-A or A-one-A | V不V形式的动词短语 | 能不能、信不信 |
| VP | verb phrase | 动词短语,中心词通常为动词 | 完成任务、努力工作 |
| VPT | potential form V-de-R or V-bu-R | V不R、V得R形式的动词短语 | 打不赢、打得过 |
| VRD | verb resultative compound | 动补结构短语 | 研制成功、降下来 |
| VSB | verb compounds formed by a modifier + a head | 修饰语+中心词构成的动词短语 | 拿来支付、仰头望去 |
7 changes: 7 additions & 0 deletions docs/annotations/con/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Constituency Parsing

```{toctree}
ctb
ptb
```

53 changes: 53 additions & 0 deletions docs/annotations/con/ptb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Penn Treebank

| Tag | Description |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ADJP | Adjective Phrase. |
| ADVP | Adverb Phrase. |
| CONJP | Conjunction Phrase. |
| FRAG | Fragment. |
| INTJ | Interjection. Corresponds approximately to the part-of-speech tag UH. |
| LST | List marker. Includes surrounding punctuation. |
| NAC | Not a Constituent; used to show the scope of certain prenominal modifiers within an NP. |
| NP | Noun Phrase. |
| NX | - Used within certain complex NPs to mark the head of the NP. Corresponds very roughly to N-bar level but used quite differently. |
| PP | Prepositional Phrase. |
| PRN | Parenthetical |
| PRT | Particle. Category for words that should be tagged RP. |
| QP | Quantifier Phrase (i.e. complex measure/amount phrase); used within NP. |
| ROOT | No description |
| RRC | Reduced Relative Clause. |
| S | conjunction or a wh-word and that does not exhibit subject-verb inversion. |
| SBAR | Clause introduced by a (possibly empty) subordinating conjunction. |
| SBARQ | - Direct question introduced by a wh-word or a wh-phrase. Indirect questions and relative clauses should be bracketed as SBAR, not SBARQ. |
| SINV | - Inverted declarative sentence, i.e. one in which the subject follows the tensed verb or modal. |
| SQ | Inverted yes/no question, or main clause of a wh-question, following the wh-phrase in SBARQ. |
| UCP | Unlike Coordinated Phrase. |
| VP | Vereb Phrase. |
| WHADJP | Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in how hot. |
| WHADVP | - Wh-adverb Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing a wh-adverb such as how or why. |
| WHNP | - Wh-noun Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing some wh-word, e.g. who, which book, whose daughter, none of which, or how many leopards. |
| WHPP | - Wh-prepositional Phrase. Prepositional phrase containing a wh-noun phrase (such as of which or by whose authority) that either introduces a PP gap or is contained by a WHNP. |
| X | - Unknown, uncertain, or unbracketable. X is often used for bracketing typos and in bracketing the…​the-constructions. |

7 changes: 7 additions & 0 deletions docs/annotations/dep/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Dependency Parsing

```{toctree}
sd
ud
```

Loading

0 comments on commit bce4a4e

Please sign in to comment.