Skip to content

Commit

Permalink
Update README to new API
Browse files Browse the repository at this point in the history
  • Loading branch information
codertimo committed Oct 20, 2018
1 parent ae094c8 commit b8f27e3
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,24 +39,28 @@ pip install bert-pytorch
## Quickstart

**NOTICE : Your corpus should be prepared with two sentences in one line with tab(\t) separator**

### 0. Prepare your corpus
```
Welcome to the \t the jungle \n
I can stay \t here all night \n
Welcome to the \t the jungle\n
I can stay \t here all night\n
```

### 1. Building vocab based on your corpus
```shell
bert-vocab -c data/corpus.small -o data/corpus.small.vocab
or tokenized corpus (tokenization is not in package)
```
Wel_ _come _to _the \t _the _jungle\n
_I _can _stay \t _here _all _night\n
```


### 2. Building BERT train dataset with your corpus
### 1. Building vocab based on your corpus
```shell
bert-dataset -d data/corpus.small -v data/corpus.small.vocab -o data/dataset.small
bert-vocab -c data/corpus.small -o data/vocab.small
```

### 3. Train your own BERT model
### 2. Train your own BERT model
```shell
bert -d data/dataset.small -v data/corpus.small.vocab -o output/bert.model
bert -c data/dataset.small -v data/vocab.small -o output/bert.model
```

## Language Model Pre-training
Expand Down

0 comments on commit b8f27e3

Please sign in to comment.