-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
64 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
欲再做分詞,需先細數零.二版加入了哪些新詞。 | ||
|
||
運算子方面,有: | ||
- % | ||
- == | ||
- != | ||
- > | ||
- >= | ||
- < | ||
- <= | ||
|
||
特殊符號有: | ||
- 【 | ||
- 】 | ||
|
||
關鍵字則有: | ||
- 術 | ||
- 若 | ||
- 或若 | ||
- 不然 | ||
- 歸 | ||
|
||
若不以功能性區分,而是以詞固有的性質來區分: | ||
|
||
運算子與特殊符號用到的符號都不允許出現在識別子(變數名、術名),可以歸為一類討論。其中,長度為 1 的詞有可能恰為長度為 2 的詞的前綴,此狀況需要特殊判定。其餘 1 字詞判定與+-*/處理方式相同。 | ||
|
||
關鍵字中的 1 字詞處理方式與零.一版的`元`相同,2 字詞與 1 字詞也類似,只是會增加一個狀態。 | ||
|
||
回憶零.一版的分詞狀態機: | ||
![零・一版分詞狀態機](../image/零・一版分詞狀態機.png) | ||
|
||
以此為基礎繪製零.二版的分詞狀態機,貧道略去 1 字符號,1 字關鍵字僅以`元`為代表,並以`不然`為 2 字關鍵字之例,`>=`與`>`為前綴問題之例。 | ||
|
||
此外,除變數名之外,術名也允許非特殊符字符任意組合,今統一講此二者稱為`識別子`。 | ||
|
||
![零・二版分詞狀態機](../image/零・二版分詞狀態機.png) | ||
|
||
新圖中`x`的含義改變為「其他出邊字符之外的所有非特殊字符」。 | ||
TODO: 修改零.一版分詞狀態圖, | ||
|
||
本次分詞就不再附上代碼,在零.一版的基礎上依狀態機畫葫蘆即可。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters