Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lexer.rbs #33

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Steepfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ target :lib do
signature "sig"

check "lib/lrama/bitmap.rb"
check "lib/lrama/lexer.rb"
check "lib/lrama/report.rb"
check "lib/lrama/warning.rb"
end
27 changes: 14 additions & 13 deletions lib/lrama/lexer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,10 @@ class Lexer
include Lrama::Report::Duration

# s_value is semantic value
Token = Struct.new(:type, :s_value, keyword_init: true) do
Type = Struct.new(:id, :name, keyword_init: true)
Token = _ = Struct.new(:type, :s_value, keyword_init: true) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] Inserting _ between a constant and Struct.new is a common workaround when we use Struct to define new class.

# @implements Token[SValue]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] This is an annotation defined in steep not rbs.


Type = _ = Struct.new(:id, :name, keyword_init: true)

attr_accessor :line, :column, :referred
# For User_code
Expand All @@ -18,8 +20,8 @@ def to_s
"#{super} line: #{line}, column: #{column}"
end

@i = 0
@types = []
instance_variable_set :@i, 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] steep can not detect a type of instance variable for original codes, even so rbs file has declarations for them. It might the limitation for Struct with block (?).

lib/lrama/lexer.rb:23:6: [error] Cannot find the declaration of instance variable: `@i`
│ Diagnostic ID: Ruby::UnknownInstanceVariable
│
└       @i = 0
        ~~

lib/lrama/lexer.rb:24:6: [error] Cannot find the declaration of instance variable: `@types`
│ Diagnostic ID: Ruby::UnknownInstanceVariable
│
└       @types = []
        ~~~~~~

Detected 2 problems from 1 file

instance_variable_set :@types, []

def self.define_type(name)
type = Type.new(id: @i, name: name.to_s)
Expand Down Expand Up @@ -62,8 +64,7 @@ def self.define_type(name)
GrammarRules = 3
Epilogue = 4

# Token types

# @dynamic prologue, bison_declarations, grammar_rules, epilogue, bison_declarations_tokens, grammar_rules_tokens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my environment bundle exec steep check passes without this annotation. Is there any benefit for these annotation?

attr_reader :prologue, :bison_declarations, :grammar_rules, :epilogue,
:bison_declarations_tokens, :grammar_rules_tokens

Expand Down Expand Up @@ -147,7 +148,7 @@ def lex_text
# * https://www.gnu.org/software/bison/manual/html_node/Symbol-Decls.html
# * https://www.gnu.org/software/bison/manual/html_node/Empty-Rules.html
def lex_common(lines, tokens)
line = lines.first[1]
line = lines.fetch(0)[1]
column = 0
ss = StringScanner.new(lines.map(&:first).join)

Expand Down Expand Up @@ -217,9 +218,9 @@ def lex_common(lines, tokens)
when ss.scan(/%empty/)
# skip
else
l = line - lines.first[1]
l = line - lines.fetch(0)[1]
split = ss.string.split("\n")
col = ss.pos - split[0...l].join("\n").length
col = ss.pos - split.take(l).join("\n").length
raise "Parse error (unknown token): #{split[l]} \"#{ss.string[ss.pos]}\" (#{line}: #{col})"
end
end
Expand All @@ -237,7 +238,7 @@ def lex_user_code(ss, line, column, lines)
str = "{"
# Array of [type, $n, tag, first column, last column]
# TODO: Is it better to keep string, like "$$", and use gsub?
references = []
references = [] #: Array[reference]

while !ss.eos? do
case
Expand Down Expand Up @@ -291,7 +292,7 @@ def lex_user_code(ss, line, column, lines)
end

# Reach to end of input but brace does not match
l = line - lines.first[1]
l = line - lines.fetch(0)[1]
raise "Parse error (brace mismatch): #{ss.string.split("\n")[l]} \"#{ss.string[ss.pos]}\" (#{line}: #{ss.pos})"
end

Expand All @@ -315,7 +316,7 @@ def lex_string(ss, terminator, line, lines)
end

# Reach to end of input but quote does not match
l = line - lines.first[1]
l = line - lines.fetch(0)[1]
raise "Parse error (quote mismatch): #{ss.string.split("\n")[l]} \"#{ss.string[ss.pos]}\" (#{line}: #{ss.pos})"
end

Expand All @@ -336,7 +337,7 @@ def lex_comment(ss, line, lines, str)
end

# Reach to end of input but quote does not match
l = line - lines.first[1]
l = line - lines.fetch(0)[1]
raise "Parse error (comment mismatch): #{ss.string.split("\n")[l]} \"#{ss.string[ss.pos]}\" (#{line}: #{ss.pos})"
end

Expand Down
84 changes: 84 additions & 0 deletions sig/lrama/lexer.rbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
module Lrama
class Lexer
type line_data = [String, Integer]
type reference = [::Symbol, Integer | String, Token[untyped]?, Integer, Integer]

class Type[SValue] < Struct[untyped]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need SValue type variable? Always Type#id is Interger and Type#name is String.

>> Lrama::Lexer::Token::P_expect
=> #<struct Lrama::Lexer::Type id=0, name="P_expect">
>> Lrama::Lexer::Token::Number
=> #<struct Lrama::Lexer::Type id=15, name="Number">

On the other hand. Token needs type variable because type of s_value depends on the value of Type.

attr_accessor id(): Integer
attr_accessor name(): String

def self.new: (id: Integer, name: String) -> Type[untyped]
end

class Token[SValue] < Struct[untyped]
P_expect: Type[String]
P_define: Type[String]
P_printer: Type[String]
P_lex_param: Type[String]
P_parse_param: Type[String]
P_initial_action: Type[String]
P_union: Type[String]
P_token: Type[String]
P_type: Type[String]
P_nonassoc: Type[String]
P_left: Type[String]
P_right: Type[String]
P_prec: Type[String]
User_code: Type[String]
Tag: Type[String]
Number: Type[Integer]
Ident_Colon: Type[String]
Ident: Type[String]
Semicolon: Type[String]
Bar: Type[String]
String: Type[String]
Char: Type[String]

attr_reader type(): Type[SValue]
attr_reader s_value(): SValue
attr_accessor line: Integer
attr_accessor column: Integer
attr_accessor referred: bool?
attr_accessor references: Array[reference]

self.@i: Integer
self.@types: Array[Type[untyped]]

def self.new: [SValue] (type: Type[SValue], s_value: SValue) -> Token[SValue]
def self.define_type: (::Symbol name) -> void
end

include Report::Duration

Initial: Integer
Prologue: Integer
BisonDeclarations: Integer
GrammarRules: Integer
Epilogue: Integer

attr_reader prologue: Array[line_data]
attr_reader bison_declarations: Array[line_data]
attr_reader grammar_rules: Array[line_data]
attr_reader epilogue: Array[line_data]
attr_reader bison_declarations_tokens: Array[Token[untyped]]
attr_reader grammar_rules_tokens: Array[Token[untyped]]

@text: String
@state: Integer
@debug: boolish

def initialize: (String text) -> void

private
def create_token: [SValue] (Type[SValue] type, SValue s_value, Integer line, Integer column) -> Token[SValue]
def lex_text: -> void
def lex_common: (Array[line_data] lines, Array[Token[untyped]] tokens) -> void
def lex_bison_declarations_tokens: -> void
def lex_user_code: (StringScanner, Integer line, Integer column, Array[line_data] lines) -> [Token[String], Integer]
def lex_string: (StringScanner, String terminator, Integer line, Array[line_data] lines) -> line_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely, the type of returned value is [String, Integer]. But this string is a part of line so I'm wondering which is better to use line_data type or [String, Integer]. Do you have any opinion?

def lex_comment: (StringScanner, Integer line, Array[line_data] lines, String str) -> Integer
def lex_line_comment: (StringScanner, Integer line, String str) -> Integer
def lex_grammar_rules_tokens: -> void
def debug: (String msg) -> void
end
end
5 changes: 5 additions & 0 deletions sig/patch.rbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
class StringScanner
def []: (Integer) -> String | ...

def getch: -> String | ...
end