Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lexer.rbs #33

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add lexer.rbs #33

wants to merge 3 commits into from

Conversation

akouryy
Copy link
Contributor

@akouryy akouryy commented May 27, 2023

Added lexer.rbs.

Discussions:

  • Type and Token have type parameters named SValue. I expect them to work well in parser.rb.
    • They are invariant. I could not find a good way to make them covariant while keeping create_token(Token::Number, str, ...) untypable.
  • I added patch.rbs, which patches the types of StringScanner#[] and StringScanner#getch.
    • Another approach would be to define the non-nil versions of the methods, e.g., StringScanner#fetch and StringScanner#getch!, as soutaro remarked.
    • Yet another approach would be to replace every ss[n] with (ss[n] || raise) and so on, but that would be hard to read.
  • On the other hand, I avoided patching Array#[] and Array#first because Array is used widely, and such patches may affect other files. Instead, I replaced array[0...l] with array.take(l) and array.first with array.fetch(0).

@akouryy akouryy changed the title Lexer rbs Add lexer.rbs May 27, 2023
@@ -18,8 +20,8 @@ def to_s
"#{super} line: #{line}, column: #{column}"
end

@i = 0
@types = []
instance_variable_set :@i, 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] steep can not detect a type of instance variable for original codes, even so rbs file has declarations for them. It might the limitation for Struct with block (?).

lib/lrama/lexer.rb:23:6: [error] Cannot find the declaration of instance variable: `@i`
│ Diagnostic ID: Ruby::UnknownInstanceVariable
│
└       @i = 0
        ~~

lib/lrama/lexer.rb:24:6: [error] Cannot find the declaration of instance variable: `@types`
│ Diagnostic ID: Ruby::UnknownInstanceVariable
│
└       @types = []
        ~~~~~~

Detected 2 problems from 1 file

@@ -62,8 +64,7 @@ def self.define_type(name)
GrammarRules = 3
Epilogue = 4

# Token types

# @dynamic prologue, bison_declarations, grammar_rules, epilogue, bison_declarations_tokens, grammar_rules_tokens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my environment bundle exec steep check passes without this annotation. Is there any benefit for these annotation?

Token = Struct.new(:type, :s_value, keyword_init: true) do
Type = Struct.new(:id, :name, keyword_init: true)
Token = _ = Struct.new(:type, :s_value, keyword_init: true) do
# @implements Token[SValue]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] This is an annotation defined in steep not rbs.

@@ -7,8 +7,10 @@ class Lexer
include Lrama::Report::Duration

# s_value is semantic value
Token = Struct.new(:type, :s_value, keyword_init: true) do
Type = Struct.new(:id, :name, keyword_init: true)
Token = _ = Struct.new(:type, :s_value, keyword_init: true) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] Inserting _ between a constant and Struct.new is a common workaround when we use Struct to define new class.

type line_data = [String, Integer]
type reference = [::Symbol, Integer | String, Token[untyped]?, Integer, Integer]

class Type[SValue] < Struct[untyped]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need SValue type variable? Always Type#id is Interger and Type#name is String.

>> Lrama::Lexer::Token::P_expect
=> #<struct Lrama::Lexer::Type id=0, name="P_expect">
>> Lrama::Lexer::Token::Number
=> #<struct Lrama::Lexer::Type id=15, name="Number">

On the other hand. Token needs type variable because type of s_value depends on the value of Type.

@yui-knk
Copy link
Collaborator

yui-knk commented May 28, 2023

  • type variance: I'm not sure the point. Can I ask you to share the code which does not work well?
  • patch.rbs: I agree to the current approach. There are 3 options for it as you write. However rewriting the method call like (ss[n] || raise) make codes un-readable. I don't have strong preference between defining StringScanner#fetch and patching rbs definitions. But patching rbs is bit better as a first step because StringScanner is not used widely in this project.
  • Array: Both replacement seems reasonable for me.

def lex_common: (Array[line_data] lines, Array[Token[untyped]] tokens) -> void
def lex_bison_declarations_tokens: -> void
def lex_user_code: (StringScanner, Integer line, Integer column, Array[line_data] lines) -> [Token[String], Integer]
def lex_string: (StringScanner, String terminator, Integer line, Array[line_data] lines) -> line_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely, the type of returned value is [String, Integer]. But this string is a part of line so I'm wondering which is better to use line_data type or [String, Integer]. Do you have any opinion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants