Update parser as per latest changes in nom #132

subhojit777 · 2018-09-21T04:10:48Z

No description provided.

Hywan · 2018-09-21T07:04:06Z

Thanks for the PR!

Do you have an idea why tests are failing?

subhojit777 · 2018-09-21T09:54:09Z

I will take a look. I did cargo build, and I came to know to know that InputTake and InputTakeAtPosition are not implemented. I will look into the tests.

subhojit777 · 2018-09-26T06:54:32Z

source/tokens.rs

+  {
+    match (0..self.slice.len()).find(|b| predicate(self.slice[*b])) {
+      Some(0) => Err(Err::Error(Context::Code(*self, e))),
+      Some(i) => Ok((Span {


@Hywan So I found one of the many unknown reasons why the tests are failing.

One problem is that the new Span is not passing the correct offset, line, column values - after the split. I am looking for examples where, after the slice is converted into utf8, the values are computed using a function or something. There are tabs and newslines in the input, it should return the correct offset and line after converting it into utf8. Do you know any such example?

For instance:

fn case_first_with_whitespace() { named!(hello<Span, Span>, tag!(b"hello")); named!(test1<Span, Span>, first!(tag!(b"hello"))); named!(test2<Span, Span>, first!(hello)); let input = Span::new(b" \nhello\t\r"); let output = Ok((Span::new_at(b"\t\r", 8, 2, 6), Span::new_at(b"hello", 3, 2, 1))); assert_eq!(test1(input), output); assert_eq!(test2(input), output); }

Fails with out output:

Diff < left / right > : Ok( ( Span { < offset: 0, < line: 1, < column: 1, > offset: 8, > line: 2, > column: 6, slice: [ 9, 13 ] }, Span { < offset: 0, < line: 1, > offset: 3, > line: 2, column: 1, slice: [ 104, 101, 108, 108, 111 ] } ) )

When computing the new Span, you should update the offset to include i (from find). Same for line and column.

You might want to check

parser/source/tokens.rs

Lines 1180 to 1222 in 077aa7a

fn slice(&self, range: $range) -> Self {

let next_slice = &self.slice[range];

if next_slice == self.slice {

return *self;

}

let next_offset = self.slice.offset(next_slice);

if next_offset == 0 {

return Span {

offset: self.offset,

line : self.line,

column: self.column,

slice : next_slice

};

}

let consumed = &self.slice[..next_offset];

let number_of_newlines = bytecount::count(consumed, b'\n') as u32;

let next_column =

if number_of_newlines == 0 {

self.column + next_offset as u32

} else {

match memchr::memrchr(b'\n', consumed) {

Some(last_newline_position) => {

(next_offset - last_newline_position) as u32

},

None => {

unreachable!();

}

}

};

Span {

offset: self.offset + next_offset,

line : self.line + number_of_newlines,

column: next_column,

slice : next_slice

}

}

to see an example of that logic. One way to do that would be (I imagine):

Ok((self.slice(i..), self.slice(..i))

Because slice will return a Span with appropriated values set.

subhojit777 · 2018-10-09T03:23:05Z

I looked into the remaining failing tests. One of them is case_exclude_empty_set(). I found that it is failing because it uses the InputTakeAtPosition and FindToken implementations of &[u8] which is provided by the nom framework.

The test output:

Diff < left / right > :
 
<Err(
<    Incomplete(
<        Size(
<            1
<        )
>Ok(
>    (
>        [],
>        [
>            102,
>            101,
>            100,
>            97,
>            98,
>            99
>        ]
     )
 )

Do you think that they need to be implemented differently inside the parser library? @Hywan

subhojit777 · 2018-10-09T03:32:13Z

    fn case_exclude_empty_set() {
        named!(
            test,
            exclude!(
                is_a!("abcdef"),
                alt!(
                    tag!("abc")
                  | tag!("ace")
                )
            )
        );

        assert_eq!(test(&b"fedabc"[..]), Ok((&b""[..], &b"fedabc"[..])));
    }

is_a!("abcdef") uses find_token() which returns bool. But split_at_position1() does Some() pattern match-ing. This is weird. I think I am missing something here. Any idea?

Hywan · 2018-10-09T08:36:46Z

Please, give me some days to look at this. I don't time right now, hope to get free time in few days :-). Thanks for your patience!

subhojit777 · 2018-10-10T03:58:05Z

Meanwhile I can do more investigation.

#132 (comment) - I am wrong in this comment. It is alright there. The find() inside split_at_position1() expects the Fn to return a bool

But I am confused here because in every iteration it is true, still it is executing the None arm.

Hywan · 2018-11-14T13:34:00Z

Working on this right now.

subhojit777 · 2018-12-05T03:01:31Z

@Hywan Did you check this?

Hywan · 2018-12-05T09:27:12Z

Yes, and I have similar issue. I've paused my patch for some weeks because of other projects. I'm planning to switch back to Tagua VM very soon.

Implement InputTake for Span

8e84f42

Hywan self-assigned this Sep 21, 2018

Hywan added enhancement in progress component-internal labels Sep 21, 2018

subhojit777 commented Sep 26, 2018

View reviewed changes

subhojit777 force-pushed the update branch from 821f857 to da49be1 Compare September 27, 2018 06:40

Implement InputTakeAtPosition for Span

0da60f8

subhojit777 force-pushed the update branch from da49be1 to 0da60f8 Compare September 27, 2018 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update parser as per latest changes in nom #132

Update parser as per latest changes in nom #132

subhojit777 commented Sep 21, 2018

Hywan commented Sep 21, 2018

subhojit777 commented Sep 21, 2018

subhojit777 Sep 26, 2018 •

edited

Loading

Hywan Sep 26, 2018

subhojit777 commented Oct 9, 2018

subhojit777 commented Oct 9, 2018

Hywan commented Oct 9, 2018

subhojit777 commented Oct 10, 2018

Hywan commented Nov 14, 2018

subhojit777 commented Dec 5, 2018

Hywan commented Dec 5, 2018

	fn slice(&self, range: $range) -> Self {
	let next_slice = &self.slice[range];

	if next_slice == self.slice {
	return *self;
	}

	let next_offset = self.slice.offset(next_slice);

	if next_offset == 0 {
	return Span {
	offset: self.offset,
	line : self.line,
	column: self.column,
	slice : next_slice
	};
	}

	let consumed = &self.slice[..next_offset];
	let number_of_newlines = bytecount::count(consumed, b'\n') as u32;

	let next_column =
	if number_of_newlines == 0 {
	self.column + next_offset as u32
	} else {
	match memchr::memrchr(b'\n', consumed) {
	Some(last_newline_position) => {
	(next_offset - last_newline_position) as u32
	},

	None => {
	unreachable!();
	}
	}
	};

	Span {
	offset: self.offset + next_offset,
	line : self.line + number_of_newlines,
	column: next_column,
	slice : next_slice
	}
	}

Update parser as per latest changes in nom #132

Are you sure you want to change the base?

Update parser as per latest changes in nom #132

Conversation

subhojit777 commented Sep 21, 2018

Hywan commented Sep 21, 2018

subhojit777 commented Sep 21, 2018

subhojit777 Sep 26, 2018 • edited Loading

Choose a reason for hiding this comment

Hywan Sep 26, 2018

Choose a reason for hiding this comment

subhojit777 commented Oct 9, 2018

subhojit777 commented Oct 9, 2018

Hywan commented Oct 9, 2018

subhojit777 commented Oct 10, 2018

Hywan commented Nov 14, 2018

subhojit777 commented Dec 5, 2018

Hywan commented Dec 5, 2018

subhojit777 Sep 26, 2018 •

edited

Loading