Parsing syntax like Rust's raw string literals #441
-
First of all, thank you very much for creating chumsky. // I can use double quotes here without terminating the string literal
r#"{"example": "json"}"#
// If I want a literal `"#` inside my string I just use one additional hash sign:
r##"this is the "coding"#trending page"## I want to parse this syntax with chumsky. fn raw_string_literal() -> impl Parser<char, String, Error = Simple<char>> + Clone {
let start = just("r")
.ignore_then(just("#").repeated())
.then_ignore(just("\""));
let end = just("\"").then(just("#").repeated().exactly(?));
end
.not()
.repeated()
.delimited_by(start, end)
.map(|chars| {
chars.into_iter().fold(String::new(), |mut s, c| {
s.push(c);
s
})
})
} The problem is that I can't know at parser build time how many hash signs have been used to start the string literal and thus how many are needed to terminate it again. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
What you are looking for is the fn raw_str_lit() -> impl Parser<char, String, Error = Simple<char>> + Clone {
just("r")
.ignore_then(
// This `Parser::map` saves us any allocations because `Vec` doesn't allocate for
// ZSTs, and we only need the length anyhow.
// This isn't necessary, and premature optimization is the root of all evil, so sue me for it ;D
just('#').map(|_| ()).repeated().collect::<Vec<_>>(),
)
.then_ignore(just('"'))
.then_with(|start| {
let end = just('"').ignore_then(just('#').repeated().exactly(start.len()));
end.not().repeated().collect()
})
}
#[cfg(test)]
mod tests {
use super::raw_str_lit;
use chumsky::Parser;
#[test]
fn empty_raw() {
let empty_raw = r###"r##""##"###;
assert_eq!(raw_str_lit().parse(empty_raw), Ok("".into()));
}
#[test]
fn non_empty_raw() {
let non_empty_raw = r##"r#"hi""you"#""##;
assert_eq!(raw_str_lit().parse(non_empty_raw), Ok("hi\"\"you".into()));
}
#[test]
fn nested_raw() {
let non_empty = r#####"r###"r##"hello there world"##"###"#####;
assert_eq!(
raw_str_lit().parse(non_empty),
Ok("r##\"hello there world\"##".into())
);
}
#[test]
fn json_raw() {
let json = r##"r#"{"example": "json"}"#"##;
assert_eq!(
raw_str_lit().parse(json),
Ok("{\"example\": \"json\"}".into())
);
}
#[test]
fn coding_trend_raw() {
let json = r###"r##"this is the "coding"#trending page"##"###;
assert_eq!(
raw_str_lit().parse(json),
Ok("this is the \"coding\"#trending page".into())
);
}
} Does this seem to be what you are after? |
Beta Was this translation helpful? Give feedback.
What you are looking for is the
Parser::then_with
method, which allows you to define a second parser based on the result of the first. In this case, it allows us to get the amount of'#'
s that the start parser was able to parse. Here is what I came up with based on what you described: