Strategy for erroring immediately upon hitting invalid path? #225

bbqsrc · 2020-05-03T22:48:45Z

Say I have a simple rule along the lines of "(foo" _ <some other stuff>, where I know that if any of the following parameters fail, that there's no other possible rule that could match, and wish to return a descriptive error for this invalid use of (foo, what should I do?

The text was updated successfully, but these errors were encountered:

kevinmehall · 2020-05-09T19:50:18Z

I'm assuming what you mean is that you want to report an error at the position of the (, rather than the error inside the "some other stuff" you'd get by default if it failed to match. You can do this with quiet!{ "(foo" _ <some other stuff> } / expected!("valid foo construct") -- quiet to inhibit the default error and prevent bumping the error position forward, and expected to report your own error at that position if the first arm fails.

bbqsrc · 2020-05-09T19:52:41Z

Mm, not quite, because expected!() doesn't short circuit. I'd like for the parsing to stop immediately upon hitting that expected!(). Essentially, I am asking how one should go about forcing a single expected error once parsing has entered a rule, and not allow backtracking.

kevinmehall · 2020-05-09T20:29:45Z

Ah, you're describing the cut operator, which doesn't exist in rust-peg. Its primary purpose in preventing backtracking is to allow a packrat parser to discard cached state, but rust-peg is not a packrat parser by default (you can cache individual rules with #[cache]).

What change in the reported error are you hoping to get from this? Assuming you have no further rules that can match the "(foo", the error position won't be bumped forward by anything else and the reported error will be the one inside the failing <some other stuff>. If you do have a potentially matching alternative like "(" ident() later on, you could always inhibit it with negative lookahead (!"foo").

Improved error reporting features are definitely something I'm interested in pursuing. Another potentially relevant idea is labeled failures.

bbqsrc · 2020-05-09T20:33:51Z

I thought I might have been talking about a cut operator but I wasn't certain. 😄

Hmm, negative lookahead actually should be suitable for my use case, if I make a rule of those idents that should never be in that position. I'm basically implementing a Scheme with peg, and honestly it has been an absolute delight. So much so that I quickly got to the part where I wanted nicer errors, haha.

I plan to use some environment horrors to track spans to assist with error handling, so I'll let you know how my experiments pan out.

jprochazk · 2022-12-24T13:36:12Z

I'm not sure if I should open a new issue for this, but I think it's related to this one. My problem is that in case I use expected! or a conditional action which results in a "terminal error", the expected token set of the resulting ParseError contains the terminal error, but also all of the other unmatched tokens, which results in a pretty gnarly error.

For example, consider the following snippet from a parser for an indentation-sensitive language:

/// Assert that the next token has no indentation
rule _()
  = ![_] // ignore eof
  / p:position!() &[_] {?
      if state.get_indentation(p).is_some() {
        // terminal error - no other rule should match after this
        return Err("invalid indentation")
      }
      Ok(())
    }

rule expr() -> Expr
    = precedence! {
        // arithmetic ops, comparison ops, etc. omitted for brevity
        ---
        // field access expression (e.g. `a.b`)
        left:(@) _ [Op_Dot] _ key:ident() { ast::Expr::GetField(left, key) }
        ---
        // <snip>
    }

Some code like

a
.b

will emit the following parse error (in the full parser):

error: expected one of EOF, [Brk_CurlyL], [Brk_ParenL], [Brk_SquareL], [Kw_Break], [Kw_Class], [Kw_Continue], [Kw_Fn], [Kw_For], [Kw_If], [Kw_Loop], [Kw_Return], [Kw_Use], [Kw_While], [Kw_Yield], [Lit_Bool], [Lit_Ident], [Lit_Null], [Lit_Number], [Lit_String], [Op_Bang], [Op_Minus], invalid indentation

When the relevant error is only invalid indentation.

I currently hack around this by prefixing the error with a string i know will not appear anywhere else, and only emitting expected.tokens().find(|t| t.starts_with(error_prefix)). If it can't be found, it still emits the full expected set like above. This works well enough, but I can't help but feel like it's a lot of performance wasted on backtracking when all you really want to do is fail fast.

kevinmehall added the question label May 9, 2020

kevinmehall closed this as completed May 9, 2020

zsol mentioned this issue Jul 30, 2024

test_parse_error for native parser Instagram/LibCST#1179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategy for erroring immediately upon hitting invalid path? #225

Strategy for erroring immediately upon hitting invalid path? #225

bbqsrc commented May 3, 2020

kevinmehall commented May 9, 2020

bbqsrc commented May 9, 2020

kevinmehall commented May 9, 2020

bbqsrc commented May 9, 2020

jprochazk commented Dec 24, 2022

Strategy for erroring immediately upon hitting invalid path? #225

Strategy for erroring immediately upon hitting invalid path? #225

Comments

bbqsrc commented May 3, 2020

kevinmehall commented May 9, 2020

bbqsrc commented May 9, 2020

kevinmehall commented May 9, 2020

bbqsrc commented May 9, 2020

jprochazk commented Dec 24, 2022