Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy for erroring immediately upon hitting invalid path? #225

Closed
bbqsrc opened this issue May 3, 2020 · 5 comments
Closed

Strategy for erroring immediately upon hitting invalid path? #225

bbqsrc opened this issue May 3, 2020 · 5 comments
Labels

Comments

@bbqsrc
Copy link

bbqsrc commented May 3, 2020

Say I have a simple rule along the lines of "(foo" _ <some other stuff>, where I know that if any of the following parameters fail, that there's no other possible rule that could match, and wish to return a descriptive error for this invalid use of (foo, what should I do?

@kevinmehall
Copy link
Owner

I'm assuming what you mean is that you want to report an error at the position of the (, rather than the error inside the "some other stuff" you'd get by default if it failed to match. You can do this with quiet!{ "(foo" _ <some other stuff> } / expected!("valid foo construct") -- quiet to inhibit the default error and prevent bumping the error position forward, and expected to report your own error at that position if the first arm fails.

@bbqsrc
Copy link
Author

bbqsrc commented May 9, 2020

Mm, not quite, because expected!() doesn't short circuit. I'd like for the parsing to stop immediately upon hitting that expected!(). Essentially, I am asking how one should go about forcing a single expected error once parsing has entered a rule, and not allow backtracking.

@kevinmehall
Copy link
Owner

Ah, you're describing the cut operator, which doesn't exist in rust-peg. Its primary purpose in preventing backtracking is to allow a packrat parser to discard cached state, but rust-peg is not a packrat parser by default (you can cache individual rules with #[cache]).

What change in the reported error are you hoping to get from this? Assuming you have no further rules that can match the "(foo", the error position won't be bumped forward by anything else and the reported error will be the one inside the failing <some other stuff>. If you do have a potentially matching alternative like "(" ident() later on, you could always inhibit it with negative lookahead (!"foo").

Improved error reporting features are definitely something I'm interested in pursuing. Another potentially relevant idea is labeled failures.

@bbqsrc
Copy link
Author

bbqsrc commented May 9, 2020

I thought I might have been talking about a cut operator but I wasn't certain. 😄

Hmm, negative lookahead actually should be suitable for my use case, if I make a rule of those idents that should never be in that position. I'm basically implementing a Scheme with peg, and honestly it has been an absolute delight. So much so that I quickly got to the part where I wanted nicer errors, haha.

I plan to use some environment horrors to track spans to assist with error handling, so I'll let you know how my experiments pan out.

@jprochazk
Copy link

I'm not sure if I should open a new issue for this, but I think it's related to this one. My problem is that in case I use expected! or a conditional action which results in a "terminal error", the expected token set of the resulting ParseError contains the terminal error, but also all of the other unmatched tokens, which results in a pretty gnarly error.

For example, consider the following snippet from a parser for an indentation-sensitive language:

/// Assert that the next token has no indentation
rule _()
  = ![_] // ignore eof
  / p:position!() &[_] {?
      if state.get_indentation(p).is_some() {
        // terminal error - no other rule should match after this
        return Err("invalid indentation")
      }
      Ok(())
    }

rule expr() -> Expr
    = precedence! {
        // arithmetic ops, comparison ops, etc. omitted for brevity
        ---
        // field access expression (e.g. `a.b`)
        left:(@) _ [Op_Dot] _ key:ident() { ast::Expr::GetField(left, key) }
        ---
        // <snip>
    }

Some code like

a
.b

will emit the following parse error (in the full parser):

error: expected one of EOF, [Brk_CurlyL], [Brk_ParenL], [Brk_SquareL], [Kw_Break], [Kw_Class], [Kw_Continue], [Kw_Fn], [Kw_For], [Kw_If], [Kw_Loop], [Kw_Return], [Kw_Use], [Kw_While], [Kw_Yield], [Lit_Bool], [Lit_Ident], [Lit_Null], [Lit_Number], [Lit_String], [Op_Bang], [Op_Minus], invalid indentation

When the relevant error is only invalid indentation.

I currently hack around this by prefixing the error with a string i know will not appear anywhere else, and only emitting expected.tokens().find(|t| t.starts_with(error_prefix)). If it can't be found, it still emits the full expected set like above. This works well enough, but I can't help but feel like it's a lot of performance wasted on backtracking when all you really want to do is fail fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants