Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix format_links errors on wrapped punctuation #3074

Closed
wants to merge 1 commit into from

Conversation

hughrun
Copy link
Contributor

@hughrun hughrun commented Nov 1, 2023

format_links checks for wrapping and end-punctuation in a different order to how it puts the formatted text back together:

  1. check whether it is "wrapped"
  2. if so, remove final wrapper character
  3. check whether it ends in punctuation
  4. if so remove end punctuation
  5. check is valid URL
  6. if wrapped, add closing wrapper
  7. if end punctuation, add punctuation at end.

This creates a problem if there is end-punctuation inside the wrapper. Examples:

[..] => [... => [.. => [..] => [..].
(?) => (? => ( => () => ()?

This commit makes the following changes to resolve this:

  1. Check whether the string ends in punctuation before checking for wrappers
  2. Check separately whether strings inside wrappers end in punctuation
  3. Add wrapped end-punctuation before adding closing wrapper
  4. Add closing wrapper before adding end-punctuation

fixes #2993
fixes #3049

format_links checks for wrapping and end-punctuation in a different order to how it puts the formatted text back together:

1. check whether it is "wrapped"
2. if so, remove final wrapper character
3. check whether it ends in punctuation
4. if so remove end punctuation
5. check is valid URL
6. if wrapped, add closing wrapper
7. if end punctuation, add punctuation at end.

This creates a problem if there is end-punctuation _inside the wrapper_. Examples:

[..] => [... => [.. => [..] => [..].
(?) => (? => ( => () => ()?

This commit makes the following changes:

1. Check whether the string ends in punctuation before checking for wrappers
2. Check separately whether strings inside wrappers end in punctuation
3. Add wrapped end-punctuation before adding closing wrapper
4. Add closing wrapper before adding end-punctuation
@hughrun
Copy link
Contributor Author

hughrun commented Nov 1, 2023

@bookwyrm-social/code-review this should hopefully be a simple one to test. I think I fixed it but I might have missed something.

@dato
Copy link
Contributor

dato commented Nov 2, 2023

A couple weeks ago I prepared a small fix for this function in #3027 (plus a refactor proper in dato#15), that related to this (though I don't think it fixed everything, some of the issues were not even filed).

I don't mind if this gets merged, since it seems a good fix and I can rebase + update my code later.

But if you like the idea of an eventual refactor of this function, it would be good to know in advance if you'd accept something like the refactor in the second PR, above. I could update it with an extra check for nested punct (should be possibly done in a loop; feasible with the prefix/suffix thing I introduced).

Thanks equally for this fix, and in advance!

@dato
Copy link
Contributor

dato commented Nov 2, 2023

(And goes without saying, if you happen to find any bit of code in those PRs useful, please pick from them freely! I wouldn't be able to do any coding until the weekend, at least.)

@hughrun
Copy link
Contributor Author

hughrun commented Nov 2, 2023

A couple weeks ago I prepared a small fix for this function in #3027 (plus a refactor proper in dato#15), that related to this (though I don't think it fixed everything, some of the issues were not even filed).

OMG I totally missed this! Sorry for the double up.

I'll take a look at your fixes and give my opinion on what we should do. I'm certainly not attached to my fix, it was just a bug that was annoying me: happy for whatever is the best fix.

@hughrun
Copy link
Contributor Author

hughrun commented Nov 2, 2023

@dato I think dato#15 is the preferred approach and we should go with that instead of #3027 or #3074

I had a quick look - it looks like a nice solution, but from a quick scan it seems maybe the regex needs to be tightened up a little to ensure we're only touching trailing punctuation.

@hughrun hughrun marked this pull request as draft November 2, 2023 07:41
@dato
Copy link
Contributor

dato commented Nov 3, 2023

@dato I think dato#15 is the preferred approach

Great! Thank you for taking a look.

I added tests now for the issues your PR addresses, and I added an extra fix (for trailing punctuation inside the brackets).

I had a quick look - it looks like a nice solution, but from a quick scan it seems maybe the regex needs to be tightened up a little to ensure we're only touching trailing punctuation.

Hmm. I see I used \Z as anchor instead of $, but I can't remember that there was a reason (they're the same here). I should change it, for readability.

Is that what you meant, the missing $, or is there something else?

and we should go with that instead of #3027 or #3074

Do you think it's fine if I just update #3027 with the contents of dato#15?

Thanks again!

@hughrun
Copy link
Contributor Author

hughrun commented Nov 3, 2023

Yes that's what I meant: I was looking for a $.

Makes sense to update the PR you already have here, then we can do a formal review and hopefully pull in.

@hughrun hughrun closed this Nov 3, 2023
@hughrun hughrun deleted the format_links branch January 14, 2024 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect rendering of (?) Misparsing of [...] elisions
2 participants