Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Regex Badge #10925

Open
TrianguloY opened this issue Mar 6, 2025 · 5 comments
Open

Dynamic Regex Badge #10925

TrianguloY opened this issue Mar 6, 2025 · 5 comments
Labels
service-badge New or updated service badge

Comments

@TrianguloY
Copy link

📋 Description

I'm writing this issue as explained in the tutorial, since I already have a prototype and my idea is to create a PR in the following days. (great tutorial btw, it was really helpful!)


This new badge is designed to extract any data from any file using regexes. Regex can become very complex, but at the same time are very powerful. They may not be as structured as xml or json, but that's precisely what makes them so powerful, since you can extract data from literally anywhere, be it a gradle file, a proprietary format, or even a readme. The only constraint is that the file is parsed as a big string.

As an example, if you have a file like this sample, and you use a regex of ^version: '(.*?)' with a replacement of $1, you get a badge with the value 0.0.1.

The only concern about this badge is that regex can be used as a denial-of-service attack, as some regex with some specific inputs can take years to be resolved. There are already some existing solutions for nodejs to mitigate and add timeouts to regex operations. And I plan to review and implement one of them. I understand that this is mandatory for the badge feasibility.

🔗 Data

The badge will fetch a raw file from a url, like the other existing dynamic badges. The regex search/replace will be done locally afterwards with provided data. There is no need for an external public api.

🎤 Motivation

I came up with this badge idea after trying (but failing) to implement a gradle badge (already requested) since I couldn't import the apparently only JavaScript gradle parser library gradle-to-js. I'm not a node-js developer, and probably there are ways to do that, but I came up with this alternative that I like even more.

I tried to find existing alternatives, but couldn't find anything that I personally could use for my own project. Since I know about programming, and this repository had clear contribution guidelines, I just decided to implement it myself. As already explained I plan to create a PR in the following days, I just created this issue to gather preliminary feedback before that.

@TrianguloY TrianguloY added the service-badge New or updated service badge label Mar 6, 2025
@chris48s
Copy link
Member

chris48s commented Mar 7, 2025

As you've noted, the big issue with accepting user-supplied regex (and the reason we don't do it) is vulnerability to ReDOS attacks. This is the reason why we decided not to accept user-supplied regex in #9173

I think before you submit a PR, I'd like to see a plan here for how you plan to mitigate that attack vector as that is what the whole thing hinges on really.

@TrianguloY
Copy link
Author

Thanks for the concern! Totally understandable.

My first idea was to use an existing package that provides that functionality. I found https://www.npmjs.com/package/time-limited-regular-expressions and https://github.com/sindresorhus/super-regex but none of them allows me to use the regex method I need.

Later I found this post mentioning using node's vm core module, and with a bit of so help I managed to run it.

The code generates a vm script context and runs the regex inside it, with a time limit. If it takes more than 1 second (the user won't be able to change the timeout, but I can change the code constant if needed) it stops and returns an error.
Running inside a vm generates some non-zero overhead, I'm aware of that, but since the badges are automatically cached it should hopefully not be of much issue.

From testing I can confirm that the timeout works as expected and the ReDOS attack is not possible. If that's enough my next step is to create several tests, clean the code, and create a PR.
If you want to review the code I can either create a draft pr or paste the file here. If you still think such a badge is dangerous and will probably not be merged I'll understand and close this issue (will still keep the code on a fork just in case though).

@chris48s
Copy link
Member

There's a few ways in which a ReDOS attack can deny service. One thing is taking a long time. Running with a timeout puts a cap on that.
However, a ReDOS attack can also consume a lot of CPU, consume a lot of memory, and block the event loop.
Capping execution time helps, but an attacker can still block the event loop for a second at a time and has a full second to consume as much memory as possible.

Having done a bit of reading, it seems like the safest way to do this in node would actually be with RE2
https://github.com/uhop/node-re2
https://github.com/google/re2
which implements a subset of regex that does not include certain features:
https://github.com/uhop/node-re2?tab=readme-ov-file#limitations-things-re2-does-not-support
I think I am reasonably convinced by RE2's credentials and this would allow us to provide a useful enough subset of regex to cover most legitimate use-cases.

I think with this specific issue, I'd quite like to get a bit of input from another maintainer since in the past we have explicitly said this is something we don't want to do.
@PyvesB @calebcartwright do either of you have any thoughts on this one?

@TrianguloY
Copy link
Author

RE2 looks like a good candidate indeed. I'll try to use it on my branch and play with it a bit. Thanks!

@TrianguloY
Copy link
Author

I've tested with re2 and it works as expected. It fulfills my use-case, and unless you have a long and/or complex regex you don't really see any difference with normal regex.

I can create a draft (or ready-to-review!) pr if needed, the code is here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-badge New or updated service badge
Projects
None yet
Development

No branches or pull requests

2 participants