Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter out near duplicate responses #1951

Merged
merged 8 commits into from
Oct 20, 2024
Merged

Conversation

dogancanbakir
Copy link
Member

Closes #758

$ go run . -u https://httpbin.org -path xml,json,html,html -fd -v

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/

                projectdiscovery.io

[INF] Current httpx version v1.6.8 (latest)
[WRN] UI Dashboard is disabled, Use -dashboard option to enable
https://httpbin.org/html
[WRN] Skipping duplicate response with simhash 9850149872654300798
https://httpbin.org/xml
https://httpbin.org/json

@dogancanbakir dogancanbakir self-assigned this Oct 14, 2024
Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Shall we use something like gcache with a maximum number of items to avoid memory leak in case of huge input lists?

@Mzack9999 Mzack9999 linked an issue Oct 16, 2024 that may be closed by this pull request
Copy link
Member

@Mzack9999 Mzack9999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation: lgtm
follow up: update docs making it clear that only the first response will be considered valid while the others filtered out

@ehsandeep ehsandeep merged commit 2f16a47 into dev Oct 20, 2024
11 checks passed
@ehsandeep ehsandeep deleted the add_near_duplicate_filter branch October 20, 2024 21:59
@wgetnz
Copy link

wgetnz commented Dec 10, 2024

Should default pages for errors be ignored? Such as 404, 500, index.html...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add feature to exclude the same http response
4 participants