dugo
, a fast and efficient command-line tool written in Go to find and remove duplicate files in a directory. It supports concurrency for improved performance compared to other tools.
- Fast Duplicate Detection: Uses file size and MD5 hashing to quickly identify potential duplicates.
- Accurate Comparison: Performs byte-by-byte comparison to confirm duplicates, avoiding false positives due to hash collisions.
- Concurrency Support: Leverages Go's goroutines to process files in parallel, speeding up the deduplication process.
- Interactive Deletion: Optionally prompts the user to delete selected duplicate files interactively.
- Flexible Ignore Options: Allows ignoring files or directories by name or regex pattern.
- Customizable Workers: Lets you control the number of concurrent workers for optimal performance.
- Go (for building from source).
-
Clone the repository:
-
Build the tool:
go build -o dogu
-
Move the binary to a directory in your
PATH
(optional):sudo mv dogu /usr/local/bin/
To find duplicates in a directory:
./dugo /path/to/directory
To interactively delete duplicates:
./dugo -it /path/to/directory
- Ignore specific files or directories by name:
./dugo -ignore-names=".git,temp,backup" /path/to/directory
- Ignore files or directories using a regex pattern:
./dugo -ignore-regex=".*\.tmp$" /path/to/directory
Set the number of concurrent workers (default: 4):
./dugo -workers=8 /path/to/directory
Find duplicates, ignore .tmp
files, enable interactive deletion, and use 8 workers:
./dugo -ignore-regex=".*\.tmp$" -it -workers=8 /path/to/directory
Flag | Description |
---|---|
-ignore-names |
Comma-separated list of file/directory names to ignore (exact match). |
-ignore-regex |
Regex pattern to ignore files/directories by path. |
-workers |
Number of concurrent workers (default: 4). |
-it |
Enable interactive deletion of duplicate files. |
- Scan Directory: The tool scans the specified directory and groups files by size.
- Hash Files: Files with the same size are hashed using MD5.
- Compare Files: Files with the same hash are compared byte-by-byte to confirm duplicates.
- Report or Delete: Duplicates are either reported to the user or deleted interactively.
Contributions are welcome! Here’s how you can help:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the MIT License. See the LICENSE file for details.
- Inspired by the need for a fast and accurate file deduplication tool.
- Built with Go’s powerful concurrency model for high performance.