Skip to content

voltagex/tplink-grab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Adam Baxter
Dec 20, 2024
dec55b7 · Dec 20, 2024

History

28 Commits
Dec 20, 2024
Apr 27, 2022
Aug 11, 2024
Apr 27, 2022
Apr 26, 2022
Apr 27, 2022
Apr 27, 2022
Aug 11, 2024
Apr 27, 2022
Apr 26, 2022

Repository files navigation

tplink-grab

Downloads all GPL tarballs (and zips and rars!) from TP-Link by parsing https://www.tp-link.com/au/choose-your-location/, then extracting country-specific support/gpl-code/ pages to get lists of tarballs The pages are structured in such a way that they'll either have direct links to tar.gz files or similar, or Javascript generates links to a page like https://www.tp-link.com/phppage/gpl-res-list.html?model=Deco%20M5&appPath=kz for each model and country code

first_pass.py

Gets the list of countries and creates initial list of URLs.

Output

  • links/{country code}.json - Cache of productTree JSON from each GPL code page
  • links/country code}.model.csv - Links to model pages, we need to parse these further to get tarballs

CSV that looks like

link,original_url,model_name,appPath
https://www.tp-link.com/phppage/gpl-res-list.html?model=Deco%20X60&appPath=au,https://www.tp-link.com/au/support/gpl-code/,Deco X60,au
https://www.tp-link.com/phppage/gpl-res-list.html?model=Deco%20X20&appPath=au,https://www.tp-link.com/au/support/gpl-code/,Deco X20,au

Used by second_pass.

  • links/{country code}.tars.csv - Direct links to tarballs

CSV that looks like

link,original_url,model_name,appPath
https://static.tp-link.com/resources/gpl/GPL_X90_1.tar.gz,https://www.tp-link.com/au/support/gpl-code/,Deco X90,au
https://static.tp-link.com/resources/gpl/GPL_X68_1.tar.gz,https://www.tp-link.com/au/support/gpl-code/,Deco X68,au

Used by second_pass.

cached_downloader.py

Uses warcio's ability to wrap requests to set up a nice little cache layer. Download cache generated by using warcio and requests - uncompressed WARC 1.1 format

Future plans to dump this into SQLite and compress with https://github.com/phiresky/sqlite-zstd, but do we really need to?

Output

  • output/{sha256sum of url} - WARC file used as cache

second_pass.py

Parses *.model.csv, downloads the additional model pages and parses them for more links to archives, which are then added to the corresponding *.tars.csv file

scripts/extract_exists.sh

Run bash extract_exists.sh path/to/archives in the path you want to extract to

TODO

  • Reduce amount of log spam - use https://gist.github.com/bdarnell/3118509 or similar

  • Rename output/ to cache/

  • second_pass: grab all links to tarballs, deduplicate, write metadata to sqlite (HEAD requests?), compare with already downloaded tarballs (?)

  • document third pass

About

Grabs all of TPLink's open source tarballs

Resources

Stars

Watchers

Forks

Packages

No packages published