1
0
Fork 0

Having fun with ChatGPT4 to build an archive of IRS PDF documents. Curious about XetHub default deduplication over PDFs. 15+% feels pretty good!

Fixes to avoid redownloading, still super basic scraping

main
Rajat Arya 12 months ago
parent 87a66fd759
commit ed6cb29018
1 changed files (1.0 KiB → 1.2 KiB)
  1. 12
      code/scraper.py

code/scraper.py (1.0 KiB → 1.2 KiB)

Loading…
Cancel
Save