1
0
Fork 0

Having fun with ChatGPT4 to build an archive of IRS PDF documents. Curious about XetHub default deduplication over PDFs. 15+% feels pretty good!

initial commit

main
Rajat Arya 11 months ago
parent 0c4ca8efd0
commit 87a66fd759
4 changed files (0 B → 5.0 KiB)
  1. 28
      .gitignore
  2. 105
      README.md
  3. 7
      code/requirements.txt
  4. 35
      code/scraper.py

.gitignore (0 B → 287 B)

README.md (0 B → 3.6 KiB)

code/requirements.txt (0 B → 126 B)

code/scraper.py (0 B → 1.0 KiB)

Loading…
Cancel
Save