15
0
Fork 0

Imported from https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample

Take a drop from HuggingFace

main
Zach Nation 4 months ago
parent d68c812bdd
commit 25d063e495
13 changed files (0 B → 5.0 GiB)
  1. 129
      README.md
  2. 107
      RedPajama-Data-1T-Sample.py
  3. 3
      arxiv_sample.jsonl
  4. 3
      book_sample.jsonl
  5. 3
      c4_sample.jsonl
  6. 3
      cc_2019-30_sample.jsonl
  7. 3
      cc_2020-05_sample.jsonl
  8. 3
      cc_2021-04_sample.jsonl
  9. 3
      cc_2022-05_sample.jsonl
  10. 3
      cc_2023-06_sample.jsonl
  11. 3
      github_sample.jsonl
  12. 3
      stackexchange_sample.jsonl
  13. 3
      wikipedia_sample.jsonl

README.md (0 B → 3.5 KiB)

RedPajama-Data-1T-Sample.py (0 B → 3.3 KiB)

arxiv_sample.jsonl (0 B → 89 MiB)

book_sample.jsonl (0 B → 105 MiB)

c4_sample.jsonl (0 B → 826 MiB)

cc_2019-30_sample.jsonl (0 B → 657 MiB)

cc_2020-05_sample.jsonl (0 B → 797 MiB)

cc_2021-04_sample.jsonl (0 B → 765 MiB)

cc_2022-05_sample.jsonl (0 B → 703 MiB)

cc_2023-06_sample.jsonl (0 B → 817 MiB)

github_sample.jsonl (0 B → 212 MiB)

stackexchange_sample.jsonl (0 B → 77 MiB)

wikipedia_sample.jsonl (0 B → 113 MiB)

Loading…
Cancel
Save