Imported from https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample
parent
d68c812bdd
commit
25d063e495
13 changed files (0 B → 5.0 GiB)
README.md
(0 B → 3.5 KiB)
RedPajama-Data-1T-Sample.py
(0 B → 3.3 KiB)
arxiv_sample.jsonl
(0 B → 89 MiB)
book_sample.jsonl
(0 B → 105 MiB)
c4_sample.jsonl
(0 B → 826 MiB)
cc_2019-30_sample.jsonl
(0 B → 657 MiB)
cc_2020-05_sample.jsonl
(0 B → 797 MiB)
cc_2021-04_sample.jsonl
(0 B → 765 MiB)
cc_2022-05_sample.jsonl
(0 B → 703 MiB)
cc_2023-06_sample.jsonl
(0 B → 817 MiB)
github_sample.jsonl
(0 B → 212 MiB)
stackexchange_sample.jsonl
(0 B → 77 MiB)
wikipedia_sample.jsonl
(0 B → 113 MiB)
Loading…
Reference in new issue