🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Organization Card
🐝📊💁
Collections
7
spaces
1
models
48
BEE-spoke-data/tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024-infinity-instruct-7m-T2T_en-1024-v2
Text2Text Generation
•
Updated
BEE-spoke-data/tFINE-900m-e16-d32-instruct
Text2Text Generation
•
Updated
•
56
BEE-spoke-data/tFINE-900m-e16-d32-flan
Text2Text Generation
•
Updated
•
62
BEE-spoke-data/slimpajama_tok-48128-BPE-forT5
Updated
BEE-spoke-data/claude-tokenizer-forT5
Updated
BEE-spoke-data/Meta-Llama-3-8Bee
Text Generation
•
Updated
•
36
BEE-spoke-data/MiniTokenizer-20480
Updated
BEE-spoke-data/BeeTokenizer
Updated
•
1
BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
Text Generation
•
Updated
•
31
•
1
BEE-spoke-data/Mistral-7B-v0.3-stepbasin-books-20k
Text Generation
•
Updated
•
5
datasets
60
BEE-spoke-data/smollm-corpus-python
Viewer
•
Updated
•
12.4M
•
149
BEE-spoke-data/flan-v2-hf
Viewer
•
Updated
•
819M
•
9
BEE-spoke-data/the-stack-smol-xs-all
Viewer
•
Updated
•
8.7k
•
4
BEE-spoke-data/the-stack-smol-xs-scored-and-annotated-python
Viewer
•
Updated
•
100
•
2
BEE-spoke-data/upvoteweb-posts
Viewer
•
Updated
•
45.9M
•
8
BEE-spoke-data/napierone-pdf-raw
Viewer
•
Updated
•
18.5k
•
5
BEE-spoke-data/fineweb-1000_64k
Viewer
•
Updated
•
2k
•
15
•
2
BEE-spoke-data/govdocs1-image
Viewer
•
Updated
•
199k
•
12
BEE-spoke-data/sarcasm-scrolls
Viewer
•
Updated
•
8.76k
•
2
•
1
BEE-spoke-data/fineweb-edu-10BT-mincols
Viewer
•
Updated
•
9.67M
•
4
•
1