Hugging Face dropped SmolLM 🤏 > Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more! > 135M, 360M, and 1.7B param model checkpoints > Trained on 600B high-quality synthetic + FineWeb Edu tokens > Architecture: Llama + GQA + 2048 ctx length > Ripe for fine-tuning and on-device deployments. > Works out of the box with Transformers!
Mistral released Mathstral 7B ∑ > 56.6% on MATH and 63.47% on MMLU > Same architecture as Mistral 7B > Works out of the box with Transformers & llama.cpp > Released under Apache 2.0 license