DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 8 days ago • 53
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published about 1 month ago • 40
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 58
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15 • 22
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated Paper • 2407.10969 • Published Jul 15 • 20
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14 • 29
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1 • 84
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 146
view article Article Expanding Model Context and Creating Chat Models with a Single Click By maywell • Apr 28 • 37
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 256
Memory Augmented Language Models through Mixture of Word Experts Paper • 2311.10768 • Published Nov 15, 2023 • 16
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 39
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning Paper • 2311.11501 • Published Nov 20, 2023 • 33
Orca 2: Teaching Small Language Models How to Reason Paper • 2311.11045 • Published Nov 18, 2023 • 70
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates Paper • 2307.05695 • Published Jul 11, 2023 • 22