Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published 8 days ago • 15
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published 15 days ago • 29
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention about 1 month ago • 19
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Paper • 2409.02897 • Published 16 days ago • 42
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published 22 days ago • 49
CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation Paper • 2408.14572 • Published 24 days ago • 7
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published 23 days ago • 32
In-Context Imitation Learning via Next-Token Prediction Paper • 2408.15980 • Published 23 days ago • 9
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published 24 days ago • 137
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published 30 days ago • 53
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
Heavy Labels Out! Dataset Distillation with Label Space Lightening Paper • 2408.08201 • Published Aug 15 • 17
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Paper • 2408.08152 • Published Aug 15 • 51
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15 • 31
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 114
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9 • 36
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5 • 60
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
ShieldGemma: Generative AI Content Moderation Based on Gemma Paper • 2407.21772 • Published Jul 31 • 13
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28 • 19
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 60
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Paper • 2407.16607 • Published Jul 23 • 21
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 75
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders Paper • 2407.14435 • Published Jul 19 • 6
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published Jul 19 • 41
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Paper • 2407.12854 • Published Jul 9 • 29
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models Paper • 2407.01906 • Published Jul 2 • 34
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1 • 84
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published Jun 29 • 35
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published Jun 28 • 21
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 93
Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published Jun 27 • 24
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
Efficient Continual Pre-training by Mitigating the Stability Gap Paper • 2406.14833 • Published Jun 21 • 19
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22 • 45
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published Jun 18 • 36
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17 • 34
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18 • 31
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 56
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation Paper • 2406.10996 • Published Jun 16 • 32
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published Jun 17 • 61
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published Jun 13 • 85
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11 • 36