chanmuzi (Chan Kim)

upvoted a paper 6 days ago

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published 8 days ago • 15

upvoted a collection 8 days ago

Solar Pro

Collection

The most intelligent LLM on a single GPU • 3 items • Updated 7 days ago • 8

upvoted a paper 10 days ago

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

Paper • 2409.03810 • Published 15 days ago • 29

upvoted an article 11 days ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

about 1 month ago

• 19

upvoted a paper 13 days ago

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 14 days ago • 83

upvoted a paper 14 days ago

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Paper • 2409.02897 • Published 16 days ago • 42

upvoted a paper 16 days ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published 22 days ago • 49

upvoted 2 papers 17 days ago

CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation

Paper • 2408.14572 • Published 24 days ago • 7

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published 23 days ago • 32

upvoted a paper 21 days ago

In-Context Imitation Learning via Next-Token Prediction

Paper • 2408.15980 • Published 23 days ago • 9

upvoted a paper 22 days ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published 24 days ago • 137

upvoted a paper 26 days ago

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published 29 days ago • 29

upvoted a paper 28 days ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published 30 days ago • 53

upvoted 11 papers about 1 month ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

Heavy Labels Out! Dataset Distillation with Label Space Lightening

Paper • 2408.08201 • Published Aug 15 • 17

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15 • 51

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

Paper • 2408.08072 • Published Aug 15 • 31

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5 • 37

Self-Taught Evaluators

Paper • 2408.02666 • Published Aug 5 • 22

upvoted 9 papers about 2 months ago

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

ShieldGemma: Generative AI Content Moderation Based on Gemma

Paper • 2407.21772 • Published Jul 31 • 13

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 102

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Paper • 2407.19594 • Published Jul 28 • 19

SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain

Paper • 2407.19584 • Published Jul 28 • 60

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Paper • 2407.16607 • Published Jul 23 • 21

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 75

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Paper • 2407.14435 • Published Jul 19 • 6

upvoted 2 papers 2 months ago

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Paper • 2407.14057 • Published Jul 19 • 41

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published Jul 9 • 29

upvoted an article 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 242

upvoted 3 papers 2 months ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 153

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 64

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 80

upvoted 21 papers 3 months ago

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published Jul 2 • 34

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1 • 84

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Paper • 2407.00468 • Published Jun 29 • 35

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28 • 21

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 93

Aligning Teacher with Student Preferences for Tailored Training Data Generation

Paper • 2406.19227 • Published Jun 27 • 24

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 84

Unlocking Continual Learning Abilities in Language Models

Paper • 2406.17245 • Published Jun 25 • 28

Efficient Continual Pre-training by Mitigating the Stability Gap

Paper • 2406.14833 • Published Jun 21 • 19

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22 • 45

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Paper • 2406.12624 • Published Jun 18 • 36

Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Paper • 2406.11230 • Published Jun 17 • 34

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Paper • 2406.12793 • Published Jun 18 • 31

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17 • 56

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Paper • 2406.10996 • Published Jun 16 • 32

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17 • 61

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13 • 85

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11 • 52

Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13 • 43

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Paper • 2406.07522 • Published Jun 11 • 36

Are We Done with MMLU?

Paper • 2406.04127 • Published Jun 6 • 37

Chan Kim

AI & ML interests

Organizations

chanmuzi's activity

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

SmolLM - blazingly fast and remarkably powerful