beomi (Lee Junbum)

upvoted a collection 7 days ago

DataGemma Release

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 8 days ago • 53

upvoted a paper 23 days ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20 • 85

upvoted a paper 29 days ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published about 1 month ago • 40

upvoted an article about 1 month ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20

• 58

upvoted 6 papers 2 months ago

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Paper • 2407.10457 • Published Jul 15 • 22

upvoted a paper 3 months ago

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1 • 84

upvoted an article 3 months ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28

• 146

upvoted a collection 3 months ago

Gemma 2 Release

Collection

15 items • Updated 10 days ago • 166

upvoted 2 articles 5 months ago

Article

Expanding Model Context and Creating Chat Models with a Single Click

By

•

Apr 28

• 37

Article

Can We Train Chat Models with Raw Data?

By

•

Apr 25

• 17

upvoted a paper 5 months ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103

upvoted 2 papers 7 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

Nemotron-4 15B Technical Report

Paper • 2402.16819 • Published Feb 26 • 42

upvoted a paper 9 months ago

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 256

upvoted 5 papers 10 months ago

Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 16

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Paper • 2311.11501 • Published Nov 20, 2023 • 33

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118

Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 70

upvoted a paper 11 months ago

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

upvoted a paper about 1 year ago

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

Paper • 2307.05695 • Published Jul 11, 2023 • 22

Lee Junbum PRO

AI & ML interests

Organizations

beomi's activity

DataGemma Release

Instruction Pre-Training: Language Models are Supervised Multitask Learners

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

GRUtopia: Dream General Robots in a City at Scale

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Qwen2 Technical Report

Better & Faster Large Language Models via Multi-token Prediction

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Training and Finetuning Embedding Models with Sentence Transformers v3

Gemma 2 Release

Expanding Model Context and Creating Chat Models with a Single Click

Can We Train Chat Models with Raw Data?

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Nemotron-4 15B Technical Report

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Memory Augmented Language Models through Mixture of Word Experts

System 2 Attention (is something you might need too)

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

Exponentially Faster Language Modelling

Orca 2: Teaching Small Language Models How to Reason

BitNet: Scaling 1-bit Transformers for Large Language Models

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates