Aya Datasets Collection The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jun 28 • 12
The first neural machine translation system for the Erzya language Paper • 2209.09368 • Published Sep 19, 2022 • 1
Seamless: Multilingual Expressive and Streaming Speech Translation Paper • 2312.05187 • Published Dec 8, 2023 • 10
WebInstruct 🌐 Embeddings 🧱 Models Collection A collection of SoTA embeddings model fine-tuned on WebInstruct dataset to learn to pair instructions with its responses • 3 items • Updated 15 days ago • 11
Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer Paper • 2404.04042 • Published Apr 5 • 1
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 61
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
GoLLIE Collection We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE. • 4 items • Updated Mar 11 • 17
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer Paper • 2311.08526 • Published Nov 14, 2023 • 9
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection Paper • 2310.10482 • Published Oct 16, 2023 • 1
AXOLOTL'24 Shared Task on Multilingual Explainable Semantic Change Modeling Paper • 2407.04079 • Published Jul 4 • 1
BM25S: Orders of magnitude faster lexical search via eager sparse scoring Paper • 2407.03618 • Published Jul 4 • 10
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs Paper • 2407.02552 • Published Jul 2 • 4