Aleksei Dorkin's picture

Aleksei Dorkin PRO

adorkin

·

slowwavesleep

AI & ML interests

Computational Linguistics

Organizations

adorkin's activity

upvoted a collection 1 day ago

Aya Datasets

The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jun 28 • 12

upvoted 2 papers 9 days ago

The first neural machine translation system for the Erzya language

Paper • 2209.09368 • Published Sep 19, 2022 • 1

Seamless: Multilingual Expressive and Streaming Speech Translation

Paper • 2312.05187 • Published Dec 8, 2023 • 10

upvoted 2 collections 10 days ago

WebInstruct 🌐 Embeddings 🧱 Models

A collection of SoTA embeddings model fine-tuned on WebInstruct dataset to learn to pair instructions with its responses • 3 items • Updated 15 days ago • 11

Zero-shot Segmentation

6 items • Updated 11 days ago • 2

upvoted a paper 13 days ago

Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer

Paper • 2404.04042 • Published Apr 5 • 1

upvoted a paper about 1 month ago

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

upvoted a paper about 2 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 61

upvoted a collection about 2 months ago

SAM2

All the models and demos for SAM2 • 8 items • Updated Aug 2 • 11

upvoted a paper about 2 months ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20 • 85

upvoted a collection about 2 months ago

GoLLIE

We present GoLLIE, a Large Language Model trained to follow annotation guidelines that outperforms previous approaches on zero-shot IE. • 4 items • Updated Mar 11 • 17

upvoted a paper about 2 months ago

GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer

Paper • 2311.08526 • Published Nov 14, 2023 • 9

upvoted a collection about 2 months ago

DCLM

DCLM Models + Datasets • 6 items • Updated Jul 18 • 23

upvoted an article 2 months ago

Article

The Rise of Agentic Data Generation

By

•

Jul 15

• 74

upvoted 4 papers 2 months ago

xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection

Paper • 2310.10482 • Published Oct 16, 2023 • 1

AXOLOTL'24 Shared Task on Multilingual Explainable Semantic Change Modeling

Paper • 2407.04079 • Published Jul 4 • 1

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

Paper • 2407.03618 • Published Jul 4 • 10

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Paper • 2407.02552 • Published Jul 2 • 4

upvoted an article 5 months ago

Article

How to Finetune phi-3 on MacBook Pro

By

•

Apr 24

• 62