Younes Belkada's picture

Younes Belkada

ybelkada

·

AI & ML interests

Large Language Models, Quantization, Vision, Multimodality, Diffusion models

Articles

Welcome FalconMamba: The first strong attention-free 7B model

Welcome Llama 3 - Meta's new open LLM

GaLore: Advancing Large Model Training on Consumer-grade Hardware

quanto: a pytorch quantization toolkit

Fine-Tuning Gemma Models in Hugging Face

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Overview of natively supported quantization schemes in 🤗 Transformers

Making LLMs lighter with AutoGPTQ and transformers

Fine-tune Llama 2 with DPO

The Falcon has landed in the Hugging Face ecosystem

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Introducing RWKV — An RNN with the advantages of a transformer

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Organizations

ybelkada's activity

posted an update 30 days ago

Post

2045

Falcon Mamba now available now in llama.cpp !
Check out GGUF files uploaded here: tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a

2 replies

·

posted an update about 1 month ago

Post

2901

FalconMamba 7B - a new model from TII (Technology Innovation Institute) is out !

- Blogpost: https://huggingface.co./blog/falconmamba
- Link to collection: tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a
- Link to playground: tiiuae/falcon-mamba-playground

posted an update 7 months ago

Post

Check out quantized weights from ISTA-DAS Lab directly in their organisation page: https://huggingface.co./ISTA-DASLab ! With official weights of AQLM (for 2bit quantization) & QMoE (1-bit MoE quantization)

Read more about these techniques below:

AQLM paper: Extreme Compression of Large Language Models via Additive Quantization (2401.06118)
QMoE: QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)

Some useful links below:

AQLM repo: https://github.com/Vahe1994/AQLM
How to use AQLM & transformers: https://huggingface.co./docs/transformers/quantization#aqlm
How to use AQLM & PEFT: https://huggingface.co./docs/peft/developer_guides/quantization#aqlm-quantizaion

Great work from @BlackSamorez and team !

replied to bstadt's post 7 months ago

Exciting release !

replied to smangrul's post 7 months ago

Hi @Jenish-23
For running AWQ models using HF transformers, please refer to this documentation section: https://huggingface.co./docs/transformers/quantization#awq
For using AWQ + Lora you just need to load a AWQ base model using HF transformers and apply LoRA as usual with no code changes. Make sure to install transformers from source for that

replied to their post 7 months ago

Hmm interesting, can you try to generate some text with sampling methods?

posted an update 7 months ago

Post

Try out Mixtral 2-bit on a free-tier Google Colab notebook right now!

https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing

AQLM method has been recently introduced on transformers main branch

The 2bit model can be found here: BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch

And you can read more about the method here: https://huggingface.co./docs/transformers/main/en/quantization#aqlm

Great work @BlackSamorez and team!

5 replies

·

replied to macadeliccc's post 7 months ago

Hi !
I think for NEFTune it should be supported out of the box as you just need to pass the correct argument neftune_noise_alpha in TrainingArguments right?

replied to macadeliccc's post 7 months ago

Great work !

replied to davidberenstein1957's post 7 months ago

Very nice demo !!