๐ Leaderboards & Arenas ๆ่กๆฆๅ่ฏๆตๅบๅ
Running on CPU Upgrade78๐Note By BAAI. The Open Chinese LLM Leaderboard aims to track, rank, and evaluate open Chinese large language models (LLMs). This leaderboard is powered by the FlagEval platform, providing corresponding computational resources and runtime environment. The evaluation dataset consists entirely of Chinese data to assess Chinese language proficiency.
Running on CPU Upgrade383๐Open VLM Leaderboard
VLMEvalKit Evaluation Results Collection
Note By OpenMMLab The OpenVLM Leaderboard evaluates and ranks 62 Vision-Language Models (VLMs) across 23 multi-modal benchmarks using the VLMEvalKit, featuring only open-source or publicly available API models.
Running78๐OpenCompass LLM Leaderboard
Note By Shanghai AI Lab An LLM leaderboard for Chinese models on many metric axes - super complete
Running32โกEvalCrafter
Note By Tencent AI Text to video generation leaderboard
Running on Zero234๐GenAI Arena
Realtime Image/Video Gen AI Arena
Note By Tiger Lab An arena for image generation!
Running15๐ฅSeaExam Leaderboard
Note By Alibaba - DAMO Southeast Asian (SEA) languages leaderboard
Running on CPU Upgrade55๐ฅAIR-Bench Leaderboard
Note By Jina AI and BAAI A new benchmark focuses on fair out-of-domain evaluation for RAG & NeuralIR
Running8๐Science Leaderboard
Leaderboard for LLM for Science Reasoning
Note By Tiger Lab Leaderboard for Science reasoning.
Running98๐VBench Leaderboard
Note By Shanghai AI Lab Leaderboard for Video Generative Models.
- Running16๐ข
CompassArena
- Running10๐
FLARE
- Running473๐ผ๐ฌ
Vision Arena (Testing VLMs side-by-side)
- Running13๐ฅ
ChronoMagic Bench
A Benchmark for Metamorphic Evaluation of T2V Generation
- Running8๐ฅ
TempCompass
- Running10๐ฅ
MJ Bench Leaderboard
- Running3๐
MM-Vet v2 Evaluator
- Running on Zero40๐
K-Sort Arena
Efficient Image/Video K-Sort Arena
- Sleeping6๐ข
Salad Bench Leaderboard
- Running2๐ฅ
MLLMGuardLeaderboard