xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
TraDiffusion: Trajectory-Based Training-Free Image Generation Paper • 2408.09739 • Published Aug 19 • 7
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges Paper • 2408.08946 • Published Aug 16 • 9
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Paper • 2408.10119 • Published Aug 19 • 15
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views Paper • 2408.10195 • Published Aug 19 • 12
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19 • 32
MambaEVT: Event Stream based Visual Object Tracking using State Space Model Paper • 2408.10487 • Published Aug 20 • 5
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model Paper • 2408.10764 • Published about 1 month ago • 7
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published about 1 month ago • 10
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency Paper • 2408.11054 • Published about 1 month ago • 10
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper • 2408.11001 • Published about 1 month ago • 11
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published about 1 month ago • 54
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering Paper • 2408.09174 • Published Aug 17 • 51
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification Paper • 2408.11237 • Published about 1 month ago • 4
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer Paper • 2408.08793 • Published Aug 16 • 4
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Paper • 2408.11706 • Published 30 days ago • 5
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper • 2408.11475 • Published 30 days ago • 16
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models Paper • 2408.11817 • Published 29 days ago • 7
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published 30 days ago • 53
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models Paper • 2408.11318 • Published about 1 month ago • 54
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper • 2408.12480 • Published 29 days ago • 13
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design Paper • 2408.12503 • Published 29 days ago • 20
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Paper • 2408.11878 • Published about 1 month ago • 48
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments Paper • 2408.10945 • Published about 1 month ago • 6
Memory-Efficient LLM Training with Online Subspace Descent Paper • 2408.12857 • Published 28 days ago • 10
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published 28 days ago • 20
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published 27 days ago • 25
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published 29 days ago • 109
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published 28 days ago • 15
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper • 2408.13413 • Published 27 days ago • 13
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published 27 days ago • 21
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published 27 days ago • 19
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published 24 days ago • 33
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper • 2408.13467 • Published 27 days ago • 23
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published 25 days ago • 58
DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification Paper • 2408.14236 • Published 25 days ago • 3
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published 24 days ago • 23
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published 23 days ago • 27
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published 23 days ago • 36
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Paper • 2408.15664 • Published 23 days ago • 11
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature Paper • 2408.15836 • Published 23 days ago • 11
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Paper • 2408.15915 • Published 23 days ago • 19
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published 23 days ago • 20
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published 23 days ago • 41
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published 24 days ago • 51
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published 22 days ago • 81
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements Paper • 2408.15666 • Published 23 days ago • 9
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs Paper • 2407.02485 • Published Jul 2 • 5
Life Science, Health and Medical Datasets for ML Collection A collection of datasets for Medical Domain • 4 items • Updated Jun 24 • 1
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 250