Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published about 10 hours ago • 5 • 1
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published about 10 hours ago • 1
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published about 10 hours ago • 4 • 1
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published about 11 hours ago • 1
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published about 19 hours ago • 2 • 1
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published about 19 hours ago • 9 • 1
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published 1 day ago • 1
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published about 20 hours ago • 1
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 1 day ago • 18 • 2
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 1 day ago • 45 • 2
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Paper • 2409.12139 • Published 1 day ago • 9 • 3
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 2 days ago • 24 • 2
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 3 days ago • 11 • 3
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 2 days ago • 19 • 2
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction Paper • 2409.11211 • Published 3 days ago • 6 • 2
Agile Continuous Jumping in Discontinuous Terrains Paper • 2409.10923 • Published 3 days ago • 10 • 2
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published 2 days ago • 11 • 2
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published 4 days ago • 15 • 2
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 6 days ago • 38 • 2
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published 6 days ago • 11 • 2
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 7 days ago • 24 • 2
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 7 days ago • 10 • 2
Apollo: Band-sequence Modeling for High-Quality Audio Restoration Paper • 2409.08514 • Published 7 days ago • 5 • 2
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published 7 days ago • 8 • 2
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published 7 days ago • 9 • 4
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published 7 days ago • 10 • 3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 7 days ago • 39 • 2
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published 7 days ago • 15 • 2
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder Paper • 2409.08248 • Published 7 days ago • 12 • 4
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos Paper • 2409.07450 • Published 8 days ago • 10 • 2
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering Paper • 2409.07441 • Published 8 days ago • 8 • 3
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis Paper • 2409.07129 • Published 9 days ago • 7 • 2
gsplat: An Open-Source Library for Gaussian Splatting Paper • 2409.06765 • Published 9 days ago • 11 • 2
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 8 days ago • 18 • 2
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 10 days ago • 14 • 2
SongCreator: Lyrics-based Universal Song Generation Paper • 2409.06029 • Published 10 days ago • 19 • 2
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published 9 days ago • 14 • 2
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Paper • 2409.04410 • Published 13 days ago • 23 • 2
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published 14 days ago • 16 • 4
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation Paper • 2409.02245 • Published 16 days ago • 9 • 2
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining Paper • 2409.02326 • Published 16 days ago • 16 • 2
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published 16 days ago • 27 • 3
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published 16 days ago • 84 • 11
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation Paper • 2409.01055 • Published 18 days ago • 6 • 2
ContextCite: Attributing Model Generation to Context Paper • 2409.00729 • Published 19 days ago • 13 • 3
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization Paper • 2409.00492 • Published 19 days ago • 11 • 2
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published 16 days ago • 32 • 3
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published 18 days ago • 10 • 2
Compositional 3D-aware Video Generation with LLM Director Paper • 2409.00558 • Published 19 days ago • 14 • 2
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Paper • 2408.17131 • Published 21 days ago • 10 • 2