AK's picture

AK

akhaliq

·

_akhaliq

AI & ML interests

None yet

Organizations

akhaliq's activity

commented 3 papers about 1 hour ago

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published about 10 hours ago • 5 •

LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Paper • 2409.12960 • Published about 10 hours ago •

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

Paper • 2409.12957 • Published about 10 hours ago • 4 •

commented 5 papers about 2 hours ago

3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

Paper • 2409.12892 • Published about 11 hours ago •

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published about 19 hours ago • 2 •

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published about 19 hours ago • 9 •

FlexiTex: Enhancing Texture Generation with Visual Guidance

Paper • 2409.12431 • Published 1 day ago •

Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

Paper • 2409.12532 • Published about 20 hours ago •

commented 4 papers 1 day ago

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published 1 day ago • 18 •

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published 1 day ago • 20 •

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 1 day ago • 45 •

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

Paper • 2409.12139 • Published 1 day ago • 9 •

commented 10 papers 2 days ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 2 days ago • 24 •

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published 3 days ago • 11 •

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published 2 days ago • 19 •

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published 2 days ago • 47 •

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published 2 days ago • 55 •

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Paper • 2409.11211 • Published 3 days ago • 6 •

Agile Continuous Jumping in Discontinuous Terrains

Paper • 2409.10923 • Published 3 days ago • 10 •

On the limits of agency in agent-based models

Paper • 2409.10568 • Published 6 days ago • 11 •

OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published 2 days ago • 11 •

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published 3 days ago • 23 •

commented 3 papers 3 days ago

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.10173 • Published 4 days ago • 15 •

Breaking reCAPTCHAv2

Paper • 2409.08831 • Published 7 days ago • 1 •

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 6 days ago • 38 •

commented 6 papers 4 days ago

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis

Paper • 2409.08947 • Published 6 days ago • 11 •

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published 7 days ago • 24 •

DrawingSpinUp: 3D Animation from Single Character Drawings

Paper • 2409.08615 • Published 7 days ago • 10 •

Apollo: Band-sequence Modeling for High-Quality Audio Restoration

Paper • 2409.08514 • Published 7 days ago • 5 •

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published 7 days ago • 8 •

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Paper • 2409.08353 • Published 7 days ago • 9 •

commented 4 papers 7 days ago

DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors

Paper • 2409.08278 • Published 7 days ago • 10 •

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 7 days ago • 39 •

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published 7 days ago • 15 •

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

Paper • 2409.08248 • Published 7 days ago • 12 •

commented 7 papers 8 days ago

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Paper • 2409.07450 • Published 8 days ago • 10 •

Generative Hierarchical Materials Search

Paper • 2409.06762 • Published 9 days ago • 6 •

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Paper • 2409.07441 • Published 8 days ago • 8 •

MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis

Paper • 2409.07129 • Published 9 days ago • 7 •

gsplat: An Open-Source Library for Gaussian Splatting

Paper • 2409.06765 • Published 9 days ago • 11 •

Agent Workflow Memory

Paper • 2409.07429 • Published 8 days ago • 25 •

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.07452 • Published 8 days ago • 18 •

commented 3 papers 9 days ago

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published 10 days ago • 14 •

SongCreator: Lyrics-based Universal Song Generation

Paper • 2409.06029 • Published 10 days ago • 19 •

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

Paper • 2409.06633 • Published 9 days ago • 14 •

commented 2 papers 11 days ago

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04410 • Published 13 days ago • 23 •

Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

Paper • 2409.04005 • Published 14 days ago • 16 •

commented 4 papers 15 days ago

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

Paper • 2409.02245 • Published 16 days ago • 9 •

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

Paper • 2409.02326 • Published 16 days ago • 16 •

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published 16 days ago • 27 •

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 16 days ago • 84 •

commented 8 papers 16 days ago

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

Paper • 2409.01055 • Published 18 days ago • 6 •

ContextCite: Attributing Model Generation to Context

Paper • 2409.00729 • Published 19 days ago • 13 •

Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published 19 days ago • 19 •

FLUX that Plays Music

Paper • 2409.00587 • Published 19 days ago • 31 •

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Paper • 2409.00492 • Published 19 days ago • 11 •

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published 16 days ago • 32 •

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published 18 days ago • 10 •

Compositional 3D-aware Video Generation with LLM Director

Paper • 2409.00558 • Published 19 days ago • 14 •

commented a paper 18 days ago

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Paper • 2408.17131 • Published 21 days ago • 10 •