Sylvain Filoni
fffiloni
AI & ML interests
ML for Animation • Alumni Arts Déco Paris
Articles
Organizations
fffiloni's activity
upvoted
an
article
6 days ago
Article
"Diffusers Image Fill" guide
By
•
•
19LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Paper
•
2409.06666
•
Published
•
51
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models
Paper
•
2409.07452
•
Published
•
18
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Paper
•
2409.07450
•
Published
•
10
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper
•
2409.02097
•
Published
•
31
FLUX that Plays Music
Paper
•
2409.00587
•
Published
•
31
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Paper
•
2409.01071
•
Published
•
26
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation
Paper
•
2409.02245
•
Published
•
9
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Paper
•
2409.03718
•
Published
•
24
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
Paper
•
2409.01322
•
Published
•
94
upvoted
a
paper
15 days ago
upvoted
a
paper
16 days ago
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion
Paper
•
2408.00458
•
Published
•
10
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
Paper
•
2408.00735
•
Published
•
15
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
103
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
Paper
•
2408.01337
•
Published
•
10
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling
Paper
•
2408.01291
•
Published
•
11
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
Paper
•
2408.03284
•
Published
•
9
Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation
Paper
•
2408.03588
•
Published
•
6
Fast Sprite Decomposition from Animated Graphics
Paper
•
2408.03923
•
Published
•
7
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches
Paper
•
2408.04567
•
Published
•
23
upvoted
an
article
about 1 month ago
Article
A Complete Guide to Audio Datasets
•
16
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Paper
•
2403.14610
•
Published
•
3
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Paper
•
2407.11398
•
Published
•
8
Kinetic Typography Diffusion Model
Paper
•
2407.10476
•
Published
•
1
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Paper
•
2407.19548
•
Published
•
22
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
Paper
•
2407.19474
•
Published
•
22
Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
Paper
•
2407.19593
•
Published
•
12
Artist: Aesthetically Controllable Text-Driven Stylization without Training
Paper
•
2407.15842
•
Published
•
13
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
Paper
•
2407.10738
•
Published
•
3
DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
Paper
•
2407.16260
•
Published
•
1
SHIC: Shape-Image Correspondences with no Keypoint Supervision
Paper
•
2407.18907
•
Published
•
38
Text2Place: Affordance-aware Text Guided Human Placement
Paper
•
2407.15446
•
Published
•
2
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
Paper
•
2407.17952
•
Published
•
27
Floating No More: Object-Ground Reconstruction from a Single Image
Paper
•
2407.18914
•
Published
•
18
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper
•
2407.14177
•
Published
•
42
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Paper
•
2407.01494
•
Published
•
13
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper
•
2407.02869
•
Published
•
18
Video-to-Audio Generation with Hidden Alignment
Paper
•
2407.07464
•
Published
•
16
Still-Moving: Customized Video Generation without Customized Video Data
Paper
•
2407.08674
•
Published
•
11
Video Diffusion Alignment via Reward Gradients
Paper
•
2407.08737
•
Published
•
47
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Paper
•
2407.10387
•
Published
•
6
IMAGDressing-v1: Customizable Virtual Dressing
Paper
•
2407.12705
•
Published
•
12
The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
Paper
•
2407.12579
•
Published
•
1
Shape of Motion: 4D Reconstruction from a Single Video
Paper
•
2407.13764
•
Published
•
19
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Paper
•
2407.14329
•
Published
•
4
Stable Audio Open
Paper
•
2407.14358
•
Published
•
22
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Paper
•
2407.15754
•
Published
•
19
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Paper
•
2407.15642
•
Published
•
10
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation
Paper
•
2407.15060
•
Published
•
9
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
Paper
•
2407.16655
•
Published
•
28
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
Paper
•
2407.16224
•
Published
•
23
Article
Image-based search engine
By
•
•
22Article
How I train a LoRA: m3lt style training overview
By
•
•
45upvoted
a
paper
3 months ago
upvoted
an
article
3 months ago
Article
Thoughts on LoRA Training #1
By
•
•
31