fffiloni (Sylvain Filoni)

upvoted an article 6 days ago

Article

"Diffusers Image Fill" guide

By

•

7 days ago

• 19

upvoted 3 papers 7 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 9 days ago • 51

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.07452 • Published 8 days ago • 18

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Paper • 2409.07450 • Published 8 days ago • 10

upvoted 6 papers 11 days ago

FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

Paper • 2409.02245 • Published 16 days ago • 9

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

Paper • 2409.03718 • Published 14 days ago • 24

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 17 days ago • 94

upvoted a paper 15 days ago

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 16 days ago • 84

upvoted a paper 16 days ago

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published 16 days ago • 32

upvoted 2 papers 21 days ago

Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9 • 8

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published 23 days ago • 27

upvoted 2 papers 22 days ago

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Paper • 2408.14211 • Published 25 days ago • 8

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published 24 days ago • 119

upvoted 9 papers about 1 month ago

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Paper • 2408.00458 • Published Aug 1 • 10

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Paper • 2408.00735 • Published Aug 1 • 15

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 103

MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

Paper • 2408.01337 • Published Aug 2 • 10

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Paper • 2408.01291 • Published Aug 2 • 11

ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Paper • 2408.03284 • Published Aug 6 • 9

Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation

Paper • 2408.03588 • Published Aug 7 • 6

Fast Sprite Decomposition from Animated Graphics

Paper • 2408.03923 • Published Aug 7 • 7

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Paper • 2408.04567 • Published Aug 8 • 23

upvoted an article about 1 month ago

Article

A Complete Guide to Audio Datasets

Dec 15, 2022

• 16

upvoted 30 papers about 2 months ago

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Paper • 2403.14610 • Published Mar 21 • 3

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Paper • 2407.11398 • Published Jul 16 • 8

Kinetic Typography Diffusion Model

Paper • 2407.10476 • Published Jul 15 • 1

Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

Paper • 2407.19548 • Published Jul 28 • 22

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Paper • 2407.19474 • Published Jul 28 • 22

Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture

Paper • 2407.19593 • Published Jul 28 • 12

Artist: Aesthetically Controllable Text-Driven Stylization without Training

Paper • 2407.15842 • Published Jul 22 • 13

AccDiffusion: An Accurate Method for Higher-Resolution Image Generation

Paper • 2407.10738 • Published Jul 15 • 3

DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

Paper • 2407.16260 • Published Jul 23 • 1

SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26 • 38

Text2Place: Affordance-aware Text Guided Human Placement

Paper • 2407.15446 • Published Jul 22 • 2

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Paper • 2407.17952 • Published Jul 25 • 27

Floating No More: Object-Ground Reconstruction from a Single Image

Paper • 2407.18914 • Published Jul 26 • 18

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

Paper • 2407.01494 • Published Jul 1 • 13

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3 • 18

Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published Jul 10 • 16

Still-Moving: Customized Video Generation without Customized Video Data

Paper • 2407.08674 • Published Jul 11 • 11

Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11 • 47

Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity

Paper • 2407.10387 • Published Jul 15 • 6

IMAGDressing-v1: Customizable Virtual Dressing

Paper • 2407.12705 • Published Jul 17 • 12

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

Paper • 2407.12579 • Published Jul 17 • 1

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published Jul 18 • 19

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Paper • 2407.14329 • Published Jul 19 • 4

Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 22

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Paper • 2407.15754 • Published Jul 22 • 19

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper • 2407.15642 • Published Jul 22 • 10

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21 • 9

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 28

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Paper • 2407.16224 • Published Jul 23 • 23

upvoted 2 articles 3 months ago

Article

Image-based search engine

By

•

Jul 4

• 22

Article

How I train a LoRA: m3lt style training overview

By

•

Jul 1

• 45

upvoted a paper 3 months ago

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Paper • 2403.13570 • Published Mar 20 • 3

upvoted an article 3 months ago

Article

Thoughts on LoRA Training #1

By

•

Jun 18

• 31

Sylvain Filoni

AI & ML interests

Articles

Breaking Barriers: The Critical Role of Art and Design in Advancing AI Capabilities

Organizations

fffiloni's activity

"Diffusers Image Fill" guide

A Complete Guide to Audio Datasets

Image-based search engine

How I train a LoRA: m3lt style training overview

Thoughts on LoRA Training #1