Articles by GaggiX
55

FLUX.2 [Klein]: Towards Interactive Visual Intelligence (bfl.ai)

1

OmniPSD: Layered PSD Generation with Diffusion Transformer (showlab.github.io)

2

NaTex: Seamless Texture Generation as Latent Color Diffusion (natex-ldm.github.io)

1

Back to Basics: Let Denoising Generative Models Denoise (arxiv.org)

2

Veo 3.1 and new creative capabilities in the Gemini API (googleblog.com)

59

Linus Learns Analog Circuits (github.com/torvalds)

2

UnifoLM-WMA-0: A World-Model-Action (WMA) Framework Under UnifoLM Family (unigen-x.github.io)

73

GLM-4.5: Reasoning, Coding, and Agentic Abililties (z.ai)

3

A retro gaming YouTuber faces possible jail time for reviewing gaming handhelds (androidauthority.com)

2

From Bytes to Ideas: Language Modeling with Autoregressive U-Nets (arxiv.org)

2

Sparse Representation and Construction for High-Resolution 3D Shapes Modeling (lizhihao6.github.io)

81

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation (lllyasviel.github.io)

2

Fiction.LiveBench: The First Long Context Benchmark for Writers (fiction.live)

1

ReCamMaster: Camera-Controlled Generative Rendering from a Single Video (jianhongbai.github.io)

1

Charting and Navigating Hugging Face's Model Atlas (horwitz.ai)

45

Block Diffusion: Interpolating between autoregressive and diffusion models (arxiv.org)

3

iGenius Releases Colosseum 355B (igenius.ai)

1

Framer: Interactive Frame Interpolation (aim-uofa.github.io)

1

VidPanos: Generative Panoramic Videos from Casual Panning Videos (vidpanos.github.io)

4

New OpenAI Whisper model: "turbo" (github.com/openai)

1

Kolmogorov-Arnold Transformer (arxiv.org)

2

Breaking ReCAPTCHAv2 (arxiv.org)

4

Driverless semis could be months away (arstechnica.com)

3

WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language (arxiv.org)

1

The Llama 3 Herd of Models (arxiv.org)

25

Xbox console sales continue to crater with 42% revenue drop (arstechnica.com)

3

Midjourney v6.1 (midjourney.com)

4

Azure Llama 3.1 Benchmarks (reddit.com)

2

Solid State Batteries Are Here: Yoshino Power Station (undecidedmf.com)

1

PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings (lllyasviel.github.io)

1

UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks (jingjingrenabc.github.io)

4

Adversarial Perturbations Cannot Reliably Protect Artists from Generative AI (arxiv.org)

2

Gen-3 Alpha (runwayml.com)

1

Massively Multimodal Masked Modeling (epfl.ch)

1

An Image Is Worth 32 Tokens for Reconstruction and Generation (yucornetto.github.io)

1

Human4DiT: Free-View Human Video Generation with 4D Diffusion Transformer (human4dit.github.io)

3

What If We Recaption Billions of Web Images with LLaMA-3? (haqtu.me)

2

Hacking Hundreds of Wii Us at Once (reversing.live)

2

SuperGaussian: Repurposing Video Models for 3D Super Resolution (supergaussian.github.io)

1

iOS 18 AI boost could be called 'Apple Intelligence' (appleinsider.com)

3

ToonCrafter: Generative Cartoon Interpolation (doubiiu.github.io)

1

Aya 23: Open Weight Releases to Further Multilingual Progress (cohere.com)

1

CAT3D: Create Anything in 3D with Multi-View Diffusion Models (cat3d.github.io)

26

ElevenLabs Music (twitter.com/elevenlabsio)

1

A Careful Examination of LLM Performance on Grade School Arithmetic (arxiv.org)

4

Rabbit R1 source code analysis by Retr0id (github.com/rabbitscam)

0

MeshLRM: Large Reconstruction Model for High-Quality Meshes (sarahweiii.github.io)

17

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior (animate-your-word.github.io)

2

Bringing generative AI to video editing workflows in Adobe Premiere Pro (adobe.com)

1

Open model Command R+ beats GPT-4 in the LMSYS Chatbot Arena (reddit.com)

3

Mixture-of-Depths: Dynamically allocating compute in transformer language models (arxiv.org)

31

Qwen1.5-Moe: Matching 7B Model Performance with 1/3 Activated Parameters (qwenlm.github.io)

38

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (jasonppy.github.io)

1

Claude 3 Haiku is ranked #6 on LLM arena (huggingface.co)

0

A new way to search and connect. Only on Android (ai.android)

1

Possible Mistral Medium model leak? (twitter.com/qtnx_)

1

Moe-LLaVA: Mixture of Experts for Large Vision-Language Models (github.com/pku-yuangroup)

2

SupIR: Revolutionizing image restoration with cutting-edge large-scale AI (xpixel.group)

1

AutoRT: Foundation Models for Large Scale Orchestration of Robotic Agents (auto-rt.github.io)

3

Midjourney V6 photorealistic images collection (reddit.com)

2

Gemini vs GPT-4V: A Comparison and Combination of VLMs Through Qualitative Cases (arxiv.org)

2

CoSeR: Bridging Image and Language for Cognitive Super-Resolution (coser-main.github.io)

1

VecFusion: Vector Font Generation with Diffusion (arxiv.org)

49

ReconFusion: 3D Reconstruction with Diffusion Priors (reconfusion.github.io)

24

Music ControlNet: Multiple Time-Varying Controls for Music Generation (musiccontrolnet.github.io)

2

Instant3D: Fast Text-to-3D with Sparse-View Generation (jiahao.ai)

1

LRM: Large Reconstruction Model for Single Image to 3D (yiconghong.me)

1

OpenAI Consistency Decoder (github.com/openai)

50

Playing Pokemon Red with Reinforcement Learning (github.com/pwhiddy)

10

I ask DALLE-3 to generate a Pepe but each time I tell it to make it “more rare.” (twitter.com/willdepue)

25

Q-Transformer: Scalable Reinforcement Learning via Autoregressive Q-Functions (q-transformer.github.io)

4

Stable Diffusion XL Inpainting model released (huggingface.co)

2

YouTube uses AI to summarize videos in latest test (theverge.com)

2

AvatarVerse: High-Quality and Stable 3D Avatar Creation from Text and Pose (avatarverse3d.github.io)

3

JEN-1: Text-Guided Music Generation with Omnidirectional Diffusion Models (futureverse.com)

55

Magic123: One Image to High-Quality 3D Object Generation (guochengqian.github.io)

1

[PDF] Scaling TransNormer to 175B Parameters (arxiv.org)

8

Announcing SDXL 1.0 (stability.ai)

1

AUTOMATIC1111 webui updated to v1.5 (github.com/automatic1111)

1

Brain2Music: Reconstructing Music from Human Brain Activity (google-research.github.io)

2

Video2dataset: A simple tool for large video dataset curation (laion.ai)

60

Stable Diffusion XL technical report [pdf] (github.com/stability-ai)

1

DragDiffusion: Diffusion Models for Interactive Point-Based Image Editing (arxiv.org)

3

Yuzu: Progress Report May 2023 (yuzu-emu.org)

2

Rerender a Video: Zero-Shot Text-Guided Video-to-Video Translation (anonymous-31415926.github.io)

1

Boot: Data-Free Distillation of Denoising Diffusion Models with Bootstrapping (jiataogu.me)

2

VideoComposer: Compositional Video Synthesis with Motion Controllability (videocomposer.github.io)

1

Video Adapter: Efficient Adaption of Text-to-Video Foundation Models (video-adapter.github.io)

5

ControlNet for QR Code (reddit.com)

1

No positional encoding outperforms all positional encoding variants in decoders (twitter.com/a_kazemnejad)

1

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments (arxiv.org)

2

Training Diffusion Models with Reinforcement Learning (rl-diffusion.github.io)

1

Key-Locked Rank One Editing for Text-to-Image Personalization (nvidia.com)

3

In-Context Learning Unlocked for Diffusion Models (zhendong-wang.github.io)

12

Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model (tango-web.github.io)

3

Training Stable Diffusion from Scratch for <$50k with MosaicML (mosaicml.com)

288

MiniGPT-4 (minigpt-4.github.io)

2

A new Paella: simple and efficient text-to-image generation (laion.ai)

2

ControlNet v1.1 (github.com/lllyasviel)

1

GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models (nvlabs.github.io)