55
1
OmniPSD: Layered PSD Generation with Diffusion Transformer (showlab.github.io)
2
NaTex: Seamless Texture Generation as Latent Color Diffusion (natex-ldm.github.io)
1
Back to Basics: Let Denoising Generative Models Denoise (arxiv.org)
2
Veo 3.1 and new creative capabilities in the Gemini API (googleblog.com)
59
Linus Learns Analog Circuits (github.com/torvalds)
2
UnifoLM-WMA-0: A World-Model-Action (WMA) Framework Under UnifoLM Family (unigen-x.github.io)
73
GLM-4.5: Reasoning, Coding, and Agentic Abililties (z.ai)
3
A retro gaming YouTuber faces possible jail time for reviewing gaming handhelds (androidauthority.com)
2
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets (arxiv.org)
2
Sparse Representation and Construction for High-Resolution 3D Shapes Modeling (lizhihao6.github.io)
81
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation (lllyasviel.github.io)
2
Fiction.LiveBench: The First Long Context Benchmark for Writers (fiction.live)
1
ReCamMaster: Camera-Controlled Generative Rendering from a Single Video (jianhongbai.github.io)
1
Charting and Navigating Hugging Face's Model Atlas (horwitz.ai)
45
Block Diffusion: Interpolating between autoregressive and diffusion models (arxiv.org)
3
iGenius Releases Colosseum 355B (igenius.ai)
1
Framer: Interactive Frame Interpolation (aim-uofa.github.io)
1
VidPanos: Generative Panoramic Videos from Casual Panning Videos (vidpanos.github.io)
4
New OpenAI Whisper model: "turbo" (github.com/openai)
1
Kolmogorov-Arnold Transformer (arxiv.org)
2
Breaking ReCAPTCHAv2 (arxiv.org)
4
Driverless semis could be months away (arstechnica.com)
3
WavTokenizer: An Efficient Acoustic Discrete Codec Tokenizer for Audio Language (arxiv.org)
1
The Llama 3 Herd of Models (arxiv.org)
25
Xbox console sales continue to crater with 42% revenue drop (arstechnica.com)
3
Midjourney v6.1 (midjourney.com)
4
Azure Llama 3.1 Benchmarks (reddit.com)
2
Solid State Batteries Are Here: Yoshino Power Station (undecidedmf.com)
1
PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings (lllyasviel.github.io)
1
UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks (jingjingrenabc.github.io)
4
Adversarial Perturbations Cannot Reliably Protect Artists from Generative AI (arxiv.org)
2
Gen-3 Alpha (runwayml.com)
1
Massively Multimodal Masked Modeling (epfl.ch)
1
An Image Is Worth 32 Tokens for Reconstruction and Generation (yucornetto.github.io)
1
Human4DiT: Free-View Human Video Generation with 4D Diffusion Transformer (human4dit.github.io)
3
What If We Recaption Billions of Web Images with LLaMA-3? (haqtu.me)
2
Hacking Hundreds of Wii Us at Once (reversing.live)
2
SuperGaussian: Repurposing Video Models for 3D Super Resolution (supergaussian.github.io)
1
iOS 18 AI boost could be called 'Apple Intelligence' (appleinsider.com)
3
ToonCrafter: Generative Cartoon Interpolation (doubiiu.github.io)
1
Aya 23: Open Weight Releases to Further Multilingual Progress (cohere.com)
1
CAT3D: Create Anything in 3D with Multi-View Diffusion Models (cat3d.github.io)
26
ElevenLabs Music (twitter.com/elevenlabsio)
1
A Careful Examination of LLM Performance on Grade School Arithmetic (arxiv.org)
4
Rabbit R1 source code analysis by Retr0id (github.com/rabbitscam)
0
MeshLRM: Large Reconstruction Model for High-Quality Meshes (sarahweiii.github.io)
17
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior (animate-your-word.github.io)
2
Bringing generative AI to video editing workflows in Adobe Premiere Pro (adobe.com)
1
Open model Command R+ beats GPT-4 in the LMSYS Chatbot Arena (reddit.com)
3
Mixture-of-Depths: Dynamically allocating compute in transformer language models (arxiv.org)
31
Qwen1.5-Moe: Matching 7B Model Performance with 1/3 Activated Parameters (qwenlm.github.io)
38
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (jasonppy.github.io)
1
Claude 3 Haiku is ranked #6 on LLM arena (huggingface.co)
0
A new way to search and connect. Only on Android (ai.android)
1
Possible Mistral Medium model leak? (twitter.com/qtnx_)
1
Moe-LLaVA: Mixture of Experts for Large Vision-Language Models (github.com/pku-yuangroup)
2
SupIR: Revolutionizing image restoration with cutting-edge large-scale AI (xpixel.group)
1
AutoRT: Foundation Models for Large Scale Orchestration of Robotic Agents (auto-rt.github.io)
3
Midjourney V6 photorealistic images collection (reddit.com)
2
Gemini vs GPT-4V: A Comparison and Combination of VLMs Through Qualitative Cases (arxiv.org)
2
CoSeR: Bridging Image and Language for Cognitive Super-Resolution (coser-main.github.io)
1
VecFusion: Vector Font Generation with Diffusion (arxiv.org)
49
ReconFusion: 3D Reconstruction with Diffusion Priors (reconfusion.github.io)
24
Music ControlNet: Multiple Time-Varying Controls for Music Generation (musiccontrolnet.github.io)
2
Instant3D: Fast Text-to-3D with Sparse-View Generation (jiahao.ai)
1
LRM: Large Reconstruction Model for Single Image to 3D (yiconghong.me)
1
OpenAI Consistency Decoder (github.com/openai)
50
Playing Pokemon Red with Reinforcement Learning (github.com/pwhiddy)
10
I ask DALLE-3 to generate a Pepe but each time I tell it to make it “more rare.” (twitter.com/willdepue)
25
Q-Transformer: Scalable Reinforcement Learning via Autoregressive Q-Functions (q-transformer.github.io)
4
Stable Diffusion XL Inpainting model released (huggingface.co)
2
YouTube uses AI to summarize videos in latest test (theverge.com)
2
AvatarVerse: High-Quality and Stable 3D Avatar Creation from Text and Pose (avatarverse3d.github.io)
3
JEN-1: Text-Guided Music Generation with Omnidirectional Diffusion Models (futureverse.com)
55
Magic123: One Image to High-Quality 3D Object Generation (guochengqian.github.io)
1
[PDF] Scaling TransNormer to 175B Parameters (arxiv.org)
8
Announcing SDXL 1.0 (stability.ai)
1
AUTOMATIC1111 webui updated to v1.5 (github.com/automatic1111)
1
Brain2Music: Reconstructing Music from Human Brain Activity (google-research.github.io)
2
Video2dataset: A simple tool for large video dataset curation (laion.ai)
60
Stable Diffusion XL technical report [pdf] (github.com/stability-ai)
1
DragDiffusion: Diffusion Models for Interactive Point-Based Image Editing (arxiv.org)
3
Yuzu: Progress Report May 2023 (yuzu-emu.org)
2
Rerender a Video: Zero-Shot Text-Guided Video-to-Video Translation (anonymous-31415926.github.io)
1
Boot: Data-Free Distillation of Denoising Diffusion Models with Bootstrapping (jiataogu.me)
2
VideoComposer: Compositional Video Synthesis with Motion Controllability (videocomposer.github.io)
1
Video Adapter: Efficient Adaption of Text-to-Video Foundation Models (video-adapter.github.io)
5
ControlNet for QR Code (reddit.com)
1
No positional encoding outperforms all positional encoding variants in decoders (twitter.com/a_kazemnejad)
1
Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments (arxiv.org)
2
Training Diffusion Models with Reinforcement Learning (rl-diffusion.github.io)
1
Key-Locked Rank One Editing for Text-to-Image Personalization (nvidia.com)
3
In-Context Learning Unlocked for Diffusion Models (zhendong-wang.github.io)
12
Text-to-Audio Generation Using Instruction Tuned LLM and Latent Diffusion Model (tango-web.github.io)
3
Training Stable Diffusion from Scratch for <$50k with MosaicML (mosaicml.com)
288
MiniGPT-4 (minigpt-4.github.io)
2
A new Paella: simple and efficient text-to-image generation (laion.ai)
2
ControlNet v1.1 (github.com/lllyasviel)
1