Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Automated Anime Video Processing
Learn how AI agents can automate video processing, cut hallucinations, and maintain consistency for polished, creative anime production.
The price of AI video models continues to fall, but handling hallucinations and cutting together a polished video is still a lot of work! Explore how AI agents can help you produce better videos and keep you in creative flow.
You will learn how to:
- Automatically cut hallucinations
- Compare cut video against the prompt
- Evaluating consistency against the rest of the video
- Google GeminiGemini is Google's most capable, multimodal AI model: it seamlessly reasons across text, code, audio, image, and video.Gemini is Google's foundational, multimodal AI model, engineered to natively understand and combine text, code, image, audio, and video inputs. The technology is optimized across three sizes: Ultra (for highly complex tasks), Pro (for broad task scaling), and Nano (for efficient on-device performance). Gemini Ultra, for example, achieved a 90.0% score on the MMLU benchmark, surpassing human experts. It functions as a powerful AI assistant, integrated across Google services like Gmail and Maps, and features advanced tools like Deep Research and custom AI experts (Gems). Its Pro version offers a long context window, handling up to 1,500 pages or 30k lines of code simultaneously.
- AutoWeebAutoWeeb delivers production-ready AI tools: generate consistent anime characters, convert photos to popular art styles (e.g., Demon Slayer), and build 360-degree cinematic scenes.This is the next-gen AI engine for anime creation: AutoWeeb eliminates the inconsistency issues plaguing other models. Our core technology focuses on two critical areas: character consistency and spatial coherence. Users upload a single photo and convert it directly to a chosen style (Bunny Girl Senpai, Cyberpunk, etc.), maintaining key features across all outputs. Furthermore, the platform utilizes 360-degree panoramic scene builders, giving creators a virtual camera to frame shots within a consistent 3D environment. This allows for professional cinematic storytelling and reliable asset generation for any scale project.
- ByteDanceByteDance is the $300 billion global technology giant: the engine behind TikTok and Douyin, driving personalized content discovery with proprietary AI algorithms.ByteDance, founded in 2012 by Zhang Yiming, is a world-leading internet technology company specializing in content platforms. The core technology is a sophisticated AI and machine learning system that powers personalized content feeds: this is the engine behind its flagship short-video apps, TikTok and Douyin. The company’s ecosystem, which also includes Toutiao and CapCut, serves over 3.5 billion monthly active users globally. As of late 2024, ByteDance is valued at approximately $300 billion, reporting a 2024 revenue of $155 billion: a clear indicator of its dominance in mobile entertainment and digital content distribution.
- SeedanceSeedance is ByteDance’s multimodal AI video model that generates cinematic, multi-shot narrative sequences with synchronized stereo audio.Developed by the team behind TikTok and CapCut, Seedance 2.0 moves beyond single-clip generation to produce cohesive, edited scenes up to 15 seconds long. The model employs a quad-modal input system (text, image, audio, and video) that allows creators to lock in specific character faces, motion styles, and soundscapes through an all-around reference architecture. By integrating dual-channel stereo technology, it ensures that sound effects like the strike of a match or the rustle of fabric are perfectly synced with the visual frame. This technology effectively shifts the AI workflow from asset generation to digital directing, providing professional-grade 1080p output with physical accuracy that rivals traditional production.
- Alibaba WANAlibaba's Wan AI is a leading, open-source video generation model, transforming text or images into high-quality, temporally consistent video clips with native audio support.This is Alibaba's advanced generative AI model, Wan AI: It specializes in high-fidelity video creation from text or image prompts. The latest iteration, Wan 2.6, delivers multi-shot narrative capabilities and native audio synchronization, a key differentiator. Wan supports up to 15 seconds of 1080p HD video, excelling at maintaining subject consistency and complex scene composition. We’re talking about a powerful, open-source architecture that provides a robust API for commercial-grade visual generation.
Related projects
Agent Written Daily Newsletter (No Human in the Loop)
Los Angeles
This talk details an agentic workflow autonomously producing a daily newsletter, replacing 3-5 hours of human work with…
Claude Code as an api
Los Angeles
Learn to use an open-source Claude Code model with Ollama for generating runnable code through agentic orchestration with…
Anote
New York City
This talk explores how the Anote platform uses human feedback to improve AI models like GPT-4 for specific…
AI for Feels, Not Just Tasks
Los Angeles
Learn how an AI‑driven emotional‑clarity tool uses user‑typed input and generated images to reveal patterns, then offers reflections,…
Training/Generating Absurd Cat Standup Videos
Los Angeles
The talk explains how to scrape and clean Seinfeld scripts, train an 8‑billion‑parameter Llama‑3 model for monologues, and…
Building a 10¢ Research Paper TTS Pipeline: Kokoro, Claude, and the Hidden Costs of Audio
Los Angeles
Building a 10¢ TTS pipeline using Claude and on-device Kokoro for research paper audio summaries, detailing cost trade-offs…