Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Qwen-3-VL Sovereign Document Analytics
Learn document analytics in under 100 lines using Qwen-3-VL-4B-Instruct for PDF processing, OCR, structured output, and Text-to-SQL.
get started with Document analytics in < 100 lines of code.
PDF to Image -> VLM for OCR with Image -> Structured JSON Response -> Text to SQL with Tool Call -> Uses Qwen-3-VL-4B-Instruct
Agentic Tax Analytics system utilizes TypeScript/Python and dwani.ai for discovery.
dwani.ai's uberTax uses AI/RegTech to track global e-invoicing and EU regulatory mandates.
- Qwen3-VL-4B-InstructThis 4-billion-parameter model is the instruction-tuned variant of the Qwen3-VL series, engineered by Alibaba Cloud for rapid, practical deploymentThis 4-billion-parameter model is the instruction-tuned variant of the Qwen3-VL series, engineered by Alibaba Cloud for rapid, practical deployment. It excels at multimodal reasoning across image, video, and text inputs, with a core focus on following user prompts directly and efficiently. Key features include advanced Visual Agent capabilities (operating PC/mobile GUIs), expanded OCR supporting 32 languages, and a native 256K context window for long-horizon video understanding. The 'Instruct' tuning makes it the go-to choice for low-latency tasks: visual chatbots, document analysis, and quick visual Q&A.
- vLLMvLLM is the high-throughput, memory-efficient LLM inference engine: it leverages PagedAttention to maximize GPU utilization and cut serving costs.This is the engine for scaling LLM inference: vLLM (Virtual Large Language Model) is an open-source library engineered for high-throughput and low-latency serving. Its core innovation is PagedAttention, a memory management technique inspired by OS virtual memory, which efficiently handles the Key-Value (KV) cache. This optimization drastically reduces memory overhead—up to 90% in some reported cases—and allows for continuous batching of requests. The result: significantly higher request capacity on the same hardware, lower GPU usage, and a production-ready, cost-effective serving system that supports popular models like Llama and Mistral, complete with an OpenAI-compatible API server.
- llamaMeta's open-weights LLM family optimized for high-performance local deployment and custom fine-tuning across 8B to 405B parameter scales.Llama 3.1 delivers state-of-the-art performance through a flagship 405B parameter model trained on 15 trillion tokens. It supports a 128k context window: ideal for analyzing massive datasets or long-form documentation. Developers utilize Llama for diverse tasks (multilingual translation, Python code generation, and complex reasoning) while maintaining data sovereignty via local hosting. The ecosystem includes the Llama Stack for agentic workflows and optimized weights for 8B and 70B models, ensuring high throughput on consumer hardware or enterprise clusters.
- GradioGradio is the open-source Python library for rapidly building and sharing interactive web UIs for any machine learning model or Python function.Gradio is the essential tool for data scientists and ML engineers: it turns any Python function (including TensorFlow, PyTorch, and Hugging Face models) into a live, interactive web application with just a few lines of code. This open-source library eliminates the need for complex frontend development, handling all HTML, CSS, and JavaScript automatically. Developers define the function and specify inputs (e.g., 'text', 'image', 'slider') and outputs, then launch the interface locally, embed it in a notebook, or instantly generate a shareable public link. Gradio is widely adopted for quick prototyping, model demonstration, and deployment on platforms like Hugging Face Spaces, making complex models accessible to non-technical users for testing and feedback.
- FastAPIFastAPI is a modern, high-performance Python web framework for building APIs with automatic OpenAPI documentation.FastAPI is a robust, high-speed Python web framework: it is built on Starlette (for async capabilities) and Pydantic (for data validation and serialization). Leveraging standard Python 3.8+ type hints, the framework automatically generates interactive API documentation (Swagger UI/ReDoc) and enforces data validation, effectively reducing developer-induced errors by an estimated 40%. This architecture delivers performance on par with Node.js and Go, significantly increasing feature development speed (up to 300% faster). It is production-ready, fully supporting OpenAPI and JSON Schema standards for all API specifications.
Related projects
DocRouter.AI
Boston
We'll demonstrate how DocRouter.AI extracts full schemas from unstructured documents using a drag‑and‑drop UI, and how the Cursor…
Image-Aware Content Extraction - Building custom models for specific use cases
Berlin
This talk covers building custom layout detection models to extract and preserve relevant images from documents, enabling combined…
AI-Driven Workflow Validation for Invoices using AWS Bedrock
Singapore
This talk covers using AWS Bedrock to automate invoice validation and workflow assignment, improving accuracy and efficiency in…
A smart file browser
Bengaluru
This talk covers a desktop file browser with smart folder triggers, AI file creation, and offline TTS/STT using…
Generating "stories" from your digital trace data
Paris
The talk explores generating visual stories from Google Takeout data using open-weight transformer models, highlighting data pipelines, ML…
Docling: Get your Documents Ready for Gen AI
Zürich
A live coding demo showing Docling's AI-driven layout, table, and formula extraction to convert complex PDF documents into…