Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Ollama Groq Local Inference
Explore deploying and optimizing LLMs locally using Ollama and Groq. Learn quantization, memory optimization, and batching for efficient local inference with real benchmarks.
En esta demostración exploraremos cómo desplegar y optimizar modelos de lenguaje (LLMs) en hardware local, enfocándonos en estrategias prácticas de inferencia para máxima eficiencia. Demostraremos la implementación de modelos como Ollama, Groq y otros frameworks, incluyendo técnicas de cuantización, optimización de memoria y batching para lograr latencias bajas y throughput alto en sistemas locales. Se incluirán benchmarks reales y comparativas de rendimiento.
- Llama-2Llama 2 is Meta AI's powerful, openly accessible family of large language models (LLMs), featuring models from 7B to 70B parameters for research and commercial applications.Llama 2 is Meta AI's next-generation LLM family, released for free research and commercial use. The collection includes both pre-trained foundation models and instruction-tuned 'Chat' variants, scaling from 7 billion (7B) up to 70 billion (70B) parameters. Key technical upgrades over Llama 1 involve training on 2 trillion tokens (40% more data) and doubling the context length to 4096 tokens. The Llama-2-chat models were rigorously aligned using Reinforcement Learning from Human Feedback (RLHF), positioning them as a top-tier, openly available option for developers building advanced generative AI solutions.
- MistralFrontier AI models (LLMs) from Paris: delivering top-tier performance and efficiency through open-source innovation and optimized architecture.Mistral AI is the Paris-based frontier AI startup, founded in April 2023 by ex-Google DeepMind and Meta researchers (Arthur Mensch, Guillaume Lample, Timothée Lacroix). We challenge opaque 'big AI' with a mission to democratize advanced models: focusing on open-source, efficiency, and performance. Our technology, including the 123B parameter Mistral Large 2 and sparse Mixture of Experts (MoE) architecture, consistently delivers state-of-the-art results at significantly lower costs. We provide enterprise-grade solutions (Mistral AI Studio, Le Chat) for custom deployment, fine-tuning, and full data control. We are scaling fast: a $14 billion valuation confirms our position as a global leader in accessible, powerful generative AI.
- TransformersThe deep learning architecture that revolutionized sequence modeling (NLP, vision) by replacing recurrent units with a parallelizable multi-head self-attention mechanism.The Transformer: a neural network architecture introduced in the landmark 2017 paper, "Attention Is All You Need." It eliminated the sequential processing bottleneck of prior Recurrent Neural Networks (RNNs) by relying solely on self-attention, enabling massive parallelization and significantly faster training (up to 10x faster) on modern hardware. This efficiency allowed for the creation of large-scale pre-trained models: BERT (encoder-only) and the generative GPT series (decoder-only). The architecture is now foundational to all modern Large Language Models (LLMs) and drives the current state-of-the-art in AI.
- OllamaDeploy and run open-source Large Language Models (LLMs) like Llama 3 and Mistral locally on your machine: achieve private, cost-effective AI via a simple command-line interface.Ollama is the essential tool for running LLMs locally: consider it the Docker for AI models. It packages complex models and dependencies into a single, easy-to-use application for macOS, Linux, and Windows systems. You get immediate access to models like Gemma 2 and DeepSeek-R1 via a straightforward CLI or REST API. This local-first approach guarantees data privacy and security, eliminating cloud dependency and high API costs. Ollama also optimizes performance on consumer hardware using techniques like quantization, ensuring efficient execution even on standard desktops.
- DockerDocker is the open-source platform that packages applications and dependencies into standardized, portable containers for consistent execution across any environment.Docker is the industry-standard containerization platform, enabling developers to build, ship, and run applications efficiently. It uses the Docker Engine (the core runtime) to create lightweight, isolated environments called containers: these units bundle an application’s code, libraries, and configuration. This self-contained approach guarantees consistency, eliminating the 'it works on my machine' problem across development, testing, and production environments (local workstations, cloud, or on-premises). Docker debuted in 2013 and now serves over 20 million developers monthly, simplifying complex workflows like CI/CD and microservices architecture by leveraging tools like Docker Hub for image sharing and Docker Compose for multi-container applications.
Related projects
IDEAR
Bogotá
The talk explains a web app delivering interactive 3D product models, its analytics pipeline, infrastructure choices, and compares…
Plataformas de IA agéntica de código abierto
Manizales
La charla aborda la instalación, uso y análisis de plataformas de IA agéntica de código abierto como OpenManus,…
cli_engineer
Manizales
Explores the architecture, principles, and operation of a fully automated agency-style software development system, showing how to build,…
De la pereza a la automatización
Manizales
Explores practical automation methods learned over years, showing how simple tools and AI can streamline everyday projects and…
Analisis de comentarios
Bogotá
This talk explores using Langchain and LLMs to analyze company comments cost-effectively, providing adaptable insights based on real…
Manizales 1900 - 1930: Un Viaje al Pasado Restaurado con IA
Manizales
The talk demonstrates how generative AI restores and animates 1900‑1930 Manizales photographs, detailing the technical workflow, challenges, historical…