No Data for the Witch’s Cauldron: Using Safe, Open-Source AI Chats systems | Montreal .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

October 21, 2025 · Montreal

OpenWeb UI LiteLLM Safe Chat

Learn how to assemble an open‑source chat system with OpenWeb UI, LiteLLM, Azure/Bedrock models, internal RAG, voice, image, and code integration.

Overview
Links
Tech stack
  • AWS Bedrock
    AWS Bedrock is the fully managed, serverless platform providing a unified API gateway to diverse, high-performing foundation models (FMs) from providers like Anthropic, AI21 Labs, and Amazon Titan.
    Amazon Bedrock is your fully managed, serverless platform for building and scaling generative AI applications. It offers a single API to access a curated selection of industry-leading foundation models (FMs) from partners (e.g., Anthropic's Claude, Meta's Llama 3.1) and Amazon (Titan family). Developers leverage core features: use Knowledge Bases for Retrieval Augmented Generation (RAG) with proprietary data, deploy Agents for complex task automation, and implement Guardrails for responsible AI policies. This streamlined approach ensures enterprise-grade security and simplifies model customization (fine-tuning) without managing underlying infrastructure.
  • LiteLLM
    LiteLLM is the unified LLM gateway: call 100+ models (OpenAI, Anthropic, Azure, etc.) using a single, standardized OpenAI-compatible API.
    LiteLLM acts as your production-grade LLM gateway, simplifying complex multi-model deployments. It unifies over 100 LLM providers—including OpenAI, Anthropic, and VertexAI—under a single, consistent API call structure (the OpenAI format). This standardization eliminates SDK friction. Key features include the LiteLLM Router for automatic retry and fallback logic across deployments, ensuring high reliability. Additionally, the Proxy Server centralizes cost tracking, allows granular budget setting per virtual key, and provides load balancing, making it essential for ML Platform teams managing scalable, cost-optimized Gen AI applications.
  • Qdrant
    Qdrant is an open-source, Rust-powered vector database and search engine: it delivers high-performance, scalable similarity search for AI applications.
    Qdrant functions as a production-ready vector database, purpose-built in Rust for unmatched speed and reliability, even when processing billions of high-dimensional vectors. It provides a convenient API to store, search, and manage vector embeddings (points) along with optional metadata (payloads). Key features include advanced filtering on those payloads, support for multiple distance metrics (Cosine, Dot Product, Euclidean), and cloud-native scalability. Developers leverage Qdrant for critical AI workloads like Retrieval-Augmented Generation (RAG) systems and large-scale recommendation engines, deploying via Docker, self-hosting, or the managed Qdrant Cloud service.
  • MinIO
    MinIO: High-performance, S3-compatible object storage, optimized for AI/ML and cloud-native infrastructure.
    MinIO is a high-performance, cloud-native object storage server, fully compatible with the Amazon S3 API. Built on a lightweight, single-binary, shared-nothing architecture, it delivers industry-leading throughput for demanding workloads: think AI/ML, big data analytics, and containerized applications on Kubernetes. It is 100% open-source, licensed under GNU AGPLv3, and designed for global-scale deployment from the private cloud to the edge.
  • Kubernetes
    Kubernetes (K8s): Production-grade container orchestration: automate deployment, scaling, and management across your cluster.
    Kubernetes (K8s) is your control plane for planet-scale container orchestration: it automates the deployment, scaling, and management of containerized applications across your cluster. Built on 15 years of Google's production experience (Borg), K8s ensures your *desired state* is always maintained. Core resources like Pods, Deployments, and Services manage auto-scaling, load balancing, and self-healing for you. You interact directly with the API server using `kubectl` (the command-line tool) to execute zero-downtime rollouts and rapid rollbacks. As a CNCF project, it provides vendor-neutral flexibility for any infrastructure: cloud, on-premises, or hybrid.

Related projects