AI call centre on WhatsApp | Dhaka .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

November 01, 2025 · Dhaka

AI Call Centre via WhatsApp

Learn how an AI voice agent enables live WhatsApp calls, handling authentication, sales, and engagement, allowing SMBs to launch a 100‑channel call centre in minutes.

Overview
Links
Tech stack
  • RAG
    RAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.
    RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.
  • spaCy
    spaCy is the industrial-strength, open-source Python library for advanced Natural Language Processing (NLP).
    spaCy delivers industrial-strength Natural Language Processing in Python: It is a free, open-source library built by Explosion, designed specifically for production-grade applications . The library leverages Cython and focuses on state-of-the-art speed for large-scale text processing . Core capabilities include Named Entity Recognition (NER), Part-of-Speech (POS) tagging, and dependency parsing, utilizing pre-trained neural network models for 70+ languages . spaCy’s architecture, built on the Thinc machine learning library, ensures seamless integration with deep learning frameworks like PyTorch and TensorFlow .
  • LLM Fine-Tuning
    LLM Fine-Tuning: The process of adapting a pre-trained foundation model (e.g., Llama, Mistral) to a specific, high-value task using techniques like LoRA to boost accuracy and reduce compute costs.
    LLM Fine-Tuning is the essential post-training step: it shifts a generalist Large Language Model into a domain-specific specialist. We take a base model (like Llama 3.1 8B) and train it on a small, high-quality dataset, a process known as Supervised Fine-Tuning (SFT). Critical to efficiency are Parameter-Efficient Fine-Tuning (PEFT) methods: LoRA and QLoRA drastically reduce the trainable parameters, enabling customization of massive models (70B+) on less hardware. This results in superior, consistent performance for target applications, such as generating on-brand customer support responses or classifying legal documents with 99%+ accuracy.
  • WhatsApp
    WhatsApp is the secure, end-to-end encrypted instant messaging and VoIP platform connecting over 3 billion monthly active users globally.
    WhatsApp Messenger, a Meta Platforms property, operates as a cross-platform instant messaging (IM) and Voice over IP (VoIP) service. The application requires only a cellular mobile number for registration, enabling users to send text, voice, and video messages, plus make voice and video calls (desktop and mobile). Launched in May 2009, the service scaled rapidly: it hit 3 billion monthly active users as of May 2025, solidifying its position as the world's most-used messenger app. Facebook (now Meta) acquired WhatsApp Inc. in February 2014 for approximately US$19.3 billion, integrating its massive global reach into the Meta ecosystem.
  • Vectorization
    Vectorization is a CPU optimization technique that converts scalar operations to operate on a set of data (a vector) simultaneously, leveraging Single Instruction, Multiple Data (SIMD) hardware.
    Vectorization is mission-critical for high-performance computing: it exploits data-level parallelism by applying one instruction to multiple data elements concurrently (SIMD). Modern CPUs use wide vector registers—like Intel’s 512-bit Advanced Vector Extensions (AVX-512) or ARM’s NEON—to pack and process multiple data points, such as 16 single-precision floats, in a single clock cycle. This parallel execution dramatically reduces the instruction count and can deliver performance boosts of 4x to 16x over traditional scalar processing. Compilers often auto-vectorize simple loops, but maximum gains for compute-intensive workloads (e.g., matrix multiplication) often require manual vectorization using hardware intrinsics or optimized libraries (like Intel MKL).
  • Stanford NER
    A Java-based implementation of Conditional Random Field sequence models for labeling entities like persons, locations, and organizations.
    Stanford NER (Named Entity Recognition) provides high-performance tools for identifying rigid designators in text using Linear Chain CRF models. It ships with pre-trained classifiers for English (3-class, 4-class, and 7-class models), Chinese, and German, while allowing users to train custom models on their own labeled data. Developed by the Stanford Natural Language Processing Group, this software integrates seamlessly into the broader Stanford CoreNLP pipeline and remains a foundational standard for extracting structured data from unstructured strings.
  • Apache OpenNLP
    A machine learning toolkit for processing natural language text, specializing in tokenization, sentence segmentation, and entity recognition.
    Apache OpenNLP provides a robust Java library for common NLP tasks using Maximum Entropy and Perceptron-based machine learning. Developers use it to build production pipelines for named entity recognition (NER), part-of-speech tagging, and chunking. It includes pre-trained models for multiple languages (English, Spanish, German) and command-line tools for training custom models on domain-specific datasets.
  • NLTK
    NLTK (Natural Language Toolkit) is the leading Python platform for building programs to process human language data, providing core tools for NLP tasks.
    NLTK is the foundational, open-source Python library for Natural Language Processing (NLP): it’s the go-to platform for research, education, and prototyping. The toolkit delivers a comprehensive suite of modules for essential text processing, including tokenization, stemming, Part-of-Speech (POS) tagging, and classification. It provides easy-to-use interfaces to over 50 corpora and lexical resources (e.g., WordNet), enabling users to quickly implement core algorithms. Since its initial release in 2001, NLTK has remained a critical resource for applying statistical and symbolic NLP techniques across various academic and commercial projects.
  • Flair
    Flair is an AI-powered design tool that generates high-fidelity branded content and product photography in seconds.
    Flair streamlines the visual production pipeline for e-commerce brands by converting simple product photos into professional-grade marketing assets. Users place a product bottle or package on a digital canvas, describe the surrounding environment (e.g., 'on a marble countertop with soft morning sunlight'), and the engine renders a complete scene with realistic lighting and shadows. The platform handles complex tasks like background removal and upscaling, allowing a single operator to produce dozens of high-conversion social media ads and website banners without a physical studio or expensive photography crew.
  • Stanza
    A Pythonic NLP pipeline by Stanford providing high-accuracy linguistic analysis for 70+ human languages.
    Stanza delivers a suite of high-performance NLP tools built on PyTorch, including the official Python interface to the Stanford CoreNLP software. It handles everything from tokenization and multi-word token expansion to dependency parsing and named entity recognition (NER). By leveraging pre-trained neural models, Stanza achieves state-of-the-art results on the Universal Dependencies benchmarks across 70+ languages. It is the definitive choice for researchers requiring deep linguistic accuracy and developers needing a robust, multilingual pipeline for production-grade text processing.
  • Hugging Face Transformers
    The Hugging Face Transformers library is the premier open-source Python toolkit, providing a unified API for over 1M+ state-of-the-art pre-trained models (like BERT, GPT-3, T5) across NLP, vision, and audio tasks.
    Hugging Face Transformers is the essential open-source Python library for democratizing state-of-the-art machine learning. It delivers a unified, framework-agnostic API (PyTorch, TensorFlow) for accessing and utilizing over 1M+ pre-trained model checkpoints, including industry standards like BERT, GPT-2, and T5. Developers leverage the high-level `Pipeline` class for rapid, optimized inference (e.g., text generation, sentiment analysis) and the `Trainer` class for efficient fine-tuning and distributed training. This core library connects the ML community to the vast Hugging Face Hub, accelerating the deployment of models across text, vision, and audio modalities with minimal code.
  • BERT
    BERT (Bidirectional Encoder Representations from Transformers) is a foundational, pre-trained NLP model that uses a Transformer encoder to process text bidirectionally, capturing full word context for superior language understanding.
    BERT is a revolutionary language representation model introduced by Google AI Language in 2018. It is built on the Transformer architecture and distinguishes itself by being deeply bidirectional: it processes the entire sequence of words (left and right context) simultaneously, unlike previous unidirectional models. This capability is achieved through a Masked Language Model (MLM) pre-training objective. The model, released in sizes like BERTBASE (110 million parameters) and BERTLARGE (340 million parameters), dramatically improved the state-of-the-art across 11+ Natural Language Processing tasks, including question answering (SQuAD) and sentiment analysis, establishing a new baseline for the field.

Related projects