RAG authorization to protect sensitive data | Tokyo .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

January 15, 2026 · Tokyo

RAG Authorization for Sensitive Data

Learn how to protect sensitive data in RAG pipelines using user permissions and relationship-based access control for granular security.

Overview
Tech stack
  • Chroma
    Chroma is the open-source vector database engineered for AI: it simplifies the storage and retrieval of vector embeddings for large language models (LLMs).
    Chroma functions as the critical memory layer for modern Generative AI applications, specifically powering Retrieval-Augmented Generation (RAG). It stores vector embeddings (numerical representations of unstructured data like text or images) and associated metadata. This architecture enables low-latency, high-accuracy similarity searches using metrics like cosine distance. Developers can deploy it locally or use the managed Chroma Cloud, leveraging Python and JavaScript/TypeScript SDKs for rapid prototyping and production-scale LLM context retrieval.
  • LangChain
    The open-source framework for building and deploying reliable, data-aware Large Language Model (LLM) applications.
    LangChain is the essential framework for engineering LLM-powered applications: it simplifies connecting models (like GPT-4 or Claude) to external data, computation, and APIs. The platform provides a modular set of components—Chains, Agents, Tools, and Memory—allowing developers to quickly build complex workflows like Retrieval-Augmented Generation (RAG) pipelines and sophisticated conversational agents. Its Python and JavaScript libraries, combined with LangChain Expression Language (LCEL), offer a standardized interface for rapid prototyping and moving applications to production with confidence.
  • OpenFGA
    OpenFGA is the high-performance, open-source Fine-Grained Authorization (FGA) engine, built on Google's Zanzibar model.
    OpenFGA delivers fast, scalable authorization, implementing a Relationship-Based Access Control (ReBAC) system inspired by Google's Zanzibar paper. It provides a simple, expressive modeling language for defining access policies and offers low-latency authorization checks via HTTP and gRPC APIs. As a CNCF Incubating Project, it is production-ready and supports multiple SDKs (Go, .NET, JavaScript, Python), enabling developers to centralize complex authorization logic outside application code for increased velocity and compliance.
  • RAG
    RAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.
    RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.
  • Vector database
    A vector database is a specialized system: it stores, indexes, and queries high-dimensional data embeddings for rapid, large-scale semantic similarity search.
    This technology is purpose-built to manage unstructured data (text, images, audio) by converting it into numerical arrays called vector embeddings (often 100 to 1,000+ dimensions). Unlike traditional databases, a vector database uses algorithms like HNSW (Hierarchical Navigable Small World) to index these vectors, enabling lightning-fast Approximate Nearest Neighbor (ANN) searches based on distance metrics (e.g., cosine similarity). This capability is critical for modern AI: it powers Retrieval-Augmented Generation (RAG) to provide contextual memory for Large Language Models (LLMs), drives semantic search engines, and delivers real-time, personalized recommendations with high recall accuracy.

Related projects