CityPulseNYC: Multi-Modal RAG

Demonstrates building CityPulseNYC, a borough‑based video platform using Whisper transcription, LLaVA vision, 384‑dim embeddings, PGVector search, two‑stage retrieval, activity scoring, and async processing.

Overview

I’ll demonstarate CityPulseNYC, a hyperlocal video platform for NYC residents and tourists that lests users discover whats happening in their borough through semantic search and video feeds.

Technical Walkthrough will show:

THE SOFTWARE:
Borough based video feed(Manhattan, Brooklyn, Queens, Bronx, Staten Island) with activity based ranking

“Ask NYC” a natural language search across all boroughs and all kinds of video content(“What festivals are happening in Brooklyn this week?”)

3 Tier processing making videos playable in 5 seconds or less, fully searchable in 90 seconds

THE IMPLEMENTATION:
Multimodal RAG pipeline: Whisper transcription+ LLaVa Vision analysis -> 384 dim embeddings -> PG Vector Search

Two stage result retrieval: Strict semantic search falling back to expanded time window

Activity Scoring Algorithm prioritizing results related to user search

ARCHITECTURE:
DB schema, Software Patterns used

Live Demo will include actual NYC video content, SQL Queries showing vector operations

Tech stack