Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Edge AI Latency-Accuracy Trade-offs
An empirical study of partitioning, quantization, and early‑exit operators on mobile, edge, and cloud, showing latency‑accuracy trade‑offs and optimal hybrid configurations.
This presentation will explore the findings from our paper “On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance.” The study empirically evaluates how different Edge AI deployment strategies—specifically combinations of Partitioning, Quantization, and Early Exit operators—affect inference latency and accuracy across Mobile, Edge, and Cloud environments. Using ONNX-based models deployed in a containerized environment, the research examines both single- and multi-tier configurations (e.g., Mobile-Edge, Edge-Cloud) under varying network bandwidths. Results highlight that hybrid strategies, particularly Quantization + Early Exit on Edge, offer optimal latency-accuracy trade-offs for many real-world conditions, while Quantization alone is best when accuracy preservation is critical. These findings provide actionable insights for MLOps engineers seeking efficient, privacy-aware deployment strategies in heterogeneous Edge AI systems.
Benchmarks black-box edge operators using Dockerized Mobile/Edge/Cloud tiers and JSON data.