Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
SimCLR for Data-Starved Perception
Compare traditional texture feature engineering with SimCLR self-supervised learning for monochrome cork defect classification, showing learned embeddings outperform standard CNNs on data-scarce tasks.
As with many others who did early AI/ML research in the 2000s–10s, long before deep learning dominated vision tasks, I approached image classification using white-box pipelines grounded in feature engineering and psychophysics. The task at hand, monochrome cork defect classification, relied on perceptual cues humans use such as contrast, granularity, and directionality, and on engineered texture descriptors to capture them (Laws’ Textural Energy Measures, Gabor wavelets, second-order grayscale metrics, model-based textural analysis, and 3D texture analysis). Eye-tracking provided important saliency information that informed the textural analysis work. This produced an interpretable system where most of the “intelligence” lived in the feature extraction itself, and the classifier, here a simple back-propagation network with C-fuzzy activation rules, served only as the final decision layer. The system worked because the features were informed by how humans actually perceive texture, a fully transparent, explainable workflow.
Fast-forward to today’s deep learning era. My first attempt to test how well deep learning sees textures used a standard ResNet classifier with transfer learning. It failed catastrophically: ResNet architectures are optimized for large, labeled, natural-image datasets with color and semantic structure, not small, grayscale, high-frequency texture datasets like cork. The network overfit quickly and struggled to learn meaningful texture embeddings from limited supervised labels. This led me to evaluate SimCLR, a self-supervised contrastive method developed at Google Research. SimCLR does not require labels to learn representations; instead, it learns invariances through augmentations, making it far better suited for texture-rich, monochrome data. Because cork images differ mostly in subtle local patterns, contrastive learning is able to discover structure that supervised ResNets fail to capture. This is still a work in progress with much to explore, but SimCLR seems to automate what I once had to engineer by hand: learning robust perceptual features directly from the data.
In this demo, I will compare the two paradigms and show my progress with SimCLR, and why learned embeddings outperform standard deep CNNs for this type of feature-poor, data-starved perception task. The goal is not only to report results, but to reflect on what each method teaches us about perception, representation, and how machines “learn to see.”
- PythonPython: The high-level, general-purpose language built for readability, powering everything from web backends to advanced machine learning models.Python is the high-level, general-purpose language prioritizing clear, readable syntax (via significant indentation), ensuring rapid development for any team . Its ecosystem is massive: use it for robust web development with frameworks like Django and Flask, or leverage its power in data science with libraries such as Pandas and NumPy . The Python Package Index (PyPI) provides thousands of community-contributed modules, offering immediate solutions for tasks from network programming to GUI creation . The language is actively maintained by the Python Software Foundation (PSF), with the stable release currently at Python 3.14.0 (as of November 2025) .
- PyTorchPyTorch is the open-source machine learning framework: it provides a Python-first tensor library with strong GPU acceleration and a dynamic computation graph for building deep neural networks.PyTorch, developed by Meta AI, is a premier open-source deep learning framework favored in both research and production environments. Its core is a powerful tensor library (like NumPy) optimized for GPU acceleration, delivering 50x or greater speedups for complex computations. The key differentiator is its 'Pythonic' design and dynamic computation graph (eager execution), which allows for rapid prototyping and simplified debugging compared to static-graph frameworks. Leveraging its Autograd system for automatic differentiation, practitioners build and train models for computer vision and NLP; major companies like Tesla (Autopilot) and Microsoft utilize PyTorch for critical AI applications.
- scikit-learnScikit-learn (sklearn) is the essential Python library for efficient, production-ready machine learning, built on NumPy and SciPy.Scikit-learn (sklearn) is the industry-standard Python library, providing a unified API for efficient predictive data analysis. It delivers robust, open-source implementations of core machine learning algorithms: classification (e.g., Support Vector Machines, Random Forests), regression (e.g., Linear Regression), and clustering (K-Means, DBSCAN). Built on the foundational scientific stack (NumPy, SciPy), its consistent Estimator API simplifies complex data science workflows. Developers use it to quickly move from data preprocessing (StandardScaler) to model evaluation (accuracy_score) and pipeline construction in production environments.
- PillowThe essential Python library for resizing, padding, and normalizing image data to meet strict VLM architectural constraints.Pillow manages the critical transformation layer between raw visual files and model-ready tensors. It standardizes disparate inputs into the fixed resolutions (often 224x224 or 336x336) required by architectures like CLIP or LLaVA. By leveraging high-quality resampling filters (such as Lanczos) and precise canvas padding, the library prevents aspect ratio distortion that degrades zero-shot performance. This ensures every pixel aligns perfectly with the spatial expectations of the Vision Transformer (ViT) backbone.
Related projects
A Google-Like Search Experience for Your Photo Library with Opensource Tools
Raleigh
Learn how to embed, index, and search photo libraries using FiftyOne's Brain similarity and CLIP, enabling visual similarity,…
From Chaos to Concurrency: Building a Scalable Medical Document Processor with AI
Pereira
Learn how we transformed messy medical PDFs, DOCX, and images into structured data using OCR, AI classification, prompt…
The visual search engine for statistical data
Lausanne
Learn how a visual search engine transforms statistical queries into trusted, multilingual results with instant interactive charts and…
Image-Aware Content Extraction - Building custom models for specific use cases
Berlin
This talk covers building custom layout detection models to extract and preserve relevant images from documents, enabling combined…
Datamaps are 2D maps which visualize hyperdimensional latent spaces
Seattle
This talk introduces datamaps, a 2D visualization for embedding vectors, showing a live demo using Ai2 Semantic Scholar…
Computer Vision.
Medellín
Learn how to train and perform inference with a multi‑purpose semantic segmentation model using TensorFlow on Databricks, covering…