Rediscovering Perception for Data-Starved, Feature-Poor Data | Raleigh .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

December 10, 2025 · Raleigh

SimCLR for Data-Starved Perception

Compare traditional texture feature engineering with SimCLR self-supervised learning for monochrome cork defect classification, showing learned embeddings outperform standard CNNs on data-scarce tasks.

Overview
Tech stack
  • Python
    Python: The high-level, general-purpose language built for readability, powering everything from web backends to advanced machine learning models.
    Python is the high-level, general-purpose language prioritizing clear, readable syntax (via significant indentation), ensuring rapid development for any team . Its ecosystem is massive: use it for robust web development with frameworks like Django and Flask, or leverage its power in data science with libraries such as Pandas and NumPy . The Python Package Index (PyPI) provides thousands of community-contributed modules, offering immediate solutions for tasks from network programming to GUI creation . The language is actively maintained by the Python Software Foundation (PSF), with the stable release currently at Python 3.14.0 (as of November 2025) .
  • PyTorch
    PyTorch is the open-source machine learning framework: it provides a Python-first tensor library with strong GPU acceleration and a dynamic computation graph for building deep neural networks.
    PyTorch, developed by Meta AI, is a premier open-source deep learning framework favored in both research and production environments. Its core is a powerful tensor library (like NumPy) optimized for GPU acceleration, delivering 50x or greater speedups for complex computations. The key differentiator is its 'Pythonic' design and dynamic computation graph (eager execution), which allows for rapid prototyping and simplified debugging compared to static-graph frameworks. Leveraging its Autograd system for automatic differentiation, practitioners build and train models for computer vision and NLP; major companies like Tesla (Autopilot) and Microsoft utilize PyTorch for critical AI applications.
  • scikit-learn
    Scikit-learn (sklearn) is the essential Python library for efficient, production-ready machine learning, built on NumPy and SciPy.
    Scikit-learn (sklearn) is the industry-standard Python library, providing a unified API for efficient predictive data analysis. It delivers robust, open-source implementations of core machine learning algorithms: classification (e.g., Support Vector Machines, Random Forests), regression (e.g., Linear Regression), and clustering (K-Means, DBSCAN). Built on the foundational scientific stack (NumPy, SciPy), its consistent Estimator API simplifies complex data science workflows. Developers use it to quickly move from data preprocessing (StandardScaler) to model evaluation (accuracy_score) and pipeline construction in production environments.
  • Pillow
    The essential Python library for resizing, padding, and normalizing image data to meet strict VLM architectural constraints.
    Pillow manages the critical transformation layer between raw visual files and model-ready tensors. It standardizes disparate inputs into the fixed resolutions (often 224x224 or 336x336) required by architectures like CLIP or LLaVA. By leveraging high-quality resampling filters (such as Lanczos) and precise canvas padding, the library prevents aspect ratio distortion that degrades zero-shot performance. This ensures every pixel aligns perfectly with the spatial expectations of the Vision Transformer (ViT) backbone.

Related projects