Rediscovering Perception for Data-Starved, Feature-Poor Data

Compare traditional texture feature engineering with SimCLR self-supervised learning for monochrome cork defect classification, showing learned embeddings outperform standard CNNs on data-scarce tasks.

Overview

As with many others who did early AI/ML research in the 2000s–10s, long before deep learning dominated vision tasks, I approached image classification using white-box pipelines grounded in feature engineering and psychophysics. The task at hand, monochrome cork defect classification, relied on perceptual cues humans use such as contrast, granularity, and directionality, and on engineered texture descriptors to capture them (Laws’ Textural Energy Measures, Gabor wavelets, second-order grayscale metrics, model-based textural analysis, and 3D texture analysis). Eye-tracking provided important saliency information that informed the textural analysis work. This produced an interpretable system where most of the “intelligence” lived in the feature extraction itself, and the classifier, here a simple back-propagation network with C-fuzzy activation rules, served only as the final decision layer. The system worked because the features were informed by how humans actually perceive texture, a fully transparent, explainable workflow.

Fast-forward to today’s deep learning era. My first attempt to test how well deep learning sees textures used a standard ResNet classifier with transfer learning. It failed catastrophically: ResNet architectures are optimized for large, labeled, natural-image datasets with color and semantic structure, not small, grayscale, high-frequency texture datasets like cork. The network overfit quickly and struggled to learn meaningful texture embeddings from limited supervised labels. This led me to evaluate SimCLR, a self-supervised contrastive method developed at Google Research. SimCLR does not require labels to learn representations; instead, it learns invariances through augmentations, making it far better suited for texture-rich, monochrome data. Because cork images differ mostly in subtle local patterns, contrastive learning is able to discover structure that supervised ResNets fail to capture. This is still a work in progress with much to explore, but SimCLR seems to automate what I once had to engineer by hand: learning robust perceptual features directly from the data.

In this demo, I will compare the two paradigms and show my progress with SimCLR, and why learned embeddings outperform standard deep CNNs for this type of feature-poor, data-starved perception task. The goal is not only to report results, but to reflect on what each method teaches us about perception, representation, and how machines “learn to see.”

Tech stack