The rapid development in computer vision and machine learning has caused a major disruption in the retail industry. In addition to the rise of the web and online shopping, traditional markets also quickly embrace AI-related technology solutions at the physical store level. Following the introduction of computer vision to the world of retail a new set challenges emerged, such as the detection of products in crowded store displays, fine-grained classification of many visually similar classes, as well as dynamically adapting to changes in data in terms of class appearance variation over time, and new classes that may appear in the images before they are labeled in the dataset. The scene complexity, scale, class imbalance, lack of reliable supervised samples, and dynamic nature of the data, encourage solutions such as context based detection and classification, few-shot learning, uncertainty modeling and open set recognition, and so forth.This workshop aims to present and progress the revolution that is already occuring in the word of retail and welcomes any work on relevant computer vision challenges, including but not limited to:
- - Detection in densely packed scenes
- - Class imbalance and lack of labeled data. New classes introduced over time
- - Ultrafine-grained object classification: Classes are often virtually indistinguishable by visual appearance
- - Hierarchical classification: products fall into product, brand, and sub-brand hierarchies
- - Context modeling of geometric structures
- - Multi-person tracking
- - Recognition of actions such as taking/returning/examining products
We invite submissions of papers limited to 8 pages according to the CVPR format. Reviews are double blind and papers will be selected based on relevance, significance and clarity.
The workshop includes two challenges representing the difficulties of product detection and recognition "in the wild". Challenge competitors are invited to submit a paper presenting their approach and results. Please note that you do not have to submit a paper to participate in the challenges.
The world of retail takes the detection scenario to unexplored territories with millions of possible facets and hundreds of heavily crowded objects per image. This detection challenge is based on Trax’s data of supermarket shelves and pushes the limits of detection systems.
The AliProducts dataset consists of ∼3M images of ∼50K different products. The dataset covers many categories of daily commodities, including cosmetics, beverages, snacks, etc. The products are in SKU (Stock Keeping Unit) fine-grained level, and it may be difficult to distinguish between some of the products. Handling class imbalance and noisy training data are also highlights of this challenge.