Retail Vision Workshop

Overview

Previous workshops: CVPR 2024, CVPR 2023, CVPR 2022, CVPR 2021, CVPR 2020

The rapid development in computer vision and machine learning has caused a major disruption in the retail industry in recent years. In addition to the rise of online shopping, traditional markets also quickly embraced AI-related technology solutions at the physical store level. Following the introduction of computer vision to the world of retail, a new set of challenges emerged. These challenges were further expanded with the introduction of image and video generation capabilities.

The physical domain exhibits challenges such as the detection of shopper and product interactions, fine-grained recognition of visually similar products, as well as new products that are introduced on a daily basis. The online domain contains similar challenges, but with their own twist. Product search and recognition is performed on more than 100,000 classes, each including images, textual captions, and text by users during their search. In addition to discriminative machine learning, image generation has also started being used for the generation of product images and virtual try-on.

All of these challenges are shared by different companies in the field, and are also at the heart of the computer vision community. This workshop aims to present the progress in these challenges and encourage the forming of a community for retail computer vision.

Accepted Papers

Best paper - "Recommendation By Generation: Generation Augmented Complementary Fashion Item Retrieval Using Incomplete Outfit". Gaurab Bhattacharya, Vivek B S, P. Rajith Bhargav, Jayavardhana Gubbi, Bagyalakshmi V and Arpan Pal.
Best paper - "SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation". Sarthak Srivastava and Kathy Wu.
"HyperVLM: Hyperbolic Space Guided Vision Language Modeling for Hierarchical Multi-Modal Understanding". Sarthak Srivastava and Kathy Wu.
"Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents". Janika Deborah Gajo, Gerarld Paul Merales, Jerome E. Escarcha, Brenden Ashley Molina, Gian Nartea, Emmanuel Maminta, Juan Carlos Roldan and Rowel Atienza.
"RetailAction: Dataset for Multi-View Spatio-Temporal Localization of Human-Object Interactions in Retail". Davide Mazzini, Alberto Raimondi, Bruno Abbate, Daniel Fischetti and David M Woollard.
"BARE: Body-Aware Refinement Extension for Human Mesh Recovery". Eli Alshan, Aaron Olender, Lior Fritz and Ianir Ideses.
"Relative Pose Regression with Pose Auto-Encoders: Enhanced Accuracy and Data Efficiency for Retail Applications". Yoli Shavit and Yosi Keller.
"DualFit: A Two-Stage Virtual Try-On via Warping and Synthesis". Minh Tran, Johnmark Clements, Annie Prasanna Manoharan, Tri Nguyen and Ngan Le.

Relevant Extras

"Kaputt: A Large-Scale Dataset for Visual Defect Detection". ICCV 2025. Sebastian Höfer, Dorian Henning, Artemij Amiranashvili, Douglas Morrison, Mariliza Tzes, Ingmar Posner, Marc Matvienko, Alessandro Rennola and Anton Milan.

The GroceryVision Challenge

This year's challenge focuses on shopping scenarios in physical stores. It is based on data collected by Amazon. We offer two distinct but complementary tracks: a track for action recognition and one for product verification. In the action recognition scenario, a model detects shopper actions such as putting a product in a basket. The verification track is about recognizing the product's identity. This is a unique opportunity to explore this domain.

GroceryVision Challenge Website

Invited Speakers

Hadar Averbuch-Elor - Keynote

Yair Adato

Shun Takeuchi

Ianir Ideses

Cedric Beliard - Keynote

Walmart Global Tech

Organizers

For questions about the workshop please contact Ehud Barnea (ehud.barnea at gmail dot com). For questions about the challenge please see challenge pages.