BigBasket SuperMart — AI Technology

Santosh Waddi

Published in

Bigbasket

7 min readFeb 3, 2023

Today most supermarts and physical stores in India provide manual billing at the billing counter. This has two issues:

It requires additional manpower, stickers, and repeated training for the in-store operational team as they scale.
In most stores, the billing counter is different from the weighing counters which adds to the friction in the purchase journey of a customer. How many times did we lose the sticker at the billing counter and run back to the weighing counters?

For Bigbasket to tackle the above problems in its Bigbasket supermarts, we have used some of the latest and advanced AI technologies which help to club billing and weighing counters and reduce the cost of manpower needed to weigh the loose items.

For the customers, we enabled a seamless self-billing and self-checkout experience. We used a computer vision deep learning model in the detection of unpacked/packed/loose items in fruits, vegetables, and FMCG categories at the time of billing.

This helps us to have a seamless experience of adding both the item and weights to the shopping cart.

The following diagram shows how the customer will use the self-billing counter

The following video shows how billing happens at the self-checkout counter.

self-checkout_demo_video.mp4

Our Inference Engine is a cloud-based solution that is used for detecting the live image for identifying the SKU. It is invoked by the store terminals sending the compressed format of a live image and returning SKU Id as a response. This SKU id is looked up by internal services to load the SKU details like name, description, pricing, etc.

Insights

Current AI models support about 5000+ SKUs across FNV and FMCG items.
20+ Big Basket stores across 3 cities are using AI-based check out the technology.
About 30K+ detections happen per day which is about 2K+ average detections per hour and during peak hours it is about 3.3K+ detections.
All detections happen in the cloud with an SLA < 300 ms.
About 7 million images are used for training the models.

Deep Learning models

We have used three deep-learning models in our pipeline. The summary of each model is explained below

FMCG Model

We have used image-based classification for FMCG product detection.
Our model network consists of RESNET 152 [2] architecture followed by fully connected layers based on total classes.
Out of 66 Million parameters, 23 Million parameters are set as trainable parameters, and the remaining are frozen. This helped us use fewer images at the time of training.
Pytorch framework has been used. Initial parameters are obtained using transfer learning [3] from the imagenet model. This helped us in faster convergence and reduced total training time.

FnV and FnV packed Models

We have used object recognition for FnV and FnV-packed product detection. For many SKUs, we would require a piece count.
Our model network consists of Yolov4 architecture with a darknet framework. This helped us in faster inference time.

Challenges in Product Detection

Considering the wide variety of products and combinations, we had to solve a fair number of challenges to improve our detection mechanisms. To handle these challenges we developed a proprietary algorithm used for linking similar items in a “parent-child” relationship. The algorithm, along with classifying each image, uses the weight information and text on the product label to resolve the correct product. Please find the list of challenges below and in the next sections, we explained them with examples and how we have handled them with our design.

Products have almost all sides very similar except for some text on the product
Products having most of the surfaces similar view except the front surface
Products look visually very similar and have a weighted variant
Products look almost identical on all the surfaces

Products have almost all sides very similar except for some text on the product

Few products from the same brand having different flavours will have almost similar packing and only small text on the image will have the difference. A few examples are listed below

Examples

During training

To handle this scenario, we create a new parent class combining similar SKUs images.

In the above example all the child SKU ids are mapped to the parent class id Nivea Men and all the child SKU id images are used for training the Nivea Men class.

During inference

At the time of inference, we use a text detection algorithm to map the correct child SKU. We have used Amazon Rekognition API [1] to extract the text on the image.

Products having most of the surfaces similar view except the front surface

A major challenge in this scenario is that we can’t always expect the user to place a particular product on the front surface. In these scenarios when the user places the product on the back side, then we need to send a feedback message to the customer “Apologies, kindly turn the item to the front side!”. A few examples are listed below

Examples

During training

To handle this scenario, we create a new class and train the model with those problems facing surface images.

For the above SKUs we have created an issue class id Dairy milk — back side, all the backside images are used for training the model

During inference

When our model detects that new issue class then we send an appropriate feedback message to the customer

Products look visually very similar and have a weighted variant

Few products of the same brand and same variety will have different quantity variants but they look visually very similar with a slight difference in package size. Even the text is similar here. Considering there will be a small change in camera height across the counters identifying the actual SKU id with the size variation will be very challenging. A few examples are listed below

Examples

During training

To handle this scenario we create a new class called the parent class of those child SKUs. Here we train all the child SKU ids images as one class called parent class.

During inference

When the model detects the parent class we use weight information to map the correct child SKU id.

Products look almost identical on all the surfaces

Few products look almost identical on all the surfaces and they might have similar weights and shapes etc. In those scenarios vision-based detection is not possible, we might need external devices to bill those SKU items. A few examples are listed below

Examples