Improving automatic discrimination of logos with similar texts


Logo recognition is the task of identifying a specific logo and its location in images or videos. It helps create a safe and trustworthy shopping experience, for instance by recognizing images containing offensive symbols or corporate trademarks.

Related content

CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.

Logo recognition poses challenges that other image classification problems, such as recognizing cat or dog species, don’t, since the number of logo classes is typically an order of magnitude larger. Additionally, new logos, trademarks, and symbols are constantly being created.

In a paper my colleague Mark Hubenthal and I are presenting at the 2023 Winter Conference on Applications of Computer Vision (WACV), which starts next month, we address the problem of zero-shot logo recognition, where we do not have access to all the possible types of logos during model training.

The standard solution to this problem has two stages: (i) detecting all the possible image regions that might contain a logo and (ii) matching the detected regions against an ever-evolving set of logo prototypes. The matching process is challenging, especially for logos that are very similar to other logos or that contain a lot of text.

The pipeline for standard zero-shot logo recognition. The first step identifies all the possible logo regions, which are converted to dense vectors (embeddings) by an embedder. These dense vectors are compared to an ever-evolving reference dataset of logo classes. The output regions are assigned logo classes if the distances between their embeddings and the reference logo classes are lower than a threshold.

Our paper makes two major contributions. First, we demonstrate that leveraging image-text contrastive pretraining, which involves aligning the representation of an image with its text description, significantly alleviates the challenges of text-heavy logo matching. Second, we propose a metric-learning loss function — that is, a loss function that learns from the data how to measure similarity — that better separates highly related logo classes.

Related content

Machine learning method relies on coarse “bounding-box” image labels but still delivers state-of-the-art segmentation results.

In experiments on standard open-source logo recognition datasets, we compared our approach to the existing state of the art. We measured performance according to recall, or how often a model is able to identify the exact logo class compared to the total attempts. Our method achieves a new state of the art on five public logo datasets, with a 3.5% improvement in zero-shot recall on LogoDet3K test, 4% on OpenLogo, 6.5% on FlickrLogos-47, 6.2% on Logos In The Wild, and 0.6% on BelgaLogo.

Contrastive learning

Traditionally, logo recognition is treated as a specific instance of the general object detection problem. However, most commercial object detection systems assume a constant set of classes or categories during both training and inference. That assumption is often violated in logo recognition, due to new design patents and trademarks being registered or new offensive symbols being created in online forums.

Zero-shot logo recognition relies heavily on an embedding model for matching query regions against a constantly evolving set of cropped logo images. In previous work, Amazon researchers discovered that traditional pretrained computer vision models did a poor job representing text-heavy logo classes. They proposed using a separate text pipeline to extract the text in the image via optical character recognition (OCR) and using the text to augment a vision-based embedding.

Related content

New method uses cross-attention and multitask training to improve the accuracy and training efficiency of video moment retrieval.

In a number of recent works, researchers have discovered that image-text contrastive training — a type of metric learning — can help visual embedders implicitly recognize text in images. In contrastive training, a model is fed pairs of training examples; each pair contains either two positive examples or one positive example and one negative. The model learns to not only cluster positive examples together but to push positive examples away from negative examples.

In contrastive training, negative examples are typically chosen at random. But we further improve the separability of very similar logos by mining the training data for hard-negative examples — logos whose associated texts are similar to those associated with logos of different classes. For instance, “Heinz” is a hard negative for “Heineken”, since they share the same four starting letters.

During training, we explicitly pair positive examples with their hard negatives, to encourage the model to distinguish logos with similar texts. The combination of contrastive training and hard-negative example pairing is what enabled our model to establish new benchmarks in logo recognition.

Filled-in shapes represent learned proxy vectors (representing specific classes of logos), and unfilled shapes represent image embedding vectors. An image with logo class “Heinz” is attracted to its own class’s proxy (green arrow), pushed away from other classes’ proxies (red arrows), and also pushed away from individual image embeddings belonging to the assigned hard-negative classes “Heinz_baked_beans” and “Heineken” (pink arrows).

Separately, we have used this approach to train a logo embedder on a much larger set of logo images. A currently deployed system using this embedding model is used to surface Climate Pledge Friendly-eligible products for human review by recognizing sustainability-related logos in product images. The same system is also used to identify images containing certain prohibited content or offensive symbols. Notably, our system can act on new offensive symbols as they are identified, without the need for any updates in our architecture.

Source link


Please enter your comment!
Please enter your name here

Share post:


More like this

The Pixel 8 and Pixel 8 Pro: why Google’s innovative flagships should be your next buy

Google's brand new Pixel 8 and Pixel 8...

This Manhattan School Is Seeking To Change Education For Special Needs Students

For many special needs students and their families,...

To Counter Grade Inflation, We Need To Change The Way We Teach

Students are getting higher grades, but test scores...

5 ways to boost server efficiency

For AMD servers in particular, efficiency improves sharply...