Online retailers have access to detailed web and mobile analytics when it comes to their ecommerce stores. This data helps businesses predict trends, forecast demand, optimize pricing, and identify customers.1 The digital shopping experience, however, is only one side of the equation. What if brick-and-mortar retailers could gather in-store analytics to deliver data-informed, personalized shopping experiences for their customers?
Here’s our solution.
Face Lab is an Android application that provides business owners with real-time, in-store analytics about a customer’s gender, age range, and emotional state. Retail owners can pair this layer of dynamic data with a customer’s purchase history, loyalty history, and POS system to better understand their business and create a tailor made experience for in-store shoppers.
The original inspiration for Face Lab came from an article about the Google Cloud Vision API release. When the news about the machine learning API made its way around the office, the team was hooked and wanted to experiment with possible applications right away. By using powerful machine learning models, the Cloud Vision API had features that could analyze and understand the content of an image including text, objects, brands, emotional attributes from faces, and location detection.
After some initial testing and ideation, however, our team kept wishing that Google’s emotion and facial analyses could do more . We ended up starting over and using Microsoft Cognitive Services APIs, which delivered the results we needed. The Emotion API allowed us to detect anger, happiness, sadness, surprise, neutrality, fear, disgust, and contempt – all of which are understood to be cross-cultural and communicated through universal facial expressions. The Face API contained machine learning-based predictions of facial feature and allowed us to detect age, gender, and facial hair in one or multiple faces within an image.
We started by brainstorming a few use cases. Our main one was the idea of bringing real-time audience emotion analysis to events such as trade shows and presentations using a live feed from multiple cameras or even 360° cameras. We wanted folks to gather this feedback live and to be able to review it after the fact.
Once we started building, however, we realized that in addition to a live group analysis (multiple faces), we also wanted to have a simple single capture feature (one face). The final product contains these two complex features – group and single capture – in addition to analyzed results for each.
Initially, we weren’t committed to a platform , so the original design directions had concepts that could be flexible for both native standards. We did, however, prioritize efficiency and usability above all else. We made sure that both the onboarding process and the flow felt familiar. From the start, it only took 1-2 taps to start running the face and emotion analysis. Additionally, every action or feature in the app can be achieved with a maximum of 2 taps. The flow mimics Android’s native image and video capture. We also designed the results to be simple and easy to digest – choosing to use visual emojis instead of words – and bringing only active results to the foreground.
Currently, facial recognition software is mostly used for security purposes i.e. identifying shoplifters. As the technology evolves and retailers start to integrate more data into their business processes, facial recognition software has the potential to grow dramatically in use. According to the research firm MarketsandMarket, the retail analytics market in general is currently valued at $1.8B and could almost triple to $4.5B by 2019.2 A McKinsey study indicated that current retailers who optimally implemented big data analytics experienced a 60% increase in margins.3
Facial recognition software in retail can be used to track dwell times (how long someone spends in a store), loyalty programs, and POS records (for example, finding the best placement for products). All of these features could be aggregated to create individual customer profiles. FRS also has the potential to integrate with other IoT technology, such as mobile marketing, and can hold advertising more accountable for audience profile, engagement, and insights.
Some important questions to consider in the future are…
How will facial recognition software deal with online retail?
How will it integrate with other touch points in the customer journey of “unified commerce” (online reviews, social media, web and mobile purchases, etc.)?
Lastly, how will retailers manage privacy concerns for their customers?
 Marr, Bernard. “Big Data: A Game Changer In The Retail Sector.” Forbes.
 Allan, Joshua Whitey. “Retail Analytics Market Poised for Growth, but Challenges Remain.” Data Informed.
 Manyika, James, Michael Chiu, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Byers Hung. “Big Data: The next Frontier for Innovation, Competition, and Productivity.” McKinsey & Company.
Microsoft Cognitive Services APIs vs. Google Cloud Vision API
As mentioned during our design process, we originally considered using the Google Cloud Vision API before deciding on Microsoft Cognitive Services APIs. We put together a full technical analysis below comparing the main differences between the two options.
How It Works
- Microsoft Face API and Emotion API: First, the Microsoft Face API analyzes a picture and returns rectangle coordinates for an image as well as facial properties – age, gender, pose, smile and facial hair. Then, each face is sent to the Microsoft Emotion API, which returns a confidence percentage for each emotion (anger, contempt, disgust, fear, happiness, neutrality, sadness and surprise).
- Google Cloud Vision API: The “Face Detection” feature analyzes a picture in a manner similar to the Microsoft Face API, returning rectangle coordinates for each face and the likeliness of certain emotions being present (anger, joy, sorrow and surprise).
Advantages of Microsoft Cognitive Services APIs
- Going beyond the previously mentioned features, the Face API also provides additional facial recognition capabilities including facial verification, identification of similar faces, grouping of faces, and recognition of faces as individuals.
- Microsoft provides a wider range of emotions.
- The emotion detection seemed more accurate during our tests.
- Microsoft SDK for Android allows for better manipulation of the data returned by the APIs. All of the information is structured within objects with easily accessible properties. Google returns a JSON object that the developer has to process in order to get the information.
- Microsoft offers a free trial. Microsoft allows for 30k requests per month (limited to 20 per minute) while Google only allows for 1k requests per month for free.
- While Microsoft returns a confidence percentage for emotions (0%-100%), Google returns only “likeliness” (very likely, very unlikely, likely and unlikely). Percentages are easier to use and allows for a better interpretation of the data than the likeliness strings.
Advantages of Google Vision API
- There are no limited requests per minute for the free trial. This is especially advantageous when several users are sending requests to the API at the same time.
- You can access Google Vision API and all of its features through a single API key. For Microsoft, each API has its own set of keys which can become harder to manage when working with multiple APIs.
- The emotion and face detection work with a single call without the need to call two different APIs.
- Microsoft Face API does not work on rotated images. The developer has to make sure that the data sent to the API is generated from an image with the face in portrait orientation.
- Although Android’s class camera is deprecated, most developers still use it because camera 2 is only compatible with later Android versions.