How to Perform Face Recognition with OpenIO and OpenCV

How to recognise patterns in pictures at scale? By “scale,” we mean when you are storing billions of files.
Guillaume Delaporte
Guillaume Delaporte
VP Sales at OpenIO

It’s time for the third in our series of articles about GridForApps, the event-driven framework that is part of OpenIO Object Storage. For those who missed the previous articles, here’s a link to the first article A Technical Introduction to GridForApps, and to the second one Simple Metadata indexing through GridForApps.

In this article, we will tackle a common use case often raised by some of our customers. How can we recognise patterns in pictures at scale? By “scale,” we mean when you are storing billions of files. We have prepared three possible scenarios, using the most common open-source. One has been showcased during our webinar with Tensorflow. The following week, we will do the same using the deep learning framework Caffee.

But let's start with OpenCV. We will use it to detect faces in pictures and localize them. Using our event-driven processing framework, we will use this framework to analyse each object that is uploaded; a pre-trained neural network will detect people in images. As usual, we will enrich the object with new metadata: the number of faces detected, their location, and the sizes of the images. If you think about it, there are many use cases for this sort of procedure: face blurring, face detection for CCTV, or like the cover picture you can see behind.

As a side note, using Elasticsearch, we will demonstrate how you can build more complex workflows with OpenIO Grid for Apps technology: store, enrich, index, and search.

Let’s do it!

As in our previous articles, we will use our docker container image to easily spawn an OpenIO SDS environment. Retrieve the OpenIO SDS Docker image:

# docker pull openio/sds

Start your new OpenIO SDS environment:

# docker run -ti --tty openio/sds

You should now be at the prompt with an OpenIO SDS instance up and running.

Next, we will configure the trigger, so each time you add a new object, the metadata from the object will be pushed to Elasticsearch. Add the following content to the file /etc/oio/sds/OPENIO/oio-event-agent-0/oio-event-handlers.conf:

pipeline = process

pipeline = content_cleaner

pipeline = account_update

pipeline = account_update

pipeline = account_update

pipeline = volume_index

pipeline = volume_index

use = egg:oio#content_cleaner

use = egg:oio#account_update

use = egg:oio#volume_index

use = egg:oio#notify
tube = oio-process
queue_url = beanstalk://

If you want to learn more about this configuration file, please refer to our previous blog post.

Then, restart the openio event agent to enable the modification:

# gridinit_cmd restart @oio-event-agent

Your event-driven system is now up and running. The next step is to write the script that will analyse objects using OpenCV. To do so, we will need first to install the OpenCV python module:

# yum install opencv-python

Download the pre-trained neural network:

# curl -o /etc/oio/sds/OPENIO/haarcascade_frontalface_alt.xml

Then write the script; let’s call it

#!/usr/bin/env python
import cv2
import json
import numpy as np
from oio.api import object_storage
from oio.event.beanstalk import Beanstalk, ResponseError

def faceclassifier(image):

# Specify the trained cascade classifier
face_cascade_name = "/etc/oio/sds/OPENIO/haarcascade_frontalface_alt.xml"

# Create a cascade classifier
face_cascade = cv2.CascadeClassifier()

# Load the specified classifier

# Run the classifiers
faces = face_cascade.detectMultiScale(image, 1.1, 5, 0|, (30, 30))

return faces

b = Beanstalk.from_url("beanstalk://")"oio-process")

while True:
event_id, data = b.reserve()
except ResponseError:

# Retrieve the information from the event (namespace, bucket, object name ...)
meta = json.loads(data)
url = meta["url"]

s = object_storage.ObjectStorageAPI(url["ns"], "")
meta, stream = s.object_fetch(url["account"], url["user"], url["path"])

image = cv2.imdecode(np.frombuffer("".join(stream), np.uint8), 1)
faces = faceclassifier(image)
# Update the object with new metadatas, number of faces + position
s.object_update(url["account"], url["user"], url["path"], {"face_number" : str(len(faces)), "position" : json.dumps(faces.tolist())})

Finally, launch it in background:

# python

Please note that the script is written in Python, but you can write it any other language.

How does it work?

It’s time to add a new picture to see if the process works. First, download a picture to upload it to OpenIO. Let's use the family picture we took during our last hackathon:

# curl -o /tmp/family.jpg

Using the OpenIO CLI, let’s upload this new object family.jpg to the container mycontainer in the account myaccount:

# openio --oio-ns OPENIO --oio-account myaccount object create mycontainer /tmp/family.jpg

Well done! You’ve just uploaded the picture and OpenCV was launched as a background task to process it. Now, and to conclude, let’s check the metadata added to the object:

# openio --oio-ns OPENIO --oio-account myaccount object show mycontainer family.jpg

We obtain the following result:

| Field            | Value                                                                                                                                                                                                   |
| account          | myaccount                                                                                                                                                                                               |
| container        | mycontainer                                                                                                                                                                                             |
| ctime            | 1495148185                                                                                                                                                                                              |
| hash             | B2F24793EF43A0837F13B74090B4B0F1                                                                                                                                                                        |
| id               | 0A117650D44F050027DFB3C35803A77B                                                                                                                                                                        |
| meta.face_number | 10                                                                                                                                                                                                      |
| meta.position    | [[370, 285, 46, 46], [472, 289, 44, 44], [844, 296, 48, 48], [758, 319, 43, 43], [358, 344, 48, 48], [642, 348, 44, 44], [192, 305, 52, 52], [270, 330, 41, 41], [570, 306, 45, 45], [50, 223, 66, 66]] |
| mime-type        | application/octet-stream                                                                                                                                                                                |
| object           | family.jpg                                                                                                                                                                                     |
| policy           | SINGLE                                                                                                                                                                                                  |
| size             | 92434                                                                                                                                                                                                   |
| version          | 1495148185129235                                                                                                                                                                                        |

As you can see, OpenCV detected 10 faces meta.face_number = 10 with their following position and size (x,y,w,h) in meta.position.

Want to know more about OpenIO and GridForApps?

OpenIO SDS is available for testing in four different flavors: Linux packages, the Docker image, and Raspberry Pi.

Stay in touch with us and our community through Twitter and our Slack community channel, to receive the latest info, support, and to chat with other users.

Guillaume Delaporte
Guillaume Delaporte
VP Sales at OpenIO
Guillaume has extensive experience in building and running large storage platforms, which he gained as system engineer and project leader at Atos Worldline, before co-founding OpenIO in 2015.
All posts by Guillaume