Understanding event-driven storage and start with GridForApps.

This is the first in of a series of articles about the event-driven framework that is part of OpenIO Object Storage. This framework allows users to process data at scale; we call it GridFor Apps.
Guillaume Delaporte
Guillaume Delaporte
VP Sales at OpenIO

Today, this technology is used for video transcoding, metadata enrichment, image recognition and manipulation, pattern recognition in images and data files, real-time video transcoding and watermarking, and more. But, if you think of the future and the quantity of data we expect to produce, the number of use cases is even bigger, with applications in fields like industrial IoT, artificial intelligence, big data; the only limit is your imagination.

I recommend that you first read this article to understand what we describe below: Run Applications Directly on the Storage Infrastructure.

Let’s give it a try

Let’s start with a very simple use case: adding a new metadata field to an object right after its upload. We will tackle more complex use cases in the coming weeks.

To deploy an OpenIO SDS cluster, we will use the Docker container that we provide as a quick and easy way to use the software. But you can use the same steps to implement OpenIO Grid for Apps and use it on a very large platform with hundreds of nodes and billions of objects.

Retrieve the OpenIO SDS Docker container:

# docker pull openio/sds

Start your new OpenIO SDS environment:

# docker run -ti --tty openio/sds

You should now be at the prompt with an OpenIO SDS instance up and running.

Next, we will configure the trigger, so that every time you add a new object, the data is processed and a new metadata field is added.

Add the following content to the file /etc/oio/sds/OPENIO/oio-event-agent-0/oio-event-handlers.conf:

pipeline = process

pipeline = content_cleaner

pipeline = account_update

pipeline = account_update

pipeline = account_update

pipeline = volume_index

pipeline = volume_index

use = egg:oio#content_cleaner

use = egg:oio#account_update

use = egg:oio#volume_index

use = egg:oio#notify
tube = oio-process
queue_url = beanstalk://

As you can see in the configuration file, there are many events that can be triggered (such as storage.container.new, storage.content.deleted, etc.), but for this tutorial we will just focus on the storage.content.new event.

According to the configuration file, each time we put new content in the object store ([handler:storage.content.new"]), we will use the pipeline “process” (pipeline = process).

The pipeline “process” will then take the event and put it in the tube oio-process in the local beanstalk instance, as described at the end of the configuration file:

use = egg:oio#notify
tube = oio-process
queue_url = beanstalk://

Then, restart the openio event agent to enable the modification:

# gridinit_cmd restart @oio-event-agent

Your event-driven system is now up and running. The next step is to write a small script that will take the events stored in the beanstalk to process the object.

Let's create a script called add-metadata.py with the following content:

#!/usr/bin/env python
import json
from oio.api import object_storage
from oio.event.beanstalk import Beanstalk, ResponseError
# Initiate a connection to beanstalk to fetch the events from the tube oio-process
b = Beanstalk.from_url("beanstalk://")

# Waiting for events
while True:
# Reserve the event when it appears
event_id, data = b.reserve()
except ResponseError:
# Or continue waiting for the next one
# Retrieve the information from the event (namespace, bucket, object name ...)
meta = json.loads(data)
url = meta["url"]
# Initiate a connection with the OpenIO cluster
s = object_storage.ObjectStorageAPI(url["ns"], "")
# Add the metadata to the object
s.object_update(url["account"], url["user"], url["path"], {"uploaded" : "true"})
# Delete the event

Finally, launch it in background:

# python add-metadata.py

Please note that the script is written in Python, but you can write it any other language.

How does it work?

It's time to add a new object to see if it works. Using the OpenIO CLI, let's upload the new object /etc/fstab to the container mycontainer in the account myaccount:

# openio --oio-ns OPENIO --oio-account myaccount object create mycontainer /etc/fstab

And check that the new metadata was properly set:

# openio --oio-ns OPENIO --oio-account myaccount object show mycontainer fstab

With the following result:

| Field         | Value                            |
| account       | myaccount                        |
| container     | mycontainer                      |
| ctime         | 1493721260                       |
| hash          | FB2B5EC6E6BC56CF7D02BE2B3D4AA5BA |
| id            | 64A81915884E0500529252884202F1CA |
| meta.uploaded | true                             |
| mime-type     | application/octet-stream         |
| object        | fstab                            |
| policy        | SINGLE                           |
| size          | 313                              |
| version       | 1493721260075114                 |

You can see that the metadata was added to the object meta.uploaded | true

Learn more

As I mentioned above, this is the first of a series of articles that will demonstrate our Grid for Apps technology with some interesting use cases (image recognition and manipulation, pattern recognition, content indexation, and more).

Want to know more about OpenIO?

OpenIO SDS is available for testing in four different flavors: Linux packages, the Docker image, and Raspberry Pi.

Stay in touch with us and our community through Twitter and our Slack community channel, to receive the latest info, support, and to chat with other users.

Guillaume Delaporte
Guillaume Delaporte
VP Sales at OpenIO
Guillaume has extensive experience in building and running large storage platforms, which he gained as system engineer and project leader at Atos Worldline, before co-founding OpenIO in 2015.
All posts by Guillaume