How We Increased Rebuild Speed in Cases of Disk or Node Failure

Most object storage solutions handle data protection themselves, either by replicating data or with erasure coding, and then geo-distributing different copies to secure it. In addition, a good storage system must be able to restore volumes as quickly as possible to restore data security in case of disk or node failure.
Aymeric Ducroquetz
Aymeric Ducroquetz
Dev at OpenIO

We have achieved significant improvements in the speed of rebuilding data when volumes are damaged in the latest version of OpenIO. The key to this is the distribution of rebuilding jobs across the whole platform.

In this article, I’d like to explain:

  • why it is necessary to do this as fast as possible
  • how we managed to make this process three times as fast, what kind of tools we used
  • what the future holds for data rebuilding

Rebuilding Missing Chunks

OpenIO is a software defined object storage system that doesn’t rely on static placement algorithms such as chords, rings, crushmaps, etc. Instead, the solution relies on its ConsciousGrid technology to optimize data placement in the most flexible manner possible, and on a heavily distributed directory to persist the locations polled by ConsciousGrid.

The data itself is chunked and stored on services called rawx. When losing a drive hosting a rawx, all the chunks of that rawx are lost. Thanks to the data protection mechanisms, it is possible to rebuild all the data of the failed drive, rebuilding the missing chunks to guarantee the redundancy expected by the owner of the data. The less time we wait to rebuild the data, the lower is the risk of losing another disk, which could compound the problem. Therefore, we need a tool to rebuild missing chunks as quickly as possible.

OpenIO 18.04

OpenIO 18.04 provided a tool to rebuild all the chunks of a damaged rawx. But as drive capacities increase, we store more data chunks on rawx services, and more data is impacted if a disk or node fails. After a profiling session, we discovered the bottleneck: the tool could only be started on one machine, as it was written in Python in a mono-process/multi-coroutine fashion. It did not distribute its load, not even on local CPU cores.

When rebuilding millions of chunks, we were limited due to the CPU and/or network capabilities of the machine. We spent a lot of effort in the 18.10 to speed up the rebuilding mechanism.

OpenIO 18.10

Our goal for OpenIO SDS 18.10 was to no longer be limited by the performance of a single CPU core, nor even by a single machine. The only limit we would accept was the total resources of all the machines hosting rawx services.

With a test platform, we managed to increase the rebuild speed by a factor N, where N is the number of workers. We achieved a 3x improvement in performance on the same platform:

Rebuild Speed

How does it work?

Distributed rebuilding is based on the controller/worker communication pattern.

Distributed Rebuilding
  • Each worker is connected to a queue where it can retrieve the chunks to rebuild (3).

  • The single controller receives all the chunks to rebuild (1) and sends them to workers via their queues (2).

  • When a worker rebuilds a chunk, it notifies the controller via a queue (4).

By distributing the chunks to rebuild to the different workers, we distribute the load of this maintenance operation across all the machines having a worker. So, with a high enough number of workers, the rebuild speed is limited only by the rawx's ability to read the chunks present and write the new chunks.

What’s next?

We never had such a quick rebuilding tool. To do this, we distributed the rebuilding jobs across the whole platform, and this is essential to make sure environments are resilient and run smoothly with no downtime or data loss.

In the future, we plan to integrate the rebuilding process into GridForApps to simplify the distribution of chunks to rebuild by automatically distributing them over the entire platform.

Learn more

The documentation is available here

OpenIO SDS is available for testing in four different flavors: Linux packages, the Docker image, and Raspberry Pi.

Stay in touch with us and our community through Twitter and our Slack community channel, to receive the latest info, support, and to chat with other users.