News

13.11.2019
Infrastructure

Storage access for data processing

Authors: Efstratios Politis, Giorgos Kalaentzis, Giorgos Saloustros, Christos Kozanitis, and Angelos Bilas
Institute of Computer Science (ICS) Foundation for Research and Technology - Hellas (FORTH)

H3 is an object store on top of the High-Performance Storage Infrastructure of the Evolve platform. It provides a productivity-oriented cloud-like shared storage interface, and it guarantees data security and integrity.

The Service
The H3 service is a High speed, High volume and High availability Object Store, which is backed by a high-performance key-value store. H3 provides a cloud-friendly API, which is similar to the Simple Storage Service (S3 [1]) of the Amazon Web Services (AWS); such an API enables Evolve pilots that currently use the Cloud to mitigate to the Evolve platform with minimal changes. The service features a flat organization scheme where each object is linked to a globally unique identifier called bucket. Buckets can be though off as file folders though they may only "contain" objects.
The service implements access control by associating buckets with a single user-group. All access to the store is made with respect to a bucket, thus transitively this acts as a coarse grain access control for the linked objects.
The API supports typical bucket operations, such as creating, listing, and deleting buckets. The object management operations include uploading/downloading objects to/from H3, as well as copying, renaming, listing, and deleting uploaded objects. Multipart upload is also supported where objects are uploaded in parts and are coalesced at the end.
Registered users access the service through either a Command Line Utility (CLI), or a set of SDKs that we provide (in Python, C++, and Java). In the first implementation of H3, the data are stored in a Redis [2] distributed key-value store, and a Postgresql [3] server manages all metadata.

Roadmap
The coming releases will introduce a number of new features and improvements, including an Apache Spark plugin, a Tensorflow plugin, Access Control Lists (ACL), integration with a High Performance key-value store, such as Kreon [4], and an improved token based authentication.
Kreon, which will replace the Redis-based key-value store in upcoming releases, is a persistent write-optimized key-value store designed for flash storage. It promotes CPU overhead and I/O amplification reduction through increased I/O randomness. To achieve this, Kreon uses a multi-level indexing data structure with levels of increasing size, enabling batched data transfers to lower levels to amortize insert costs. Furthermore, it uses memory-mapped I/O to reduce I/O cache related CPU overheads. Kreon will also support custom metadata per KV pair.
The planned H3-Spark plugin will provide a seamless integration of Spark applications (or any other framework supporting Hadoop) with the H3 service by simply replacing the file-system prefix of HDFS specific file-paths with the H3 one. The plugin will implement the Hadoop FileSystem API by mapping methods like create, open, delete, etc. to H3 operations through the Java SDK.

Software stack with

Figure 1 - Software stack with "fast-path" enabled for all SDKs

The planned H3-Tensorflow plugin will implement a file-system extension that will be exposed to Python applications through the tf.io.gfile [5] namespace and associated APIs thus allowing a user to access H3 objects by simply passing to the API the object name prefixed with the H3 marker. The tf.io.gfile APIs provide support for typical directory operations and expose a native file-stream API thus allowing applications to access data without concern of their provider’s particulars.
The planned token scheme will be extended to carry run-time information. This along with the new per-object and per-bucket ACLs will both offer a finer-grain access control and reduce the number of times we perform metadata lookup for authentication purposes.

 

[1] https://aws.amazon.com/s3/
[2] https://redis.io/
[3] https://www.postgresql.org/
[4]Papagiannis, A., Saloustros, G., González-Férez, P. and Bilas, A., 2018, October. An efficient memory-mapped key-value store for flash storage. In Proceedings of the ACM Symposium on Cloud Computing (pp. 490-502). ACM.

[5] https://www.tensorflow.org/api_docs/python/tf/io/gfile

12.11.2019

FPGA Accelerated Computing

Cookies Definitions

EVOLVE Project may use cookies to memorise the data you use when logging to EVOLVE website, gather statistics to optimise the functionality of the website and to carry out marketing campaigns based on your interests.

The cookies allow to customize the commercial offers that are presented to you, considering your interests. They can be our own or third party cookies. Please, be advised that, even if you do not accept these cookies, you will receive commercial offers, but do not match your preferences.
These cookies are necessary to allow the main functionality of the website and they are activated automatically when you enter this website. They store user preferences for site usage so that you do not need to reconfigure the site each time you visit it.
These cookies direct advertising according to the interests of each user so as to direct advertising campaigns, taking into account the tastes of users, and they also limit the number of times you see the ad, helping to measure the effectiveness of advertising and the success of the website organisation.