In-Memory Power

In-Memory Enables Operational Intelligence

With its scalable performance and ability to analyze petabytes of data, Hadoop has quickly become the standard for business intelligence on huge data sets. However, its batch processing and disk-based data storage have made it unsuitable for use in analyzing live, fast-changing data in production environments to extract important patterns and trends and generate immediate feedback—operational intelligence. Until now.

ScaleOut hServer’s in-memory data storage and integrated compute engine unlock the power of Hadoop for operational intelligence. Instead of storing live fast-changing data on disk within HDFS, ScaleOut hServer uses a fast, scalable in-memory data grid (IMDG) that enables live data to be continuously saved, updated, and analyzed using ScaleOut hServer’s Hadoop MapReduce compute engine. Standard MapReduce applications now can analyze live, fast-changing data with extremely low latency.

Analyze Live Data

ScaleOut hServer’s in-memory data grid stores key/value pairs across an elastic set of networked servers, ensuring fast data access and updates to live data with linear scalability, and high availability using a standard, object-oriented model.

Run MapReduce Unchanged

MapReduce applications use standard Hadoop API libraries; no code changes are needed to run with ScaleOut hServer. Access in-memory data sets or even HDFS files using standard Hadoop input/output formats.

ScaleOut hServer Integrates an In-Memory Data Grid with a MapReduce Compute Engine

Blazingly Fast Execution Times

ScaleOut hServer’s in-memory MapReduce compute engine executes Hadoop MapReduce programs in seconds (or less) by incorporating several techniques not available with standard Hadoop. By avoiding Hadoop’s batch scheduling, it can start up jobs in milliseconds instead of tens of seconds. In-memory data storage dramatically reduces access times by eliminating data motion from disk or across the network. Fast, highly optimized, memory-based storage, combining, data shuffling, and optional sorting further drive down overhead. All of this is accomplished without changing a line of MapReduce code.

Isn’t This Just Spark?

ScaleOut hServer is designed specifically for operational intelligence (OI). Unlike Spark, it uses an in-memory data grid to host fast-changing data. This ensures that memory-based data objects are both individually accessible and highly available. Because Spark was developed to accelerate batch processing, it incorporates a different in-memory data storage model which does not meet the needs of OI.

Benchmarks Demonstrate 40X Speedup

In benchmark testing of a real-world financial services application, ScaleOut hServer demonstrated continuous MapReduce execution at the rate of 350 milliseconds per run in comparison to Apache Hadoop, which clocked in at more than 15 seconds. This application tracks market price changes for a hedge fund to generate alerts when portfolio rebalancing is needed.

Fast Deployment

Standalone Platform Simplifies Development

Forget the complexities of installing and configuring complex Hadoop distributions. Because ScaleOut hServer uses its own MapReduce compute engine, it does not require a Hadoop software stack be installed. Instead it ships with all the necessary Hadoop Java API libraries as well as its own open source libraries and server components; everything you need is included. In fact, you can install and start running MapReduce applications on a laptop or developer workstation in under thirty minutes.

ScaleOut hServer runs in both Linux and Windows environments and can be deployed on premise or in public clouds and currently is available on the Amazon Web Services and Windows Azure cloud environments. Unlike standard Hadoop distributions, deploying ScaleOut hServer in the cloud is easy and fast.

To help simplify application development, ScaleOut hServer incorporates several features to automate parameter setting and optimize performance. When using memory-based data sets, it automatically sets key MapReduce parameters, such as splits, partitions, and slots. It allows sorting to be turned off when not needed. It also automatically detects and optimizes applications that produce a single, combined result instead of generating results from multiple reducers. Memory-based data sets use a highly efficient, pipelined storage architecture to maximize performance for large populations of small key/value pairs.

Automatic Code Shipping

ScaleOut hServer automatically ships application code to grid servers for execution. You also can optionally use ScaleOut StateServer Pro’s “invocation grid” feature to prestage code once and reuse it for multiple MapReduce runs.

Built-In Scalability and High Availability

As your application workload grows, ScaleOut hServer seamlessly scales its execution throughput as you add servers to the in-memory data grid. This increases storage capacity, access throughput, and execution capacity. The grid automatically redistributes the workload across all servers and keeps execution times fast. ScaleOut hServer’s fully peer-to-peer architecture eliminates bottlenecks to scaling.

Easy Integration

Integrates with Popular Hadoop Platforms

ScaleOut hServer is compatible with the latest versions of most popular Hadoop platforms, including Apache, Cloudera, Hortonworks, and IBM; ScaleOut Software is a certified Cloudera and Hortonworks partner. This means that you can run fully compatible MapReduce applications for any of these platforms on ScaleOut hServer’s in-memory compute engine.

ScaleOut hServer also can be installed on Hadoop clusters to extend their reach into operational intelligence. Now a single Hadoop deployment can provide both batch processing for offline, static data as well as real-time analytics for fast-changing, live data hosted within ScaleOut hServer’s in-memory data grid. Its in-memory compute engine can be configured as an optional MapReduce framework for YARN so that MapReduce jobs can be seamlessly directed to ScaleOut hServer.

Seamlessly Access HDFS

When installed on a Hadoop cluster, MapReduce applications running under ScaleOut hServer can access HDFS files using standard input/output formats. They also can combine access to HDFS files and in-memory data sets within a single application.

Perform Hive Queries

Because ScaleOut hServer runs MapReduce unchanged, it can execute Hive queries under YARN at in-memory speeds. It also enables Hive to query in-memory data sets based on their object-oriented properties.

Integrating ScaleOut hServer into a Hadoop Platform

Built-In HDFS Cache

ScaleOut hServer incorporates an HDFS cache which accelerates access times by serving key/value pairs to mappers from memory instead of disk. To cache input data, the application simply wraps its HDFS input format with a special “dataset input format” that saves key/value pairs in memory as they flow to the mappers during program execution. On subsequent runs, the cache serves these key/value pairs from the in-memory data grid (IMDG) instead of from HDFS. It also automatically handles cache invalidation if the HDFS file changes. For best performance, HDFS data sets should fit within the memory of the IMDG.

HDFS Cache Accelerates Access to HDFS Files

Integrated Hadoop Execution Platform

When deployed alongside a standard Hadoop platform on the same compute cluster, ScaleOut hServer adds new capabilities for operational intelligence to a BI platform. Its combination of support for live data with object-oriented access, in-memory execution speed, and high availability deliver real-time capabilities previously unavailable with Hadoop. Now Hadoop developers can use the same skill set for building applications for both business and operational intelligence.

Use ScaleOut hServer for ETL

ScaleOut hServer can run MapReduce to extract, transform and load into HDFS in real time. This leverages Hadoop skill sets and eliminates ad hoc ETL mechanisms.

Learn more about ScaleOut hServer for ETL

Add OI to a Hadoop BI Platform

In-memory MapReduce can provide immediate feedback to a production system while simultaneously forwarding real-time events to a Hadoop data warehouse for strategic analysis, adding significant value to a Hadoop-based BI platform.

ScaleOut hServer®

The World’s First In-Memory MapReduce Execution Engine for Hadoop

Open Source Client

Free Community Edition