ScaleOut Analytics Server®
ScaleOut Analytics Server® combines a scalable, in-memory data grid with a powerful analytics engine to deliver near real-time analytics performance for business intelligence, financial modeling, and other time-sensitive analytics needs. Now you can run continuous in-memory analytics across a fast-changing dataset to keep your analysis up to date. Advanced capabilities for rapidly searching grid-based data and for quickly developing in-memory map/reduce applications make ScaleOut Analytics Server perfect for fast, scalable analysis of both enterprise data and public data sets.
ScaleOut Analytics Server introduces breakthrough capabilities for automatically shipping all needed executable code and libraries from the developer’s workstation to grid servers for parallel data analysis. Developers can define Java or C# invocation grids to pre-stage the in-memory data grid’s execution environment. Now analysis codes can quickly and easily be deployed to enterprise clusters or cloud-based grids without manual intervention.
ScaleOut Analytics Server leverages everything you know about object oriented programming. Its in-memory data grid stores data as serialized objects, enabling the use of intuitive, property-based queries to select objects for analysis. Analysis codes can be developed using standard, object-oriented techniques without the need to understand parallel programming or tune complex data mapping and reduction models. To further accelerate development, ScaleOut Analytics Server includes the ScaleOut Management Pack™ which includes comprehensive tools for observing, managing, and archiving grid-based data.
Product Note: ScaleOut Analytics Server is the successor product to ScaleOut Grid Computing Edition™.
Scalable Performance and Low Latency
Through its integration with an industry leading in-memory data grid, ScaleOut Analytics Server’s computation engine delivers breakthroughs in real-time performance and ease of use. ScaleOut Analytics Server automatically and efficiently schedules analysis computations across its elastic cluster of grid servers to scale performance and handle large, memory-based datasets. ScaleOut Analytics Server also minimizes latency by analyzing data already staged in memory and minimizing data motion during the computation. In contrast to file-based analysis platforms such as Hadoop, ScaleOut Analytics Server eliminates both the overhead of batch scheduling and bottlenecks from file I/O for staging data and processing results. The result is blazingly fast, scalable analysis and near-real-time responsiveness.
Powerful Parallel Query
Parallel query is a cornerstone of ScaleOut Analytics Server’s ease of use and fast in-memory analytics performance. Applications can perform parallel queries to rapidly select objects within the in-memory data grid for map/reduce analyses. ScaleOut Analytics Server enables C# or Java objects to be easily queried based on their class properties. Fully parallel query across all grid servers and optimized storage and lookup of property data ensure fast selection of objects from large collections. ScaleOut Analytics Server provides a full implementation of .NET's Language Integrated Query (LINQ), which enables C# applications to structure queries using SQL-like semantics. Java applications can structure property-based queries using composable filter methods with logical and comparison operators. To minimize lookup times, ScaleOut Analytics Server’s client libraries automatically extract selected properties, and grid servers store them as indexed, deserialized data during object updates.
ScaleOut Analytics Server’s seamless and straightforward use of Java and C# shortens design time and eliminates unnecessary tuning of analysis computations. Its simplified map/reduce computation engine executes user-defined evaluation (“map”) methods in parallel on selected sets of objects stored in the data grid and then combines the results using user-defined merge (“reduce”) methods. Objects to be analyzed typically are organized in the data grid by a language-defined type and selected for analysis by querying their class properties. Called Parallel Method Invocation (PMI), this technique dramatically simplifies application design by using an in-memory execution model that is fully integrated with well understood language mechanisms. ScaleOut Analytics Server automatically handles all of the details of parallel scheduling. Developers proficient in Java and C# can quickly and easily build analysis algorithms without knowledge of parallel programming techniques.
ScaleOut Analytics Server’s innovative invocation grid feature eliminates the headaches of deploying code to the grid. Required executable programs and libraries are automatically shipped to the grid servers to pre-stage the execution environment and enable fast invocation of parallel method invocations. This eliminates the need to manually deploy analysis code and libraries on a large pool of servers, and it ensures that all servers are properly configured.
PMI Example: Consider a financial application that “back tests” stock trading strategies by examining a large collection of stock histories, one for each stock symbol containing the price history of the stock, and merging the results into a single report. The stock histories are stored as objects in the in-memory data grid and are queried by properties such as sector, market cap, or other criteria. The developer only needs to write two methods, an analysis method that examines the trading strategy over a single stock history and generates a report, and a second method to merge two sets of in-memory reports. ScaleOut Analytics Server automatically sends all code and libraries to the grid servers for execution. The developer selects the stocks to be analyzed using a parallel query of target properties, for example, all stocks in a specified sector with a minimum market cap. Once the user invokes the analysis from a workstation, ScaleOut Analytics Server executes the parallel query on all grid servers and passes the selected objects to the local computation engine on each server. The computation engine executes the analysis and merge methods and then combines the results for all grid servers using a fast, inter-node merging algorithm. The final merged result is then passed back to the application.
ScaleOut Analytics Server’s parallel method invocation requires no tuning and always delivers maximum performance. PMI’s straightforward data-reduction model matches the needs of many analysis computations and avoids the complexities of file-based analysis platforms, such as Hadoop, which require the developer to create record readers, construct key spaces, optimize the execution of parallel reducers, and sequentially combine the reducers’ final results.
To illustrate the power of the ScaleOut Analytics Server grid-based map/reduce engine, compare ScaleOut Analytics Server’s performance to the popular Hadoop’s file-based map/reduce platform in a real-world financial analysis application, such as the back testing of stock trading strategies described above. In ScaleOut Analytics Server, these stock histories are stored in its in-memory data grid; in Hadoop they are stored in the Hadoop Distributed File System (HDFS). To measure each platform’s ability to handle increasing workloads with linear scalability, the number of stock histories analyzed is proportional to the number of grid servers.
As the graph below illustrates, both platforms exhibit linear scaling. However, ScaleOut Analytics Server delivers significantly higher throughput than Hadoop version 0.20.2 (shown as the red and blue lines in the graph below) due to the overhead introduced by Hadoop’s file I/O and batch scheduling. ScaleOut Analytics Server’s memory-based map/reduce engine and highly efficient scheduling, as well as the elimination of file I/O and unnecessary data motion, gives it a significant performance advantage. Just storing data in ScaleOut Analytics Server’s in-memory data grid instead of HDFS boosts Hadoop’s throughput by about 6X (shown as the green line), although file I/O between the map and reduce phases still limits its performance.
To complement parallel method invocation, ScaleOut Analytics Server includes the capability to analyze targeted sets of large, column-oriented data objects. Called Single Method Invocation (SMI), this mechanism lets C# and Java applications invoke methods on specified objects, supply parameters to the invocation, and efficiently receive the method's analysis results. Because its highly optimized implementation executes analysis methods where data is stored and avoids all unnecessary network copies, SMI efficiently analyzes a specified set of stored objects as an alternative to the map/reduce model provided by ScaleOut Analytics Server’s parallel method invocation. In addition, applications can use SMI to efficiently update stored objects without replacing their full contents in a manner similar to the use of stored procedures in database systems.
Fast, Scalable Data Access
Data analytics has rapidly evolved to require fast, scalable data access for analyzing large, fast-changing, and increasingly complex data. Financial services and other data-intensive industries routinely demand near real-time processing to maintain a competitive edge in their business operations. By storing fast-changing data within an in-memory data grid, ScaleOut Analytics Server dramatically reduces access latencies, avoids bottlenecks, and delivers linearly scalable access throughput. When used in compute clusters, ScaleOut Analytics Server’s in-memory data grid lets applications immediately share data across compute nodes without the need for message passing; this simplifies program structure and shortens design cycles. ScaleOut Analytics Server’s in-memory data grid provides a powerful data access platform for both data analytics and a wide range of high performance computing (HPC) applications.
Included with ScaleOut Analytics Server, the ScaleOut Management Pack™ adds important capabilities that extend your ability to manage, analyze, and protect data stored in ScaleOut Analytics Server’s in-memory data grid. All management tools are designed to automatically scale their performance so that they can efficiently handle very large data sets. The Management Pack’s object browser lets developers visually browse and manage grid-based objects. The object browser’s graphical display of grid data shortens development cycles by enabling developers to easily verify their applications’ use of the in-memory data grid.
The ScaleOut Management Pack also includes a parallel backup and restore feature for quickly archiving grid data in the file system. This tool takes full advantage of all grid servers to accelerate backup and restore operations. Its fast, scalable performance also enables the efficient creation of grid “snapshots” which can later be restored and analyzed using in-memory parallel method invocations.