Distributed Data Grids
The Next Generation in Distributed Caching
Distributed data grids combine distributed caching with powerful analysis and management tools to give you a complete solution for managing fast-changing data in a server farm or compute grid. Distributed caching has evolved over the last decade to dramatically improve the performance and scalability of applications running on server farms and compute grids. By caching data in memory across many servers instead of within a single database server, distributed caching eliminates performance bottlenecks and minimizes access times. As distributed caching has become deeply embedded within enterprise applications, its functionality has evolved into a distributed data grid (also called a distributed data fabric) to meet increasingly sophisticated needs for reliable data storage, real-time analysis, and comprehensive management. Let's take a closer look at how distributed data grids have evolved to become an essential component of scalable applications.
What is a Distributed Data Grid?
A data grid automatically scales its capacity and handles network or server failures. It also provides APIs for coordinating data access and asynchronously notifying clients. To this a data grid adds powerful tools, such as parallel query and "map/reduce," to give applications the insights they need for fast, in-depth analysis. Lastly, enterprise-level data grids incorporate comprehensive management tools for observing, managing, and backing up mission-critical data.
First Generation Distributed Caches
The first generation of distributed caches, such as open source memcache, offer a simple mechanism for scaling access to data retrieved from a database server or other data source. By spreading access requests across a set of memory-based caching servers using a simple hashing mechanism, repeated access to the database server can be avoided. Although this type of cache is simple to deploy, its use is restricted to data that can be replaced from a backing store in case a caching server fails. Also, it lacks the ability to migrate data among the caching servers as servers are added to scale capacity and performance. Moreover, clients cannot coordinate accesses to shared data with distributed locking. These limitations make first generation distributed caches unsuitable for mission-critical business logic state, session data, and other computational data managed by enterprise applications.
Second Generation Distributed Caches
Commercial distributed caches, such as ScaleOut StateServer®, have introduced key capabilities targeted at the needs of enterprise applications. This "second generation" in distributed caching adds dynamic load-balancing to let the cache transparently scale by adding servers without affecting running applications or losing stored data. Data replication between caching servers enables the cache to survive server failures, and distributed locking provides a robust mechanism for coordinating access to cached data. These key features plus querying, asynchronous events, object dependencies, transparent backing store access, and other advanced capabilities represents a huge evolution from the first generation of distributed caching.
ScaleOut Software's distributed caching solution integrates these second generation features with a particular emphasis on delivering maximum scalability and ease of use. ScaleOut StateServer's fully self-organizing, self-healing, peer-to-peer architecture combined with its patented, quorum-based data replication set it apart from the competition as the most scalable, reliable, and easiest to deploy and manage product in its class.
Distributed Data Grids
As enterprise applications have deepened their reliance on distributed caching, the need to incorporate powerful mechanisms for querying, analyzing, and managing data have emerged. To meet these needs, distributed caches have evolved into full fledged distributed data grids for hosting mission-critical data which is shared by numerous clients. These clients need fully parallel query to quickly locate groups of logically related data, and they need the ability to easily analyze or "mine" this data to find important patterns or make real-time decisions. For example, a financial services application might analyze a large cache holding stock histories to evaluate new trading strategies that respond to fast changing market conditions. Users additionally need intuitive graphical tools to observe or manage groups of cached data or even individual objects. Both users and IT managers need to be able to quickly "snapshot" the grid's contents for later analysis or backup. By meeting these key needs for in-depth data analysis and management, distributed data grids have created a new generation of data management which enables enterprise applications to scale their performance with powerful capabilities that were previously unavailable. Whether hosted on a server farm, compute grid, or cloud, distributed data grids have become an essential means for driving scalability and deriving insights from application data.
By incorporating several unique capabilities, ScaleOut StateServer Grid Computing Edition offers an industry-leading distributed data grid solution for enterprise applications. Its APIs enable fully parallel query across all cache servers for the fastest possible performance. Applications can easily execute user-defined "map/reduce" methods in parallel on a selected set of cached objects and combine the results. This capability provides a fast and powerful execution layer for parallel mining and analysis of cached data. Developers simply write two intuitive methods as if programming on a single machine, and ScaleOut StateServer GCE transparently combines the power of multiple machines, processors, and cores. A graphical object browser lets user and IT managers visually browse grid-based data and perform management operations. Lastly, a unique parallel backup and restore utility makes it easy to snapshot to a file system for all objects or groups of interest within the grid. Combined with ScaleOut StateServer's scalability and ease of use and ScaleOut Software's world-class support, ScaleOut Grid Computing Edition provides a data grid solution designed to meet the needs of the most demanding enterprise applications.










