Developers

Developers"How can a data grid help me to build applications that deliver high performance on server farms and HPC compute grids'" You may be asking yourself this question as you develop your next distributed applications. Let's take a quick dive into data grids from a developer's point of view and see what they are all about.

 

Eliminating the Scalability Bottleneck

You already know that server farms and HPC compute grids can deliver big performance gains - and do so cost-effectively by using industry-standard servers. For example, say you are running a Web application on one server, and that server begins to max out. Now if you add another server running another copy of your application, you can double the throughput. This means that you will be able to handle twice the load without increasing response times for Web users. As you keep adding servers, you should be able to just keep scaling throughput.

The trick of course is to make sure that adding servers does not also create bottlenecks that keep throughput from scaling linearly. The problem is that the servers may have to share a resource, such as a database server, and this shared resource may become overloaded. Web and grid computing applications often use a database server to store application-state so that it can be accessed across the grid. They also may repeatedly retrieve popular data, such as product descriptions, or they may create application-wide data, such as "top 10" lists, schedules, or interim stock trading results, all of which have to be accessible to all servers. It is critical to avoid creating a storage bottleneck when storing and accessing this data.

This is where data grids can step in and help you avoid storage bottlenecks. By storing data in a scalable, in-memory data grid that spans the whole server farm, you can be sure that access throughout scales with the farm or HPC compute grid. Since the data grid makes data uniformly available to all servers, you don't have to implement a mechanism for passing data between servers. And the data grid automatically creates replicas and handles server failures to keep data safe.

Easy to Use APIs

Let's see just how easy it is to store data in ScaleOut StateServer's data grid. The product's APIs give your application a simple view of the data grid that looks the same on all servers. Objects are stored as binary, serialized objects identified by a 256-bit key that you supply. Access methods are basically Add, Retrieve, Update, and Remove. These methods also take care of synchronization between servers by locking a grid object when you retrieve it and unlocking it after it's updated.

Here's a simple example. Let's say you want to store a "top 10" list. You just create a ScaleOut built-in helper object called a named cache to setup the object's storage area within the grid and then you add the object to the data grid, as the following code snippet illustrates:

TopTenList myList;
// (initialize myList)
SossCache mycache = CacheFactory.GetCache("myCache");
mycache["mylist"] = myList;

This operation serializes the object myList and then stores it in the data grid using a simple string key called "mylist" for access. Now you can retrieve the list, modify it, and re-store it, as follows:

myList = mycache["mylist"] as TopTenList;
// (modify the list in the local server)
mycache["mylist"] = myList;

As you can see, the APIs make access to the data grid easy and hide all of the machinery needed to implement global accessibility, scalability, and data replication for high availability. Here is another example of some of the data grid's hidden power. Let's say you get a database invalidation event that requires that you remove the cached top 10 list. With one line of code you can instantly signal all servers in the farm that the stored data has been invalidated:

mycache["mylist"] = null;

Imagine how much work it would be to develop code that removes the top 10 list from all servers in a synchronized manner, while handling possible server failures. ScaleOut Software's data grid efficiently and reliably takes care of this for you.

Scalable Storage Capabilities

As you begin to explore the possibilities for data grids, you will find they are a powerful tool for storing application data which needs to be quickly accessed and shared across a server farm. Data grids can be used in a wide variety of applications to help boost performance and avoid bottlenecks to scalable throughput.

To see an example of how ScaleOut Software's data grid can easily scale your application, consider the problem of handling object expiration after a timeout, such as a session timeout for a Web session-state object. You can assign a timeout value to an object when it is added to the data grid, and if a timeout occurs, the data grid signals an asynchronous event to notify your application. Because the data grid has spread objects across the servers within the farm, it can simultaneously signal timeout events on different servers to scale the event handling load. The data grid also takes care of server failures by re-signaling events as necessary to ensure that they are reliably delivered. So your application automatically benefits from the data grid's scalable, highly available event handling without any code development on your part.

Because objects are automatically load-balanced across the farm or compute grid, the data grid also provides an excellent means for distributing your application's workload, as we just saw with the event handling example. ScaleOut Software will continue to add API capabilities to simplify the design of server farm and grid computing applications so that you can take full advantage of the scalability and high availability provided by the data grid.

Going Beyond Storage: Data Analysis, Management, and the Cloud

Data grids have the ability to do more than provide fast scalable storage for data. As a developer, you may need to provide analysis capabilities for your applications. Using the parallel query and parallel method invocation features of ScaleOut Analytics Server, you can implement powerful map/reduce-style analysis on large data sets stored in the data grid. This straightforward, data-parallel programming model minimizes development time while delivering fast results when analyzing large data sets hosted in the grid.

Developers need powerful management tools to make data grids practical for real-world applications. To meet these needs, ScaleOut Management Pack™ includes an object browser for visually browsing and managing objects stored in the data grid and a parallel backup/restore feature for archiving its contents in the file system. Parallel backup further extends the opportunity for data analysis since snapshots of the data grid can be quickly captured in real-time and later analyzed for trends.

When you target applications for deployment in the cloud, ScaleOut StateServer's data grid can simplify data migration and ensure that you take full advantage of the cloud's natural elasticity. Read about our cloud computing solutions to learn more.

Powerful technology that’s easy to use



Companies Using ScaleOut

Certification & Partners

Share This Page