Memory Usage

Sufficient physical memory should be provisioned for the IMDG to store the data set objects and their replicas. If the IMDG runs out of memory during the recording phase, the affected splits will not be recorded and instead will be read from HDFS using the underlying record reader on the next run. Although this behavior will impact access time, it is recommended that the maximum memory usage by the IMDG be limited by setting the max_memory parameter in the soss_params.txt file on every server in the cluster. Setting this parameter ensures that adequate physical memory is provisioned for the Hadoop infrastructure and MapReduce jobs.

If additional servers running ScaleOut hServer are added to the cluster, the cached data sets will be automatically rebalanced across all servers to take advantage of the larger IMDG. This will move some of the cached splits to new servers for access by additional Hadoop mappers to increase the overall throughput of the MapReduce run.