I have been getting many questions of how to tune GridGain, so I decided to create a brief manual which covers most important tuning properties.
1. GridGain is multi-threaded - Use It
If you are experiencing somewhat slow performance for cache updates, you should ask yourselves whether you are utilizing full computing power (all the cores) on your machine. GridGain is multi-threaded internally, but if you are doing sequential operations one after another from a single thread, then you are not using multithreading. Generally it makes sense to use the amount of threads of about 2 or 3 times the number of cores for populating grid. All GridGain APIs are thread-safe, so you don't have to worry about any concurrency issue when populating data.
2. Use Collocated Computations
GridGain enables you to execute MapReduce computations in memory. However, most computations usually work on some data which is cached on remote grid nodes. Loading that data from remote nodes is usually expensive and it is a lot more cheaper to send the computation to the node where the data is. The easiest way to do it is to use GridProjection.affinityRun(...) method; however GridGain has plenty of "mapKeysToNodes(...)" methods to help users figure out data ownership within Grid.
3. Use Data Loader
If you need to upload lots of data into cache, use org.gridgain.grid.GridDataLoader to do it. Data loader will properly batch the updates prior to sending them to remote nodes and will properly control number of parallel operations taking place on each node to avoid thrashing. Generally it provides performance of 10x than doing a bunch of single-threaded updates.
4. Tune Initial Cache Size
To avoid internal resizing of cache maps you should always provide proper cache start size - not doing so can significantly hurt performance as some CPU cycles will be spent on GridGain resizing internal cache maps instead of application logic. You can configure cache start size via GridCacheConfiguration.getStartSize() configuration property.
5. Tune Near Cache
When using Partitioned cache, GridGain will front this cache with local Near cache to make sure that if entry does not belong to local partitions, it will still be cached in a smaller local cache for better performance on next access.
However, most usages of GridGain happen from collocated computations, i.e. computations submitted to the grid are usually routed to the nodes where the data resides automatically. In cases like this, using Near cache is redundant, as all data access happens from memory anyway. To save on performance, you can disable Near cache by setting GridCacheConfiguration.isNearEnabled() configuration property.
6. Tune Off-Heap Memory
If you plan to allocate large amounts of memory to your JVM for data caching (usually more than 10GB of memory), then your application will most likely suffer from prolonged lock-the-world GC pauses which can significantly hurt latencies. To avoid GC pauses use off-heap memory to cache data - essentially your data is still cached in memory, but JVM does not know about it and GC is not affected.
The only configuration property to set to enable off-heap memory is GridCacheConfiguration.getMaxOffHeapMemory() which will tell GridGain how much off-heap memory to make available for your application. By default off-heap memory is disabled.
7. Tune Swap Storage
First of all, if you don't plan to use swap storage (i.e. disk overflow storage), you should not change any default swap settings (swap storage is disabled by default). If you do need to use swap storage, then you should enable it via GridCacheConfiguration.isSwapEnabled() configuration property.
8. Tune Query Indexing
There are several configuration properties that you should watch out for here. First of all and most importantly, if you don't plan to use cache queries at all, you should disable indexing altogether via GridCacheConfiguration.isQueryIndexEnabled() configuration property.
If you do plan to use cache queries, you should properly enable/disable indexing of primitive keys and values on GridH2IndexingSpi. You should enable indexing for primitive keys by setting setDefaultIndexPrimitiveKey() to true on the SPI only if you plan to use primitive cache keys in your cache queries. The same goes for indexing primitive values controlled by setDefaultIndexPrimitiveValue(...) property.
Also, if for every value class you don't plan to have different key classes (essentially every value class has one key class), set setDefaultIndexFixedTyping(...) on the SPI to true. This way GridGain will store key types as corresponding SQL types instead of binary form which provides faster performance for key lookups.
9. Tune Eviction Policy
Again, if you don't plan to over-populate your cache, i.e. if you don't need any eviction policy at all, then you should disable eviction policy altogether via GridCacheConfiguration.isEvictionEnabled() configuration property.
If you do need GridGain to make sure that data in cache does not overgrow beyond allowed memory limits, you should carefully choose the eviction policy you need. Most likely you will need either FIFO or LRU eviction policies shipped with GridGain, however depending on your application, you may need to configure LIRs or plugin your own custom eviction policy. Regardless of which eviction policy you use, you should carefully chose the maximum amount of entries in cache allowed by eviction policy - if cache size overgrows this limit, then evictions will start occurring. Usually max size is controlled by setMaxSize(...) configuration property on the instance of eviction policy.
You should also almost always configure "setAllowEmptyEntries(...)" configuration property to false. By default GridGain will keep entries with null values in cache to preserve some other properties of the entry, like time-to-live for example. However, if you don't use time-to-live then most likely you should discard the entry once it gets expired or invalidated.
10. Use Write-Behind Caching
If you can afford for your persistent store to be behind your in-memory cache, then use write-behind caching. When write-behind is enabled, GridGain will batch up cache updates and flush them to database in batches in the background which can often provide significant performance benefits. You can enable write-behind caching via GridCacheConfiguration.isWriteBehindEnabled() configuration property.