(Page 1 of 1)
Accelerate HBase bucket cache with flash
Apache HBase™ is a scale-out, distributed database that delivers the fault tolerance necessary for uninterrupted, real-time access to large quantities of unstructured or big data. HBase scale enables it to support massive database tables containing billions of rows and millions of columns. Based on the Hadoop® Distributed File System (HDFS), HBase delivers high-bandwidth streaming performance for low-latency random access.
HBase delivers exceptionally fast insert and update performance for parallel processing of HDFS data distributed throughout server clusters. However, this distributed data can impose a significant performance penalty on HBase random read access operations. The challenge, then, is to optimize both low-latency random access and high-batch throughput. A significant barrier to performance optimization is the limited read and write response inherent to the underlying hard disk drive (HDD) architecture.
Solid state PCIe® flash storage opens bottlenecks to HBase performance, delivering up to 200x more I/O operations per second (IOPS) than HDDs. HBase today takes advantage of flash memory’s high performance, integrating flash storage into its architecture with a bucket cache feature. Bucket cache provides supplementary cache to DRAM and serves as a temporary cache store for data evicted from DRAM (LRU) cache. Flash-based caching can multiply the effective size of DRAM dedicated to HBase RegionServers, enabling a larger dataset to be processed faster than is possible with DRAM and HDDs alone.
In internal LSI testing of random reads, flash caching proves a valuable and cost-effective addition to the HBase memory hierarchy. Flash caching can help boost HBase system performance and increases performance of the dataset size in scale-outs.
The tests benchmarked the implementation of an HBase bucket cache using a Nytro™ MegaRAID® flash accelerator card, which increased operations per second by up to 7 times and reduced average latency by up to 12 times (Uniform Distribution). The Nytro MegaRAID acceleration card improved performance even more for the standard Zipfian distribution – a standard built-in distribution used by the Yahoo! Cloud Serving Benchmark – reducing latency by 21x and 12x in the 95th and 99th percentiles, respectively.
The table below summarizes the performance improvements during internal testing in operations per second, average latency, and latency in the 95th and 99th percentiles.
||Improvement with flash caching, Uniform Distribution
||Improvement with flash caching, Zipfian Distribution |
|Operations Per Second
|95th Percentile Latency
|99th Percentile Latency
To learn more, read the HBase Performance white paper.