H-base Interview Questions and Answers

Q1: What are the different commands used in Hbase operations?

A: There are 5 atomic commands which carry out different operations by Hbase.

Get, Put, Delete, Scan and Increment.

Q2: How to connect to Hbase?

A: A connection to Hbase is established through Hbase Shell which is a Java API.

Q3: What is the role of Zookeeper in Hbase?

A: The zookeeper maintains configuration information, provides distributed synchronization, and also maintains the communication between clients and region servers.

Q4: What are the different types of filters used in Hbase?

A: Filters are used to get specific data form a Hbase table rather than all the records.

They are of the following types.

  • Column Value Filter
  • Column Value comparators
  • KeyValue Metadata filters.
  • RowKey filters.

Q5: Name three disadvantages Hbase has as compared to RDBMS?

A:

  • Hbase does not have in-built authentication/permission mechanism
  • The indexes can be created only on a key column, but in RDBMS it can be done in any column.
  • With one HMaster node there is a single point of failure.

Q6: Is Hbase a scale out or scale up process?

A: Hbase runs on top of Hadoop which is a distributed system. Haddop can only scale uo as and when required by adding more machines on the fly. So Hbase is a scale out process.

Q7: What is compaction in Hbase?

A: As more and more data is written to Hbase, many HFiles get created. Compaction is the process of merging these HFiles to one file and after the merged file is created successfully, discard the old file.

Q8: What are the different compaction types in Hbase?

A: There are two types of compaction. Major and Minor compaction. In minor compaction, the adjacent small HFiles are merged to create a single HFile without removing the deleted HFiles. Files to be merged are chosen randomly.

In Major compaction, all the HFiles of a column are emerged and a single HFiles is created. The delted HFiles are discarded and it is generally triggered manually.

Q9: What is a cell in Hbase?

A: A cell in Hbase is the smallest unit of a Hbase table which holds a piece of data in the form of a tuple{row,column,version}

Q10: What is the role of the class HColumnDescriptor in Hbase?

A: This class is used to store information about a column family such as the number of versions, compression settings, etc. It is used as input when creating a table or adding a column.

Q11: What is the lower bound of versions in Hbase?

A: The lower bound of versions indicates the minimum number of versions to be stored in Hbase for a column. For example If the value is set to 3 then three latest version wil be maintained and the older ones will be removed.

Q12: What is a rowkey in Hbase?

A: Each row in Hbase is identified by a unique byte of array called row key.

Q13: What are the two ways in which you can access data from Hbase?

A: The data in Hbase can be accessed in two ways.

  • Using the rowkey and table scan for a range of row key values.
  • Using mapreduce in a batch manner.

Q14: What are the two types of table design approach in Hbase?

A: They are − (i) Short and Wide (ii) Tall and Thin

Q15: How does Hbase support Bulk data loading?

A: There are two main steps to do a data bulk load in Hbase.

  • Generate Hbase data file(StoreFile) using a custom mapreduce job) from the data source. The StoreFile is created in Hbase internal format which can be efficiently loaded.
  • The prepared file is imported using another tool like comletebulkload to import data into a running cluster. Each file gets loaded to one specific region.

Q16: How does Hbase provide high availability?

A: Hbase uses a feature called region replication. In this feature for each region of a table, there will be multiple replicas that are opened in different RegionServers. The Load Balancer ensures that the region replicas are not co-hosted in the same region servers.

Q17: What is a Hbase Store?

A: A Habse Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.

Q18: What is hotspotting in Hbase?

A: Hotspotting is asituation when a large amount of client traffic is directed at one node, or only a few nodes, of a cluster. This traffic may represent reads, writes, or other operations. This traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability.

Q19: What is a Namespace in Hbase?

A: A Namespace is a logical grouping of tables . It is similar to a database object in a Relational database system.

Q20: Explain the process of row deletion in HBase.

A: On issuing a delete command in HBase through the HBase client, data is not actually deleted from the cells but rather the cells are made invisible by setting a tombstone marker. The deleted cells are removed at regular intervals during compaction.

Q21: Explain about HLog and WAL in HBase.

A: All edits in the HStore are stored in the HLog. Every region server has one HLog. HLog contains entries for edits of all regions performed by a particular Region Server.WAL abbreviates to Write Ahead Log (WAL) in which all the HLog edits are written immediately.WAL edits remain in the memory till the flush period in case of deferred log flush.

Q22: Define standalone mode in Hbase?

A: It is a default mode of HBase .In standalone mode, HBase does not use HDFS—it uses the local filesystem instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process.

Q23: What Is Hregionserver In Hbase?

A: HRegion Server is the Region Server implementation. It is responsible for serving and managing regions. In a distributed cluster, a Region Server runs on a Data Node.

Q24: What Are The Different Block Caches In Hbase?

A: HBase provides two different BlockCache implementations: the default on-heap LruBlockCache and the BucketCache, which is (usually) off-heap.

Q25: Explain What Is Wal And Hlog In Hbase?

A: WAL (Write Ahead Log) is similar to MySQL BIN log; it records all the changes occur in data. It is a standard sequence file by Hadoop and it stores HLogkey’s. These keys consist of a sequential number as well as actual data and are used to replay not yet persisted data after a server crash. So, in cash of server failure WAL work as a life-line and retrieves the lost data’s.