Apache Cassandra Interview Questions and Answers

Q1 : What is a keyspace in Cassandra?
A : In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.

Q2 : How Cassandra delete Data?
A : SSTables are immutable and cannot remove a row from SSTables.  When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered deleted.

Q3 : What are the features of Cassandra?
A : Cassandra is preferred for its availability and scalability. Apart from which following are additional features:
Linear Scale Performance.
CQL – Cassandra Query Language
Data Security
Consistency
No failure
Quick error detection and recovery.

Q4 : What is Bloom Filter is used for in Cassandra?
A : A bloom filter is a space-efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra, it is used to save IO when performing a KEY LOOKUP.

Q5 : How Cassandra writes data?
A : Cassandra writes data in three components

  • Commitlog write
  • Memtable write
  • SStable write

Q6 : What is mandatory while creating a table in Cassandra?
A : While creating a table primary key is mandatory, it is made up of one or more columns of a table.

Q7 : Define replication factor.
A : The data in a node undergoes replication. The data is copied from one node to another to ensure fault tolerance. The replication factor is the number of copies of the data that are sent to different nodes.

Q8 : What are the values stored in the Cassandra Column?
A : In Cassandra Column, basically there are three values

  • Column Name
  • Value
  • Time Stamp

Q9 : What is Apache Cassandra?
A : Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database), for managing very large amounts of structured data spread out across the world. It provides a highly available service with no single point of failure. It was developed at Facebook for inbox search and it was open-sourced by Facebook in July 2008.

Q10 : Define replication strategy.
A :  These strategies define the technique how the replicas are placed in a cluster. There are mainly two types of Replication Strategy:
Simple strategy
Network Topology Strategy

Q11 : What needs to be taken care while adding a Column?
A : While adding a column you need to take care that the

  • Column name is not conflicting with the existing column names
  • Table is not defined with compact storage option

Q12 : List the benefits of using Cassandra.

A : Unlike traditional or any other database, Apache Cassandra delivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts, and Software Engineers.

  •  Instead of master-slave architecture, Cassandra is established on peer-to-peer architecture ensuring no failure.
  •  It also assures phenomenal flexibility as it allows insertion of multiple nodes to any Cassandra cluster in any datacenter. Further, any client can forward its request to any server.
  • Cassandra facilitates extensible scalability and can be easily scaled up and scaled down as per the requirements. With a high throughput for reading and write operations, this NoSQL application need not be restarted while scaling.
  • Cassandra is also revered for its strong data replication on nodes capability as it allows data storage at multiple locations enabling users to retrieve data from another location if one node fails. Users have the option to set up the number of replicas they want to create.
  • Shows brilliant performance when used for massive datasets and thus, the most preferable NoSQL DB by most organizations.
  •  Operates on column-oriented structure and thus, quickens and simplifies the process of slicing. Even data access and retrieval become more efficient with column-based data model.
  • Further, Apache Cassandra supports schema-free/schema-optional data model, which un-necessitate the purpose of showing all the columns required by your application. Find out how Cassandra Versus MongoDB can help you get ahead in your career
Q13 : How does Cassandra write?
A : Cassandra performs the write function by applying two commits-first it writes to a commit log on disk and then commits to an in-memory structured known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTable (sorted string table). Cassandra offers speedier write performance.

Q14 : When you can use Alter keyspace?
A : ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.

Q15 : What is Memtable in Cassandra?
A : 

  • Cassandra writes the data to an in-memory structure known as Memtable
  • It is an in-memory cache with content stored as key/column
  • By key Memtable data are sorted
  • There is a separate Memtable for each ColumnFamily, and it retrieves column data from the key

Q16 : What is Cassandra- CQL collections?
A : Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in the following ways

  • List: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)
  • SET: It is used for a group of elements to store and returned in sorted orders (holds repeating elements)
  • MAP: It is a data type used to store a key-value pair of elements

Q17 : Who developed Cassandra and in which language?
A :  Avinash Lakshman and Prashant Malik developed Cassandra using Java. Later Apache took it under it for further development.

Q18 : What is the main objective of creating Cassandra?
A :  The main objective of Cassandra is to handle a large amount of data. Furthermore, the objective also ensures fault tolerance with the swift transfer of data.

Q19 : Define SSTable.
A :  SSTable is Sorted String Table. It is a data file that accepts regular Mem Tables.

Q20 : Name the management tools in Cassandra.
A :  These are the management tools used in Cassandra.

  • DataStaxOpsCenter
  • SPM

Q21 : Give some advantages of Cassandra.
A :  These are the advantages if Cassandra:
Since data can be replicated to several nodes, Cassandra is fault tolerant.
Cassandra can handle a large set of data.
Cassandra provides high scalability.

Q22 : What is the relationship between Apache Hadoop, HBase, Hive, and Cassandra?
A : Apache Hadoop, File Storage, Grid Compute processing via Map Reduce.
Apache Hive, SQL like interface on top of Hadoop.
Apache HBase, Column Family Storage built like BigTable
Apache Cassandra, Column Family Storage build like BigTable with Dynamo topology and consistency.

Q23 : What happens to existing data in my cluster when I add new nodes?
A : When a new node joins a cluster, it will automatically contact the other nodes in the cluster and copy the right data to itself.

Q24 : Define composite key.
A : Composite keys include row key and column name. They are used to define column family with a concatenation of data of different type.

Q25 : Define consistency.
A :  This is a technique to synchronize and update rows of Cassandra data and its replica.