ETL Testing Interview Questions and Answers

Q1 : Where are these ETL systems exactly used?
A : ETL systems are used by businesses to integrate data from multiple sources. These software systems are key components in ensuring that your company is processing its data efficiently, allowing your business to run smoothly and without interruption.

Q2 : What process is exactly involved in ETL testing?
A : The process of ETL allows a business/enterprise to collect important data from different source systems and validate/change it to fit their goals and models, and then store it in data warehouse for analytic, forecasts and other kinds of reports for daily use. In a world of digital enterprise, it is a critical part of running an effective and efficient business.

Q3 : Why ETL testing is required?
A : To verify the Data which are being transferred from one system to the other in the described patter/manner by the business (requirements)

Q4 : What are the various tools used in ETL?
A :

  • Cognos Decision Stream
  • Oracle Warehouse Builder
  • Business Objects XI
  • SAS business warehouse
  • SAS Enterprise ETL server

Q5 : Why is this ETL process used?
A : Data has become a critical part of all kinds of businesses and operations. Because data is so important to a successful business, poor performance or inaccurate procedure can cost time and money. Therefore, ETL testing is designed to ensure that the data processing is done in the expected way for the business/enterprise to get the benefit out of it.

Q6 : Why ETL testing is required?
A :

  • To keep a check on the Data which are being transferred from one system to the other.
  • To keep a track on the efficiency and speed of the process.
  • To be well acquainted with the ETL process before it gets implemented into your business and production.

Q7 : What is ETL Testing?
A : ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database.

Q8 : What is ETL testing in comparison with database testing?
A : Verifies whether data is moved as expected• Verifies whether counts in the source an target are matching• Verifies whether data is transformed as expected• Verifies that the foreign primary key relations are preserved during the ETL• Verifies for duplication in loaded data

Q9 : What are ETL tester responsibilities?
A :
  • Requires in-depth knowledge of the ETL tools and processes.
  • Needs to write the SQL queries for the various given scenarios during the testing phase.
  • Should be able to carry our different types of tests such as Primary Key, defaults and keep a check on the other functionality of the ETL process.
  • Quality Check
Q10 : Define the ETL processing?
A : ETL Testing Process:
Although there are many ETL tools, there is a simple testing process which is commonly used in ETL testing. It is as important as the implementation of ETL tool into your business. Having a well-defined ETL testing strategy can make the testing process much easier. Hence, this process needs to be followed before you start the Data Integration processed with the selected ETL tool. In this ETL testing process, a group of experts comprising the programming and developing team will start writing SQL statements. The development team may customize according to the requirements.
ETL testing process is:
Analyzing the requirement – Understanding the business structure and their particular requirement.
Validation and Test Estimation – An estimation of time and expertise required to carry on with the procedure.
Test Planning and Designing the testing environment – Based on the inputs from the estimation, an ETL environment is planned and worked out.
Test Data preparation and Execution – Data for the test is prepared and executed as per the requirement.Q11 : What are Cubes and OLAP Cubes?
A : Cubes are data processing units comprised of fact tables and dimensions from the data warehouse. It provides a multi-dimensional analysis.
OLAP stands for Online Analytics Processing, and OLAP cube stores large data in multi-dimensional form for reporting purposes. It consists of facts called as measures categorized by dimensions.Q12 : What exactly do you mean by the Transformation? What are the types of the same?
A : It is basically regarded as the repository object which is capable to produce the data and can even modify and pass it in a reliable manner. The two commonly used transformations are Active and Passive.Q13 : What is partitioning in ETL
A : The transactions are always needed to be divided for better performance. The same processes are known as Partitioning. It simply makes sure that the server can directly access the sources through multiple connections.

Q14 : What is Factless fact table in ETL?
A : It is defined as the table without measures in the ETL. There are a number of events that can be managed directly with the same. It can also record events that are related to the employees or with the management and this task can be accomplished in a very reliable manner.

Q15 : Explain the use of Lookup Transformation?
A : The Lookup Transformation is useful for

  • Getting a related value from a table using a column value
  • Update slowly changing dimension table
  • Verify whether records already exist in the table

Q16 : What are the types of partitioning? 
A : The types of partitions are
Round-Robin Partitioning:

  • By informatica data is distributed evenly among all partitions
  • In each partition where the number of rows to process is approximately the same, this partitioning is applicable

Hash Partitioning:

  • For the purpose of partitioning keys to group data among partitions, Informatica server applies a hash function
  • It is used when ensuring the processes groups of rows with the same partitioning key in the same partition need to be ensured

Q17 : What is the advantage of using DataReader Destination Adapter?
A : The advantage of using the DataReader Destination Adapter is that it populates an ADO recordset (consist of records and columns) in memory and exposes the data from the DataFlow task by implementing the DataReader interface so that other application can consume the data.

Q18 : In what case do you use dynamic cache and static cache in connected and unconnected transformations?
A :

  • Dynamic cache is used when you have to update master table and slowly changing dimensions (SCD) type 1
  • For flat files Static cache is used

Q19 : Can you define measures in a simple statement?
A : Well, they can be called as the number data which is generally based on the columns and is generally present in a fact table by default.

Q20 : What do you understand by Data Purging?
A : There are needs and situations when the data needs to be deleted from the data warehouse. It is a very daunting task to delete the data in bulk. The Purging is an approach that can delete multiple files at the same time and enable users to maintain speed as well as efficiency. A lot of extra space can be created simply with this.

Q21 : In case you have non-OLEDB (Object Linking and Embedding Database) source for the lookup what would you do?
A : In case if you have non-OLEBD source for the lookup then you have to use Cache to load data and use it as source.

 Q22 : What is data source view?
A : A data source view allows to define the relational schema which will be used in the analysis services databases. Rather than directly from data source objects, dimensions and cubes are created from data source views.

Q23 : What is Grain of Fact?
A : Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact Granularity

Q24 : What is fact? What are the types of facts?
A : It is a central component of a multi-dimensional model which contains the measures to be analyzed. Facts are related to dimensions.
Types of facts are

  • Additive Facts
  • Semi-additive Facts
  • Non-additive Facts

Q25 : What is ETL?
A : In data warehousing architecture, ETL is an important component, which manages the data for any business process. ETL stands for Extract, Transform and Load. Extract does the process of reading data from a database. Transform does the converting of data into a format that could be appropriate for reporting and analysis. While load does the process of writing the data into the target database.