Data Stage Interview Questions and Answers

Q1 : What is the difference between Datastage 7.5 and 7.0?
A : In Datastage 7.5 many new stages are added for more robustness and smooth performance, such as Procedure Stage, Command Stage, Generate Report etc.

Q2 : How is DataStage different from Informatica?

A : Data stage and Informatica are both powerful ETL tools but there are a few difference between the two tools. Data stage has the parallelism and partition concept for node configuration whereas the Informatica tool there is not support for parallelism in node configuration. Data stage is simpler to use as compared to Informatica.

Q3 : What are the types of jobs available in Datastage?
A :

  1. Server Job
  2. Parallel Job
  3. Sequencer Job
  4. Container Job

Q4 : What is a Routine in Data Stage?

A : The DataStage manager defines a collection of functions within this tool which is called as a Routine. There are basically there types of Routines in DataStage namely Job Control Routine, Before/After Sub-routine, Transform Function

Q5 : What are the primary usages of Datastage tool?
A : Datastage is an ETL tool which is primarily used for extracting data from source systems, transforming that data and finally loading it to target systems.

Q6 : How is a DataStage source file filled?

A : We can develop an SQL query or we can use a row generator extract tool through which we can fill the source file in Data Stage.

Q7 : What are the different common services in Datastage?
A : Below is the list of common services in Datastage

  • Metadata services
  • Unified service deployment
  • Security services
  • Looping and reporting services.

Q8 : In Datastage, how you can fix the truncated data error?
A : The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘ IMPORT_REJECT_STRING_FIELD_OVERRUN’.

Q9 : What could be a data source system?
A : It could be a database table, a flat file, or even an external application like people soft.

Q10 : What are the different kinds of views available in a Datastage director?
A : There are 3 kinds of views available in Datastage director

  • Log view
  • Status view
  • Job view

Q11 : What are the main features of Datastage?
A : The main features of Datastage are highlighted below

  • It is the data integration component of IBM Infosphere information server.
  • It is a GUI based tool. We just need to drag and drop the Datastage objects and we can convert it to Datastage code.
  • It is used to perform the ETL operations (Extract, transform, load)
  • It provides connectivity to multiple sources & multiple targets at the same time
  • Provides partitioning and parallel processing techniques which enable the Datastage jobs to process a huge volume of data quite faster.
  • It has enterprise-level connectivity.

Q12 : What are DataStage sequences?
A : Datastage sequence connects the DataStage jobs in a logical flow.

Q13 : What are the various kinds of containers available in Datastage?
A : We have below 2 containers in Datastage

  • Local container
  • Shared container
Q14 : How is merging done in DataStage?
A : Merging is done when two or more tables are expected to be combined based on their primary key column. This is the basis for merging in Data Stage.
Q15 : What is the use of Datastage director?
A : Through Datastage director, we can schedule a job, validate the job, execute the job and monitor the job.
Q16 : Define Data Stage?
A : A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extract data from databases and change them into data warehouses. It has become an essential part of the IBM WebSphere Data Integration suite.
Q17 : What is the method of removing duplicates, without the remove duplicate stage?
A : Duplicates can be removed by using Sort stage. We can use the option, as to allow duplicate = false.
Q18 : How to manage date conversion in Datastage?
A : We can use date conversion function for this purpose i.e. Oconv(Iconv(Filedname,”Existing Date Format”),”Another Date Format”).
Q19 : Define Repository tables in Datastage?
A : In Datastage, the Repository is another name for a data warehouse. It can be centralized as well as distributed.
Q20 : Explain the DataStage parallel Extender or Enterprise Edition (EE)?
A : Parallel extender in DataStage is the data extraction and transformation application for parallel processing.
There are two types of parallel processing’s are available they are:

  1. Pipeline Parallelism
  2. Partition Parallelism

Q21 : Define APT_CONFIG in Datastage?
A : It is the environment variable that is used to identify the *.apt file in Datastage. It is also used to store the node information, disk storage information and scratch information.

Q22 : What is a conductor node in DataStage?
A :  Actually every process contains a conductor process where the execution was started and a section leader process for each processing node and a player process for each set of combined operators and an individual player process for each uncombined operator.
Whenever we want to kill a process we should have to destroy the player process and then section leader process and then conductor process.

Q23 : Name the different types of Lookups in Datastage?
A : There are two types of Lookups in Datastage i.e. Normal lkp and Sparse lkp. In Normal lkp, the data is saved in the memory first and then the lookup is performed. In Sparse lkp, the data is directly saved in the database. Therefore, the Sparse lkp is faster than the Normal lkp.

Q24 : Explain Usage Analysis in Datastage?
A : In Datastage, Usage Analysis is performed within a few clicks. Launch Datastage Manager and right-click the job. Then, select Usage Analysis and that’s it.

Q25 : How do you run datastage job from command line? 
A : Using “dsjob” command as follows.

dsjob -run -jobstatus projectname jobname