Q1 : What is the difference between Datastage 7.5 and 7.0?
A : In Datastage 7.5 many new stages are added for more robustness and smooth performance, such as Procedure Stage, Command Stage, Generate Report etc.
Q2 : How is DataStage different from Informatica?
Q3 : What are the types of jobs available in Datastage?
A :
- Server Job
- Parallel Job
- Sequencer Job
- Container Job
Q4 : What is a Routine in Data Stage?
Q5 : What are the primary usages of Datastage tool?
A : Datastage is an ETL tool which is primarily used for extracting data from source systems, transforming that data and finally loading it to target systems.
Q6 : How is a DataStage source file filled?
Q7 : What are the different common services in Datastage?
A : Below is the list of common services in Datastage
- Metadata services
- Unified service deployment
- Security services
- Looping and reporting services.
Q8 : In Datastage, how you can fix the truncated data error?
A : The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘ IMPORT_REJECT_STRING_FIELD_OVERRUN’.
Q9 : What could be a data source system?
A : It could be a database table, a flat file, or even an external application like people soft.
Q10 : What are the different kinds of views available in a Datastage director?
A : There are 3 kinds of views available in Datastage director
- Log view
- Status view
- Job view
Q11 : What are the main features of Datastage?
A : The main features of Datastage are highlighted below
- It is the data integration component of IBM Infosphere information server.
- It is a GUI based tool. We just need to drag and drop the Datastage objects and we can convert it to Datastage code.
- It is used to perform the ETL operations (Extract, transform, load)
- It provides connectivity to multiple sources & multiple targets at the same time
- Provides partitioning and parallel processing techniques which enable the Datastage jobs to process a huge volume of data quite faster.
- It has enterprise-level connectivity.
Q12 : What are DataStage sequences?
A : Datastage sequence connects the DataStage jobs in a logical flow.
Q13 : What are the various kinds of containers available in Datastage?
A : We have below 2 containers in Datastage
- Local container
- Shared container
A : Through Datastage director, we can schedule a job, validate the job, execute the job and monitor the job.
A : A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extract data from databases and change them into data warehouses. It has become an essential part of the IBM WebSphere Data Integration suite.
A : Duplicates can be removed by using Sort stage. We can use the option, as to allow duplicate = false.
A : We can use date conversion function for this purpose i.e. Oconv(Iconv(Filedname,”Existing Date Format”),”Another Date Format”).
A : In Datastage, the Repository is another name for a data warehouse. It can be centralized as well as distributed.
A : Parallel extender in DataStage is the data extraction and transformation application for parallel processing.
There are two types of parallel processing’s are available they are:
- Pipeline Parallelism
- Partition Parallelism
Q21 : Define APT_CONFIG in Datastage?
A : It is the environment variable that is used to identify the *.apt file in Datastage. It is also used to store the node information, disk storage information and scratch information.
Q22 : What is a conductor node in DataStage?
A : Actually every process contains a conductor process where the execution was started and a section leader process for each processing node and a player process for each set of combined operators and an individual player process for each uncombined operator.
Whenever we want to kill a process we should have to destroy the player process and then section leader process and then conductor process.
Q23 : Name the different types of Lookups in Datastage?
A : There are two types of Lookups in Datastage i.e. Normal lkp and Sparse lkp. In Normal lkp, the data is saved in the memory first and then the lookup is performed. In Sparse lkp, the data is directly saved in the database. Therefore, the Sparse lkp is faster than the Normal lkp.
Q24 : Explain Usage Analysis in Datastage?
A : In Datastage, Usage Analysis is performed within a few clicks. Launch Datastage Manager and right-click the job. Then, select Usage Analysis and that’s it.
Q25 : How do you run datastage job from command line?
A : Using “dsjob” command as follows.
dsjob -run -jobstatus projectname jobname