Talend for Hadoop Interview Questions and Answers

Q1 : How do you schedule a Job in Talend?
A : In order to schedule a Job in Talend first, you need to export the Job as a standalone program. Then using your OS’ native scheduling tools (Windows Task Scheduler, Linux, Cron etc.) you can schedule your Jobs.

Q2 : Which Talend component is used for data transform using built-in .NET classes?
A : tDotNETRow helps you facilitate data transform by utilizing custom or built-in .NET classes.

Q3 : What is the use of Expression Editor in Talend?
A : From an Expression Editor, all the expressions like InputVar or Output, and constraint statements can be viewed and edited easily. Expression Editor comes with a dedicated view for writing any function or transformation. The necessary expressions which are needed for the data transformation can be directly written in the Expression editor or you can also open the Expression Builder dialog box where you can just write the data transformation expressions.

Q4 : Can you define schema at runtime in Talend?
A : Schemas can’t be defined during runtime. As the schemas define the movement of data, it must be defined while configuring the components.

Q5 : Can you edit generated code directly?
A : This is not possible; you cannot directly edit the code generated for a Talend Job.

Q6 : What Is Talend?
A : Talend is an open source software integration platform/vendor.

  • It offers data integration and data management solutions.
  • This company provides various integration software and services for big data, cloud storage, data integration, data management, master data management, data quality, data preparation, and enterprise applications.
  • But Talend’s first product i.e. Talend Open Studio for Data Integration is more popularly referred to as Talend.

Q7 : What is the full name of Talend?
A : Talend Open Studio

Q8 : When did Talend Open Studio come into existence/launched?
A : Launched in October 2006

Q9 : What is Talend Open Studio?
A : Talend Open Studio for Data Integration is an open source data integration product developed by Talend and designed to combine, convert and update data in various locations across a business.

Q10 : What is a project in Talend?
A : ‘Project’ is the highest physical structure which bundles up and stores all types of Business Models, Jobs, metadata, routines, context variables or any other technical resources.

Q11 : Explain the various types of connections available in Talend.
A : Connections in Talend define whether the data has to be processed, data output, or the logical sequence of a Job. Various types of connections provided by Talend are:

  1. Row: The Row connection deals with the actual data flow. Following are the types of Row connections supported by Talend:
    • Main
    • Lookup
    • Filter
    • Rejects
    • ErrorRejects
    • Output
    • Uniques/Duplicates
    • Multiple Input/Output
  2. Iterate: The Iterate connection is used to perform a loop on files contained in a directory, on rows contained in a file or on the database entries.
  3. Trigger: The Trigger connection is used to create a dependency between Jobs or Subjobs which are triggered one after the other according to the trigger’s nature. Trigger connections are generalized in two categories:
    1. Subjob Triggers
      • OnSubjobOK
      • OnSubjobError
      • Run if
    2. Component Triggers
      • OnComponentOK
      • OnComponentError
      • Run if
  4. Link: The Link connection is used to transfer the table schema information to the ELT mapper component.

Q12 : What is the use of tLoqateAddressRow component in Talend?
A : This component is used for correct mailing addresses associated with customer data to ensure a single customer view and better delivery for their customer mailings.

Q13 : What is the most current version of Talend Open Studio?
A : Talend Open Studio 5.6.0

Q14 : What is the default pattern of a Date column in Talend?
A : By default, the date pattern for a column of type Date in a schema is “dd-MM-yyyy”.

Q15 : What is tMap?
A : tMap is an advanced component, which integrates itself as a plugin to Talend Studio. tMap transforms and routes data from single or multiple sources to single or multiple destinations. It allows you to define the tMap routing and transformation properties.

Q16 : What do you understand by MDM in Talend?
A : Master data management, through which an organization builds and manages a single, consistent, accurate view of key enterprise data, has demonstrated substantial business value including improvements to operational efficiency, marketing effectiveness, strategic planning, and regulatory compliance. To date, however, MDM has been the privilege of a relatively small number of large, resource-rich organizations. Thwarted by the prohibitive costs of proprietary MDM software and the great difficulty of building and maintaining an in-house MDM solution, most organizations have had to forego MDM despite its clear value.

Q17 : What’s new in v5.6?
A : This technical note highlights the important new features and capabilities of version 5.6 of Talend’s comprehensive suite of Platform, Enterprise, and Open Studio solutions.

Q18 : Describe a Job Design in Talend.
A : A Job is a basic executable unit of anything that is built using Talend. It is technically a single Java class which defines the working and scope of information available with the help of graphical representation. It implements the data fl

Q19 : Can you define a variable that is accessible from multiple Jobs?
A : Yes, you can declare a static variable in a routine, and add the setter/getter methods for this variable in the routine. The variable is then accessible from different Jobs.

Q20 : What is the difference between Built-In and Repository?
A :Built-in: all information is stored locally in the Job. You can enter and edit all information manually.
Repository: all information is stored in the repository.
You can import read-only information into the Job from the repository. If you want to modify the information, you must take one of the following actions:
1. Convert the information from Repository to Built-in and then edit the built-in information.
2. Modify the information in the Repository. Once you have made the changes, you are prompted to update the changes into the Job.

Q21 : How can you normalize delimited data in Talend Open Studio?
A : By using the tNormalize component

Q22 : What is the difference between OnSubjobOK and OnComponentOK?
A : OnSubjobOK and OnComponentOK are trigger links, which can link to another subjob.
The main difference between OnSubjobOK and OnComponentOK lies in the execution order of the linked subjob. With OnSubjobOK, the linked subjob starts only when the previous subjob completely finishes. With OnComponentOK, the linked subjob starts when the previous component finishes.

Q23 : What are Context Variables and why they are used in Talend?
A : Context variables are the user-defined parameters used by Talend which are passed into a Job at the runtime. These variables may change their values as the Job promotes from Development to Test and Production environment. Context variables can be defined in three ways:

  1. Embedded Context Variables
  2. Repository Context Variables
  3. External Context Variables

Q24 : Explain the error handling in Talend.
A : There are a few ways in which errors in Talend can be handled:

    • For simple Jobs, one can rely on the exception throwing process of Talend Open Studio, which is displayed in the Run View as a red stack trace.
    • Each Subjob and component has to return a code which leads the additional processing.  The Subjob Ok/Error and Component Ok/Error links can be used to direct the error towards an error handling routine.
    • The basic way of handling an error is to define an error handling Subjob which should execute whenever an error occurs.

Q25 : Why is Talend called a Code Generator?
A : Talend provides a user-friendly GUI where you can simply drag and drop the components to design a Job. When the Job is executed, Talend Studio automatically translates it into a Java class at the backend. Each component present in a Job is divided into three parts of Java code (begin, main and end). This is why Talend studio is called a code generator.