Pentaho Interview Questions and Answers

Question 1 : What is Pentaho Data mining?
Answer : Pentaho Data Mining uses the Waikato Environment for Information Analysis to search for data for patterns. It has functions for data processing, regression analysis, classification methods, etc.

Learn Pentaho to Unleash a Modern Career

Pentaho Training

Question 2 : What is a Tuple?
Answer : The finite ordered list of elements is called a tuple.

Question 3 : What are the benefits of Pentaho?
Answer :

  • Pentaho is an Open source.
  • It has a community that supports users.
  • It runs well under multi-platform (Windows, Linux, Macintosh, Solaris, Unix, etc.).
  • It comprises of a complete package from reporting, ETL for warehousing data management, OLAP server data mining, and also a dashboard.

Question 4 : What is Pentaho Schema Workbench?
Answer : Pentaho Schema Workbench is the graphical edge for designing OLAP cubes for Pentaho Analysis.

Question 5 : What do you understand by the term ETL?
Answer : ETL is an entry-level tool for data manipulation.

Question 6 : Define the term “Encrypting File system”?
Answer : Encrypting file system is the technology that enables files to be transparently encrypted to secure personal data from attackers with physical access to the computer.

Question 7 : What is data staging?
Answer : Data staging is a group of procedures used to prepare source system data for loading a data warehouse.

Question 8 : What is the Workflow?
Answer : Workflow is a set of instruction which tells the Informatica server how to execute the task.

Question 9 : What is a three-tier data warehouse?
Answer : A data warehouse is said to be a three-tier system where a middle system securely provides usable data to end-users. Both sides of this middle system are the end-users and the back-end data stores.

Question 10 : What is the importance of metadata in Pentaho?
Answer : A metadata model in Pentaho formulates the physical structure of your database into a logical business model. These mappings are stored in a central repository and allow developers and administrators to build business-logical DB tables that are cost-effective and optimized. It further simplifies the working of business users allowing them to create formatted reports and dashboards ensuring security to data access.
All in all, the metadata model provides an encapsulation around the physical definitions of your database and the logical representation and defines relationships between them.

Learn Pentaho to Unleash a Modern Career

Pentaho Training

Question 11 : What are the three major types of Data Integration Jobs?
Answer :

  • Transformation Jobs: Used for preparing data and used only when there is no change in data until the transforming of data job is finished.
  • Provisioning Jobs: Used for transmission/transfer of large volumes of data. Used only when no change is data is allowed unless job transformation and on large provisioning requirement.
  • Hybrid Jobs: Execute both transformation and provisioning jobs. No limitations for data changes; it can be updated regardless of success/failure. The transforming and provisioning requirements are not large in this case.

Question 12 : Explain Pentaho?
Answer : It addresses the blockades that block the organization’s ability to get value from all our data. Pentaho is discovered to ensure that each member of our team from developers to business users can easily convert data into value.

Question 13 : Is Pentaho a Trademark?
Answer : Yes, Pentaho is a trademark.

Question 14 : Mention major features of Pentaho?
Answer : Direct Analytics on MongoDB: It authorizes business analysts and IT to access, analyze, and visualize MongoDB data.

  • Science Pack: Pentaho’s Data Science Pack operationalizes analytical modeling and machine learning while allowing data scientists and developers to unburden the labor of data preparation to Pentaho Data Integration.
  • Full YARN Support for Hadoop: Pentaho’s YARN mixing enables organizations to exploit the full computing power of Hadoop while leveraging existing skillsets and technology investments.

Question 15 : How to perform database join with PDI (Pentaho Data Integration)?
Answer : PDI supports joining of two tables form the same database using a ‘Table Input’ method, performing the join in SQL only.
On the other hand, for joining two tables in different databases, users implement ‘Database Join’ step. However, in a database join, each input row query executes on the target system from the mainstream, resulting in lower performance as the number of queries implement on the B increases.
To avoid the above situation, there is yet another option to join rows from two different Table Input steps. You can use ‘Merge Join ‘step, using the SQL query having ‘ORDER BY’ clause. Remember, the rows must be perfectly sorted before implementing merge join.

Question 16 : Is Data Integration and ETL Programming same?
Answer : No. Data Integration refers to passing of data from one type of systems to other within the same application. On the contrary, ETL is used to extract and access data from different sources. And transform it into other objects and tables.

Question 17 : Define Pentaho BI Project?
Answer : Pentaho BI Project is a current effort by the Open Source communal to provide groups with best-in-class solutions for their initiative Business Intelligence (BI) needs

Question 18 : What is the Pentaho Reporting Evaluation?
Answer : Pentaho Reporting Evaluation is a particular package of a subset of the Pentaho Reporting capabilities, designed for typical first-phase evaluation activities such as accessing sample data, creating and editing reports, and viewing and interacting with reports.

Question 19 : What major applications comprises of Pentaho BI Project?
Answer : The Pentaho BI Project encompasses the following major application areas:

  • Business Intelligence Platform
  • Data Mining
  • Reporting
  • Dashboards
  • Business Intelligence Platform

Question 20 : Explain MDX?
Answer : Multidimensional Expressions (MDX) is a query language for OLAP databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.

Learn Pentaho to Unleash a Modern Career

Pentaho Training

Question 21 : Explain Hierarchy Flattening.
Answer : It is just the construction of parent-child relationships in a database. Hierarchy Flattening uses both horizontal and vertical formats, which enables easy and trouble-free identification of sub-elements. It further allows users to understand and read the main hierarchy of BI and includes Parent column, Child Column, Parent attributes, and Child attributes.

Question 22 : What is the Pentaho Reporting Evaluation?
Answer : Pentaho Reporting Evaluation is a particular package of a subset of the Pentaho Reporting capabilities, designed for typical first-phase evaluation activities such as accessing sample data, creating and editing reports, and viewing and interacting with reports.

Question 23 : Which platform benefits from the Pentaho BI Project?
Answer : Java developers who generally use project components to rapidly assemble custom BI solutions
ISVs who can improve the value and ability of their solutions by embedding BI functionality
End-Users who can quickly deploy packaged BI solutions which are either modest or greater to traditional commercial offerings at a dramatically lower cost.

Question 24 : Explain Pentaho report Designer (PRD).
Answer : PRD is a graphics tool to execute report-editing functions and create simple and advanced reports and help users export them in PDF, Excel, HTML and CSV files. PRD consists of a Java-based report engine offering data integration, portability and scalability. Thus, it can be embedded in Java web applications and also other application servers like Pentaho BA server.

Question 25 : Can fieldnames in a row duplicated in Pentaho?
Answer : No, Pentaho doesn’t allow field duplication.

Question 26 : What are variables and arguments in transformations?
Answer : Transformations dialog box consists of two different tables: one of arguments and the other of variables. While arguments refer to command line specified during batch processing, PDI variables refer to objects that are set in a previous transformation/job in the OS.

Question 27 : What do you understand by Pentaho Metadata?
Answer : Pentaho Metadata is a piece of the Pentaho BI Platform designed to make it easier for users to access information in business terms.

Question 28 : How to configure JNDI for Pentaho DI Server?
Answer : Pentaho offers JNDI connection configuration for local DI to avoid continuous running of application server during the development and testing of transformations. Edit the properties in jdbc.propertiesfile located at…\data-integration-server\pentaho-solutions\system\simple-jndi.

Question 29 : Does transformation allow filed duplication?
Answer : “Select Values” will rename a field as you select the original field also. The original field will have a duplicate name of the other field now.

Question 30 : Explain how to sequentialize transformations?
Answer : Since PDI transformations support parallel execution of all the steps/operations, it is impossible to sequentialize transformations in Pentaho. Moreover, to make this happen, users need to change the core architecture, which will actually result in slow processing.

Learn Pentaho to Unleash a Modern Career

Pentaho Training

Question 31 : How to use database connections from repository?
Answer : You can either create a new transformation/job or close and reopen the ones already loaded in Spoon.