Question 1 : Can you use the single installation of Ranger on the HDP, to be used with HDF?
Answer : Yes. You can use a single Ranger installed on the HDP to manage HDF (separate installation) as well. The Ranger that is included with HDP will not include the service definition for NiFi, so it would need to be installed manually.
Learn Apache Nifi to Unleash a Modern Career
Question 2 : Will NiFi have connectors following the RDBMS database?
Answer : Yes. You will be able to use rotate processors bundled in NiFi to act additionally than RDBMS in substitute ways. as an example, ExecuteSQL permits you to the state of affairs a SQL choose statement to a designed JDBC association to burning rows from a database; QueryDatabaseTable permits you to incrementally fetch from a decibel table, and GenerateTableFetch permits you to not incrementally fetch the archives, however, and fetch neighboring supply table partitions.
Question 3 : If you want to execute a shell script, in the NiFi dataflow. How to do that?
Answer : To execute a shell script in the NiFi processor, you can use the ExecuteProcess processor.
Question 4 : What is the solution to avoid “Back-pressure deadlock”?
Answer : There are a few options like:
- admin can temporarily increase the back-pressure threshold of the failed connection.
- Another useful approach to consider in such a case may be to have Reporting Tasks that would monitor the flow for large queues.
Question 5 : If you want to consume a SOAP-based WebService in HDF dataflow and WSDL are provided to you. Which of the processor will help to consume this web service?
Answer : You can use the InvokeHTTP processor. With InvokeHTTP, you can add dynamic properties, which will be sent in the request as headers. You can use dynamic properties to set values for the Content-Type and SOAPAction headers, just use the header names for the names of the dynamic properties. InvokeHTTP lets you control the HTTP method, so you can set that to POST. The remaining step would be to get the content of request.xml to be sent to the InvokeHTTP as a FlowFile. One way to do this is to use a GetFile processor to fetch requeset.xml from some location on the filesystem and pass the success relationship of GetFile to InvokeHTTP.
Question 6 : How would you Distribute lookup data to be used in the Dataflow processor?
Answer : You should have used “PutDistributeMapCache”. to share common static configurations at various parts of a NiFi flow.
Question 7 : Will NiFi put in as a facilitate?
Answer : Yes, it’s presently supported in the UNIX system and macOS lonesome.
Question 8 : What is the reportage Task?
Answer : A reportage Task may be a NiFi elaboration narrowing that is alert of reportage and analyzing NiFi’s internal metrics to gift the opinion to outside resources or report standing to warn as bulletins that seem directly within the NiFi interface.
Question 9 : Does the processor commit or rollback the session?
Answer : Yes, the processor is that the part through the session it will commit and rollback. If Processor rolls ensure the session, the FlowFile that were accessed throughout that session can each pension of 1 of being reverted to their previous states. If a Processor instead chooses to commit the session, the session is in command of change the FlowFile Repository and rootage Repository behind the relevant opinion.
Question 10 : Will NiFi member to external sources Like Twitter?
Answer : Absolutely. NIFI includes an undoubtedly protractile framework, permitting any developers/users to hitch knowledge supply instrumentation quite simply. Within the previous official pardon, NIFI 1.0, we tend to have 170+ processors bundled behind the appliance by default, together with the twitter processor. Moving promise considering, supplementary processors/extensions will tremendously be meant in each one of freedom.
Learn Apache Nifi to Unleash a Modern Career
Question 11 : What is the Template in Nifi?
Answer : Template is a re-usable workflow. Which you can import and export in the same or different NiFi instances. It can save a lot of time rather than creating Flow, again and again, each time. A template is created as an XML file.
Question 12 : What is a Nifi Custom Properties Registry?
Answer : You can use to load custom key, value pair you can use custom properties registry, which can be configured as (in nifi.properties file)
nifi.variable.registry.properties=/conf/nifi_registry
And you can put key-value pairs in that file and you can use those properties in you NiFi processor using expression language e.g. ${OS} if you have configured that property in a registry file.
Question 13 : What is MiNiFi?
Answer : MiNiFi is a subproject of Apache NiFi which is designed as a complementary data collection approach that supplements the core tenets of NiFi, focusing on the collection of data at the source of its creation. MiNiFi is designed to run directly at the source, that is why it is special importance is given to the low footprint and low resource consumption. MiNiFi is available in Java as well as C++ agents which are ~50MB and 3.2MB in size respectively.
Question 14 : What is Apache NiFi used for?
Answer :
- Reliable and secure transfer of data between different systems.
- Delivery of data from source to different destinations and platforms.
- Enrichment and preparation of data.
- Conversion between formats.
- Extraction/Parsing.
- Routing decisions.
Question 15 : How can we decide between NiFi vs Flume cs Sqoop?
Answer : NiFi supports all use cases that Flume supports and also have Flume processor out of the box.
NiFi also supports some similar capabilities of Sqoop. For example, GenerateTableFetch processor which does incremental fetch and parallel fetch against source table partitions.
Ultimately, what we want to look at is whether we are solving a specific or singular use case. IF so, then any one of the tools will work. NiFi’s benefits will really shine when we consider multiple use cases being handled at once and critical flow management features like interactive, real-time command and control with full data provenance.
Question 16 : Is There a Programming Language that Apache Nifi supports?
Answer : NiFi is implemented in the Java programming language and allows extensions (processors, controller services, and reporting tasks) to be implemented in Java. In addition, NiFi supports processors that execute scripts written in Groovy, Jython, and several other popular scripting languages.
Question 17 : What is Apache Nifi?
Answer : NiFi is helpful in creating DataFlow. It means you can transfer data from one system to another system as well as process the data in between.
Question 18 : What is a flow file?
Answer : FlowFiles are the heart of NiFi and its data flows. A FlowFile is a data record, which consists of a pointer to its content and attributes which support the content. The content is the pointer to the actual data which is being handled and the attributes are key-value pairs that act as a metadata for the flow file. Some of the attributes of a flow file are filename, UUID, MIME Type etc.
Question 19 : What is Reporting Task?
Answer : A Reporting Task is a NiFi extension point that is capable of reporting and analyzing NiFi’s internal metrics in order to provide the information to external resources or report status information as bulletins that appear directly in the NiFi User Interface.
Question 20 : What is Nifi Flowfile?
Answer : A FlowFile is a message or event data or user data, which is pushed or created in the NiFi. A FlowFile has mainly two things attached to it. Its content (Actual payload: Stream of bytes) and attributes. Attributes are key-value pairs attached to the content (You can say metadata for the content).
Learn Apache Nifi to Unleash a Modern Career
Question 21 : What are the component of flow file?
Answer : A FlowFile is made up of two parts:
- Content: The content is a stream of bytes which contains a pointer to the actual data being processed in the data flow and is transported from source to destination. Keep in mind flow file itself does not contain the data, rather it is a pointer to the content data. The actual content will be in the Content Repository of NiFi.
- Attributes: The attributes are key-value pairs that are associated with the data and act as the metadata for the flowfile. These attributes are generally used to store values which actually provides context to the data. Some of the examples of attributes are filename, UUID,
MIME Type, Flowfile creating time etc.
Question 22 : What is the Bulleting and How it helps in Nifi?
Answer : If you want to know if any problems occur in a data flow. You can check in the logs for anything interesting, it is much more convenient to have notifications pop up on the screen. If a Processor logs anything as a WARNING or ERROR, we will see a “Bulletin Indicator” show up in the top-right-hand corner of the Processor.
This indicator looks like a sticky note and will be shown for five minutes after the event occurs. Hovering over the bulletin provides information about what happened so that the user does not have to sift through log messages to find it. If in a cluster, the bulletin will also indicate which node in the cluster emitted the bulletin. We can also change the log level at which bulletins will occur in the Settings tab of the Configure dialog for a Processor.
Question 23 : What is the role of Apache NiFi in Big Data Ecosystem?
Answer : The main roles Apache NiFi is suitable for in BigData Ecosystem are:
- Data acquisition and delivery.
- Transformations of data.
- Routing data from different source to destination.
- Event processing.
- End to end provenance.
- Edge intelligence and bi-directional communication.
Question 24 : What is a processor?
Answer : NiFi processors are the building block and most commonly used components in NiFi. Processors are the blocks which we drag and drop on the canvas and data flows are made up of multiple processors. A processor can be used for bringing data into the system like GetHTTPS, GetFile, ConsumeKafka etc. or can be used for performing some kind of data transformation or enrichment, for instance, SplitJSON, ConvertAvroToOrc, ReplaceText, ExecuteScript etc.
Question 25 : How does Nifi Support Huge Volume Of Payload In A Dataflow?
Answer : Huge volume of data can transit from DataFlow. As data moves through NiFi, a pointer to the data is being passed around, referred to as a flow file. The content of the flow file is only accessed as needed.
Question 26 : NiFi and Kafka overlap in functionality?
Answer : This is very common questions. Apache NiFi and Kafka actually are very complementary solutions. A Kafka broker provides a very low latency especially when we have a large number of consumers pulling from the same topic. Apache Kafka provides data pipelines and low latency, however, Kafka is not designed to solve dataflow challenges i.e. data prioritization and enrichment etc. That is what Apache NiFi is designed for, it helps in designing dataflow pipelines which can perform data prioritization and other transformations when moving data from one system to another.
Furthermore, unlike NiFi, which handles messages with arbitrary sizes, Kafka prefers smaller messages, in the KB to MB range while NiFi is more flexible for varying sizes which can go up to GB per file or even more.
Apache NiFi is complementary to Apache Kafka by solving all the data flow problems for Kafka.
Question 27 : What Is Relationship In Nifi Dataflow?
Answer : When a processor finishes with the processing of FlowFile. It can result in Failure or Success or any other relationship. And based on this relationship you can send data to the Downstream or next processor or mediate accordingly.
Question 28 : While configuring a processor, what is the language syntax or formulas used?
Answer : NiFi has a concept called expression language which is supported on a per property basis, meaning the developer of the processor can choose whether a property supports expression language or not.
Question 29 : How does Nifi support huge volume of Payload in a Dataflow?
Answer : Huge volume of data can transit from DataFlow. As data moves through NiFi, a pointer to the data is being passed around, referred to as a FlowFile. The content of the FlowFile is only accessed as needed.
Question 30 : If no prioritizers are set in a processor, what prioritization scheme is used?
Answer : The default prioritization scheme is said to be undefined, and it may change from time to time. If no prioritizers are set, the processor will sort the data based on the FlowFile’s Content Claim. This way, it provides the most efficient reading of the data and the highest throughput. We have discussed changing the default setting to First In First Out, but right now it is based on what gives the best performance.
These are some of the most commonly used interview questions regarding Apache NiFi. To read more about Apache NiFi you can check the category Apache NiFi and please do subscribe to the newsletter for more related articles.
Learn Apache Nifi to Unleash a Modern Career
Question 31 : What happens to data if NiFi goes down?
Answer : NiFi stores the data in the repository as it is traversing through the system. There are 3 key repositories:
- The flow file repository.
- The content repository.
- The provenance repository.
As a processor writes data to a flowfile, that is streamed directly to the content repository, when the processor finishes, it commits the session. This triggers the provenance repository to be updated to include the events that occurred for that processor and then the flowfile repository is updated to keep track of where in the flow the file is. Finally, the flow file can be moved to the next queue in the flow. This way, if NiFi goes down at any point, it will be able to resume where it left off. This, however, glosses over one detail, which is that by default when we update the repositories, we write the into to repository but this is often cached by the OS. In case of any failure, this cached data might be lost if the OS also fails along with NiFi. If we really want to avoid this caching we can configure the repositories in the file nifi properties to always sync to disk. This, however, can be a significant hindrance to performance. If only NiFi does down this not be problematic in any way to data, as OS will still be responsible for flushing that cached data to the disk.
Question 32 : Does Nifi Works As A Master-slave Architecture?
Answer : No, from NiFi 1.0 there is 0-master philosophy is considered. And each node in the NiFi cluster is the same. NiFi cluster is managed by the Zookeeper. Apache ZooKeeper elects a single node as the Cluster Coordinator, and failover is handled automatically by ZooKeeper. All cluster nodes report heartbeat and status information to the Cluster Coordinator. The Cluster Coordinator is responsible for disconnecting and connecting nodes. Additionally, every cluster has one Primary Node, also elected by ZooKeeper.
Question 33 : Can we schedule the flow to auto-run like one would with the coordinator?
Answer : By default, the processors are already continuously running as Apache NiFi is designed to be working on the principle of continuous streaming. Unless we select to only run a processor on an hourly or daily basis for example. But by design Apache NiFi is not a job oriented thing. Once we start a processor, it runs continuously.
Question 34 : Do the Attributes get added to content (actual Data) when data is pulled by Nifi?
Answer : You can certainly add attributes to your FlowFiles at any time, that’s the whole point of separating metadata from the actual data. Essentially, one FlowFile represents an object or a message moving through NiFi. Each FlowFile contains a piece of content, which is the actual bytes. You can then extract attributes from the content, and store them in memory. You can then operate against those attributes in memory, without touching your content. By doing so you can save a lot of IO overhead, making the whole flow management process extremely efficient.
Question 35 : What Is The Backpressure In Nifi System?
Answer : Sometimes what happens that Producer system is faster than the consumer system. Hence, the messages which are consumed is slower. Hence, all the messages (FlowFiles) which are not being processed will remain in the connection buffer. However, you can limit the connection backpressure size either based on a number of FlowFiles or number of data size. If it reaches to defined limit, a connection will give back pressure to producer processor not run. Hence, no more FlowFiles generated, until backpressure is reduced.
Question 36 : Does Nifi Works As A Master-slave Architecture?
Answer : No, from NiFi 1.0 there is 0-master philosophy is considered. And each node in the NiFi cluster is the same. NiFi cluster is managed by the Zookeeper. Apache ZooKeeper elects a single node as the Cluster Coordinator, and failover is handled automatically by ZooKeeper. All cluster nodes report heartbeat and status information to the Cluster Coordinator. The Cluster Coordinator is responsible for disconnecting and connecting nodes. Additionally, every cluster has one Primary Node, also elected by ZooKeeper.