Apache Nifi Tutorial PDF
Apache Nifi Tutorial PDF
Apache Nifi Tutorial PDF
i
Apache NiFi
In this tutorial, we will be explaining the basics of Apache NiFi and its features.
Audience
This tutorial is designed for software professionals who want to learn the basics of Apache
NiFi and its programming concepts in simple and easy steps. It describes the components
of Apache NiFi with suitable examples.
Prerequisites
You should have a basic understanding of Java, ETL, Data ingestion and transformation.
The user should be familiar with web server, platform configuration, and regex patterns.
All the content and graphics published in this e-book are the property of Tutorials Point (I)
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish
any contents or a part of contents of this e-book in any manner without written consent
of the publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our
website or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at [email protected]
i
Apache NiFi
Table of Contents
About the Tutorial ............................................................................................................................................ i
Audience ........................................................................................................................................................... i
Prerequisites ..................................................................................................................................................... i
GetFile ........................................................................................................................................................... 11
PutFile ............................................................................................................................................................ 14
ii
Apache NiFi
Core properties.............................................................................................................................................. 29
zookeeper ...................................................................................................................................................... 33
DBCPConnectionPool .................................................................................................................................... 54
MonitorMemory ............................................................................................................................................ 56
iv
1. Apache NiFi — Introduction Apache NiFi
Apache NiFi is a powerful, easy to use and reliable system to process and distribute data
between disparate systems. It is based on Niagara Files technology developed by NSA and
then after 8 years donated to Apache Software foundation. It is distributed under Apache
License Version 2.0, January 2004. The latest version for Apache NiFi is 1.7.1.
Apache NiFi is a real time data ingestion platform, which can transfer and manage data
transfer between different sources and destination systems. It supports a wide variety of
data formats like logs, geo location data, social feeds, etc. It also supports many protocols
like SFTP, HDFS, and KAFKA, etc. This support to wide variety of data sources and
protocols making this platform popular in many IT organizations.
● It is highly configurable. This helps users with guaranteed delivery, low latency,
high throughput, dynamic prioritization, back pressure and modify flows on
runtime.
● It also provides data provenance module to track and monitor data from the start
to the end of the flow.
● Developers can create their own custom processors and reporting tasks according
to their needs.
● NiFi also provides support to secure protocols like SSL, HTTPS, SSH and other
encryptions.
● It also supports user and role management and also can be configured with LDAP
for authorization.
● Process Group: It is a group of NiFi flows, which helps a user to manage and keep
flows in hierarchical manner.
● Processor: A processor is a java module responsible for either fetching data from
sourcing system or storing it in destination system. Other processors are also used
to add attributes or change content in flowfile.
● Flowfile: It is the basic usage of NiFi, which represents the single object of the
data picked from source system in NiFi. NiFi processor makes changes to flowfile
1
Apache NiFi
while it moves from the source processor to the destination. Different events like
CREATE, CLONE, RECEIVE, etc. are performed on flowfile by different processors in
a flow.
● Event: Events represent the change in flowfile while traversing through a NiFi Flow.
These events are tracked in data provenance.
● Data provenance: It is a repository. It also has a UI, which enables users to check
the information about a flowfile and helps in troubleshooting if any issues that arise
during the processing of a flowfile.
● Apache NiFi supports clustering, so it can work on multiple nodes with same flow
processing different data, which increase the performance of data processing.
● It also provides security policies on user level, process group level and other
modules too.
● Its UI can also run on HTTPS, which makes the interaction of users with NiFi secure.
● NiFi supports around 188 processors and a user can also create custom plugins to
support a wide variety of data systems.
● Apache NiFi have state persistence issue in case of primary node switch, which
sometimes makes processors not able to fetch data from sourcing systems.
2
2. Apache NiFi — Basic Concepts Apache NiFi
Apache NiFi consist of a web server, flow controller and a processor, which runs on Java
Virtual Machine. It also has 3 repositories Flowfile Repository, Content Repository, and
Provenance Repository as shown in the figure below.
Flowfile Repository
This repository stores the current state and attributes of every flowfile that goes through
the data flows of apache NiFi. The default location of this repository is in the root directory
of apache NiFi. The location of this repository can be changed by changing the property
named "nifi.flowfile.repository.directory".
Content Repository
This repository contains all the content present in all the flowfiles of NiFi. Its default
directory is also in the root directory of NiFi and it can be changed using
"org.apache.nifi.controller.repository.FileSystemRepository" property. This directory uses
large space in disk so it is advisable to have enough space in the installation disk.
Provenance Repository
The repository tracks and stores all the events of all the flowfiles that flow in NiFi. There
are two provenance repositories – volatile provenance repository (in this repository all
the provenance data get lost after restart) and persistent provenance repository. Its
default directory is also in the root directory of NiFi and it can be changed using
“org.apache.nifi.provenance.PersistentProvenanceRepository” and
“org.apache.nifi.provenance.VolatileProvenanceRepositor” property for the respective
repositories.
3
Apache NiFi
4
3. Apache NiFi — Environment Setup Apache NiFi
In this chapter, we will learn about the environment setup of Apache NiFi. The steps for
installation of Apache NiFi are as follows:
Step 1: Install the current version of Java in your computer. Please set the JAVA_HOME
in your machine. You can check the version as shown below:
$ echo $JAVA_HOME
Step 3: The installation process for Apache NiFi is very easy. The process differs with the
OS:
● Windows OS: Unzip the zip package and the Apache NiFi is installed.
● UNIX OS: Extract tar file in any location and the Logstash is installed.
Step 4: Open command prompt, go to the bin directory of NiFi. For example, C:\nifi-
1.7.1\bin, and execute run-nifi.bat file.
C:\nifi-1.7.1\bin>run-nifi.bat
Step 5: It will take a few minutes to get the NiFi UI up. A user can check nifi-app.log,
once NiFi UI is up then, a user can enter http://localhost:8080/nifi/ to access UI.
5
4. Apache NiFi — User Interface Apache NiFi
Apache is a web-based platform that can be accessed by a user using web UI. The NiFi UI
is very interactive and provides a wide variety of information about NiFi. As shown in the
image below, a user can access information about the following attributes:
Active Threads
Total queued data
Transmitting Remote Process Groups
Not Transmitting Remote Process Groups
Running Components
Stopped Components
Invalid Components
Disabled Components
Up to date Versioned Process Groups
Locally modified Versioned Process Groups
Stale Versioned Process Groups
Locally modified and Stale Versioned Process Groups
Sync failure Versioned Process Groups
6
Apache NiFi
Processors
User can drag the process icon on the canvas and select the desired processor for the data
flow in NiFi.
Processor Icon
Input port
Below icon is dragged to canvas to add the input port into any data flow.
Input port is used to get data from the processor, which is not present in that process
group.
After dragging this icon, NiFi asks to enter the name of the Input port and then it is added
to the NiFi canvas.
7
Apache NiFi
Output port
The below icon is dragged to canvas to add the output port into any data flow.
The output port is used to transfer data to the processor, which is not present in that
process group.
After dragging this icon, NiFi asks to enter the name of the Output port and then it is
added to the NiFi canvas.
Process Group
A user uses below icon to add process group in the NiFi canvas.
8
Apache NiFi
After dragging this icon, NiFi asks to enter the name of the Process Group and then it is
added to the NiFi canvas.
9
Apache NiFi
Funnel
Funnel is used to transfer the output of a processor to multiple processors. User can use
the below icon to add the funnel in a NiFi data flow.
Funnel Icon
Template
This icon is used to add a data flow template to NiFi canvas. This helps to reuse the data
flow in the same or different NiFi instances.
Template Icon
After dragging, a user can select the templates already added in the NiFi.
Label
These are used to add text on NiFi canvas about any component present in NiFi. It offers
a range of colors used by a user to add aesthetic sense.
Label Icon
10
5. Apache NiFi — Processors Apache NiFi
Apache NiFi processors are the basic blocks of creating a data flow. Every processor has
different functionality, which contributes to the creation of output flowfile. Dataflow shown
in the image below is fetching file from one directory using GetFile processor and storing
it in another directory using PutFile processor.
GetFile
GetFile process is used to fetch files of a specific format from a specific directory. It also
provides other options to user for more control on fetching. We will discuss it in properties
section below.
GetFile Settings
Following are the different settings of GetFile processor:
Name
In the Name setting, a user can define any name for the processors either according to
the project or by that, which makes the name more meaningful.
Enable
A user can enable or disable the processor using this setting.
Penalty Duration
This setting lets a user to add the penalty time duration, in the event of flowfile failure.
Yield Duration
11
Apache NiFi
This setting is used to specify the yield time for processor. In this duration, the process is
not scheduled again.
Bulletin Level
This setting is used to specify the log level of that processor.
GetFile Scheduling
These are the following scheduling options offered by the GetFile processor:
Schedule Strategy
You can either schedule the process on time basis by selecting time driven or a specified
CRON string by selecting a CRON driver option.
Concurrent Tasks
This option is used to define the concurrent task schedule for this processor.
Execution
A user can define whether to run the processor in all nodes or only in Primary node by
using this option.
Run Schedule
12
Apache NiFi
It is used to define the time for time driven strategy or CRON expression for CRON driven
strategy.
GetFile Properties
GetFile offers multiple properties as shown in the image below raging compulsory
properties like Input directory and file filter to optional properties like Path Filter and
Maximum file Size. A user can manage file fetching process using these properties.
13
Apache NiFi
GetFile Comments
This Section is used to specify any information about processor.
PutFile
The PutFile processor is used to store the file from the data flow to a specific location.
PutFile Settings
The PutFile processor has the following settings:
Name
In the Name setting, a user can define any name for the processors either according to
the project or by that which makes the name more meaningful.
Enable
A user can enable or disable the processor using this setting.
Penalty Duration
This setting lets a user add the penalty time duration, in the event of flowfile failure.
14
Apache NiFi
Yield Duration
This setting is used to specify the yield time for processor. In this duration, the process
does not get scheduled again.
Bulletin Level
This setting is used to specify the log level of that processor.
PutFile Scheduling
These are the following scheduling options offered by the PutFile processor:
Schedule Strategy
You can schedule the process on time basis either by selecting timer driven or a specified
CRON string by selecting CRON driver option. There is also an Experimental strategy Event
Driven, which will trigger the processor on a specific event.
Concurrent Tasks
This option is used to define the concurrent task schedule for this processor.
Execution
A user can define whether to run the processor in all nodes or only in primary node by
using this option.
15
Apache NiFi
Run Schedule
It is used to define the time for timer driven strategy or CRON expression for CRON driven
strategy.
PutFile Properties
The PutFile processor provides properties like Directory to specify the output directory for
the purpose of file transfer and others to manage the transfer as shown in the image
below.
16
Apache NiFi
PutFile Comments
This Section is used to specify any information about processor.
17
6. Apache NiFi — Processors Categorization Apache NiFi
18
Apache NiFi
sending the data, these processors DROP the flowfile with success relationship. Some of
the processors that belong to this category are PutEmail, PutKafka, PutSFTP, PutFile,
PutFTP, etc.
HTTP Processors
These processors deal with the HTTP and HTTPS calls. Some of the processors that belong
to this category are InvokeHTTP, PostHTTP, ListenHTTP, etc.
AWS Processors
AWS processors are responsible to interaction with Amazon web services system. Some
of the processors that belong to this category are GetSQS, PutSNS, PutS3Object,
FetchS3Object, etc.
19
7. Apache NiFi — Processors Relationship Apache NiFi
In an Apache NiFi data flow, flowfiles move from one to another processor through
connection that gets validated using a relationship between processors. Whenever a
connection is created, a developer selects one or more relationships between those
processors.
As you can see in the above image, the check boxes in black rectangle are relationships.
If a developer selects these check boxes then, the flowfile will terminate in that particular
processor, when the relationship is success or failure or both.
Success
When a processor successfully processes a flowfile like store or fetch data from any
datasource without getting any connection, authentication or any other error, then the
flowfile goes to success relationship.
Failure
When a processor is not able to process a flowfile without errors like authentication error
or connection problem, etc. then the flowfile goes to a failure relationship.
20
Apache NiFi
A developer can also transfer the flowfiles to other processors using connections. The
developer can select and also load balance it, but load balancing is just released in version
1.8, which will not be covered in this tutorial.
As you can see in the above image the connection marked in red have failure relationship,
which means all flowfiles with errors will go to the processor in left and respectively all the
flowfiles without errors will be transferred to the connection marked in green.
comms.failure
This relationship is met, when a Flowfile could not be fetched from the remote server due
to a communications failure.
not.found
Any Flowfile for which we receive a ‘Not Found’ message from the remote server will move
to not.found relationship.
permission.denied
When NiFi unable to fetch a flowfile from the remote server due to insufficient permission,
it will move through this relationship.
21
8. Apache NiFi — FlowFile Apache NiFi
A flowfile is a basic processing entity in Apache NiFi. It contains data contents and
attributes, which are used by NiFi processors to process data. The file content normally
contains the data fetched from source systems. The most common attributes of an Apache
NiFi FlowFile are:
UUID
This stands for Universally Unique Identifier, which is a unique identity of a flowfile
generated by NiFi.
Filename
This attribute contains the filename of that flowfile and it should not contain any directory
structure.
File Size
It contains the size of an Apache NiFi FlowFile.
mime.type
It specifies the MIME Type of this FlowFile.
22
Apache NiFi
path
This attribute contains the relative path of a file to which a flowfile belongs and does not
contain the file name.
23
9. Apache NiFi — Queues Apache NiFi
The Apache NiFi data flow connection has a queuing system to handle the large amount
of data inflow. These queues can handle very large amount of FlowFiles to let the processor
process them serially.
The queue in the above image has 1 flowfile transferred through success relationship. A
user can check the flowfile by selecting the List queue option in the drop down list. In
case of any overload or error, a user can also clear the queue by selecting the empty
queue option and then the user can restart the flow to get those files again in the data
flow.
The list of flowfiles in a queue, consist of position, UUID, Filename, File size, Queue
Duration, and Lineage Duration. A user can see all the attributes and content of a flowfile
by clicking the info icon present at the first column of the flowfile list.
24
Apache NiFi
25
10. Apache NiFi — Process Groups Apache NiFi
In Apache NiFi, a user can maintain different data flows in different process groups. These
groups can be based on different projects or the organizations, which Apache NiFi instance
supports.
The fourth symbol in the menu at the top of the NiFi UI as shown in the above picture is
used to add a process group in the NiFi canvas. The process group named
“Tutorialspoint.com_ProcessGroup” contains a data flow with four processors currently in
stop stage as you can see in the above picture. Process groups can be created in
hierarchical manner to manage the data flows in better structure, which is easy to
understand.
In the footer of NiFi UI, you can see the process groups and can go back to the top of the
process group a user is currently present in.
To see the full list of process groups present in NiFi, a user can go to the summary by
using the menu present in the left top side of the NiFi UI. In summary, there is process
groups tab where all the process groups are listed with parameters like Version State,
Transferred/Size, In/Size, Read/Write, Out/Size, etc. as shown in the below picture.
26
Apache NiFi
27
11. Apache NiFi — Labels Apache NiFi
Apache NiFi offers labels to enable a developer to write information about the components
present in the NiFI canvas. The leftmost icon in the top menu of NiFi UI is used to add the
label in NiFi canvas.
A developer can change the color of the label and the size of the text with a right-click on
the label and choose the appropriate option from the menu.
28
12. Apache NiFi — Configuration Apache NiFi
Apache NiFi is highly configurable platform. The nifi.properties file in conf directory
contains most of the configuration.
Core properties
This section contains the properties, which are compulsory to run a NiFi instance.
29
Apache NiFi
State Management
These properties are used to store the state of the components helpful to start the
processing, where components left after a restart and in the next schedule running.
30
Apache NiFi
FlowFile Repository
Let us now look into the important details of the FlowFile repository:
31
Apache NiFi
then change to
“org.apache.nifi.con
troller.repository.Vo
latileFlowFileReposit
ory”.
32
13. Apache NiFi — Administration Apache NiFi
Apache NiFi offers support to multiple tools like ambari, zookeeper for administration
purposes. NiFi also provides configuration in nifi.properties file to set up HTTPS and other
things for administrators.
zookeeper
NiFi itself does not handle voting process in cluster. This means when a cluster is created,
all the nodes are primary and coordinator. So, zookeeper is configured to manage the
voting of primary node and coordinator. The nifi.properties file contains some properties
to setup zookeeper.
Enable HTTPS
To use NiFi over HTTPS, administrators have to generate keystore and truststore and set
some properties in the nifi.properties file. The TLS toolkit can be used to generate all the
necessary keys to enable HTTPS in apache NiFi.
33
Apache NiFi
34
Apache NiFi
35
14. Apache NiFi — Creating Flows Apache NiFi
Apache NiFi offers a large number of components to help developers to create data flows
for any type of protocols or data sources. To create a flow, a developer drags the
components from menu bar to canvas and connects them by clicking and dragging the
mouse from one component to other.
Generally, a NiFi has a listener component at the starting of the flow like getfile, which
gets the data from source system. On the other end of there is a transmitter component
like putfile and there are components in between, which process the data.
For example, let create a flow, which takes an empty file from one directory and add some
text in that file and put it in another directory.
To begin with, drag the processor icon to the NiFi canvas and select GetFile
processor from the list.
Right-click on the processor and select configure and in properties tab add Input
Directory (c:\inputdir) and click apply and go back to canvas.
Drag the processor icon to the canvas and select the ReplaceText processor from
the list.
Right-click on the processor and select configure. In the properties tab, add some
text like “Hello tutorialspoint.com” in the textbox of Replacement Value and click
apply.
Go to settings tab, check the failure checkbox at right hand side, and then go back
to the canvas.
Drag the processor icon to the canvas and select the PutFile processor from the
list.
Right-click on the processor and select configure. In the properties tab, add
Directory (c:\outputdir) and click apply and go back to canvas.
Go to settings tab and check the failure and success checkbox at right hand side
and then go back to the canvas.
36
Apache NiFi
Now start the flow and add an empty file in input directory and you will see that, it
will move to output directory and the text will be added to the file.
By following the above steps, developers can choose any processor and other NiFi
component to create suitable flow for their organisation or client.
37
15. Apache NiFi — Templates Apache NiFi
Apache NiFi offers the concept of Templates, which makes it easier to reuse and distribute
the NiFi flows. The flows can be used by other developers or in other NiFi clusters. It also
helps NiFi developers to share their work in repositories like GitHub.
Create Template
Let us create a template for the flow, which we created in chapter no 15 “Apache NiFi -
Creating Flows”.
Select all the components of the flow using shift key and then click on the create template
icon at the left hand side of the NiFi canvas. You can also see a tool box as shown in the
above image. Click on the icon create template marked in blue as in the above picture.
Enter the name for the template. A developer can also add description, which is optional.
Download Template
Then go to the NiFi templates option in the menu present at the top right hand corner of
NiFi UI as show in the picture below.
38
Apache NiFi
Now click the download icon (present at the right hand side in the list) of the template,
you want to download. An XML file with the template name will get downloaded.
Upload Template
To use a template in NiFi, a developer will have to upload its xml file to NiFi using UI.
There is an Upload Template icon (marked with blue in below image) beside Create
Template icon click on that and browse the xml.
Add Template
In the top toolbar of NiFi UI, the template icon is before the label icon. The icon is marked
in blue as shown in the picture below.
Drag the template icon and choose the template from the drop down list and click add. It
will add the template to NiFi canvas.
39
16. Apache NiFi — API Apache NiFi
NiFi offers a large number of API, which helps developers to make changes and get
information of NiFi from any other tool or custom developed applications. In this tutorial,
we will use postman app in google chrome to explain some examples.
To add postman to your Google Chrome, go to the below mentioned URL and click add to
chrome button. You will now see a new app added to your Google Chrome.
https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomo
p?hl=en
The current version of NiFi rest API is 1.8.0 and the documentation is present in the below
mentioned URL.
https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
40
Apache NiFi
Let us now consider an example and run on postman to get the details about the running
NiFi instance.
Request
GET http://localhost:8080/nifi-api/flow/about
Response
{
"about": {
"title": "NiFi",
"version": "1.7.1",
"uri": "http://localhost:8080/nifi-api/",
"contentViewerUrl": "../nifi-content-viewer/",
"timezone": "SGT",
"buildTag": "nifi-1.7.1-RC1",
"buildTimestamp": "07/12/2018 12:54:43 SGT"
}
}
41
17. Apache NiFi — Data Provenance Apache NiFi
Apache NiFi logs and store every information about the events occur on the ingested data
in the flow. Data provenance repository stores this information and provides UI to search
this event information. Data provenance can be accessed for full NiFi level and processor
level also.
The following table lists down the different fields in the NiFi Data Provenance event list
have following fields:
7 Show lineage Last column has the show lineage icon, which is used to see
the flowfile lineage as shown in the below image.
42
Apache NiFi
To get more information about the event, a user can click on the information icon present
in the first column of the NiFi Data Provenance UI.
There are some properties in nifi.properties file, which are used to manage NiFi Data
Provenance repository.
43
Apache NiFi
44
18. Apache NiFi — Monitoring Apache NiFi
In Apache NiFi, there are multiple ways to monitor the different statistics of the system
like errors, memory usage, CPU usage, Data Flow statistics, etc. We will discuss the most
popular ones in this tutorial.
In built Monitoring
In this section, we will learn more about in built monitoring in Apache NiFi.
Bulletin Board
The bulletin board shows the latest ERROR and WARNING getting generated by NiFi
processors in real time. To access the bulletin board, a user will have to go the right hand
drop down menu and select the Bulletin Board option. It refreshes automatically and a
user can disable it also. A user can also navigate to the actual processor by double-clicking
the error. A user can also filter the bulletins by working out with the following:
by message
by name
by id
by group id
Data provenance UI
To monitor the Events occurring on any specific processor or throughout NiFi, a user can
access the Data provenance from the same menu as the bulletin board. A user can also
filter the events in data provenance repository by working out with the following fields:
by component name
by component type
by type
NiFi Summary UI
Apache NiFi summary also can be accessed from the same menu as the bulletin board.
This UI contains information about all the components of that particular NiFi instance or
cluster. They can be filtered by name, by type or by URI. There are different tabs for
different component types. Following are the components, which can be monitored in the
NiFi summary UI:
Processors
Input ports
Output ports
Remote process groups
Connections
45
Apache NiFi
Process groups
In this UI, there is a link at the bottom right hand side named system diagnostics to check
the JVM statistics.
Reporting Tasks
Apache NiFi provides multiple reporting tasks to support external monitoring systems like
Ambari, Grafana, etc. A developer can create a custom reporting task or can configure the
inbuilt ones to send the metrics of NiFi to the externals monitoring systems. The following
table lists down the reporting tasks offered by NiFi 1.7.1.
NiFi API
There is an API named system diagnostics, which can be used to monitor the NiFI stats in
any custom developed application. Let us check the API in postman.
Request
http://localhost:8080/nifi-api/system-diagnostics
Response
{
"systemDiagnostics": {
"aggregateSnapshot": {
"totalNonHeap": "183.89 MB",
46
Apache NiFi
"totalNonHeapBytes": 192819200,
"usedNonHeap": "173.47 MB",
"usedNonHeapBytes": 181894560,
"freeNonHeap": "10.42 MB",
"freeNonHeapBytes": 10924640,
"maxNonHeap": "-1 bytes",
"maxNonHeapBytes": -1,
"totalHeap": "512 MB",
"totalHeapBytes": 536870912,
"usedHeap": "273.37 MB",
"usedHeapBytes": 286652264,
"freeHeap": "238.63 MB",
"freeHeapBytes": 250218648,
"maxHeap": "512 MB",
"maxHeapBytes": 536870912,
"heapUtilization": "53.0%",
"availableProcessors": 4,
"processorLoadAverage": -1,
"totalThreads": 71,
"daemonThreads": 31,
"uptime": "17:30:35.277",
"flowFileRepositoryStorageUsage": {
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
},
"contentRepositoryStorageUsage": [
{
"identifier": "default",
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
47
Apache NiFi
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
}
],
"provenanceRepositoryStorageUsage": [
{
"identifier": "default",
"freeSpace": "286.93 GB",
"totalSpace": "464.78 GB",
"usedSpace": "177.85 GB",
"freeSpaceBytes": 308090789888,
"totalSpaceBytes": 499057160192,
"usedSpaceBytes": 190966370304,
"utilization": "38.0%"
}
],
"garbageCollection": [
{
"name": "G1 Young Generation",
"collectionCount": 344,
"collectionTime": "00:00:06.239",
"collectionMillis": 6239
},
{
"name": "G1 Old Generation",
"collectionCount": 0,
"collectionTime": "00:00:00.000",
"collectionMillis": 0
}
],
"statsLastRefreshed": "09:30:20 SGT",
"versionInfo": {
"niFiVersion": "1.7.1",
"javaVendor": "Oracle Corporation",
"javaVersion": "1.8.0_151",
"osName": "Windows 7",
48
Apache NiFi
"osVersion": "6.1",
"osArchitecture": "amd64",
"buildTag": "nifi-1.7.1-RC1",
"buildTimestamp": "07/12/2018 12:54:43 SGT"
}
}
}
}
49
19. Apache NiFi — Upgrade Apache NiFi
Before starting the upgrade of Apache NiFi, read the release notes to know about the
changes and additions. A user needs to evaluate the impact of these additions and changes
in his/her current NiFi installation. Below is the link to get the release notes for the new
releases of Apache NiFi.
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes
In a cluster setup, a user needs to upgrade NiFi installation of every Node in a cluster.
Follow the steps given below to upgrade the Apache NiFi.
Backup all the custom NARs present in your current NiFi or lib or any other folder.
Download the new version of Apache NiFi. Below is the link to download the source
and binaries of latest NiFi version.
https://nifi.apache.org/download.html
Create a new directory in the same installation directory of current NiFi and extract
the new version of Apache NiFi.
Stop the NiFi gracefully. First stop all the processors and let all the flowfiles present
in the flow get processed. Once, no more flowfile is there, stop the NiFi.
Copy the configuration of authorizers.xml from current NiFi installation to the new
version.
Add the custom logging from logback.xml to the new NiFi installation.
Update all the properties in nifi.properties of the new NiFi installation from current
version.
Please make sure that the group and user of new version is same as the current
version, to avoid any permission denied errors.
Copy the contents of the following directories from current version of NiFi
installation to the same directories in the new version.
o ./conf/flow.xml.gz
50
Apache NiFi
o For provenance and content repositories change the values in nifi. properties
file to the current repositories.
Recheck all the changes performed and check if they have an impact on any new
changes added in the new NiFi version. If there is any impact, check for the
solutions.
Start all the NiFi nodes and verify if all the flows are working correctly and
repositories are storing data and Ui is retrieving it with any errors.
Monitor bulletins for some time to check for any new errors.
If the new version is working correctly, then the current version can be archived
and deleted from the directories.
51
20. Apache NiFi — Remote Process Group Apache NiFi
Apache NiFi Remote Process Group or RPG enables flow to direct the FlowFiles in a flow to
different NiFi instances using Site-to-Site protocol. As of version 1.7.1, NiFi does not offer
balanced relationships, so RPG is used for load balancing in a NiFi data flow.
A developer can add the RPG from the top toolbar of NiFi UI by dragging the icon as shown
in the above picture to canvas. To configure an RPG, a Developer has to add the following
fields:
4 HTTP Proxy Server To specify the proxy server’s hostname for the
Hostname purpose of transport in RPG.
5 HTTP Proxy Server Port To specify the proxy server’s port for the purpose of
transport in RPG.
6 HTTP Proxy User It is an optional field to specify the username for HTTP
proxy.
52
Apache NiFi
A developer needs to enable it, before using it like we start processors before using them.
53
21. Apache NiFi — Controller Settings Apache NiFi
Apache NiFi offers shared services, which can be shared by processors and reporting task
is called controller settings. These are like Database connection pool, which can be used
by processors accessing same database.
To access the controller settings, use the drop down menu at the right top corner of NiFi
UI as shown in the below image.
There are many controller settings offered by Apache NiFi, we will discuss a commonly
used one and how we set it up in NiFi.
DBCPConnectionPool
Add the plus sign in the Nifi Settings page after clicking the Controller settings option.
Then select the DBCPConnectionPool from the list of controller settings.
DBCPConnectionPool will be added in the main NiFi settings page as shown in the below
image.
Type
Bundle
State
54
Apache NiFi
Scope
Configure and delete icon
Click on the configure icon and fill the required fields. The fields are listed down in the
table below:
3 Max Wait Time 500 millis To specify time to wait for the data from
a connection to database.
To stop or configure a controller setting, first all the attached NiFi components should be
stopped. NiFi also adds scope in controller settings to manage the configuration of it.
Therefore, only the ones which shared the same settings will not get impacted and will use
the same controller settings.
55
22. Apache NiFi — Reporting Task Apache NiFi
Apache NiFi reporting tasks are similar to the controller services, which run in the
background and send or log the statistics of NiFi instance. NiFi reporting task can also be
accessed from the same page as controller settings, but in a different tab.
To add a reporting task, a developer needs to click on the plus button present at the top
right hand side of the reporting tasks page. These reporting tasks are mainly used for
monitoring the activities of a NiFi instance, in either the bulletins or the provenance. Mainly
these reporting tasks uses Site-to-Site to transport the NiFi statistics data to other node
or external system.
MonitorMemory
This reporting task is used to generate bulletins, when a memory pool crosses specified
percentage. Follow these steps to configure the MonitorMemory reporting task:
Add in the plus sign and search for MonitorMemory in the list.
Once it is added in the main page of reporting tasks main page, click on the
configure icon.
In the properties tab, select the memory pool, which you want to monitor.
Select the percentage after which you want bulletins to alert the users.
56
23. Apache NiFi — Custom Processor Apache NiFi
Apache NiFi is an open source platform and gives developers the options to add their
custom processor in the NiFi library. Follow these steps to create a custom processor.
https://maven.apache.org/download.cgi
Add an environment variable named M2_HOME and set value as the installation
directory of maven.
https://www.eclipse.org/downloads/download.php
o nifi-<artifactBaseName>-processors
o nifi-<artifactBaseName>-nar
Then select “Existing Projects into workspace” and add the project from nifi-
<artifactBaseName>-processors directory in eclipse.
Then package the code to a NAR file by running the below mentioned command.
57
Apache NiFi
Copy the NAR file to the lib folder of Apache NiFi and restart the NiFi.
After successful restart of NiFi, check the processor list for the new custom
processor.
58
24. Apache NiFi — Custom Controllers Service Apache NiFi
Apache NiFi is an open source platform and gives developers the options to add their
custom controllers service in Apache NiFi. The steps and tools are almost the same as
used to create a custom processor.
o nifi-<artifactBaseName>
o nifi-<artifactBaseName>-nar
o nifi-<artifactBaseName>-api
o nifi-<artifactBaseName>-api-nar
Then select “Existing Projects into workspace” and add the project from nifi-
<artifactBaseName> and nifi-<artifactBaseName>-api directories in eclipse.
Then package the code to a NAR file by running the below mentioned command.
Copy these NAR files to the lib folder of Apache NiFi and restart the NiFi.
After successful restart of NiFi, check the processor list for the new custom
processor.
59
25. Apache NiFi — Logging Apache NiFi
Apache NiFi uses logback library to handle its logging. There is a file logback.xml present
in the conf directory of NiFi, which is used to configure the logging in NiFi. The logs are
generated in logs folder of NiFi and the log files are as described below.
nifi-app.log
This is the main log file of nifi, which logs all the activities of apache NiFi application
ranging from NAR files loading to the run time errors or bulletins encountered by NiFi
components. Below is the default appender in logback.xml file for nifi-app.log file.
<appender name="APP_FILE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file>
<rollingPolicy
class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-
MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<maxHistory>30</maxHistory>
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender>
The appender name is APP_FILE, and the class is RollingFileAppender, which means logger
is using rollback policy. By default, the max file size is 100 MB and can be changed to the
required size. The maximum retention for APP_FILE is 30 log files and can be changed as
per the user requirement.
nifi-user.log
This log contains the user events like web security, web api config, user authorization, etc.
Below is the appender for nifi-user.log in logback.xml file.
<appender name="USER_FILE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-user.log</file>
<rollingPolicy
class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
60
Apache NiFi
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-
user_%d.log</fileNamePattern>
<maxHistory>30</maxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender>
The appender name is USER_FILE. It follows the rollover policy. The maximum retention
period for USER_FILE is 30 log files. Below is the default loggers for USER_FILE appender
present in nifi-user.log.
nifi-bootstrap.log
This log contains the bootstrap logs, apache NiFi’s standard output (all system.out written
in the code mainly for debugging), and standard error (all system.err written in the code).
Below is the default appender for the nifi-bootstrap.log in logback.log.
61
Apache NiFi
<appender name="BOOTSTRAP_FILE"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-
bootstrap.log</file>
<rollingPolicy
class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-
bootstrap_%d.log</fileNamePattern>
<maxHistory>5</maxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender>
62