Apache Flink Introduction
By: Ahmed Nader
Agenda
• What’s Apache Flink?
• Deeper into Flink
• Quick Start and Configuration
• Get your hands dirty
• Tips and some useful links
• References
2
What’s Apache Flink?
 Open Source platform for distributed Stream and
Batch Processing.
 Large scale data processing engine.
 Real Streaming engine, not cutting stream into
batches.
 Flink has 2 APIs.
DataStrea
m DataSet
3
Datastream API
 Represents a continuous stream of data of certain
type.
 Operations applied on each element of the stream or
windows.
Data
Strea
m
Operation
Data
Strea
m
Source Sink
4
Datastream API
5
 Example Live Stock Feed:
Apple 235
Alert if
Microsoft
> 120
Apple 235
Google 516
Sum
every 10
seconds
Microsoft 124
Microsoft 124
Google 516
Write
event to
database
Alert if
sum >
10000
Dataset API
6
 Uses Batch processing.
 Special case for Stream processing where finite data
sources are just streams that happen to end.
 Offers dedicated API with machine learning and
graph processing libraries.
Data
Set
Operation
Data
Set
Source Sink
Dataset API
7
 Example Map/Reduce paradigm:
Map Reduce
a
1
2
…
Flink Stack
8
Analyzing flink stack
9
 Streaming dataflow runtime which interprets every
program as a dataflow graph.
 Some Libraries on top of Datastream and Dataset
API such as:
 Table: enables SQL like queries.
 Gelly: Graph processing to transform and traverse
graphs in a distributed fashion.
 ML: has a couple of machine learning algorithms yet
still too basic.
 CEP: easily detect complex events in a data stream.
Which can allow to get hold of what’s really important
in your data.
Deeper into Flink
10
Data Sources
From an
input file
From a
socket
From a
collection
Deeper into Flink
11
Data Sinks
Write to a
CSV File
Write to a
socket
Print on
the
terminal
Deeper into Flink
12
 Data Transformations(for DataStream API):
 Map: takes 1 element and produces 1 element.
 flatMap: takes 1 element and produces 0 or more
elements.
 Filter: Evaluates a boolean value for each element
and retains those returning true.
 KeyBy: partitions a stream into disjoint partitions
each has elements of the same key.
 Window: groups all stream events according to some
characteristic ex: data arrived in last 5 seconds.
 Union, Join, Split, Select…
Deeper into Flink
13
 Interesting Use cases:
 Processing Twitter feed and one good application for
that can be collecting statistics on that feed.
see: http://blog.brakmic.com/stream-processing-with-apache-flink/
 Identifying popular locations where people arrive by
taxis,
By applying filter and map functions on a datastream
of taxi ride records then getting the most popular
places for the last 15 minutes for example.
see: https://www.mapr.com/blog/essential-guide-streaming-first-
processing-apache-flink
Setup
14
 Pre-requisites:
 Java 7.x or higher.
 Maven 3.0.4 or higher.
 Start a new flink project using Maven:
Run the following script in the terminal:
mvn archetype:generate  -DarchetypeGroupId=org.apache.flink  -
DarchetypeArtifactId=flink-quickstart-java  -DarchetypeVersion=1.0.1
OR
 Add flink to an existing project:
see: https://ci.apache.org/projects/flink/flink-docs-release-
1.0/apis/common/index.html
Get your hands dirty:
15
Get your hands dirty:
16
Get your hands dirty:
17
Execution
Local/debugging
cluster Command Line
Interface
Web interface
See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.htm
Tips and some useful links:
18
 Subscribe to the mailing list, by sending an empty
email to user-subscribe@flink.apache.org.
 Clone the flink project on Github for more examples.
 There’s a free course by DataArtisans
see: http://dataartisans.github.io/flink-
training/index.html
 Here are some other useful links too:
• http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink
• https://ci.apache.org/projects/flink/flink-docs-release-
0.7/programming_guide.html
• https://ci.apache.org/projects/flink/flink-docs-release-
1.0/apis/common/index.html
References
19
 http://blog.brakmic.com/stream-processing-with-apache-flink/
 http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink
 https://www.mapr.com/blog/essential-guide-streaming-first-processing-
apache-flink
 https://ci.apache.org/projects/flink/flink-docs-release-
0.7/programming_guide.html
 http://dataartisans.github.io/flink-training/index.html
 https://ci.apache.org/projects/flink/flink-docs-release-
1.0/apis/common/index.html
20
Thanks!
Any Questions??

Apache flink

  • 1.
  • 2.
    Agenda • What’s ApacheFlink? • Deeper into Flink • Quick Start and Configuration • Get your hands dirty • Tips and some useful links • References 2
  • 3.
    What’s Apache Flink? Open Source platform for distributed Stream and Batch Processing.  Large scale data processing engine.  Real Streaming engine, not cutting stream into batches.  Flink has 2 APIs. DataStrea m DataSet 3
  • 4.
    Datastream API  Representsa continuous stream of data of certain type.  Operations applied on each element of the stream or windows. Data Strea m Operation Data Strea m Source Sink 4
  • 5.
    Datastream API 5  ExampleLive Stock Feed: Apple 235 Alert if Microsoft > 120 Apple 235 Google 516 Sum every 10 seconds Microsoft 124 Microsoft 124 Google 516 Write event to database Alert if sum > 10000
  • 6.
    Dataset API 6  UsesBatch processing.  Special case for Stream processing where finite data sources are just streams that happen to end.  Offers dedicated API with machine learning and graph processing libraries. Data Set Operation Data Set Source Sink
  • 7.
    Dataset API 7  ExampleMap/Reduce paradigm: Map Reduce a 1 2 …
  • 8.
  • 9.
    Analyzing flink stack 9 Streaming dataflow runtime which interprets every program as a dataflow graph.  Some Libraries on top of Datastream and Dataset API such as:  Table: enables SQL like queries.  Gelly: Graph processing to transform and traverse graphs in a distributed fashion.  ML: has a couple of machine learning algorithms yet still too basic.  CEP: easily detect complex events in a data stream. Which can allow to get hold of what’s really important in your data.
  • 10.
    Deeper into Flink 10 DataSources From an input file From a socket From a collection
  • 11.
    Deeper into Flink 11 DataSinks Write to a CSV File Write to a socket Print on the terminal
  • 12.
    Deeper into Flink 12 Data Transformations(for DataStream API):  Map: takes 1 element and produces 1 element.  flatMap: takes 1 element and produces 0 or more elements.  Filter: Evaluates a boolean value for each element and retains those returning true.  KeyBy: partitions a stream into disjoint partitions each has elements of the same key.  Window: groups all stream events according to some characteristic ex: data arrived in last 5 seconds.  Union, Join, Split, Select…
  • 13.
    Deeper into Flink 13 Interesting Use cases:  Processing Twitter feed and one good application for that can be collecting statistics on that feed. see: http://blog.brakmic.com/stream-processing-with-apache-flink/  Identifying popular locations where people arrive by taxis, By applying filter and map functions on a datastream of taxi ride records then getting the most popular places for the last 15 minutes for example. see: https://www.mapr.com/blog/essential-guide-streaming-first- processing-apache-flink
  • 14.
    Setup 14  Pre-requisites:  Java7.x or higher.  Maven 3.0.4 or higher.  Start a new flink project using Maven: Run the following script in the terminal: mvn archetype:generate -DarchetypeGroupId=org.apache.flink - DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.0.1 OR  Add flink to an existing project: see: https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  • 15.
    Get your handsdirty: 15
  • 16.
    Get your handsdirty: 16
  • 17.
    Get your handsdirty: 17 Execution Local/debugging cluster Command Line Interface Web interface See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.htm
  • 18.
    Tips and someuseful links: 18  Subscribe to the mailing list, by sending an empty email to user-subscribe@flink.apache.org.  Clone the flink project on Github for more examples.  There’s a free course by DataArtisans see: http://dataartisans.github.io/flink- training/index.html  Here are some other useful links too: • http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink • https://ci.apache.org/projects/flink/flink-docs-release- 0.7/programming_guide.html • https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  • 19.
    References 19  http://blog.brakmic.com/stream-processing-with-apache-flink/  http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink https://www.mapr.com/blog/essential-guide-streaming-first-processing- apache-flink  https://ci.apache.org/projects/flink/flink-docs-release- 0.7/programming_guide.html  http://dataartisans.github.io/flink-training/index.html  https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  • 20.