Do you need a personalized offer for your team? Contact us at academy@esolutions.ro or call 0753.029.187

Streams Processing in Big Data Architectures

Streams Processing in Big Data Architectures
COURSE FEATURES
Course Duration
Duration
3 Days
Skill Level
Skill level
Beginner
Course Language
Language
RO / EN
Course Type
Type
In-class / Online-live
Assesments
Assessments
Yes
Price for Custom Training
Custom Training
1800 Eur / Day
Price for Open Class Course
Open Class
850 Eur / Participant

Stream Processing refers to the real-time processing of data-in-motion, meaning near-real time / real time computations of unbounded streams of records (usually small sizes), dynamic data continuously generated by multiple data sources. Continuous streams of data sources could be sensor events, website users activity, credit card transactions, etc.

 

Stream Processing frameworks are distributed architectures able to handle/process instantly continuous streams of data, handling the state of the operations and also being able to recover in case one of the machines in the cluster fails. This course goes in detail into 2 existing streams processing solutions : Apache Kafka KSQL & Kafka Streams API and Apache Spark Structured Streaming but as well into the basics of streams – using Apache Kafka (understand partitioning, how data is consumed from a stream, and more).

 

In this course you will understand:

 

  • Streams concepts

  • Stream processing vs batch processing

  • Relevant use cases and the architectures used

  • Which are the most important components/features of a stream processing

  • solution/architectures

  • Which are the most known stream processing solutions and how they

  • differentiate

  • Kafka Streaming (KSQL and Kafka Streams API)

  • Spark Structured Streaming

  • How to create a solution based on KSQL and Spark Structured Streaming

 

This course is designed for solution architects, data engineers, machine learning engineers, software engineers and data scientists with a basic knowledge of scalable data processing techniques such as Hadoop, MapReduce, etc.

 

This course is introductory level.

 

Pre-requisites: a good understanding of IT distributed systems, understanding of Hadoop concepts: mainly HDFS, Apache Spark knowledge a plus, Scala basic level.

DAY
1

TOPICS

12

TOPICS
12
Fundamentals of streams with Apache Kafka & KSQL
Learn the fundamentals of streams of data, including how to work with the Apache Kafka ecosystem, data schemas, ApacheAvro, Kafka Connect and REST proxy, KSQL
Understanding the way streams are stored in distributed systems. Storing the stream in a distributed manner will help the stream computation to be scaled out to a distributed cluster for filtering and transforming the streams
Partitioning strategies for streaming topologies
Understand concept of order and time in streams
How data can be consumed from a stream – what means parallel processing of streams data
Streams processing using KSQL
Working with streams & tables
Understanding time concept in stream processing
Late events processing
Windows/aggregates
State handling – how the KSQL architecture handles state and failures
DAY
2

TOPICS

11

TOPICS
11
Streaming architectures/solutions overview (with use cases) – Kafka Streams, Flink, Spark Streaming, Storm, Samza
Spark Structured Streams Processing (using Kafka)
Intro Apache Spark: understand architecture and concepts: RDDs vs Dataframes/Datasets
Concepts of Structured Streaming
Stream DF’s
Queries on streams
Triggers
Output modes: Console, File, Memory, Kafka
Checkpointing – understand how Spark saves the state of the persistent queries and how handles failure
Window aggregations
Handling of late data – watermarking
DAY
3

TOPICS

1

TOPICS
1
End to end use case: Meetup.com RSVPs processing with Kafka & Spark

Contact Us

Feel free to leave us your thoughts so we can discover the solution together!

EMAIL

academy@esolutions.ro

Get in touch

0753.029.187

Our address

20 Constantin Budisteanu Street , 1 st. District, Bucharest

By clicking on "Send" button above, you agree with our terms of use.

EMAIL

academy@esolutions.ro

Get in touch

0753.029.187

Our address

20 Constantin Budisteanu Street , 1 st. District, Bucharest

Related Courses

Apache NIFI
Big Data

This course covers the main concepts in Apache NiFi and its potential in automated data flow between systems, providing efficient data ingestion, transformation, and routing.

DURATION
3 Days
Apache Storm
Big Data

This course covers the main concepts in Apache Storm and its potential in use cases such as real-time analytics, continuous computation, and event-driven apps.

DURATION
3 Days
Big Data
Big Data

This course covers the usage and applicability of big data technologies and concepts necessary for architecting and building a big data architecture.

DURATION
4 Days
Cassandra
Big Data

This introductory course covers common use cases for Cassandra, its key features, storage architecture, and more.

DURATION
3 Days
+ View all courses
dividerLeft

Do you want to become a trainer?

Ask not what your company can do for you, ask what you can do for your company. Apply for a training position.

Apply Now