Streams processing in Big Data architectures
Stream Processing refers to the real-time processing of data-in-motion, meaning near-real time / real time computations of unbounded streams of records (usually small sizes), dynamic data continuously generated by multiple data sources. Continuous streams of data sources could be sensor events, website users activity, credit card transactions,.. etc.
Stream Processing frameworks are distributed architectures able to handle/process instantly continuous streams of data, handling the state of the operations and also being able to recover in case one of the machines in the cluster fails. This course goes in detail into 2 existing streams processing solutions : Apache Kafka KSQL & Kafka Streams API and Apache Spark Structured Streaming but as well into the basics of streams – using Apache Kafka (understand partitioning, how data is consumed from a stream,…).
In this course you will understand:
– Streams concepts
– Stream processing vs batch processing
– Relevant use cases and the architectures used
– Which are the most important components/features of a stream processing
– Which are the most known stream processing solutions and how they
– Kafka Streaming (KSQL and Kafka Streams API)
– Spark Structured Streaming
– How to create a solution based on KSQL and Spark Structured Streaming
This course is designed for solution architects, data engineers, machine learning engineers, software engineers and data scientists with a basic knowledge of scalable data processing techniques such as Hadoop, MapReduce, etc.
This course is introductory level.
Pre-requisites: a good understanding of IT distributed systems, understanding of Hadoop concepts: mainly HDFS, Apache Spark knowledge a plus, Scala basic level
- Capitole 24
- Quizzes 0
- Durata 3 Zile
- Nivel cunostinte Orice nivel
- Limba Romana
- Cursanti 12
- Assessments Da