• Cursuri

    CATEGORII CURSURI

    • Big Data
    • Database
    • Front-End
    • Java
    • Linux & Infrastructure
    • Software Security
    • Vezi alte cursuri

    Despre cursuri

    • Cursuri
    • Instructorii noştri
    • Cum devii instructor

  • Evenimente
  • Portofoliu
  • Despre noi
  • Contact
Contact
0753.029.187
academy@esolutions.ro
academy.esolutions.roacademy.esolutions.ro
  • Cursuri

    CATEGORII CURSURI

    • Big Data
    • Database
    • Front-End
    • Java
    • Linux & Infrastructure
    • Software Security
    • Vezi alte cursuri

    Despre cursuri

    • Cursuri
    • Instructorii noştri
    • Cum devii instructor

  • Evenimente
  • Portofoliu
  • Despre noi
  • Contact

Evenimente

  • Home
  • Blog
  • Evenimente
  • Introduction in Apache Spark: Open course

Introduction in Apache Spark: Open course

  • Posted by Natalia Criclivii
  • Categorii Evenimente, Stiri
  • Data aprilie 30, 2020

This three days course is for data engineers, analysts, architects, software engineers, IT operations and technical managers  interested in the overall architecture & components of Apache Spark and understanding Spark through different exercises & use cases, and interactions with different distributed storage systems (HDFS, noSQL).

The course covers Apache Spark main concepts: the core- architecture, RDDs/Dataframes/Datasets, transformations & actions, DAG; SQL engine, streaming engine, machine learning libraries, and as well highlights the possible usage of Spark in differents use cases like:  ETL, analytics and Machine Learning.

Day I– May,13: Spark Overview

  • A brief history of Spark
  • Where Spark fits in the big data landscape
  • Apache Spark vs. Apache MapReduce: An overall architecture comparison
  • Cluster Architecture: cluster manager, workers, executors; Spark Context; Cluster Manager Types; Deployment scenarios
  • How Spark schedules and executes jobs and tasks
  • Resilient Distributed Datasets​: Fundamentals & hands on exercises
  • Ways to create an RDD: Parallelize Collection; Read from external data source (local drive, HDFS, noSQL); From existing RDD
  • Introduction to Transformations and Actions
  • Caching
  • RDD Types
  • How transformations lazily build up a Directed Acyclic Graph (DAG)
  • Shuffling
  • Hands on: using Spark for ETL

Day II– May,14: SparkSQL & DataFrames/Datasets​: Fundamentals and hands on exercises 

  • What are DataFrames/Datasets vs RDD’s
  • The DataFrames/Datasets API
  • Catalyst Optimizer
  • Spark SQL
  • Creating and running DataFrame operations
  • Reading from multiple data sources (hands on exercises)

Day III– May,14: Spark Streaming  

  • When to use Structured Spark Streaming
  • Structured streaming: Building a Spark streams out of Kafka topics; Windowing & Aggregation; Register a Spark DF stream in memory and query with Spark ML
  • Spark MLlib and Spark.ml
  • Machine Learning Examples: Collaborative filtering: Alternating Least Squares ; Classification and regression

An end to end Spark example: we will build an end to end case, from data input, data cleaning, data storage and machine learning; we will work in a cloud environment and we will use Apache Zeppelin for all the Spark coding/exercises (Scala).

Requirements​: please have a free Internet connection (port 22 open) and Google Chrome available on the working station. Also, we recommend an SSH client available on the working station.

Prerequisites​:  All exercises will be done in Scala and SQL.  Prior knowledge of Scala and SQL syntax cloud be of help for easier understanding of the exercises, but please note that the main scope of the course is to understand the architecture, ways of working and usage of Spark in different use cases. Scala programming is not part of the course objectives

Trainer

VALENTINA CRISAN
Consultant & Trainer Big Data Technologies

Consultant in Big Data and Cloud domains (solutions architecture) and trainer for Big Data Technologies and Architectures (Apache Cassandra, Apacke Kafka, Hadoop ecosystem and Big Data architectures), with more than 14-years experience in telecom and IT domains, architecting telecom value-added service solutions and leading for several years technical presales teams. Passionate about cloud and data, Valentina teaches Cassandra, Kafka, Hadoop, and big data architecture courses and works in consultancy projects in Big Data domains, organizes Bucharest Big Data meetup and organizes and teaches several Bigdata.ro events.

Investment: 750 euro

For registration, fill in the form below:

First Name:

Last Name:

Email :

Phone:

Company:

Message:


  • Share:
Natalia Criclivii

Previous post

Big Data Architecture - Open Course
aprilie 30, 2020

Next post

Java Fundamentals: Open Course
mai 4, 2020

Recomandari

Academy Calendar 2
Calendar Cursuri, Iunie- Octombrie 2020
10 iunie, 2020
Java Fundamentals
Java Fundamentals: Open Course
4 mai, 2020
valentina
Big Data Architecture – Open Course
23 aprilie, 2020

Categorii cursuri

  • Big Data
  • Database
  • Front-End
  • Java
  • Linux & Infrastructure
  • Software Security
  • Vezi alte cursuri

Cursuri recente

Streams processing in Big Data architectures

Streams processing in Big Data architectures

Introduction to noSQL

Introduction to noSQL

Machine Learning Crash Course using Python

Machine Learning Crash Course using Python

Vezi toate cursurile

logo-eduma-the-best-lms-wordpress-theme

0753.029.187

academy@esolutions.ro

Companie

  • Despre noi
  • Cursuri
  • Contact

Suport

  • Întrebări frecvente
  • Catalog cursuri

Recomandări

  • eSolutions Grup
  • Different Angle Cluster

Privacy

  • Terms

a service of eSolutions.

VREI SĂ DEVII INSTRUCTOR?

Alătură-te echipei noastre!

Aplică acum