• Cursuri
    • Categorii cursuri
      • Big Data
      • Database
      • FrontEnd
      • Java
      • Linux and Infrastructure
      • Software Security
      • Vezi alte cursuri
    • Despre cursuri
      • Cursurile noastre
      • Instructori
      • Become a Teacher
  • Evenimente
  • Portofoliu
  • Despre noi
  • Contact
Contact
0753.029.187
academy@esolutions.ro
eSolutions Academy
  • Cursuri
    • Categorii cursuri
      • Big Data
      • Database
      • FrontEnd
      • Java
      • Linux and Infrastructure
      • Software Security
      • Vezi alte cursuri
    • Despre cursuri
      • Cursurile noastre
      • Instructori
      • Become a Teacher
  • Evenimente
  • Portofoliu
  • Despre noi
  • Contact

Evenimente

  • Home
  • Blog
  • Evenimente
  • Introduction in Apache Spark: Open course

Introduction in Apache Spark: Open course

  • Posted by Natalia Criclivii
  • Categories Evenimente, Stiri
  • Date aprilie 30, 2020

This three days course is for data engineers, analysts, architects, software engineers, IT operations and technical managers  interested in the overall architecture & components of Apache Spark and understanding Spark through different exercises & use cases, and interactions with different distributed storage systems (HDFS, noSQL).

The course covers Apache Spark main concepts: the core- architecture, RDDs/Dataframes/Datasets, transformations & actions, DAG; SQL engine, streaming engine, machine learning libraries, and as well highlights the possible usage of Spark in differents use cases like:  ETL, analytics and Machine Learning.

Day I– May,13: Spark Overview

  • A brief history of Spark
  • Where Spark fits in the big data landscape
  • Apache Spark vs. Apache MapReduce: An overall architecture comparison
  • Cluster Architecture: cluster manager, workers, executors; Spark Context; Cluster Manager Types; Deployment scenarios
  • How Spark schedules and executes jobs and tasks
  • Resilient Distributed Datasets​: Fundamentals & hands on exercises
  • Ways to create an RDD: Parallelize Collection; Read from external data source (local drive, HDFS, noSQL); From existing RDD
  • Introduction to Transformations and Actions
  • Caching
  • RDD Types
  • How transformations lazily build up a Directed Acyclic Graph (DAG)
  • Shuffling
  • Hands on: using Spark for ETL

Day II– May,14: SparkSQL & DataFrames/Datasets​: Fundamentals and hands on exercises 

  • What are DataFrames/Datasets vs RDD’s
  • The DataFrames/Datasets API
  • Catalyst Optimizer
  • Spark SQL
  • Creating and running DataFrame operations
  • Reading from multiple data sources (hands on exercises)

Day III– May,14: Spark Streaming  

  • When to use Structured Spark Streaming
  • Structured streaming: Building a Spark streams out of Kafka topics; Windowing & Aggregation; Register a Spark DF stream in memory and query with Spark ML
  • Spark MLlib and Spark.ml
  • Machine Learning Examples: Collaborative filtering: Alternating Least Squares ; Classification and regression

An end to end Spark example: we will build an end to end case, from data input, data cleaning, data storage and machine learning; we will work in a cloud environment and we will use Apache Zeppelin for all the Spark coding/exercises (Scala).

Requirements​: please have a free Internet connection (port 22 open) and Google Chrome available on the working station. Also, we recommend an SSH client available on the working station.

Prerequisites​:  All exercises will be done in Scala and SQL.  Prior knowledge of Scala and SQL syntax cloud be of help for easier understanding of the exercises, but please note that the main scope of the course is to understand the architecture, ways of working and usage of Spark in different use cases. Scala programming is not part of the course objectives

Trainer

VALENTINA CRISAN
Consultant & Trainer Big Data Technologies

Consultant in Big Data and Cloud domains (solutions architecture) and trainer for Big Data Technologies and Architectures (Apache Cassandra, Apacke Kafka, Hadoop ecosystem and Big Data architectures), with more than 14-years experience in telecom and IT domains, architecting telecom value-added service solutions and leading for several years technical presales teams. Passionate about cloud and data, Valentina teaches Cassandra, Kafka, Hadoop, and big data architecture courses and works in consultancy projects in Big Data domains, organizes Bucharest Big Data meetup and organizes and teaches several Bigdata.ro events.

Investment: 750 euro

For registration, fill in the form below:

    First Name:

    Last Name:

    Email :

    Phone:

    Company:

    Message:

    • Share:
    author avatar
    Natalia Criclivii

    Previous post

    Big Data Architecture - Open Course
    aprilie 30, 2020

    Next post

    Java Fundamentals: Open Course
    mai 4, 2020

    You may also like

    2022-courses
    eSolutions Academy Course Catalogue 2022
    5 aprilie, 2022
    calendar academy
    Calendar cursuri, martie – octombrie 2021
    23 februarie, 2021
    Academy Calendar 2
    Calendar Cursuri, Iunie- Octombrie 2020
    10 iunie, 2020

    Categorii Cursuri

    • Big Data
    • Database
    • Front-End
    • Java
    • Linux & Infrastructure
    • Software Security
    • Vezi alte cursuri

    Cere Detalii

      Nume *

      Email *

      Telefon

      Curs

      Mesaj

      Făcând clic pe butonul „Trimite” de mai jos, înțelegeți și sunteți de acord că utilizarea prezentului website se supune termenilor si condițiilor de utilizare

      logo-eduma-the-best-lms-wordpress-theme

      0753.029.187

      academy@esolutions.ro

      Companie

      • Despre noi
      • Contact

      Suport

      • Întrebări frecvente
      • Catalog cursuri

      Recomandări

      • eSolutions Grup
      • Different Angle Cluster

      Privacy

      • Termeni si conditii
      • Politica privind cookie-urile
      • Politica de confidentialitate

      a service of eSolutions.

      VREI SĂ DEVII INSTRUCTOR?

      Alătură-te echipei noastre!

      Aplică acum