+91-9686502645

Big Data / Data Science

Big Data Spark Development with Scala


Scala Training withApache Spark” is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs.

It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Scala combines object-oriented and functional programming in one concise, high-level language. Scala’s static types help avoid bugs in complex applications, and its JVM and JavaScript runtimes let you build high-performance systems with easy access to huge ecosystems of libraries.

 

TARGET AUDIENCE

  • Fresh Graduates
  • Data Engineers
  • Software developers
  • ETL developers

 

Scala Course Content

Scala Programming

 Introduction of Scala

  • Introducing Scala
  • Deployment of Scala for Big Data applications
  • Apache Spark analytics.

Pattern Matching

  • The importance of Scala
  • The concept of REPL (Read Evaluate Print Loop
  • Deep dive into Scala pattern matching
  • Type interface
  • Higher order function
  • Currying
  • Traits
  • Application space
  • Scala for data analysis.

 

Executing the Scala code

  • Scala Interpreter
  • Static object timer in Scala
  • Testing String equality in Scala
  • Implicit classes in Scala
  • The concept of currying in Scala
  • Various classes in Scala.

The Classes concept in Scala

  • Classes concept
  • understanding the constructor overloading
  • the various abstract classes
  • The hierarchy types in Scala
  • The concept of object equality
  • The Val and var methods in Scala.

Case classes and pattern matching

  • Understanding Sealed traits
  • Wild,
  • Constructor,
  • Tuple,
  • Variable pattern
  • Constant pattern.

Concepts of traits with an example

  • Traits in Scala
  • The advantages of traits,
  • Linearization of traits,
  • The Java equivalent
  • Avoiding of boilerplate code.

Scala Java Interoperability

  • Implementation of traits in Scala and Java
  • Handling of multiple traits extending.

Scala Training Collections

  • Introduction to Scala collections
  • Classification of collections
  • Difference between Iterator, and Iterable in Scala
  • Example of list sequence in Scala.

Mutable collections vs. Immutable collections

  • The two types of collections in Scala
  • Mutable and Immutable collections
  • Understanding lists and arrays in Scala
  • The list buffer and array buffer
  • Queue in Scala
  • Double-ended queue
  • Deque
  • Stacks
  • Sets
  • Maps
  • Tuples in Scala.

Use Case bobsrockets package

  • Introduction to Scala packages and imports
  • The selective imports
  • The Scala test classes
  • Introduction to JUnit test class
  • JUnit interface via JUnit 3 suite for Scala test
  • The packaging of Scala applications in Directory Structure
  • Example of Spark Split and Spark Scala.

 

Apache Spark (Programming Language on Demand)

Writing Spark Applications using Scala

Spark framework comparing Scala

  • Detailed Apache Spark
  • Various features
  • Comparing with Hadoop
  • Various Spark components,
  • Combining HDFS with Spark
  • Scalding

RDD in Spark using Scala

  • The RDD operation in Spark
  • The Spark transformations, actions, data loading,
  • Comparing with MapReduce
  • Key Value Pair.

Data Frames and Spark SQL using Scala
  • The detailed Spark SQL
  • The significance of SQL in Spark for working with structured data processing
  • Spark SQL JSON support
  • Working with XML data, and parquet files
  • Creating HiveContext,
  • Writing Data Frame to Hive
  • Reading of JDBC files
  • The importance of Data Frames in Spark
  • Creating Data Frames
  • Schema manual inferring
  • Working with CSV files
  • Reading of JDBC tables
  • Converting from Data Frame to JDBC
  • The user-defined functions in Spark SQL
  • Shared variable and accumulators
  • How to query and transform data in Data Frames
  • How Data Frame provides the benefits of both Spark RDD and Spark SQL
  • Deploying Hive on Spark as the execution engine.

Spark Streaming using Scala
  • Introduction to Spark streaming
  • The architecture of Spark Streaming
  • Working with the Spark streaming program
  • Processing data using Spark streaming
  • Requesting count and Dstream
  • Multi-batch and sliding window operations
  • Working with advanced data sources.

 

Scala Training Outcome
  • The participant will be able to do Big data analytics using Spark and Scala

Scroll to Top