Big Data Spark Development with Scala
Scala Training with “Apache Spark” is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs.
It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
- Fresh Graduates
- Data Engineers
- Software developers
- ETL developers
Scala Course Content
Introduction of Scala
- Introducing Scala
- Deployment of Scala for Big Data applications
- Apache Spark analytics.
- The importance of Scala
- The concept of REPL (Read Evaluate Print Loop
- Deep dive into Scala pattern matching
- Type interface
- Higher order function
- Application space
- Scala for data analysis.
Executing the Scala code
- Scala Interpreter
- Static object timer in Scala
- Testing String equality in Scala
- Implicit classes in Scala
- The concept of currying in Scala
- Various classes in Scala.
The Classes concept in Scala
- Classes concept
- understanding the constructor overloading
- the various abstract classes
- The hierarchy types in Scala
- The concept of object equality
- The Val and var methods in Scala.
Case classes and pattern matching
- Understanding Sealed traits
- Variable pattern
- Constant pattern.
Concepts of traits with an example
- Traits in Scala
- The advantages of traits,
- Linearization of traits,
- The Java equivalent
- Avoiding of boilerplate code.
Scala Java Interoperability
- Implementation of traits in Scala and Java
- Handling of multiple traits extending.
Scala Training Collections
- Introduction to Scala collections
- Classification of collections
- Difference between Iterator, and Iterable in Scala
- Example of list sequence in Scala.
Mutable collections vs. Immutable collections
- The two types of collections in Scala
- Mutable and Immutable collections
- Understanding lists and arrays in Scala
- The list buffer and array buffer
- Queue in Scala
- Double-ended queue
- Tuples in Scala.
Use Case bobsrockets package
- Introduction to Scala packages and imports
- The selective imports
- The Scala test classes
- Introduction to JUnit test class
- JUnit interface via JUnit 3 suite for Scala test
- The packaging of Scala applications in Directory Structure
- Example of Spark Split and Spark Scala.
Apache Spark (Programming Language on Demand)
Writing Spark Applications using Scala
Spark framework comparing Scala
- Detailed Apache Spark
- Various features
- Comparing with Hadoop
- Various Spark components,
- Combining HDFS with Spark
RDD in Spark using Scala
- The RDD operation in Spark
- The Spark transformations, actions, data loading,
- Comparing with MapReduce
- Key Value Pair.
Data Frames and Spark SQL using Scala
- The detailed Spark SQL
- The significance of SQL in Spark for working with structured data processing
- Spark SQL JSON support
- Working with XML data, and parquet files
- Creating HiveContext,
- Writing Data Frame to Hive
- Reading of JDBC files
- The importance of Data Frames in Spark
- Creating Data Frames
- Schema manual inferring
- Working with CSV files
- Reading of JDBC tables
- Converting from Data Frame to JDBC
- The user-defined functions in Spark SQL
- Shared variable and accumulators
- How to query and transform data in Data Frames
- How Data Frame provides the benefits of both Spark RDD and Spark SQL
- Deploying Hive on Spark as the execution engine.
Spark Streaming using Scala
- Introduction to Spark streaming
- The architecture of Spark Streaming
- Working with the Spark streaming program
- Processing data using Spark streaming
- Requesting count and Dstream
- Multi-batch and sliding window operations
- Working with advanced data sources.
Scala Training Outcome
- The participant will be able to do Big data analytics using Spark and Scala