+91-9686502645

Big Data / Data Science

Spark Development with Python


Python Training with Apache Spark is a general purpose engine for large-scale data processing. It supports rapid application development for big data and allows for code resume across batch, interactive and streaming applications. The most popular use cases for Apache Spark include building data pipeline and developing machine learning models.

Python comes with a number of modules for interacting with the operating system, searching text with regular expressions, accessing the Internet, and just about anything else you could think of. Since it’s a dynamic, interpreted language, you don’t have to declare variables or deal with memory management bugs or this reason, Python appeals to experienced programmers as well as beginners. Google uses it extensively.

 

TARGET AUDIENCE

  • Fresher
  • Data Engineer
  • Software developers
  • ETL developers

Course Content:

  • Python Programming

Python Training in Bangalore

  • What is Python Language and features
  • Why Python and why it is different from other languages
  • Installation of Python, Anaconda Python distribution for Windows, Mac, Linux.
  • Run a sample python script, working with Python IDE’s.
  • Running basic python commands – Data types, Variables, Keywords, etc

 

Basic constructs of Python language

  • Indentation(Tabs and Spaces) and Code Comments (Pound # character)
  • Variables and Names
  • Built-in Data Types in Python – Numeric: int, float, complex
  • Containers: list, tuple, set, dict – Text Sequence: Str (String)
  • Others: Modules, Classes, Instances, Exceptions, Null Object, Ellipsis Object – Constants: False, True, None, NotImplemented, Ellipsis, __debug__
  • Basic Operators: Arithmetic, Comparison, Assignment, Logical, Bitwise, Membership, Identity
  • Slicing and The Slice Operator [n:m]
  • Control and Loop Statements: if, for, while, range(), break, continue, else

Writing Object-Oriented Program in Python and connecting with Database

  • Classes – classes and objects, access modifiers, instance and class members
  • OOPS, paradigm – Inheritance, Polymorphism and Encapsulation in Python. Functions: Parameters and Return Types
  • Lambda Expressions, Making a connection with Database for pulling data.

File Handling, Exception Handling in Python

  • Open a File, Read from a File, Write into a File
  • Resetting the current position in a File
  • The Pickle (Serialize and Deserialize Python Objects)
  • The Shelve (Overcome the limitation of Pickle)
  • What is an Exception
  • Raising an Exception
  • Catching an Exception

 

Apache Spark (Programming Language on Demand)

 

Writing Spark Applications using python

Spark framework comparing python

  • Detailed Apache Spark, its various features, comparing with Hadoop, the various Spark components, combining HDFS with Spark, Scalding

RDD in Spark using python

  • The RDD operation in Spark, the Spark transformations, actions, data loading, comparing with MapReduce, Key Value Pair.

Data Frames and Spark SQL using python

  • The detailed Spark SQL
  • The significance of SQL in Spark for working with structured data processing
  • Spark SQL JSON support
  • Working with XML data, and parquet files
  • Creating HiveContext
  • Writing Data Frame to Hive
  • Reading of JDBC files
  • The importance of Data Frames in Spark
  • Creating Data Frames
  • Schema manual inferring
  • Working with CSV files
  • Reading of JDBC tables
  • Converting from Data Frame to JDBC
  • The user-defined functions in Spark SQL, s
  • Shared variable and accumulators
  • How to query and transform data in Data Frames
  • How Data Frame provides the benefits of both Spark RDD and Spark SQL
  • Deploying Hive on Spark as the execution engine.
  • Spark Streaming using python
  • Introduction to Spark streaming
  • The architecture of Spark Streaming
  • Working with the Spark streaming program
  • Processing data using Spark streaming
  • Requesting count and Dstream
  • Multi-batch and sliding window operations
  • Working with advanced data sources.

Scroll to Top