Azure Databricks & Spark For Data Engineers (PySpark / SQL)

Azure Databricks & Spark For Data Engineers (PySpark / SQL)

 Azure Databricks & Spark For Data Engineers (PySpark / SQL)

Real World Project on Formula1 Racing using Azure Databricks, Delta Lake, Unity Catalog, Azure Data Factory [DP203]


Preview This Course - GET COUPON CODE


Major updates to the course since the launch


May 2023 - New sections 25, 26 and 27 added to include Unity Catalog. Unity Catalog is a recent addition to Databricks which offers unified data governance solution for a Data Lakehouse. These sections cover all aspects of Unity Catalog and the implementation using a project.


March 2023 - New sections 6 and 7 added. Section 8 Updated. These changes are to reflect latest Databricks recommendations around accessing Azure Data Lake. Also, this provides a better solution to complete the course project for students using Azure Student Subscription or Corporate Subscriptions with limited access to Azure Active Directory.


December 2022 - Sections 3, 4 & 5 updated to reflect recent UI changes to Azure Databricks. Also included lessons on additional functionality included by Databricks recently to Databricks clusters. .




Welcome!


I am looking forward to helping you with learning one of the in-demand data engineering tools in the cloud, Azure Databricks! This course has been taught with implementing a data engineering solution using Azure Databricks and Spark core for a real world project of analysing and reporting on Formula1 motor racing data.


This is like no other course in Udemy for Azure Databricks. Once you have completed the course including all the assignments, I strongly believe that you will be in a position to start a real world data engineering project on your own and also proficient on Azure Databricks. I have also included lessons on Azure Data Lake Storage Gen2, Azure Data Factory as well as PowerBI. The primary focus of the course is Azure Databricks and Spark core, but it also covers the relevant concepts and connectivity to the other technologies mentioned. Please note that the course doesn't cover other aspects of Spark such as Spark streaming and Spark ML. Also the course has been taught using PySpark as well as Spark SQL; It doesn't cover Scala or Java.


The course follows a logical progression of a real world project implementation with technical concepts being explained and the Databricks notebooks being built at the same time. Even though this course is not specifically designed to teach you the skills required for passing the Azure Data Engineer Associate Certification Exam DP203, it can greatly help you get most of the necessary skills required for the exam.


I value your time as much as I do mine. So, I have designed this course to be fast-paced and to the point. Also, the course has been taught with simple English and no jargons. I start the course from basics and by the end of the course you will be proficient in the technologies used.


Currently the course teaches you the following


Azure Databricks


Building a solution architecture for a data engineering solution using Azure Databricks, Azure Data Lake Gen2, Azure Data Factory and Power BI


Creating and using Azure Databricks service and the architecture of Databricks within Azure


Working with Databricks notebooks as well as using Databricks utilities, magic commands etc


Passing parameters between notebooks as well as creating notebook workflows


Creating, configuring and monitoring Databricks clusters, cluster pools and jobs


Mounting Azure Storage in Databricks using secrets stored in Azure Key Vault


Working with Databricks Tables, Databricks File System (DBFS) etc


Using Delta Lake to implement a solution using Lakehouse architecture


Creating dashboards to visualise the outputs


Connecting to the Azure Databricks tables from PowerBI


Spark (Only PySpark and SQL)


Spark architecture, Data Sources API and Dataframe API


PySpark - Ingestion of CSV, simple and complex JSON files into the data lake as parquet files/ tables.


PySpark - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.


PySpark - Creating local and temporary views


Spark SQL - Creating databases, tables and views


Spark SQL - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.


Spark SQL - Creating local and temporary views


Implementing full refresh and incremental load patterns using partitions


Delta Lake


Emergence of Data Lakehouse architecture and the role of delta lake.


Read, Write, Update, Delete and Merge to delta lake using both PySpark as well as SQL 


History, Time Travel and Vacuum


Converting Parquet files to Delta files


Implementing incremental load pattern using delta lake


Unity Catalog


Overview of Data Governance and Unity Catalog


Create Unity Catalog Metastore and enable a Databricks workspace with Unity Catalog


Overview of 3 level namespace and creating Unity Catalog objects


Configuring and accessing external data lakes via Unity Catalog


Development of mini project using unity catalog and seeing the key data governance capabilities offered by Unity Catalog such as Data Discovery, Data Audit, Data Lineage and Data Access Control.


Azure Data Factory


Creating pipelines to execute Databricks notebooks


Designing robust pipelines to deal with unexpected scenarios such as missing files


Creating dependencies between activities as well as pipelines


Scheduling the pipelines using data factory triggers to execute at regular intervals


Monitor the triggers/ pipelines to check for errors/ outputs.




Who this course is for:

  • University students looking for a career in Data Engineering
  • IT developers working on other disciplines trying to move to Data Engineering
  • Data Engineers/ Data Warehouse Developers currently working on on-premises technologies, or other cloud platforms such as AWS or GCP who want to learn Azure Data Technologies
  • Data Architects looking to gain an understanding about Azure Data Engineering stack


Post a Comment for " Azure Databricks & Spark For Data Engineers (PySpark / SQL)"