Data Engineering on Microsoft Azure

Tijdsduur: 5 dagen

Data Engineering on Microsoft Azure

Business Training N.V.

Nu inschrijven

Tip: meer info over het programma, prijs, en inschrijven? Download de brochure!

Nu inschrijven Gratis brochure aanvragen

Startdata en plaatsen

Er zijn nog geen startdata bekend voor dit product.

Vraag informatie aan over startdata.
Bekijk gerelateerde producten mét startdata: Data engineer en Microsoft Azure.

Beschrijving

In this training the modern data warehouse approach to handling any volume of both cloud based as well as on-prem data is explained in detail. First students see how to setup an Azure Data Lake and inject data with Azure Data Factory. Then students learn how to cleanse the data and prepare it for analysis with Azure Synapse Analytics and Azure DataBricks.

Who should attend this course?

This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud.

Prerequisites

Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

The modern data warehouse

The cl…

Lees de volledige beschrijving

Veelgestelde vragen

Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.

Nog niet gevonden wat je zocht? Bekijk deze onderwerpen: Data engineer, Microsoft Azure, Data storage, Data Vault en Data Analyse.

Who should attend this course?

This course focusses on developers and administrators who are considering migrating existing data solutions to the Microsoft Azure cloud.

Prerequisites

Some familiarity with relational database systems such as SQL Server is handy. Prior knowledge of Azure is not required.

The modern data warehouse

The cloud requires to reconsider some of the choices made for on-premisses data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premisses data stack. It also provides a brief intro in Azure and the use of the Azure portal.

From traditional to modern data warehouse
Lambda architecture
Overview of Big Data related Azure services
Getting started with the Azure Portal
LAB: Navigating the Azure Portal

Storing data in Azure

This module discusses the different types of storage available in Azure Storage as well as data lake storage. Also some of the tools to load and manage files in Azure storage and Data lake storage are covered.

Introduction Azure Blob Storage
Compare Azure Data Lake Storage Gen 2 with traditional blob storage
Tools for uploading data
Storage Explorer, AZCopy, PolyBase
LAB: Uploading data into Azure Storage

Introducing Azure Data Factory

When the data is stored and analysed on on-premisses you typically use ETL tools such as SQL Server Integration Services for this. But what if the data is stored in the Azure cloud? Then you can use Azure Data Factory, the cloud-based ETL service. First we need to get used to the terminology, then we can start creating the proper objects in the portal.

Data Factory V2 terminology
Setup a Data Factory with GIT support
Exploring the Data Factory portal
Creating Linked Services and Datasets
Copying data with the Data Factory wizard
LAB: Migrating data with Data Factory Wizard

Authoring pipelines in Azure Data Factory

This module dives into the process of building a Data Factory pipeline from scratch. The most common activities are illustrated. The module also focusses on how to work with variables and parameters to make the pipelines more dynamic.

Adding activities to the pipeline
Working with Expressions
Variables and Parameters
Debugging a pipeline
LAB: Authoring and debugging an ADF pipeline

Creating Data Flows in Data Factory

With Data flows data can be transformed without the need to learn about another tool (such as Databricks or Spark). Both Data flows as well as Wrangling Data Flows are covered.

From ELT to ETL
Creating Data Factory (Mapping) Data flows
Exploring Wrangling Data Flows
LAB: Transforming data with a Data flow

Data Factory Integration Runtimes

Data Factory needs integration runtimes to control where the code executes. This module walks you through the 3 types of Integration Runtimes: Azure, SSIS and self-hosted runtimes.

Integration runtime overview
Controling the Azure Integration Runtime
Setup self-hosted Integration Runtimes
Lift and shift SSIS packages in Data Factory

Deploying and monitoring Data Factory pipelines

Once development has finished the pipelines need to be deployed and scheduled for execution. Monitoring the deployed pipelines for failure, errors or just performance is another crucial topic discussed in this module.

Adding triggers to pipelines
Deploying pipelines
Monitoring pipeline executions
Restart failed pipelines
LAB: Monitoring pipeline runs

Azure SQL Database

An easy way to create a business intelligence solution in the cloud is by taking SQL Server — familiar to many Microsoft BI developers — and run it in the cloud. Backup and high availability happen automatically, and we can use nearly all the skills and tools we used on a local SQL Server on this cloud based solution as well.

Provisioning an Azure SQL Database
Migrating an on-premisses Data Warehouse to Azure SQL Database
Ingesting Azure Blob Storage data
Working with Columnstore Indexes
LAB: Using Azure SQL Databases

Azure Synapse Analytics

Azure Synapse Analytics is a suite of services aiming at loading, storing and querying large volumes of data. It allows both Spark as well as SQL users interacting with the data.

Overview of Azure Synapse Analytics
Provisioning an Azure Synapse Analytics Workspace
Getting started with Azure Synapse Studio
Ingesting data
Working with on-demand SQL Pools
Using notebooks on Spark Pools
LAB: Setting up an Azure Synapse Analytics account

Azure Synapse Analytics Provisioned SQL Pools (Azure Data Warehouse)

Azure SQL Databases have their limitations in compute power since they run on a single machine, and their size is limited to the Terabyte range. Provisioned SQL Pools in Azure Synapse Analytics (formerly known as Azure Data Warehouse) is a service aiming at an analytical workload on data volumes hundreds of times larger than what Azure SQL databases can handle. Yet at the same time we can keep on using the familiar T-SQL query language, or we can connect traditional applications such as Excel and Management Studio to interact with this service. Both storage and compute can be scaled independently.

Architecture of Provisioned SQL Pools
Loading data via PolyBase
CTAS and CETAS
Setting up table distributions
Indexing
Partitioning
Performance monitoring and tuning
LAB: Loading and querying data in Provisioned SQL Pools

Getting started with Azure Databricks

Azure Databricks allows us to use the power of Spark without the configuration hassle of Hadoop clusters. Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.

Introduction Azure Databricks
Cluster setup
Databricks Notebooks
Collaborative features in Databricks
LAB: Configuring an Azure Databricks account

Accessing data in Azure Databricks

There are many ways to access data in Azure Databricks: From uploading small files via the portal over ad-hoc connections up to mounting Azure Blob storage or data lakes. The files can also be treated as a table, providing easy access. Another point of attention in this module is dealing with malformed input data.

Uploading data
Connecting to Azure Storage and Data Warehouse
Mounting Azure Blob storage
Accessing data in an Azure Data Lake Gen 2
Dealing with malformed data
Processing Spark Dataframes in Python
Using Spark SQL
Working with Delta Lake
LAB: Processing data on an Azure Databricks cluster

Deploying an Azure Databricks solution

Once the Databricks solution has been tested it need to be scheduled for execution. This can be done either with jobs in Azure Databricks or via a Data Factory. In the latter case you need to be able to pass on variables from Data Factory into Databricks. Azure databricks widgets will make this possible.

Azure Databricks jobs
Working with Databricks Widgets
Calling Databricks Notebooks from within Azure Data Factory pipelines
LAB: Widgets in Azure DataBricks

Modeling data with Azure Analysis Services

Analysis Services is Microsoft’s OLAP (cube) technology. The latest version, Analysis Services Tabular, can also run as a database-as-a-service. This is ideal to load the cleaned, pre-processed data produced by other Azure services and cache it. This leads to faster reporting. But the data can also be enriched with KPIs, translations, derived measures etc.

Online Analytical Processing
Analysis Services Tabular
Creating a model on top of Azure Storage or Azure Data Warehouse
Model deployment
Processing
Model management
LAB: Deploying and querying Analysis Services models in the cloud

Azure Data Explorer

In between large volumes of historical, long lived data stored in a data lake, and streams of short living events processed with Azure Stream Analytics, lives the challenge of working with large volumes of semi-structured telemetry and log data, where the analysis can have a longer latency that with event processing, but requires more historical information than what event processing technology can handle. For this kind of data processing Azure Data Explorer is the ideal tool

Data Explorer architecture
Ingesting data in Data Explorer
Querying and visualizing data with Kusto
Accessing Data Explorer from Data Factory and Power BI

Real-time event processing with Azure Stream Analytics

Processing real-time events is the main goal of Azure Stream Analytics. In this module events are received from an Event hub input, processed by a SQL query and send into a destination.

Lambda architecture
Create Azure Stream Analytic jobs
Azure Event Hubs
Connecting inputs and outputs
Writing Stream Analytic queries
LAB: Processing live events with Stream Analytics

Blijf op de hoogte van nieuwe ervaringen

Er zijn nog geen ervaringen.

Vraag informatie aan over deze cursus. Je ontvangt vanaf dan ook een seintje wanneer iemand een ervaring deelt. Handige manier om jezelf eraan te herinneren dat je wilt blijven leren!
Bekijk gerelateerde producten mét ervaringen: Data engineer en Microsoft Azure.

Deel je ervaring

Heb je ervaring met deze cursus? Deel je ervaring en help anderen kiezen. Als dank voor de moeite doneert Springest € 1,- aan Stichting Edukans.

Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.

Data Engineering on Microsoft Azure

Data Engineering on Microsoft Azure

Download gratis en vrijblijvend de informatiebrochure

Heb je nog vragen?