Big Data Concept + Tools + Techniques 2022 - Apache Hadoop - Data analyse - Cloud Computing - Amazon Web Services (AWS) - Nosql - Datawarehouse
Beschrijving
Big Data Concept + Tools + Techniques.
In the modern world, data is being generated at an exponential rate. Business data generation is increasing at a similarly rapid rate. Only a small percentage of business data is structured data in rows and columns of databases. This data proliferation requires a rethinking of traditional techniques for capture, storage, and processing. Big data is a term that describes data sets so big they can’t be managed with traditional database systems. Big Data is also a collection of tools and techniques aimed at solving these problems.
Learning Kits are structured learning paths, mainly within the Emerging Tech area. A Learning Kit keeps
the student working to…
Veelgestelde vragen
Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.
Big Data Concept + Tools + Techniques.
In the modern world, data is being generated at an exponential rate. Business data generation is increasing at a similarly rapid rate. Only a small percentage of business data is structured data in rows and columns of databases. This data proliferation requires a rethinking of traditional techniques for capture, storage, and processing. Big data is a term that describes data sets so big they can’t be managed with traditional database systems. Big Data is also a collection of tools and techniques aimed at solving these problems.
Learning Kits are structured learning paths, mainly within the
Emerging Tech area. A Learning Kit keeps
the student working toward an overall goal, helping them to achieve
your career aspirations. Each part takes the student step by step
through a diverse set of topic areas. Learning Kits are made up of
required tracks, which contain all of the learning resources
available such as Assessments (Final Exams), Mentor, Practice Labs
and of course E learning. And all resources with a 365 days access
from first activation.
This Learning Kit with more than 25 hours of learning is divided into three tracks:
Course content
Big Data Infrastructures
In this learning, the focus will be on big data concepts, non-relational data, and big data analytics.
Courses (7 hours +) The Big Data Technology Wave
Big Data in Perspective
Course: 17 Minutes
- Course Introduction
- Introducing Big Data
- The Biggest Wave Yet
- Emerging Technologies
Global Data
Course:14 Minutes
- Defining Big Data
- Key Terms for Data
- Sizing Big Data
The Key Contributors
Course: 10 Minutes
- The Original Key Contributors
- The Distro Companies
The Apache Software Foundation
Course: 10 Minutes
- Apache Software Foundation
- Apache Projects
- Other Apache Projects
- Other Open Source Projects
Big Data Stack
Course: 13 Minutes
- The Big Data Stack
- Big Data Components
- NoSQL Databases
Hadoop in Detail
Course: 31 Minutes
- Distributed Computing
- Design Principles of Hadoop
- Functional View of Hadoop
- HDFS in Action
- Yarn in Action
- MapReduce in Action
- Spark in Action
Practice: Big Data elements and functions
Course: 15 Minutes
- Exercise: Working with Big Data Elements
Big Data Opportunities and Challenges
Big Data Teams
Course: 28 Minutes
- Course Introduction
- The Big Data Team
- Business Team Members
- Analytics Team Members
- Data Solutions Team Members
- Cluster Team Members
- Big Data Impacting IT
Big Data Projects
Course: 25 Minutes
- DIY Supercomputing
- Hadoop in the Clouds
- Big Data and Data Warehouses
- Business Case for Big Data
- Big Data and RDBMS
- Data Center Projects
Big Data Use Cases
Course: 20 Minutes
- Data Analytics
- Big Data Engines
- Common Analytics Use Cases
- Big Data Impacting the Globe
Opportunities and Challenges
Course: 32 Minutes
- Global Increasing Digital Volume
- The Big Companies
- Big Data Opportunity
- Big Data Challenges
- Challenges of Security and Privacy
- Planning for Big Data
- Big Data Impacting Business
- Practice: Challenges and Opportunities of Big Data
- Exercise: Challenges and Opportunities of Big Data
Big Data Concepts: Getting to Know Big Data
Course: 43 Minutes
- Course Overview
- What Is Big Data?
- Sources of Big Data
- Characteristics of Big Data
- Structured and Unstructured Data
- Big Data Analytics
- Advantages of Big Data Analytics
- Big Data Analytics: Domain Use Cases
- Big Data Analytics: Netflix Use Case
- Big Data Analytics: Amazon Use Case
- Major Challenges in Big Data
- Course Summary
Big Data Concepts: Big Data Essentials
Course: 46 Minutes
- Course Overview
- Raw Data and Big Data
- Data Warehousing and Big Data
- Big Data Computing Systems
- Horizontal and Vertical Scaling
- Features, Benefits, and Use Cases of Hadoop
- Hadoop: Components
- Hadoop: Migration to the Cloud
- Hadoop and Cloud Computing
- Features of Big Data Storage Systems
- In-memory Storage Systems
- Course Summary
Non-relational Data: Non-relational Databases
Course: 52 Minutes
- Course Overview
- Non-relational Databases
- The NoSQL Approach
- Benefits of NoSQL
- Document Databases
- Key-value Data Stores
- Graph Databases
- Columnar Databases
- HBase Architecture
- Multi-model Databases
- Next Generation NewSQL Databases
- Course Summary
Big Data Analytics: Techniques for Big Data Analytics
Course: 39 Minutes
- Course Overview
- Big Data Analytics Challenges
- Big Data Analytics Stack Layers
- Big Data Ingestion
- The Data Processing Layer
- The Data Storage Layer
- Pillars of Big Data Architecture
- Batch Processing and Big Data
- Stream Processing and Big Data
- Lambda Architecture and Use Cases
- Kappa Architecture
- Course Summary
Big Data Analytics: Spark for High-speed Big Data Analytics
Course: 51 Minutes
- Course Overview
- The Core Characteristics of Apache Spark
- Components of the Apache Spark Architecture
- Apache Spark Use Case: Uber Using Spark
- Apache Spark Use Case: Alibaba Using Spark
- Apache Spark Use Case: The Healthcare Industry
- Apache Spark vs. Hadoop
- Top Apache Spark Use Cases
- Apache Spark's Main Features
- Apache Spark Performance Optimization Techniques
- Apache Spark Best Practices
- Course Summary
Harnessing Data Volume & Velocity: Big Data to Smart Data
Course: 39 Minutes
- Course Overview
- Comparing Big Data and Smart Data
- Smart Data and Edge Technologies
- Big Data to Smart Data Formation
- Smart Data and Smart Processes
- Smart Data Use Cases
- Smart Data Life Cycle
- Big Data to Smart Data Using k-NN
- Smart Data Frameworks
- Smart Data to Business
- Clustering Smart Data
- Smart Data Integration
- Exercise: Transform Big Data to Smart Data
Securing Big Data Streams
Course: 1 Hour, 3 Minutes
- Course Overview
- Big Data Security Concerns
- Streaming Data Security Concerns
- NoSQL Database Security Concerns
- Distributed Processing Security Risks
- Data Mining and Analytics Privacy Flaws
- End-Point Device Tampering Risks
- Secure Big Data
- Secure Data Streams
- Secure Data In Motion
- End-Point Input Validation and Filtering
- Secure Data at Rest with Symmetric Ciphers
- Exercise: Securing Big Data Streams
Assessment:
- Big Data Infrastructures
Emerging New Age Architectures
In this learning, the focus will be on cloud data platforms, data lakes, and modern warehouses.
Courses (5 hours +)
Cloud Data Platforms: Cloud Computing
Course: 52 Minutes
- Course Overview
- Cloud Computing and Its Characteristics
- Cloud Computing: Use Cases and Benefits
- Cloud Computing Services: Storage and Compute Power
- Types of Cloud Compute Power
- Types of Cloud Storage
- Cloud Computing Models: PaaS, IaaS, SaaS, and FaaS
- Cloud Computing Model Comparison
- Components of Cloud Computing Architectures
- Cloud Service Provider Comparison
- Cloud Elasticity and Scalability
- Course Summary
Cloud Data Platforms: Cloud-based Applications & Storage
Course: 53 Minutes
- Course Overview
- Deploying Applications on Cloud Platforms
- Characteristics of Cloud-ready Applications
- Types of Cloud Deployment Models
- Cloud Deployment Tools
- Considerations for Cloud Application Deployment
- CPU Virtualization, Memory, and I/O Devices
- Cloud Storage Platforms
- Cloud Storage Technologies
- HDFS and Amazon S
- Types of Data Centers
- Course Summary
Cloud Data Platforms: AWS, Azure, & GCP Comparison
Course: 56 Minutes
- Course Overview
- Cloud Data Platforms: Amazon Web Services
- Cloud Data Platforms: Microsoft Azure
- Cloud Data Platforms: Google Cloud Platform
- Cloud Analytics
- Popular Cloud Analytics Tools
- Cloud Computing Challenges: Security
- Cloud Computing Challenges: Compliance
- Cloud Computing Challenges: Cost Management
- Cloud Computing Challenges: Governance
- Future of Cloud Computing
- Course Summary
Data Lakes and Modern Data Warehouses: Data Lakes
Course: 1 Hour, 19 Minutes
- Course Overview
- Data Lake Evolution
- Modern Data Lake Architecture
- Data Lakes: Key Concepts
- Data Lake Maturity Stages
- Data Swamps
- Data Lake Platforms
- Data Lake Platforms
- Governed Data Lakes
- Data Lakes: Risks and Challenges
- Data Lakes vs. Data Warehouses
- Course Summary
Data Lakes and Modern Data Warehouses: Modern Data Warehouses
Course: 1 Hour, 10 Minutes
- Course Overview
- Data Warehouses and Its Characteristics
- Modern Data Warehouses: Key Concepts and Stages
- Amazon Redshift
- Google BigQuery
- Modern Data Warehouses: Architecture and Processes
- Modern Data Warehouses: Techniques
- Data Warehouse Solutions: Batch Processing
- Data Warehouse Solutions: Real-time Processing
- Data Warehouse Solutions: Streaming Analytics
- Hybrid Modern Data Warehouse
- Course Summary
Data Lakes and Modern Data Warehouses: Azure Databricks & Data Pipelines
Course: 1 Hour, 2 Minutes
- Course Overview
- Azure Databricks: Features and Architecture
- Azure Databricks: Pros and Cons
- Snowflake Data Warehouses: Features and Architecture
- Snowflake Data Warehouses: Pros and Cons
- Data Pipelines
- Components of a Data Pipeline
- Advantages of a Data Pipeline
- Types of Data Pipeline Tools
- Comparing Data Pipeline Tools
- Building a Data Pipeline
- Course Summary
Assessment:
- Emerging New Age Architectures
Apache Spark
Explore the basics of Apache Spark, an analytics engine used for big data processing.
Courses
Accessing Data with Spark (3 hours+)
Accessing Data with Spark: An Introduction to Spark
Course: 1 Hour, 7 Minutes
- Course Overview
- Introduction to Spark and Hadoop
- Resilient Distributed Datasets (RDDs)
- RDD Operations
- Spark DataFrames
- Spark Architecture
- Spark Installation
- Working with RDDs
- Creating DataFrames from RDDs
- Contents of a DataFrame
- The SQLContext
- The map() Function of an RDD
- Accessing the Contents of a DataFrame
- DataFrames in Spark and Pandas
- Exercise: Working with Spark
Accessing Data with Spark: Data Analysis Using the Spark DataFrame API
Course: 1 Hour, 12 Minutes
- Course Overview
- Performance Improvements in Spark
- Broadcast Variables and Accumulators
- Loading Data into a DataFrame
- Sampling the Contents of a DataFrame
- Grouping and Aggregations
- Visualizing Data in a DataFrame
- Trimming and Cleaning Data
- User-Defined Functions and DataFrames
- Combining Filters, Aggregations, and Sorting
- Using Broadcast Variables
- Using Accumulators
- Exporting DataFrame Contents
- Custom Accumulators
- Join Operations
- Exercise: Data Analysis Using the DataFrame API
Accessing Data with Spark: Data Analysis using Spark SQL
Course: 55 Minutes
- Course Overview
- The Spark Catalyst Optimizer
- Introduction to Spark SQL
- Preparing Data for Analysis
- Running SQL Queries
- Inferred and Explicit Schemas
- Windowing in Spark
- Applying Window Functions
- Exercise: Data Analysis Using Spark SQL
Big Data Development with Apache Spark (5 hours+)
Introduction to Apache Spark
Course: 1 Hour, 2 Minutes
- Course Introduction
- Overview of Apache Spark
- Downloading and Installing Apache Spark
- Downloading and Installing Apache Spark on Mac OS
- Building Spark
- Working with Spark Shell
- Linking to Spark
- Spark Configuration
- Initializing Apache Spark
- Running Spark on Clusters
Apache Spark SQL
Course: 1 Hour, 10 Minutes
- Course Introduction
- Apache Spark SQL Overview
- SparkSession
- DataFrames
- Aggregations
- SQL Queries
- Temporary View
- Datasets
- JSON Datasets
- Load/Save Functions
- Specifying a Data Source
- Querying with SQL
- SaveMode
- Parquet Files
- Persistent Tables
- Partitioning
Structured Streaming
Course: 1 Hour, 13 Minutes
- Course Introduction
- Structured Streaming Overview
- Stream Input
- Stream Output
- Windowing
- Continuous Applications
- Deduplication
- File Sinks
- Streaming Query
- Streaming Query Manager
- Checkpointing
- Word Count
Spark Monitoring and Tuning
Course: 59 Minutes
Monitoring Spark Applications
Course: 17 Minutes
- Course Introduction
- Web UI
- Environment Configuration
- REST API
- Memory Allocation
Tuning Spark Applications
Course: 38 Minutes
- Speculation
- Serialization
- Memory Tuning
- Executor Memory
- Garbage Collection Tuning
- Parallelism
- Broadcast Functionality
- Explain Query Execution
- Data Compression
Practice: Monitoring Spark Applications
Course: 4 Minutes
- Exercise: Monitor Spark Applications4
Spark Security
Course: 36 Minutes
- Course Introduction
- Spark UI
- Secure Event Logs
- SSL Settings
- Shared Secret
- YARN Deployments
- SASL Encryption
- Network Security
Practice: Configuring Spark Security
Course: 3 Minutes
- Exercise: Configure Spark Security
Practice Lab:
Developing with Apache Spark (5 hours)
Practice developing with Apache Spark by performing tasks with
Spark SQL, Spark Streaming, and GraphX. Then create a
classification system using MLib and work with MLib
Regression.
Apache Hadoop
Apache Hadoop is an open-source framework for the storage and processing of big data.
Courses
Getting Started with Hadoop (5 hours+)
Introduction to Apache Spark
Course: 1 Hour, 2 Minutes
- Course Introduction
- Overview of Apache Spark
- Downloading and Installing Apache Spark
- Downloading and Installing Apache Spark on Mac OS
- Building Spark
- Working with Spark Shell
- Linking to Spark
- Spark Configuration
- Initializing Apache Spark
- Running Spark on Clusters
Apache Spark SQL
Course: 1 Hour, 10 Minutes
- Course Introduction
- Apache Spark SQL Overview
- SparkSession
- DataFrames
- Aggregations
- SQL Queries
- Temporary View
- Datasets
- JSON Datasets
- Load/Save Functions
- Specifying a Data Source
- Querying with SQL
- SaveMode
- Parquet Files
- Persistent Tables
- Partitioning
Structured Streaming
Course: 1 Hour, 13 Minutes
- Course Introduction
- Structured Streaming Overview
- Stream Input
- Stream Output
- Windowing
- Continuous Applications
- Deduplication
- File Sinks
- Streaming Query
- Streaming Query Manager
- Checkpointing
- Word Count
Spark Monitoring and Tuning
Course: 59 Minutes
Monitoring Spark Applications
Course: 17 Minutes
- Course Introduction
- Web UI
- Environment Configuration
- REST API
- Memory Allocation
Tuning Spark Applications
Course: 38 Minutes
- Speculation
- Serialization
- Memory Tuning
- Executor Memory
- Garbage Collection Tuning
- Parallelism
- Broadcast Functionality
- Explain Query Execution
- Data Compression
Practice: Monitoring Spark Applications
Course: 4 Minutes
- Exercise: Monitor Spark Applications
Spark Security
Course: 36 Minutes
- Course Introduction
- Spark UI
- Secure Event Logs
- SSL Settings
- Shared Secret
- YARN Deployments
- SASL Encryption
- Network Security
Practice: Configuring Spark Security
Course: 3 Minutes
- Exercise: Configure Spark Security
Working with Hadoop HDFS (3 hours+)
Hadoop HDFS: Introduction
Course: 1 Hour, 15 Minutes
- Course Overview
- Scaling Datasets
- Horizontal Scaling for Big Data
- Distributed Clusters and Horizontal Scaling
- Overview of HDFS
- HDFS Architectures
- MapReduce for HDFS
- YARN for HDFS
- The Mechanism of Resource Allocation in Hadoop
- Apache Zookeeper for HDFS
- The Hadoop Ecosystem
- Exercise: An Introduction to HDFS
Hadoop HDFS: Introduction to the Shell
Course: 53 Minutes
- Course Overview
- Creating a Hadoop Cluster on the Google Cloud
- Exploring Hadoop Clusters
- The YARN Cluster Manager UI
- The HDFS NameNode UIs
- Browsing the Packaged Hadoop Tools
- Configuring HDFS
- The HDFS Shells
- Exercise: Introduction to the HDFS Shell
Hadoop HDFS: Working with Files
Course: 48 Minutes
- Course Overview
Basic Directory Commands in HDFS - Using the copyFromLocal Command in HDFS
- Using the put Command in HDFS
- Using the copyToLocal Command in HDFS
- Retrieving files from HDFS
- Append and Delete Operations in HDFS
- Exercise: Working with Files on HDFS
Hadoop HDFS: File Permissions
Course: 49 Minutes
- Course Overview
- The HDFS count and du Commands
- Viewing and Setting File Permissions in HDFS
- Applying Permissions Recursively in HDFS
- An Introduction to Bash Scripting
- Scripting HDFS Operations
- Exploring the HDFS NameNode UI
- Cleanup Operations in HDFS
Data Warehousing with Hadoop (4 hours+)
Data Warehousing with Hadoop: Managing Big Data Using HDInsight Hadoop
Course: 1 Hour, 6 Minutes
- Features of HDInsight
- Fundamentals and Types of Clusters in HDInsight
- Essential Opensource Components of HDInsight
- Setting Up Hadoop Clusters on Azure HDInsight
- HDInsight Clusters with Resource Manager Template
- HDInsight Services and Storage Types
- Azure Management Console
- Creating and Managing HDInsight Clusters
- Setting Up HDInsight Emulator
- Programming in HDInsight
- Developing and Executing MapReduce Program
- Exercise: Working with HDInsight and MapReduce
Data Warehousing with Hadoop: Microsoft Analytics Platform System and Hive
Course: 1 Hour, 29 Minutes
- Microsoft Analytics Platform System
- Understanding PolyBase
- Parallel Data Warehouse Architecture
- Data Exploration Architectures
- Hive Introduction
- Hive Architecture in HDInsight
- Setting up the Development Environment for Hive
- Connect and Submit Queries
- Hive QL
- Using Azure PowerShell and Beeline
- Creating a Database and Tables and Loading Data
- Partition Tables and Data Formats
- Hue Installation and Hive Query Management
- Using Microsoft BI and Hive
- Hive as ETL
- HBase and Hive
- Exercise: Creating and Loading Data into Hive Tables
Data Warehousing with Hadoop: HDInsight and Retail Sales Implementation Using Hive
Course: 46 Minutes
- Data Modeling
- Dimensional Design Process
- Dimensional Design Steps
- Retail Business Use Cases
- Dimension Tables
- Fact tables
- Data Loading in Dimension and Fact Tables
- Essential Queries
- Creating and Executing Queries
- Hive and Power BI for Visualization
Data Warehousing with Hadoop: Spark, HDInsight and Cluster Management
Course: 56 Minutes
- Spark Introduction
- Data Representation in Spark
- Create Spark Clusters Using PowerShell
- Spark SQL and Hive
- Spark SQL Data Sources and DataFrames
- Customizing HDInsight Cluster
- Application Installation on HDInsight
- Ambari User Management
- HDInsight Management Using Azure CLI
- Troubleshooting HDInsight
- Monitoring HDInsight Hadoop
- Exercise: Working with Spark and Ambari
Specificaties
Taal: Engels
Kwalificaties van de Instructeur:
Gecertificeerd
Cursusformaat en Lengte: Lesvideo's met
ondertiteling, interactieve elementen en opdrachten en testen
Lesduur: 25 uur
Assesments: De assessment test uw kennis en
toepassingsvaardigheden van de onderwerpen uit het leertraject.
Deze is 365 dagen beschikbaar na activering.
Online Virtuele labs: Ontvang 12 maanden toegang
tot virtuele labs die overeenkomen met de traditionele
cursusconfiguratie. Actief voor 365 dagen na activering,
beschikbaarheid varieert per Training.
Online mentor: U heeft 24/7 toegang tot een online
mentor voor al uw specifieke technische vragen over het
studieonderwerp. De online mentor is 365 dagen beschikbaar na
activering, afhankelijk van de gekozen Learning Kit.
Voortgangsbewaking: Ja
Toegang tot Materiaal: 365 dagen
Technische Vereisten: Computer of mobiel apparaat,
Stabiele internetverbindingen Webbrowserzoals Chrome, Firefox,
Safari of Edge.
Support of Ondersteuning: Helpdesk en online
kennisbank 24/7
Certificering: Certificaat van deelname in PDF
formaat
Prijs en Kosten: Cursusprijs zonder extra
kosten
Annuleringsbeleid en Geld-Terug-Garantie: Wij
beoordelen dit per situatie
Award Winning E-learning: Ja
Tip! Zorg voor een rustige leeromgeving, tijd en
motivatie, audioapparatuur zoals een koptelefoon of luidsprekers
voor audio, accountinformatie zoals inloggegevens voor toegang tot
het e-learning platform.
Verrijk Uw Carrière met OEM's ICT Trainingen
Waarom kiezen voor
OEM?
Ervaring: Meer dan 20 jaar
expertise in ICT-trainingen.
Uitgebreide Selectie: Meer dan 1000 cursussen van
200 topmerken.
Hoge Tevredenheid: Beoordeeld met een 9.0 op
Springest.
Kwaliteitsgarantie: Gecertificeerde docenten en
award-winning E-learning.
Partnerschappen: Microsoft Partner, EC-Council
Partner, Certiport en Pearson VUE.
Blijf op de hoogte van nieuwe ervaringen
Deel je ervaring
Heb je ervaring met deze cursus? Deel je ervaring en help anderen kiezen. Als dank voor de moeite doneert Springest € 1,- aan Stichting Edukans.Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.