Contact us for pricing and additional information
In this course, participants learn about cloud-based big data solutions such as Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis, and the rest of the AWS big data platform. Participants learn how to use Amazon EMR to process data using the broad ecosystem of Apache Hadoop tools like Hive and Hue. Additionally, this course teaches participants how to create big data environments, work with Amazon DynamoDB, Amazon Redshift, and Amazon Kinesis, and leverage best practices to design big data environments for security and cost-effectiveness. Participants should have experience with the AWS environment and a basic understanding of data warehousing.
Objectives:
Data ingestion, transfer, and compression;
AWS data storage options;
Using DynamoDB with Amazon EMR;
Using Kinesis for near real-time Big Data processing;
Understanding of Apache Hadoop and Amazon EMR;
Using Amazon Elastic MapReduce;
The Hadoop Ecosystem;
Using Hive for advertising analytics;
Using Streaming for Life Sciences analytics;
Using Hue with Amazon EMR;
Running Pig Scripts with Hue on Amazon EMR;
Running Spark and Spark SQL interactively on Amazon EMR
Using Spark and Spark SQL for in-memory analytics;
Managing Amazon EMR costs;
Securing your Amazon EMR deployments;
Data warehouses and columnar datastores;
Understanding of Amazon Redshift;
Optimizing your Amazon Redshift environment;
The Big Data ecosystem on AWS;
Visualizing and orchestrating Big Data; and
Using Tibco Spotfire to visualize Big Data.