- AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale
- As a data receiver, we can track and manage all of our data grants and AWS Marketplace data subscriptions in one place
- For data senders, AWS Data Exchange eliminates the need to build and maintain any data delivery and entitlement infrastructure
- AWS Data Pipeline is a web service that we can use to automate the movement and transformation of data
- We you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks
- Components of a data pipeline:
- Pipeline definition: specifies the business logic of our data management
- Pipeline: schedules and runs tasks by creating Amazon EC2 instances to perform the defined work activities
- Task Runners: polls for tasks and then performs those tasks. For example, Task Runner could copy log files to Amazon S3 and launch Amazon EMR clusters. Task Runner is installed and runs automatically on resources created by your pipeline definitions
- Use case examples:
- We can use AWS Data Pipeline to archive your web server's logs to Amazon Simple Storage Service (Amazon S3) each day and then run a weekly Amazon EMR (Amazon EMR) cluster over those logs to generate traffic reports
- AWS Lake Formation is a service that makes it easy to set up a secure data lake in days
- Creating a data lake with Lake Formation is as simple as defining data sources and what data access and security policies we want to apply
- Helps us collect and catalog data from databases and object storage, move the data into new Amazon S3 data lake, clean and classify data using machine learning algorithms, and secure access to sensitive data