Skip to content

Files

Latest commit

 

History

History
250 lines (178 loc) · 13.1 KB

README.md

File metadata and controls

250 lines (178 loc) · 13.1 KB

AWS DataSync Terraform Module

This repository contains Terraform code which creates resources required to run a DataSync task to sync data within AWS and from on premises to AWS or vise-versa.

AWS Datasync

AWS DataSync supports a wide variety of file and object storage systems on-premise and in AWS to facilitate data transfer.

For on-premises storage transfers : DataSync works with the following on-premises storage systems:

  • Network File System (NFS)
  • Server Message Block (SMB)
  • Hadoop Distributed File Systems (HDFS)
  • Object storage

For AWS storage transfers: DataSync works with the following AWS storage services:

  • Amazon S3
  • Amazon EFS
  • Amazon FSx for Windows File Server
  • Amazon FSx for Lustre
  • Amazon FSx for OpenZFS
  • Amazon FSx for NetApp ONTAP

The module requires a source DataSync location and destination Datasync location to be declared. The location types supported in the examples are S3 and EFS. For more details regarding the DataSync Locations S3 and EFS and their respective arguments can be found here and here.

Usage with DataSync Locations and Task Module

  • Link to EFS to S3 same account sync example for in-cloud transfers : efs-to-s3

S3 Location

module "s3_location" {
  source = "aws-ia/datasync/aws//modules/datasync-locations"
  s3_locations = [
    {
      name = "datasync-s3"
      s3_bucket_arn            = "terraform-s3-bucket-12345"
      subdirectory             = "/"
      create_role              = true
      s3_source_bucket_kms_arn = "aws_kms_key_arn"

      tags = { project = "datasync-module" }
    }
  ]
}

Note that the Datasync S3 locations module allows you to create a DataSync IAM role by setting create_role = true. This IAM role has the required S3 permissions allowing DataSync service to seamlessly access the S3 bucket.

EFS Location

module "efs_location" {
  source = "aws-ia/datasync/aws//modules/datasync-locations"
  efs_locations = [
    {
      name = "datasync-efs"
      # In this example a new EFS file system is created in efs.tf
      efs_file_system_arn            = "arn:aws:elasticfilesystem:us-east-1:123456789012:filesystem/fs-123456789"
      ec2_config_subnet_arn          = "arn:aws:ec2:us-east-1:123456789012:subnet/subnet-1234567890abcde"
      ec2_config_security_group_arns = [arn:aws:ec2:us-east-1:123456789012:security-group/sg-1234567890abcde]
      tags                           = { project = "datasync-module" }
    }
  ]

  # The mount target should exist before we create the EFS location
  depends_on = [aws_efs_mount_target.efs_subnet_mount_target]

}

The examples also includes "aws_kms_key" resource block to create a KMS key with a key policy that restricts the use of the key based on same account and cross account access requirements. Refer to this link for information.

Two locations, one as source and other as destination are required for the Datasync task configuration. Once the locations are configured, they need to be passed as source location arn and destination location arn to the next module for Datasync task configuration.For more details regarding the DataSync Task configuration and their respective arguments can be found here.

Example :

module "backup_tasks" {
  source = "aws-ia/datasync/aws//modules/datasync-task"
  datasync_tasks = [
    {
      name                     = "efs_to_s3"
      source_location_arn      = module.s3_location.s3_locations["datasync-s3"].arn
      destination_location_arn = module.efs_location.efs_locations["datasync-efs"].arn
        options = {
          posix_permissions = "NONE"
          uid               = "NONE"
          gid               = "NONE"
        }
      schedule_expression = "cron(0 6 ? * MON-FRI *)" # Run at 6:00 am (UTC) every Monday through Friday:
    }
  ]
}

Example with DataSync Locations and Task module in a Cross Account Use Case

AWS DataSync can transfer data between Amazon S3 buckets that belong to different AWS accounts. Here's what a cross-account transfer using DataSync can look like :

  • Source account: The AWS account for managing the S3 bucket that you need to transfer data from.
  • Destination account: The AWS account for managing the S3 bucket that you need to transfer data to.

With the launch of the S3 feature Amazon S3 Object Ownership, S3 bucket-level settings can be used to disable access control lists (ACLs) and take ownership of every object in your bucket. It is no longer necessary to configure a cross-account AWS DataSync task to ensure that the destination account owns all of the objects copied over to its S3 bucket. Now, you can just use S3 Object Ownership to ensure that your destination account automatically owns all of the objects copied over to its S3 bucket.

It's important that all the data that you transfer to the S3 bucket from another account belongs to your destination account. To ensure that this account owns the data, disable the bucket's access control lists (ACLs) prior to the data transfer.

This example creates the necessary DataSync resources, including DataSync locations (Source and Destination), Task, and associated IAM roles for S3 access in the source AWS account. The resources related to the destination location (target S3 bucket) are created in the Destination AWS account. It uses IAM policies and resource-based bucket policies to manage cross-account access to DataSync.

AWS provider is used to interact with the resources in the cross accounts.The AWS Provider can source credentials and other settings from the shared configuration and credentials files. By default, these files are located at $HOME/.aws/credentials on Linux and macOS, and %USERPROFILE%\.aws\config and %USERPROFILE%\.aws\credentials on Windows.

Providers are configured as environment variables as below with the corresponding profiles configured at ~/.aws/credentials. Here is a quick reference on how to configure a credential file and use AWS Provider.

Example of ~/.aws/credentials file :

[source-account]
aws_access_key_id = xxxxxxxxxx
aws_secret_access_key = xxxxxxxxxxxxx
[destination-account]
aws_access_key_id = xxxxxxxxxx
aws_secret_access_key = xxxxxxxxxxxxx

Environment variables:

variable "source_account_profile" {
  description = "The AWS Profile for Destination Account where all the DataSync resources will be created i.e., DataSync locations, Tasks and Executions"
  default     = "source-account"
}

variable "dest_account_profile" {
  description = "The AWS Profile for Source Account where resources needed for the source DataSync location configuration are created"
  default     = "destination-account"
}

Provider config block:

provider "aws" {
  alias   = "source-account"
  profile = var.source_account_profile
  region  = var.region
}

provider "aws" {
  alias   = "destination-account"
  profile = var.dest_account_profile
  region  = var.region
}

To configure DataSync for transferring data between accounts, you need to set up permissions in both the source and destination AWS accounts. In the source account, create an IAM role that allows DataSync to transfer data to the destination account's bucket. In the destination account, update the S3 bucket policy to grant access to the IAM role created in the source account.

Datasync Location and Task Modules are generic and do not have any cross account provider configuration. Therefore, IAM role that gives DataSync the permissions to transfer data to your destination account bucket must be created outside of the module and passed as parameter for source location configuration.

module "s3_dest_location" {
  source = "aws-ia/datasync/aws//modules/datasync-locations"
  s3_locations = [
    {
      name                             = "dest-bucket"
      s3_bucket_arn                    = "terraform-s3-dest-bucket-12345"
      s3_config_bucket_access_role_arn = aws_iam_role.datasync_dest_s3_access_role.arn
      subdirectory                     = "/"
      create_role                      = false
      tags = { project = "datasync-module" }
    }
  ]
  depends_on = [aws_s3_bucket_policy.allow_access_from_another_account]

}

By default create_role is set to false for the destination location as the IAM role is created outside the DataSync Locations Module.

The depends_on meta-argument ensures that terraform creates the destination Datasync location only after the destination account S3 bucket policy is updated to allowing the source account IAM role to transfer data to destination account bucket.

Note: Task creation would fail if the destination account's S3 bucket policy does not allow the source account's IAM role, as DataSync would verify read/write access to the source and destination S3 buckets before configuring the task.

Other usage examples with DataSync Locations and Task module

  • Link to S3 to S3 same account sync example for in-cloud transfers : s3-to-s3

Support & Feedback

DataSync module for Terraform is maintained by AWS Solution Architects. It is not part of an AWS service and support is provided best-effort by the AWS Storage community.

To post feedback, submit feature ideas, or report bugs, please use the Issues section of this GitHub repo.

If you are interested in contributing to the Storage Gateway module, see the Contribution guide.

Requirements

Name Version
terraform >= 1.0.7
aws >= 4.0.0, < 5.0.0
awscc >= 0.24.0

Providers

Name Version
aws >= 4.0.0, < 5.0.0

Modules

No modules.

Resources

Name Type
aws_datasync_location_efs.efs_location resource
aws_datasync_location_s3.s3_location resource
aws_iam_role.datasync_role_s3 resource

Inputs

Name Description Type Default Required
efs_locations A list of EFS locations and associated configuration
list(object({
name = string
access_point_arn = optional(string)
ec2_config_security_group_arns = list(string)
ec2_config_subnet_arn = string
efs_file_system_arn = string
file_system_access_role_arn = optional(string)
in_transit_encryption = optional(string)
subdirectory = optional(string)
tags = optional(map(string))
}))
[] no
s3_locations A list of S3 locations and associated configuration
list(object({
name = string
agent_arns = optional(list(string))
s3_bucket_arn = string
s3_config_bucket_access_role_arn = optional(string)
s3_storage_class = optional(string)
subdirectory = optional(string)
tags = optional(map(string))
create_role = optional(bool)
}))
[] no

Outputs

Name Description
datasync_role_arn DataSync Task ARN
efs_locations DataSync EFS Location ARN
s3_locations DataSync S3 Location ARN