1. Deploy the Azure Infrastructure and Data Pipeline Related Artifacts

Create a Service Principal
Assign Service Principal with Subscription Rights. There's 2 Options
- Assign the Service Principal RBAC Owner rights at the Subscription(s)
- Pre-create all Resource Groups and Assign the Service Principal Owner RBAC Owner rights at each Resource Group
Create a federated credential for the service principal
- Please use an entity type of environment
- You will need to create a new federated credential for each environment you're deploying.
  - The IP kit deploys up to 3 environments: development, test, and production
  - Your federated credential environment name must match what's in bold above
Create an Azure Active Directory (AAD) group and add all project team members, or, if only you will be interacting with the deployed resources, yourself
If you're using GitHub environments, then create the below environments in your GitHub Repo
- development
- test
- production
If you're using environments, add the below secrets to each environment you're deploying. If you're not using environments, Add the following Repository Secrets with the same name
- TENANT_ID - how to find
- SUBSCRIPTION_ID - how to find
- SERVICE_PRINCIPAL_CLIENT_ID (From Step 1) - how to find
Create the below secrets with the same name if you're creating private endpoints
- DNS_ZONE_SUBSCRIPTION_ID
Create the below secrets with the same name if you're deploying VM's with Bastion
- VM_USERNAME
- VM_PASSWORD
For each environment you're deploying, update the feature flag variable file to indicate which resources you are deploying or behavior of resources
- If you're deploying Role-Based Access Control (RBAC), please refer here for what RBAC is deployed
- If you're deploying the pre-built data pipelines, you must enable Data Factory, Landing Storage, Data Lake, Azure SQL and either Synapse or Databricks
For each environment you're deploying, update the general variable file with the resource names for the resources you indicated you are deploying based on the feature flag file. Also add required tags, Azure location, and resource group names.
- All non Logic App/Azure Machine Learning/OpenAI resources will be deployed to the resource group inputted in the PrimaryRg variable
- The PrimaryRg variable is required. If you're only deploying Logic App/Azure Machine Learning/OpenAI resources, set the PrimaryRg variable as the same name as one of the other resource groups
- Note that most Azure resource names need to be globally unique, but keep the SQL Database name as "MetadataControl"
- The following variable values can only contain letters and numbers and must be between 3 and 24 characters long
  - dataLakeName
  - landingStorageName
  - logicAppStorageName
  - mlStorageName
- The following variable values must be between 3 and 24 characters long
  - keyVaultName
- The following variable values can only contain letters and numbers
  - mlContainerRegistryName
  - fabricCapacityName
- If Key Vault or Container Registry are deleted and need to be redeployed, please change the resource name
  - this is due to soft delete policies
If you're deploying the resources securely with no public access and private endpoints, please update the networking setup variable files and set the DeployWithCustomNetworking feature flag in the feature flag variable file to true
- The best practice is to connect to an existing spoke Virtual Network(s) for private endpoints and vnet injection. Please refer here for an overview of the networking requirements
Update the entra assignments variable files
- Only the Entra_Group_Admin and Entra_Group_Shared_Service groups are required. If you only have one group from Step 3 above, you can put the same information for both variables
Confirm the following resource providers are registered in your Azure Subscription. If not, register them
- Microsoft.EventGrid
- If you're deploying Purview: Microsoft.Purview, Microsoft.EventHub
Trigger the data-strategy-orchestrator GitHub Action. If you're unfamiliar with triggering a GitHub Action, follow these instructions.
- Please do not use the "rerun" job functionality. Always execute the job using method in above instructions

2. Complete the Post Deployment Tasks

Azure SQL

Execute the below stored procedure in the deployed Azure SQL Database(s)
- Login with AAD. SQL Auth is disabled.

EXEC [dbo].[AddManagedIdentitiesAsUsers]

Synapse

Execute the below stored procedure in the Synapse Serverless Database StoredProcDB
- Login with AAD. SQL Auth is disabled post deployment.

EXEC [dbo].[AddManagedIdentitiesAsUsers]

If you're deploying the logic app, run the following precreated SQL script in the Synapse portal: RunForLogicApp

Purview

Add the ADF and Synapse managed identities as Data Curator's in the Root Collection of Purview
- This is required for lineage
When lake DBs are created, you will need to execute the below commands for Purview to scan

CREATE LOGIN [PurviewAccountName] FROM EXTERNAL PROVIDER;
CREATE USER [PurviewAccountName] FOR LOGIN [PurviewAccountName];
ALTER ROLE db_datareader ADD MEMBER [PurviewAccountName];

If you're deploying all resources with no public access behind a virtual network and your service principal didn't have Owner RBAC rights on the Subscription

Get Owner of Subscription to Provide AAD Group with Contributor Access to Purview Managed Resource Group

if you set the feature flag, DeployPurviewIngestionPrivateEndpoints, to true

Within the Azure Portal, navigate to Purview's managed Storage Account and Event Hub. For each resource, approve the pending Private Endpoint connections created by the GitHub Action.

If your deploying all resources with no public access behind a virtual network

Set up a Managed VNET Integration Runtime to scan supported Azure data sources
Set up a Self-Hosted Integration Runtime to scan data sources unsupported by the Managed VNET Integration Runtime

3. Start Ingesting Data

Process Overview

Overview of Pre-Built Ingestion Patterns
Overview of Pre-Build Data Pipelines
Moving Data to Curated

Create Control Table Records for Metadata Driven Ingestion

Please create control table records in the dbo.MetadataControl table in the Azure SQL DB. Please follow the instructions here
- Every time you need to ingest a new source entity (e.g. sql table, csv file, Excel tab), please create one control table record when moving data from source to landing, one for landing to raw, and one for raw to staging.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
DeploymentComponents		DeploymentComponents
images		images
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DataStrategyBacklog.xlsx		DataStrategyBacklog.xlsx
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
datastrategy.md		datastrategy.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Deploy the Azure Infrastructure and Data Pipeline Related Artifacts

2. Complete the Post Deployment Tasks

Azure SQL

Synapse

Purview

If you're deploying all resources with no public access behind a virtual network and your service principal didn't have Owner RBAC rights on the Subscription

if you set the feature flag, DeployPurviewIngestionPrivateEndpoints, to true

If your deploying all resources with no public access behind a virtual network

3. Start Ingesting Data

Process Overview

Create Control Table Records for Metadata Driven Ingestion

Contributing

Trademarks

About

Releases

Packages

Contributors 6

Languages

License

microsoft/Data-and-AI-Platform

Folders and files

Latest commit

History

Repository files navigation

1. Deploy the Azure Infrastructure and Data Pipeline Related Artifacts

2. Complete the Post Deployment Tasks

Azure SQL

Synapse

Purview

If you're deploying all resources with no public access behind a virtual network and your service principal didn't have Owner RBAC rights on the Subscription

if you set the feature flag, DeployPurviewIngestionPrivateEndpoints, to true

If your deploying all resources with no public access behind a virtual network

3. Start Ingesting Data

Process Overview

Create Control Table Records for Metadata Driven Ingestion

Contributing

Trademarks

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages