These functions are used to help interact with Azure Data lake in a more seamless manner.
The following parameters are use by all data lake functions.
Parameter | Type | Description |
---|---|---|
account | String | The name of the Azure Storage Account. |
container | String | The name of the Azure Data Lake Container. |
path | String | The path to the file/folder within the data lake. For the root path use '/'. |
The data lake functions support authenticating to the data lake using the following options.
- The Service Principal (SPN) of the deployed Azure Functions App
- A user specified Service Principal. The service principal must be in the same tenant as the Azure Functions App.
- A Shared Access Signature (SAS) token.
- The Azure Storage Account key
You must only specify the parameters required for one of these authentication types for each function call. Specifying multiple types will return an error. The types are in the table below.
Note: You can optionally provide the parameter
keyVault
. This allows the name of the secret in a key vault to be passed instead of a secret value. For this to work, the deployed Azure Functions App service principal must be granted access to read the secrets from the specified key vault.
Auth Type | Parameter | Required | Format | Description |
---|---|---|---|---|
Azure Functions SPN | N/A | To use the Azure Functions App Service Principal you do not need to provide any authentication parameters. | ||
User SPN | spnClientId | Yes | GUID | The client id of the SPN you want to use to authenticate calls to the data lake. Must be in the same tenant as the Azure Functions application. |
spnClientSecret | Yes | String | The client secret for the SPN. This is either the secret for the SPN, or, if the keyVault parameter is specified is the name of the secret value in that Azure Key Vault. |
|
keyVault | No | String | The name of the Azure Key Vault that contains the secret specified by the parameter spnClientSecret . |
|
SAS Token | sasToken | Yes | String | The SAS token for accessing storage account. Must be in the same tenant as the Azure Functions application. |
keyVault | No | String | The name of the Azure Key Vault that contains the secret specified by the parameter spnClientSecret . |
|
Storage Account key | accountKey | Yes | String | The storage account key. This is either the secret for the SPN, or, if the keyVault parameter is specified is the name of the secret value in that Azure Key Vault. |
keyVault | No | String | The name of the Azure Key Vault that contains the secret specified by the parameter spnClientSecret . |
Successful calls will return 200 (OK), failed calls will return 400 (Bad Request). In both cases the return body will be a JSON object with the result set or error details.
https://<YourAzureFunctionsAppUrl>/api/DataLake/CheckPathCase
When dealing with metadata driven processing, it is easy for a mistake in the path case in metadata to cause errors when accessing the lake, because Azure Data Lake paths are case sensitive. This function can be used to validate a path. If the path does not exist, the function checks for all paths that could match but with different casing. If one is found, that is returned, or if none/multiple matches are found an error is returned.
The generic and authentication parameters above are mandatory, and are the only parameters required.
Returned values JSON objects. Below is an example of a successful call:
{
"invocationId": "37c34c08-bb41-4176-b542-8adc3617f28f",
"debugInfo": {
"informationalVersion": "1.0.0"
},
"storageContainerUrl": "https://<YourAzureFunctionsAppUrl>/myContainer",
"authType": "FunctionsServicePrincipal",
"parameters": {
"Path": "TESTDATA"
},
"validatedPath": "TestData"
}
An example of a call with a mandatory parameter missing:
{
"invocationId":"e28970da-13f2-46be-848e-c25de54539a1",
"error": "Mandatory parameter 'account' was not provided."
}
An example of a call where an internal error has occurred:
{
"invocationId":"e28970da-13f2-46be-848e-c25de54539a1",
"error": "An error occurred, see the Azure Function logs for more details"
}
https://<YourAzureFunctionsAppUrl>/api/DataLake/GetItems
This function is intended as an improved version of the ADF Get Metadata activity. It can be called from ADF using the Execute Function activity, using the following parameters.
The generic and authentication parameters above are mandatory. In addition the following optional parameters can be specified.
Parameter | Type | Description |
---|---|---|
ignoreDirectoryCase | Bool | This will call checkPathCase on the path parameter before getting the items. This means that if the path is incorrectly cased, but there is only one path that matches when looking case-insensitively, the function will return results. |
limit | String | The number of results to return. |
filter[PropertyName] | String | This allows filtering the results using the properties of items in the result set. Format is operator:value allowing flexibility building filters. The like operator matching supports full .Net style regular expressions. See below for more info on valid property names. |
orderBy | String | The property to order the result set by. Valid properties are those of the returned json for each object, eg LastModified . |
orderByDesc | Bool | Sorts the results descending if true. Default when not specified is false. Used with ordering on LastModified and a limit of 1 will find the most recent file matching a filter. |
recursive | Bool | Look through folders recursively. |
When providing a filter parameter, there are a number of operators that can be used. The format is filter[PropertyName]=operator:value
.
Operator | Description |
---|---|
eq | Property value is equal to the value provided |
ne | Property value is not equal to the value provided |
lt | Property value is less than the value provided |
gt | Property value is greater than to the value provided |
le | Property value is less than or equal to the value provided |
ge | Property value is greater than or equal to the value provided |
like | Property value matches the pattern provided. You can use * for wildcards, but .Net sytle regular expressions are also supported. |
Valid PropertyName
options are any of the returned properties of a file or folder. These are:
Property | Type | Example | Description |
---|---|---|---|
Name | String | TestDoc1.txt | The name of the file or folder. |
Directory | String | TestData/TestFolder1 | The parent directory of the file or folder. |
FullPath | String | TestData/TestFolder1/TestDoc1.txt | The full path of the file or folder. |
Url | String | https://<YourAzureFunctionsAppUrl> /test/TestData/TestFolder1/TestDoc1.txt |
The full Url for the file and folder, excluding and auth tokens that might be required for access. |
IsDirectory | Boolean | true | Flag to indicate if teh items is a file or folder. |
ContentLength | BigInt | 278432943 | Size of the file in bytes |
LastModified | Date | 2021-09-15T09:04:58Z | ISO Format date/time string |
Filter the results to return only files:
filter[IsDirectory]=eq:false
Filter the results to return only folders:
filter[IsDirectory]=eq:true
Filter the results to return files and folders modified since 2021-09-01 14:00:00:
filter[LastModified]=ge:2021-09-01 14:00:00
Filter the results to return only parquet files using a wildcard:
filter[Name]=like:*.parquet
Filter the results to return only files or folders starting with 'abc' or 'xzy' using a regular expression:
filter[Name]=like:(abc|xyz)*
Note: When using filters, you must URL encode any special characters when sending the request. This is especially important for regular expression filters.
We can also combine multiple filter using &
. for example to find files modified since september 2021. We could add a orderBy to this too to allow processing files in a date range in order...
filter[IsDirectory]=eq:false&filter[LastModified]=ge:2021-09-01 00:00:00
Note: Currently only one filter per property is supported. This mens it is not possible to look for files between dates by specifying two filters on date, on with greater than and the other with smaller than. Between filters will be added in a future release.
If parameters are used incorrectly, the returned JSON will have the error details. All other errors return a simple, generic error message, but the Azure Functions app will have detailed logging available for the execution.
Returned values JSON objects. Below is an example of a successful call:
{
"invocationId": "c21b69dc-9e76-42da-9953-ec63519f378a",
"debugInfo": {
"informationalVersion": "1.0.0"
},
"storageContainerUrl": "https://<YourAzureFunctionsAppUrl>/myContainer",
"clientId": "f4b9d6e7-2753-44c6-a579-0bd77caa287d",
"authType": "UserServicePrincipal",
"parameters": {
"Path": "/",
"IgnoreDirectoryCase": false,
"Recursive": true,
"OrderByColumn": null,
"OrderByDescending": false,
"Limit": 0,
"Filters": [
{
"PropertyName": "IsDirectory",
"Operator": "eq",
"Value": "true",
"ErrorMessage": null
}
]
},
"fileCount": 3,
"files": [
{
"Name": "TestData",
"Directory": "",
"FullPath": "TestData",
"Url": "https://<YourAzureDataLake>.dfs.core.windows.net/myContainer/TestData",
"IsDirectory": true,
"ContentLength": 0,
"LastModified": "2021-09-09T17:23:14Z"
},
{
"Name": "TestFolder1",
"Directory": "TestData",
"FullPath": "TestData/TestFolder1",
"Url": "https://<YourAzureDataLake>.dfs.core.windows.net/myContainer/TestData/TestFolder1",
"IsDirectory": true,
"ContentLength": 0,
"LastModified": "2021-09-09T17:23:14Z"
},
{
"Name": "TestFolder2",
"Directory": "TestData",
"FullPath": "TestData/TestFolder2",
"Url": "https://<YourAzureDataLake>.dfs.core.windows.net/myContainer/TestData/TestFolder2",
"IsDirectory": true,
"ContentLength": 0,
"LastModified": "2021-09-09T17:23:14Z"
}
]
}
Note: If no files are found in the specified path with the specified parameters, then the
fileCount
will be 0, and thefiles
property will be an empty array.
An example of a call with a mandatory parameter missing:
{
"invocationId":"e28970da-13f2-46be-848e-c25de54539a1",
"error": "Mandatory parameter 'account' was not provided."
}
An example of a call where an internal error has occurred:
{
"invocationId":"e28970da-13f2-46be-848e-c25de54539a1",
"error": "An error occurred, see the Azure Function logs for more details"
}