Data factory basics

Data factory documentation

Overview of Datafactory

Pricing is here

Copy data from SFTP server using Azure Data Factory

Mapping data flows availability: 2019

Copy data tool is documented here

Overview of copydata tool is here

SFTP linked service, limiting the number of files copied

Search for: SFTP linked service, limiting the number of files copied

How to ask a question on Azure portal in their forums

Here is my specific question around adf2 sftp connector

Posted question link: How do I control the number of files copied by ADF v2 SFTP connector?

A document on incremental data copies

Something to read, and see, not sure if this addresses the question.

I will need this later: Transform data in the cloud by using a Spark activity in Azure Data Factory

Here is how to understand templates in adf

A template for moving files

Can azure data factory save state information between runs?

Search for: Can azure data factory save state information between runs?

Article: USING AZURE DATA FACTORY V2 ACTIVITIES & DYNAMIC CONTENT TO DIRECT YOUR FILES

Using python to create a data factory

I have posted a question on debugging activities here

outbound connections in azure

not able to connect to sftp server from azure data factory v2

Search for: not able to connect to sftp server from azure data factory v2

Whats new in adf v2 (2017): a weblog

Looks like a good high level overview of all the pieces of adf v2

Poster a question on azure forum: Can I call an azure function from Lookup Activity to gather a dynamic set

1. Go to portal

2. Go to home

3. click on data factories icon

4. Go to author/monitor

1. sftp uses port 22

2. ftp uses port 21

3. Using a wrong connector ftp for sftp or sftp for ftp could result in an error

4. when using ftp, if the ftp server is not enabled for SSL. (In this case disable ssl assuming it is safe for your needs). If not debug the ftp server and fix the issue

1. You have to specify the output fields desired from the metadata of a data source

2. If a particular output field is not supported, you get an error

3. Remove that field from the outputs

Linked services is documented here

1. These are tags

2. You can name any number of tags on any component including a linked service

This video briefly touches on this aspect: tags and annotations


{
    "name": "your-linked-servicename",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "description": "your-ftp-server-description",
        "annotations": [
            "ingest",
            "another-annotation-name"
        ],
        "type": "FtpServer",
        "typeProperties": {
            "host": "ftp-hostname",
            "port": 21,
            "enableSsl": false,
            "enableServerCertificateValidation": false,
            "authenticationType": "Basic",
            "userName": "user-name",
            "encryptedCredential": "some-letters"
        }
    }
}

specify dynamic content in json format adf v2

Search for: specify dynamic content in json format adf v2

Interesting diversion: Channel 9: https://channel9.msdn.com

Azure friday videos

question on azure forum: dynamic content and linked service

There is a video here: Parameterize connections to your data stores in Azure Data Factory

Parameterizing linked services is documented here

First document I have seen on expression language


Azure SQL Database
Azure SQL Data Warehouse
SQL Server
Oracle
Cosmos DB
Amazon Redshift
MySQL
Azure Database for MySQL

For all other data stores, you can parameterize the linked service by selecting the Code icon on the Connections tab and using the JSON editor.


{
   "name": "AzureSqlDatabase",
   "properties": {
      "type": "AzureSqlDatabase",
      "typeProperties": {
         "connectionString": {
            "value": "Server=tcp:myserver.database.windows.net,1433;\ 
             Database=@{linkedService().DBName};\ 
             User ID=user;\ 
             Password=fake; \ 
             Trusted_Connection=False;\ 
             Encrypt=True;\ 
             Connection Timeout=30",
            "type": "SecureString"
         }
      },
      "connectVia": null,
      "parameters": {
         "DBName": {
            "type": "String"
         }
      }
   }
}

Passing parameters between activities and pipelines: A PDF


pl_  //pipeline
ds_  //data set
ac_  //activity
ls_  //linkedservice

Too sad, that is not a particularly good document!!

Documentation: Visual authoring in Azure Data Factory

I have a question posted here for the advanced tab

I have posted some questions at youtube

I have posted some questions to azure linked in group as well

How to use filter activity in adf v2

Search for: How to use filter activity in adf v2

Filter activity is documented here at MS

ftp linked service (connector) is documented here

The path to folder. If you want to use wildcard to filter folder, skip this setting and specify in activity source settings.

Can you use wild cards for Get Metadata items?

Search for: Can you use wild cards for Get Metadata items?


{
    "itemName": "wrf_wind.2019092312",
    "childItems": [
        {
            "name": "output_stn01.txt",
            "type": "File"
        },
        {
            "name": "output_stn02.txt",
            "type": "File"
        },
        {
            "name": "output_stn03.txt",
            "type": "File"
        },
        {
            "name": "output_stn04.txt",
            "type": "File"
        },
        {
            "name": "output_stn05.txt",
            "type": "File"
        },
        {
            "name": "output_stn06.txt",
            "type": "File"
        },
        {
            "name": "output_stn07.txt",
            "type": "File"
        },
        {
            "name": "output_stn08.txt",
            "type": "File"
        },
        {
            "name": "output_stn09.txt",
            "type": "File"
        },
        {
            "name": "output_stn11.txt",
            "type": "File"
        },
        {
            "name": "output_stn16.txt",
            "type": "File"
        },
        {
            "name": "output_stn17.txt",
            "type": "File"
        },
        {
            "name": "output_stn18.txt",
            "type": "File"
        },
        {
            "name": "output_stn19.txt",
            "type": "File"
        },
        {
            "name": "readme.txt",
            "type": "File"
        },
        {
            "name": "zlevs_output_d02.nc",
            "type": "File"
        },
        {
            "name": "zlevs_output_d03.nc",
            "type": "File"
        },
        {
            "name": "zlevs_output_d04.nc",
            "type": "File"
        }
    ],
    "effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Central India)",
    "executionDuration": 0,
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    }
}

Appears to be a good article on filter activity


{
    "ActivityRunId": "9c2a1a2c-cdf1-445b-acc4-186a984aafd8",
    "Status": "Succeeded",
    "Error": {
        "Message": "",
        "ErrorCode": ""
    },
    "Output": {
        "ItemsCount": 18,
        "FilteredItemsCount": 15,
        "Value": [
            {
                "name": "output_stn01.txt",
                "type": "File"
            },
            {
                "name": "output_stn02.txt",
                "type": "File"
            },
            {
                "name": "output_stn03.txt",
                "type": "File"
            },
            {
                "name": "output_stn04.txt",
                "type": "File"
            },
            {
                "name": "output_stn05.txt",
                "type": "File"
            },
            {
                "name": "output_stn06.txt",
                "type": "File"
            },
            {
                "name": "output_stn07.txt",
                "type": "File"
            },
            {
                "name": "output_stn08.txt",
                "type": "File"
            },
            {
                "name": "output_stn09.txt",
                "type": "File"
            },
            {
                "name": "output_stn11.txt",
                "type": "File"
            },
            {
                "name": "output_stn16.txt",
                "type": "File"
            },
            {
                "name": "output_stn17.txt",
                "type": "File"
            },
            {
                "name": "output_stn18.txt",
                "type": "File"
            },
            {
                "name": "output_stn19.txt",
                "type": "File"
            },
            {
                "name": "readme.txt",
                "type": "File"
            }
        ]
    }
}

//items to filter
items = 
  @activity('Get Metadata2').output.childItems

condition =
  @endswith(item().name, '.txt')

{
    "name": "TextFileFilter",
    "description": "Text file Filter",
    "type": "Filter",
    "dependsOn": [
        {
            "activity": "Get Metadata2",
            "dependencyConditions": [
                "Succeeded"
            ]
        }
    ],
    "userProperties": [],
    "typeProperties": {
        "items": {
            "value": "@activity('Get Metadata2').output.childItems",
            "type": "Expression"
        },
        "condition": {
            "value": "@endswith(item().name, '.txt')",
            "type": "Expression"
        }
    }
}

Copy activity is documented here

Folder and file property for Copy Activity in ADF v2 azure data factory

Search for: Folder and file property for Copy Activity in ADF v2 azure data factory

Azure custom activity is documented here

Here is an azure function activity

writing an azure python function

Azure functions core tools is here

A complex example of file copy behavior

Parameter passing to datasets in adf v2

Search for: Parameter passing to datasets in adf v2

Parameter passing to datasets in adf v2