Objective: From Azure Data Lake Store, copy data in a .tsv file to a database table in Azure SQL DB.
My take on Azure Data Factory is that of ETL but less of the transformation and move data from on-premises and cloud sources to Azure data-oriented services.
Read further about Azure Data Factory https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction
In my Azure Data Lake Store, I have the 227mb tsv file which will be the source data.
In my Azure SQL DB, I have the following table which will be the target.
The approach I will take is to use the Copy data Wizard. This is much more easier and convenient than the Author and deploy approach which is based on editing JSON. Hopefully in the near future, there will be more UI based tools.
Click Copy data
Click Next.
Select Azure Data Lake Store as the source
Enter connection information. For Authentication Type, I selected Service Principal over OAuth. The advantage is that the authentication does not expire as it does for OAuth. So for recurring schedules, you are best off with creating a Service Principal. For details on creating a service principal https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal#get-application-id-and-authentication-key
Choose the file or folder as the source data
Configure File Format settings. Most have been auto-detected. I chose to edit the schema to apply some proper field names
Select Destination data store as Azure SQL DB
Enter connection and authentication information
Select table
Configure column mapping. Note: I haven’t setup for repeatability such as defining a date field to help manage this scenario.
Performance settings. I choose to leave as default.
Confirm Summary
Validating and Deployment. This will create the linked services, data sets, and pipelines the same way in the Author and deploy editor
Monitor pipeline
Confirm data copy by querying the Azure SQL JobPostings table.
In Azure SQL, a quick look of the Resource Utilization during this ADF pipeline activity we can see it has been a substantial load. Although the sizing is at the lowest at 5 DTUs.
I find Azure Data Factory I neat tool with basic capabilities. I don’t think this should be compared with SSIS with more sophisticated capabilities. For now, Copy data wizard is good for general bulk data movement to get your data into the Azure cloud.