Blog Series:
- Creating Azure Data Lake
- PowerShell and Options to upload data to Azure Data Lake Store
- Using Azure Data Lake Store .NET SDK to Upload Files
- Creating Azure Data Analytics
- Azure Data Lake Analytics: Database and Tables
- Azure Data Lake Analytics: Populating & Querying Tables
- Azure Data Lake Analytics: How To Extract JSON Files
- Azure Data Lake Analytics: U-SQL C# Programmability
- Azure Data Lake Analytics: Job Execution Time and Cost
Azure Data Lake Analytics is a pay per use big data analytics service where you can write and submit scripts in U-SQL. This is a platform-as-a-service offering which requires less management of infrastructure and you focus on building your applications.
Some Key capabilities that I favored and found useful
- U-SQL
Simple and powerful while leveraging my existing skills of SQL, C# and Visual Studio IDE. Therefore, making me highly productive and deliver results. - Cost-effective
I pay for the processing time of my script jobs. Overall I spend hardly much since my job executions are only a few minutes on average and not very frequent. - Role-based access security
Able to grant various levels of permissions and assign roles to various types of users. This is leveraging Azure Active Directory. For example, enabling read-only users to report on data with tools such as Power BI.
For a more detailed overview and more of the key capabilities, read https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-overview
The following article nicely explains how to create Azure Data Lake analytics, but this blog will talk through how I set things up and weigh in on my own thoughts and considerations.
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-get-started-portal
Go to Azure Portal and add a new Azure Data Lake Analytics service
Data Lake Store
I selected an existing data lake store within the same region/location.
Pricing
Pricing Details: https://azure.microsoft.com/en-us/pricing/details/data-lake-analytics/
Pay-as-you-go is $2 USD/hr per Analytics Unit. For development purposes, when working with data in roughly 100,000s of rows as a rough indicator, this pricing was more than well with me. I am not running jobs against the data frequently throughout the day either. I hardly noticed any significant costs. Monthly plans will provide at least 50% discount.
An analytical unit is about 2 CPU cores and 6 GB of RAM. For more details on an analytical unit read Understanding the ADL Analytics Unit
For my simple U-SQL jobs, I usually run jobs with 5 Aus that run about 1 min to give a ballpark sizing.
After deployment, here are some of the key settings and functions
I like to discuss just a few of these:
Add User Wizard
Allows for role-based access security of data analytics and files
- Select Azure AD user
- Role: Owner, Contributor, Reader, Data Lake Analytics Developer
- Catalog permissions for running and managing jobs
- Files and folder permissions
New Job
Running U-SQL scripts in the browser. Nice and convenient for simple scripts. The other option is using Visual Studio with Data Lake tools
When you submit the job, you can get really nifty execution run-time monitoring
Job Management
View the history and status of job submissions
Compute Hours
View the compete hours so that you can check against your costs. Here shows is very little indication of computing usage.
Data Explorer
View the folder and files of the data lake store and the databases and tables in Data Lake Analytics
For Microsoft developers who want explore developing big data solutions and don’t aspire to be an expert, I would recommend trying Azure Data Lake Analytics. It is easy to pick up with existing knowledge of SQL and C#, easy to setup in Azure and, most importantly, able to deliver an end to end analytics solution without too much grind. This is in contrast to the Hadoop platform. For those who aspire to be an expert and want to use all the bells and whistles, then Hadoop is the way to go.
Pingback: Azure Data Lake Analytics: Populating & Querying Tables – Roy Kim on SharePoint, Azure, BI, Office 365
Pingback: Azure Data Lake Analytics: How To Extract JSON Files – Roy Kim on SharePoint, Azure, BI, Office 365
Pingback: Azure Data Lake Analytics: Database and Tables – Roy Kim on SharePoint, Azure, BI, Office 365