Azure Batch for Internet Data Collection Part 1: Lab Design

Azure Batch Overview

Azure Batch is a high-performance computing platform where you can deploy and run your application across of up to hundreds virtual machines or more.

To read further, visit Azure Batch Technical Overview

The pricing is based on the number and size of the virtual machines that you use and storage capacity. There is also a lower cost VMs called low priority VMs that are from a surplus of VMs in Azure.

What are some example Uses cases?

  • Financial risk modeling
  • Climate and hydrology data analysis
  • Image rendering, analysis, and processing
  • Media encoding and transcoding
  • Genetic sequence analysis
  • Engineering stress analysis
  • Software testing

Anatomy of Azure Batch

  • Account – The root container to configure and run batch applications.
    • Pool – A collection of nodes which are VMs that your application is executed.
      • Node – The virtual machine and its size
  • Application Packages – Upload your compiled application as a .zip so that it can be deployed to pools.
    • Job – a collection of one or more tasks to execute the application along with the specified pools and configuration parameters.
      • Task – a unit of execution which is in the form of command line execution against a single node.
      • Job Schedule – the ability to set a reoccurring schedule

My .NET console application

Collect data from internet published APIs, in parallel execution, and store in Azure Data Lake. You may choose and register for an API at https://www.programmableweb.com/category/all/apis

Functional design

  • Command line interface parser
  • Collect data from published HTTP endpoints
  • Cleanse data
  • Parse data
  • Process, transform and output into JSON file format
  • Azure Application Insights SDK for tracing and error logging
  • Azure Data Lake Store SDK for storing data

Component Design

  • DataCollection.exe
    • Main driver .NET console application
    • Parses command line arguments
  • DataCollectRESTClient
    • Call upon public HTTP REST APIs and endpoints
    • Process, cleans, transform the data
  • RKBigData.Common
  • Postings.UnitTest
    • Unit Tests against many component interfaces and its methods.
  • AzureDataLakeStorageDataAccess
  • AzureStorageDataAccess

Azure Batch for Internet Data Collection Part 1- Lab Design

The next blog will look at the configuration of the Azure Batch solution and design choices.

Next: Azure Batch for Internet Data Collection Part 2: Application Package and Pool

One thought on “Azure Batch for Internet Data Collection Part 1: Lab Design

  1. Pingback: Azure Batch for Internet Data Collection Part 2: Application Package and Pool – Roy Kim on Azure, SharePoint, BI, Office 365

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s