Azure Batch for Internet Data Collection Part 4: Parallel Task Execution

A unique capability of Azure Batch is realized when executing tasks across many nodes in a pool concurrently. That is horizontal scaling. To do this in on-premises servers, it would be very costly and labour intensive. For example, would you be able to ask IT to provision 20, 40, 100+ VMs just to use for brief moments at a time?

To get additional overview read Run tasks concurrently to maximize usage of Batch compute nodes

As I have shown in my previous blog post Part 3, I was able to execute a single task through a job. However, the Azure Portal does not yet support running parallel tasks. This is can only be done through the Azure Bath SDK or APIs. But to get a jump start, there is a .NET sample console application for parallel tasks found at https://github.com/Azure/azure-batch-samples/tree/master/CSharp/ArticleProjects/ParallelTasks

I have used this application and adjusted for my needs.

Horizontal scaling vs vertical scaling

The main considerations to scale an application from an infrastructure point of view are CPU, memory, disk and network I/O. Although in executing my console application I can scale vertically by having a larger VM size, this can’t scale a whole lot further out. Vertical scaling will hit an upper bound with physical limits of cores in a CPU and memory. By scaling horizontally across commodity VM hardware and resources, I can scale my application execution much more and with more cost efficiency. Another benefit of horizontal scaling is increased availability and redundancy so that when a VM has an issue executing tasks due to infrastructure failure, Azure batch is designed to requeue the task so that it can be running on a healthy node.

Key configuration

  • Pool:
    • VM Size Standard A1 1 core
      • Windows Server 2016
  • 50 nodes
    • 20 dedicated nodes
    • 30 low priority nodes

I have adapted this sample application with my own logic:

  • Authentication: Provide batch account URL, name and account key
  • Open a BatchClient session
    BatchSharedKeyCredentials cred = new
    BatchSharedKeyCredentials(BatchAccountUrl, BatchAccountName, BatchAccountKey);
    using (BatchClient batchClient = BatchClient.Open(cred)) { … }
  • Create a new job with reference to a pre-existing pool.
    job = batchClient.JobOperations.CreateJob();
    job.Id = jobId;
    job.PoolInformation = new PoolInformation { PoolId = poolId };
    await job.CommitAsync();
    
    • Note: The sample code creates a new pool each time which can take several minutes or more.
  • Create List cloudTasks where each CloudTask has its own command line and arguments
// setup command line
taskCommandLine = String.Format("cmd /c %AZ_BATCH_APP_PACKAGE_DATACOLLECTOR%\\DataCollect.exe -args {0}", commandArgs);// create new task
CloudTask task = new CloudTask(jobTaskId, taskCommandLine);
// For the task, make reference to the application package to know which app to run.
task.ApplicationPackageReferences = new List<ApplicationPackageReference
{
new ApplicationPackageReference
{
ApplicationId = "DataCollector",
Version = "1.1"
}
};// add task to List
cloudTasks.Add(task);
// Bulk tasks submission to the job
batchClient.JobOperations.AddTaskAsync(jobId, cloudTasks);

Compile the parallel task project.

Azure Batch for Internet Data Collection Part 4- ParallelTask Execution 1

To monitor you have two options

  • Azure Portal > Batch Account > select your Batch Pool > Overview
    • Simple view yet ready to use.
  • GitHub project >Batch Explorer
    • Comprehensive details and extensible, but need to download, compile and configure.
      Here’s an example screenshot of running Batch Explorer and seeing the heat map functionality.
      Azure Batch for Internet Data Collection Part 4- ParallelTask Execution 2
      I like how there is a configurable refresh interval so that you can rapidly refresh than what is possible through the Azure Portal.

In preparation to execute my Parallel Tasks project, I have scaled out my batch pool as 20 dedicated and 30 low priority nodes. VM size is standard A1 – 1 core and 1.75GB RAM

Azure Batch for Internet Data Collection Part 4- ParallelTask Execution 3

Go to Visual Studio and run the Parallel Task project

In Azure Portal or in Batch Explorer, you can see the respective heat maps of tasks executing my .NET console application in various nodes.

Azure Batch for Internet Data Collection Part 4- ParallelTask Execution 4
Azure Batch for Internet Data Collection Part 4- ParallelTask Execution 5

Note: This heat map is not detecting the low-priority nodes as this version of the code did not have low-priority at that time. Check for the latest version in the github samples.

In conclusion, I have demonstrated parallel execution of my .NET console app executing on many nodes.

Next: Azure Batch for Internet Data Collection Part 5: Monitoring

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s