Azure Data Lake Analytics: Populating & Querying Tables

Blog Series:

Upon creating a table with a defined set of columns, let’s go through a basic example of populating it from a CSV file stored in Azure Data Lake.

The .csv file as the sample data to be imported into the table. I am using Data Lake Explorer in visual studio. The relative file path is \MyData\FL_insurance_sample.csv. This sample data file I got from a public website.
Azure Data Lake Analytics- Populating - Querying Tables-1

U-SQL application:

// Use insurance database;
USE Insurance;

// Extract from file
@csvData =
    EXTRACT policyID int,
        statecode string,
        county string,
        eq_site_limit float,
        hu_site_limit float,
        fl_site_limit float,
        fr_site_limit float,
        tiv_2011 float,
        tiv_2012 float,
        eq_site_deductible float,
        hu_site_deductible float,
        fl_site_deductible float,
        fr_site_deductible float,
        point_latitude float,
        point_longitude float,
        line string,
        construction string,
        point_granularity int?
    FROM "/MyData/FL_insurance_sample.csv"
    USING Extractors.Csv(skipFirstNRows:1);

//Insert it into a previously created table
INSERT INTO Policies
SELECT *
FROM @csvData;

Since the data file was in CSV format, I would use the Extractors.csv. Other extractors are TSV and Text file. Since the data file had a header row of column names as its first row, I had to indicate to skip the first row by stating skipFristNrows: 1. Otherwise, the script submission would result in an error because the header columns wouldn’t work with the field data types defined. There are other arguments to specify to offer more configurability and flexibility.

Submit Job
Azure Data Lake Analytics- Populating - Querying Tables-2

To query and see the results, they must be stored into an output file in the Azure Data Lake Store. Unfortunately, I don’t see a way to see them on a console like you do in SQL Management Studio.

// Use insurance database;
USE Insurance;

@queryPolicy = SELECT * FROM Policies
WHERE county == "NASSAU COUNTY";

// output query into a file
OUTPUT @queryPolicy
TO "/MyData/queryresults/policies-by-nassaucounty.csv"
USING Outputters.Csv();

Azure Data Lake Analytics- Populating - Querying Tables-4

You can preview the file in Visual Studio
Azure Data Lake Analytics- Populating - Querying Tables-5

@queryView = SELECT * FROM PoliciesByCounty;

// output query into a file
OUTPUT @queryView
TO "/MyData/queryresults/view-by-county.csv"
USING Outputters.Csv();     

Note the View definition is:
CREATE VIEW IF NOT EXISTS Insurance.dbo.PoliciesByCounty AS
    SELECT COUNT(policyID) AS NumOfPolicies, county
    FROM Insurance.dbo.Policies
    GROUP BY county;

The preview of the output file:
Azure Data Lake Analytics- Populating - Querying Tables-6