Working with Hive Tables in Zeppelin Notebook and HDInsight Spark Cluster

Zeppelin notebooks are a web based editor for data developers, analysts and scientists to develop their code (scala, python, sql, ..) in an interactive fashion and also visualize the data. I will demonstrate simply notebook functionality, query data in hive tables, aggregate the data and save to a new hive table. For more details, read …

Continue reading Working with Hive Tables in Zeppelin Notebook and HDInsight Spark Cluster

The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS

In my blog post Populating Data into Hive Tables in HDInsight, I have demonstrated populating an internal and an external hive table in HDInsight. The primary storage is configured with Azure Data Lake Store. To see the differences, I will demonstrate dropping both types of tables and observe the effects. This for the beginner audience. To recap …

Continue reading The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS

Populating Data into Hive Tables in HDInsight

Objective: Populate a csv file to an internal and external Hive table in HDInsight. See my blog post on create hive tables Creating Internal and External Hive Tables in HDInsight I have obtained a 1.4GB csv file on US city crimes data from https://catalog.data.gov/dataset/crimes-2001-to-present-398a4 My HDInsight cluster is configured to use Azure Data Lake store …

Continue reading Populating Data into Hive Tables in HDInsight

Creating Internal and External Hive Tables in HDInsight

Objective: Create an internal and an external hive tables in HDInsight. Based on the schema of a CSV file on US city crime. https://catalog.data.gov/dataset/crimes-2001-to-present-398a4 Building Hive tables establishes a schema on the flat files that I have stored in Azure Data Lake Store. This will allow me to do SQL like queries with HiveQL on that …

Continue reading Creating Internal and External Hive Tables in HDInsight

Create HDInsight Spark Cluster with Azure Data Lake Store

The Spark cluster is one of the several cluster types that is offered through HDInsight platform-as-a-service. The unique capabilities of the Spark cluster are the in-memory processing that supports overall performance benefit over Hadoop cluster type. As a result, build big data analytics applications. For further overview read https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-overview I will walk through and comment …

Continue reading Create HDInsight Spark Cluster with Azure Data Lake Store

Power BI and Read Only Access to Azure Data Lake Store

Blog Series: Creating Azure Data Lake PowerShell and Options to upload data to Azure Data Lake Store Using Azure Data Lake Store .NET SDK to Upload Files Creating Azure Data Analytics Azure Data Lake Analytics: Database and Tables Azure Data Lake Analytics: Populating & Querying Tables Azure Data Lake Analytics: How To Extract JSON Files …

Continue reading Power BI and Read Only Access to Azure Data Lake Store

Azure Data Lake Analytics: Finding Duplicates With U-SQL Windows Functions

Blog Series: Creating Azure Data Lake PowerShell and Options to upload data to Azure Data Lake Store Using Azure Data Lake Store .NET SDK to Upload Files Creating Azure Data Analytics Azure Data Lake Analytics: Database and Tables Azure Data Lake Analytics: Populating & Querying Tables Azure Data Lake Analytics: How To Extract JSON Files …

Continue reading Azure Data Lake Analytics: Finding Duplicates With U-SQL Windows Functions

Azure Data Lake Analytics: Job Execution Time and Cost

Blog Series: Creating Azure Data Lake PowerShell and Options to upload data to Azure Data Lake Store Using Azure Data Lake Store .NET SDK to Upload Files Creating Azure Data Analytics Azure Data Lake Analytics: Database and Tables Azure Data Lake Analytics: Populating & Querying Tables Azure Data Lake Analytics: How To Extract JSON Files …

Continue reading Azure Data Lake Analytics: Job Execution Time and Cost

Azure Data Lake Analytics: U-SQL C# Programmability

Blog Series: Creating Azure Data Lake PowerShell and Options to upload data to Azure Data Lake Store Using Azure Data Lake Store .NET SDK to Upload Files Creating Azure Data Analytics Azure Data Lake Analytics: Database and Tables Azure Data Lake Analytics: Populating & Querying Tables Azure Data Lake Analytics: How To Extract JSON Files …

Continue reading Azure Data Lake Analytics: U-SQL C# Programmability

Azure Data Lake Analytics: How To Extract JSON Files

Blog Series: Creating Azure Data Lake PowerShell and Options to upload data to Azure Data Lake Store Using Azure Data Lake Store .NET SDK to Upload Files Creating Azure Data Analytics Azure Data Lake Analytics: Database and Tables Azure Data Lake Analytics: Populating & Querying Tables Azure Data Lake Analytics: How To Extract JSON Files …

Continue reading Azure Data Lake Analytics: How To Extract JSON Files