In continuation from my blog article Building a Spark Application for HDInsight using IntelliJ Part 1 of 2 which outlines my experience in installing IntelliJ, other dependent SDKs and creating an HDInsight project.
To add some code, right click src, create Scala Class


Project folders and MainApp

Scala code:
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext
object MainApp{
def main (arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName("MainApp")
val sc = new SparkContext(conf)
val rdd = sc.textFile("adl://rkbigdata.azuredatalakestore.net/MyDatasets/Crimes_-_2001_to_present_pg2.csv")
//find the rows where primary type == THEFT
val rdd1 = rdd.filter(s => s.split(",")(5) == "THEFT")
val spark = SparkSession.builder().appName("Spark SQL basic").enableHiveSupport().getOrCreate()
spark.sql("USE usdata")
val crimesDF = spark.sql("SELECT * FROM CRIMES WHERE primarytype == 'NARCOTICS'")
// save data frame of results into an existing or non-existing hive table.
crimesDF.write.mode("overwrite").saveAsTable("crimebytype_NARCOTICS")
}
}
Logic
- Read from csv file in Azure Data Lake Store into RDD
- Filter RDD for rows where primary type field is “THEFT”
- Set Hive Database to usdata (from default database)
- Query Hive table CRIMES for rows primary type field is “THEFT”
- Save data frame into new or existing crimebytype_NARCOTICS hive table.
Begin to setup IntelliJ to submit application to HDInsight
Before using Azure Explorer, I encountered an issue where signing in resulted in an error and it kept prompting me to enter the credentials. Sorry I didn’t capture the error message. And so, I was led to disable Android Support by checking it off.

Click Ok.
Sign into Azure via Azure Explorer

Select Interactive
Enter credentials

See the Azure resources display

Right click the Project and click on Submit Spark Application to HDInsight

Set Main class name to MainApp

HDInsight Spark Submission window
Confirm success

Go to Ambari Hive View to query the hive table created from the spark application.

From Jupyter notebook, I query the same hive table.

I have shown a walk through of setting up the development tooling and building a simple spark application and run against HDInsight Spark 2.0 Cluster.
Pingback: Building a Spark Application for HDInsight using IntelliJ Part 1 of 2 – Roy Kim on SharePoint, Azure, BI, Office 365