The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS

In my blog post Populating Data into Hive Tables in HDInsight, I have demonstrated populating an internal and an external hive table in HDInsight. The primary storage is configured with Azure Data Lake Store.

To see the differences, I will demonstrate dropping both types of tables and observe the effects. This for the beginner audience.

To recap the linked post, the internal hive table has its own copy of the csv file of crimes data. The external hive table crimes_ext has a pointer to the folder containing its data.The hive tables to drop as seen through Visual Studio

The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-1
In Azure Data Lake Store, the tables are under \clusters\rkhdinsight\hive\warehouse\usdata.dbThe Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-2 copy
Clicking into the crimes folder, we see the data file.
The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-3
Clicking into the crimes_ext folder, there is no data file.The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-4
Since crimes_ext is an external table, its data is configured to be under \datsets\Crimes
The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-5
Out of curiosity, I look in the external hive metastore in an Azure SQL Database. There is a table TBLS with a list of tables and its type. The hive metastore contains the metdata of the hive table, but no data.The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-6Dropping the tables

Use usdata;
DROPTABLE crimes_ext;
DROPTABLE crimes;

In Server Explorer, after refreshing the usdata database, they are gone.The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-7
In ADLS, the crimes folder is removed which was the internal table. However, the crimes_ext folder still exists for reasons I am not aware.The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-8
To check the data of the external table, we see its data file not removed, which is expected by design.
The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-9
Going back to the hive metastore, we see the two tables removed.The Effects of Dropping Internal and External Hive Tables in HDInsight and ADLS-10

Side note: The hive view crimesgroupbytype_view queries the dropped internal hive table and if you run the view, you get an error stating the dependency of the missing hive table.

The concepts of dropping internal and external tables are easy to understand, but I just want to show the underlying effects in Azure Data Lake Store and the hive metastore DB for the beginner.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s