Text Analytics of Movie Reviews using Azure Data Lake, Cognitive Services and Power BI (part 1 of 2)

Applicable Business Scenario

Marketing or data analysts who need to review sentiments and key phrases of a very large data set of consumer-based movie reviews.

Applied Technologies

  1. Azure Data Lake Store
  2. Azure Data Lake Analytics
    1. Cognitive Services
  3. Visual Studio with Azure Data Lake Analytics Tools
  4. Power BI Desktop & Power BI Service
  5. SharePoint Online site and preview of Power BI Web Part

Azure Data Lake Store

Upload .csv file of 2000 movie reviews to a folder in Azure Data Lake Store


Azure Data Lake Analytics

Execute the following U-SQL script in either the Azure Portal > Azure Data Lake Analytics > Jobs > New Jobs or Visual Studio with Azure Data Lake Analytics Tools.

This script makes reference to the Cognitive Services assemblies. They come out of the box in the Azure Data Lake master database.


U-SQL Script

 The following script reads the moviereviews.csv file in Azure Data Lake Store and then analyzes for sentiment and key phrase extraction. Two .tsv files are produced, one with the sentiment and key phrases for each movie review and another for a list of each individual key phrase with a foreign key ID to the parent movie review.

 REFERENCE ASSEMBLY [TextKeyPhrase]; @comments = EXTRACT Text string FROM @"/TextAnalysis/moviereviews.csv" USING Extractors.Csv(); @sentiment = PROCESS @comments PRODUCE Text, Sentiment string, Conf double READONLY Text USING new Cognition.Text.SentimentAnalyzer(true); @keyPhrases = PROCESS @sentiment PRODUCE Text, Sentiment, Conf, KeyPhrase string READONLY Text, Sentiment, Conf USING new Cognition.Text.KeyPhraseExtractor(); @keyPhrases = SELECT *, ROW_NUMBER() OVER () AS RowNumber FROM @keyPhrases; OUTPUT @keyPhrases TO "/TextAnalysis/out/MovieReviews-keyPhrases.tsv" USING Outputters.Tsv(); // Split the key phrases. @kpsplits = SELECT RowNumber, Sentiment, Conf, T.KeyPhrase FROM @keyPhrases CROSS APPLY new Cognition.Text.Splitter("KeyPhrase") AS T(KeyPhrase); OUTPUT @kpsplits TO "/TextAnalysis/out/MovieReviews-kpsplits.tsv" USING Outputters.Tsv();

Azure Portal > Azure Data Lake Analytics  U-SQL execution

Create a new job to execute a U-SQL script.


Visual Studio Option

You need the Azure Data Lake Tools for Visual Studio. Create a U-SQL project and paste the script. Submit the U-SQL script to the Azure Data Lake Analytics for execution. The following shows the successful job summary after the U-SQL script has been submitted.


Click here to Part 2 of 2 of this blog series

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s