The series so far:
- Text Mining and Sentiment Analysis: Introduction
- Text Mining and Sentiment Analysis: Power BI Visualizations
- Text Mining and Sentiment Analysis: Analysis with R
- Text Mining and Sentiment Analysis: Oracle Text
- Text Mining and Sentiment Analysis: Data Visualization in Tableau
The previous articles of the Text Mining and Sentiment Analysis series focused on Power BI as the data visualization tool of choice. Tableau is another data visualization and analytics tool widely used in the industry. Tableau Desktop was first released in 2004, and this platform has been in use for much longer than Power BI, which was released for general availability in 2015. Current market research by Sintel shows Tableau Software has a 14.27 % market share, while Microsoft Power BI has a 10.47 % market share in the Business Intelligence (BI) category. Given the popularity and widespread use of Tableau, this article focuses on creating visualizations (workbooks) using Tableau Desktop (version 2020.2) to analyze the key-phrases and sentiment scores for gaining meaningful insights from text data.
Initial setup
This article uses an Oracle Database Virtual Box Appliance / Virtual Machine and Tableau Desktop installed on the host machine. The term host machine refers to your laptop, desktop, or any computer on which you install Oracle Virtual Box to host the virtual machine.
Set up section 1: Oracle Database Virtual Box Appliance / Virtual Machine
- Download and install VirtualBox by following instructions listed on the VirtualBox website.
- Download and set up the Oracle Database Virtual Box Appliance / Virtual Machine by following instructions listed on the Oracle website.
- Start up the Virtual Machine and verify SQL Developer is available on the Desktop screen. Please note that you may need to create a free account to download the Virtual Machine.
- In the Virtual Machine, launch the Oracle SQL Developer application. Under the Connections menu, click on system. When prompted for credentials, enter sys as sysdba in the username field and oracle in the password field. Ensure you can successfully connect to the Oracle Database before proceeding to the next step.
Figure 1. Oracle Database Developer Virtual Machine
Figure 2. Test new connection to database
Set up section 2: Configure Oracle Database on this Virtual machine for access from your Host system
- Navigate to the Oracle VM Virtual Box Manager and shut down this VM.
- Click the Settings gear icon from the toolbar. Select Network, click the Adapter 2 tab. Check the Enable Network Adapter checkbox.
- Select Host-only Adapter from the drop-down list options for the field Attached to.
- Select VirtualBox Host-Only Ethernet Adapter from drop-down list options for field Name.
- Click OK.
Figure 3. Configure Network setting for Oracle DB Developer VM
- Start the virtual machine.
- Click the Network icon on the top right side of the toolbar. There are two Ethernet network adapters (eth0 and eth1).
- If either one of the network adapters is in the Off state, click Connect on it. Both network adapters (eth0 and eth1) should be in the Connected state before proceeding further.
Figure 4. Connect Network Adapter inside the VM
- Open a Terminal inside the VM. Type
ifconfig -a
and hit enter/return key. Make a note of the IP address for network adapter eth1. This IP address will be used later during the demo to connect Tableau Desktop from your host machine to the Oracle database orcl running on this VM. In my demo, the IP address is 192.168.52.101.
Figure 5. Note IP address for network adapter eth1
Set up section 3: Install Tableau Desktop on your Host and connect to Oracle Database running on the VM
- Follow instructions on the Tableau website to download and install Tableau Desktop. This demo uses Tableau Desktop Version 2020.2. Please note that the download requires a business email, and the free trial expires in fourteen days. You may skip this step if you already have Tableau Desktop installed and running on your host machine
- If you have never used an Oracle client on your host machine before, you would also need to install Oracle JDBC driver. Please follow the download and install instructions from Tableau support page.
- Launch Tableau Desktop. Navigate under Connect Menu > Server and click on Oracle.
Figure 6. Launch Tableau Desktop on your Host machine
- On Connect to Oracle server screen
- Enter the IP address from the previous step in the Server field
- Enter orcl in the Service field
- Enter 1521 in the Port field
- Select Use a specified username and password radio button
- Enter value system in the Username field
- Enter value oracle in the Password field
- Click Sign In
Figure 7. Connect to Oracle database from Tableau Desktop
- Upon successful sign-in, Tableau Desktop will proceed to the Data Source screen
Figure 8. Tableau Desktop showing connected to Oracle Database in the Data Source screen
Set up section 4: Import the demo data set into the Oracle Database
This article uses a demo data set where Sentiment Score and Key Phrases are already generated from the text response to ensure the demo’s simplicity, clarity, and consistency. The data set is a CSV file with fields:
- Period (Year & Quarter number)
- Manager (Name)
- Team (Name)
- Response (free form text responses from the Survey, to the question – How do you feel about your team’s health in this recent quarter)
- SentimentScore (a numeric score ranging from 0 to 1, indicating the degree of positivity of sentiment found in the Response text. 1 indicates highest positivity and 0 indicates the lowest positivity)
- KeyPhrases (a comma-separated list of meaningful words and phrases extracted from the Response text)
- Download demo data file from my Github repository and save it in a convenient location on the Virtual Machine.
- In the Virtual Machine, launch the Oracle SQL Developer application. Under the Connections menu, double click on system. Enter credentials: system in the username field and oracle in the password field. Click OK.
Figure 9. Connect to the Oracle database using Oracle SQL Developer
- Run the following SQL to create a table for loading the demo data file into
-- Create a Table to load text data CREATE TABLE SYSTEM.DEMO_SENTIMENTANALYSIS_TEAMHEALTH ( ID NUMBER GENERATED ALWAYS AS IDENTITY ,PERIOD VARCHAR2( 500 ) NOT NULL ,MANAGER VARCHAR2( 500 ) NOT NULL ,TEAM VARCHAR2( 500 ) NOT NULL ,RESPONSE VARCHAR2( 4000 ) NOT NULL ,SENTIMENT_SCORE NUMERIC(4,2) ,KEY_PHRASES VARCHAR2( 4000 ) ,DW_CREATION_DATE DATE DEFAULT SYSDATE );
Figure 10. Create a table to load the demo data file
- Under the connection menu, find table DEMO_SENTIMENTANALYSIS_TEAMHEALTH. Right-click on it and select Import Data to launch Data Import Wizard.
Figure 11. Launch the Data Import Wizard
- In Step 1 of the Data Import Wizard, from the File field navigate to the location of your saved file DemoDataWith_sc_kps.csv. Leave all other fields to their default values as shown in Figure 11. Click Next. Please note in some versions of this environment, some users may face issues with loading a .csv where the Data Import Wizard tried to create a link ID field. If you experience this issue, please save the file as .xlsx and retry.
Figure 12. Data Import Wizard – Navigate to the demo data file to import
- Proceed to the next steps in Data Import Wizard, leaving all fields to their default values. In Step 3 of the wizard, in some cases, you may need to select only the six named columns from the CSV file and ignore the long list of empty excel columns with names like “column7, column8,….”. Click Next.
Figure 13. Step 3 of Data Import Wizard – select columns to import
- In step 4 of Data Import Wizard, map source file field SentimentScore to target table column SENTIMENT_SCORE and source file field KeyPhrases to target table column KEY_PHRASES. Click Next.
Figure 14. Step 4 of Data Import Wizard – map and source and target columns
- Upon completion of the Data Import Wizard, you will see a message indicating data import has successfully completed. Click OK to exit the wizard
Figure 15. Data Import wizard completed successfully
Visualization One – Word Cloud
A Word cloud is one of the most popular ways to analyze text data by visualizing frequent keywords/phrases. It’s an image composed of keywords/phrases found within a document, where the size of each word indicates its relative frequency in that document.
Unlike Power BI, Tableau does not have an out-of-the-box visualization to create a word cloud. For creating a word cloud visualization, Tableau expects you to format input data such that each row only has one word. However, data in the KEY_PHRASES column is a comma-separated list of keywords, which needs to be transformed before it can be used in Tableau.
Run the following SQL to create a view that applies the necessary transformation logic.
CREATE OR REPLACE VIEW SYSTEM.DEMO_TEAMHEALTH_WORDFREQUENCY AS WITH TEMP AS -- get necessary data , trim whitespace and convert to lower case SELECT ID,PERIOD,MANAGER,TEAM, RTRIM ( LTRIM (LOWER (KEY_PHRASES))) KEY_PHRASES FROM system.DEMO_SENTIMENTANALYSIS_TEAMHEALTH), TEMP1 AS -- split into rows on commas (SELECT DISTINCT T.ID,T.PERIOD,T.MANAGER,TEAM, TRIM (REGEXP_SUBSTR (T.KEY_PHRASES, '[^,]+', 1, LEVELS.COLUMN_VALUE)) AS KEY_PHRASES FROM TEMP T, TABLE ( CAST ( MULTISET ( SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= LENGTH ( REGEXP_REPLACE (T.KEY_PHRASES, '[^,]+')) + 1) AS SYS.ODCINUMBERLIST)) LEVELS), TEMP2 AS -- split into rows on space (SELECT DISTINCT T.ID,T.PERIOD,T.MANAGER,TEAM, TRIM (REGEXP_SUBSTR (T.KEY_PHRASES, '[^ ]+', 1, LEVELS.COLUMN_VALUE)) AS KEY_PHRASES FROM TEMP1 T, TABLE ( CAST ( MULTISET ( SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= LENGTH ( REGEXP_REPLACE (T.KEY_PHRASES, '[^ ]+')) + 1) AS SYS.ODCINUMBERLIST)) LEVELS) SELECT ID,PERIOD,MANAGER,TEAM, KEY_PHRASES AS WORDS FROM TEMP2 WHERE -- list of custom words to remove here KEY_PHRASES NOT IN ('team', 'of', 'health','i');
Figure 16. Create view to transform KEY_PHRASES
Run the following SELECT
statement to verify the view transforms data in the expected format of one keyword/phrase per row.
SELECT * FROM SYSTEM.DEMO_TEAMHEALTH_WORDFREQUENCY ORDER BY ID;
Figure 17. Results of view to transform key phrases into a frequency table
Launch Tableau Desktop. Reconnect to Oracle Database as shown in Figure 6. From the Data Source tab, set Schema = SYSTEM and Table = DEMO_TEAMHEALTH_WORDFREQUENCY. Drag and drop view DEMO_TEAMHEALTH_WORDFREQUENCY to the right-hand side window. Select the Extract radio button. Click the prompt at the bottom of the screen to Go to Worksheet.
Figure 18. Use Oracle database view in Tableau Data Source
The Extract option pulls data one time from the Oracle database and saves it as a Tableau Extract, whereas the Live option performs a fresh data pull from the source database each time any user interacts with the Tableau Dashboard. The Live option ensures that users get current data from the database when interacting with their Tableau Dashboards. This is useful in situations where the underlying data is fast-changing, and Tableau Dashboards must reflect the latest data. However, the potential drawback of this option is slower response time/performance of Dashboards, compared to the saved Extract.
Click the Go to Worksheet prompt at the bottom of the screen and save the extract file when prompted. Tableau Desktop will open an empty worksheet. On this empty worksheet, perform the following steps to create the basic word cloud visualization.
- Drag the WORDS dimension to Text on the Marks card.
- Drag the WORDS dimension to Size on the Marks card.
- Drag the WORDS dimension to Color on the Marks card, to add color.
- Right-click on the dimension on the Size card and select Measure > Count.
- Change the Mark type from Automatic to Text.
Figure 19. Create the basic Word Cloud visualization
Add the WORDS dimension to the Filter card, which brings up the filter menu as shown in Figure 20. Switch over to the Top tab. Select the By field radio button > Top, 50, by Words and Count. Click OK. This filters the word cloud to display only the 50 most frequent words, vastly improving the readability of this visualization.
Figure 20. Filter Word cloud by Top 50 most frequent words
Add dimensions Manager, Period, and Team to the Filters card. Right-click on each of these dimensions in Filters card and select Show Filter, which will display them on the right side for the user to slice and dice the Word Cloud for their desired combination of Manager, Period and Team. Right click on sheet1 at the bottom of Tableau Desktop screen to rename this sheet to WordCloud. Update the Title to Word Cloud. Save this as a Packaged Tableau workbook (.twbx file)
Rename
Figure 21. Word Cloud with Filters
This word cloud visualization is used to interpret the following themes.
- The overall top three most frequently recurring terms are good, work and projectx, probably indicating that “good work (was done in) project” is the most prominent theme in this body of text
- The next three most frequently occurring terms are support, improvement and great. This could be interpreted as “great improvements (were made with) support” is the second most prominent theme in this body of text
- With the filters on the right-hand side, the Word cloud can be sliced and diced for the desired combination of TEAM, PERIOD and MANAGER, to identify the prominent themes by these dimensions
The next few visualizations use the Sentiment Score field.
Visualization Two – Bar Chart of Average Sentiment Score Trend
In Tableau Desktop, return to the Data Source tab found at the bottom left. Search for Table DEMO_SENTIMENTANALYIS_TEAMHEALTH. Drag and drop it to the right. Make sure that both the table and view are in the work area on the right. In the Edit Relationship screen, select the ID field on both left and right sides to blend the two data sets, creating a relationship. This is Tableau’s way of expressing the SQL equivalent of an Inner Join.
Figure 22. Add the Sentiment Analysis Table Data Source
Close the Edit Relationship pop-up window. Create a new sheet by clicking on New Worksheet from the option on the bottom right.
On this new sheet under the Tables section, expand DEMO_SENTIMENTANALYSIS_TEAMHEA. Right-click on dimensions Id, Manager, Period and Team. rename them to remove any extra characters not needed for display (This renaming of dimension names is optional and only done for cosmetic reasons, with no impact on the actual analysis.)
Figure 23. Rename fields to remove extra characters
- Drag and drop the Period dimension to the Columns shelf.
- Drag and drop the Sentiment Score measure to the Rows shelf. Right-click on it and change the aggregation to Average.
- Rename the Sheet to SentimentScoreTrend.
- Edit the Title to Average Sentiment Score Quarterly Trend.
- Right-click in the Y-Axis of the resulting bar-chart. Click Edit Axis. Select the General. Choose the Fixed option under the Range radio button. Enter a value of 0 in the field Fixed Start and value of 1 in the field Fixed end. This sets the range for the Y-axis.
Figure 24. Set range for y-axis
- Drag and drop Manager and Team dimensions to the Filters card.
- Right-click on each dimension in the Filters card and select Show Filter, which displays them on the right side where users can interact with them.
Figure 25. Average Sentiment Score Quarterly Trend
This Bar Chart visualization can be used to make the following observations
- The Average Sentiment Score across all teams trended marginally upwards from period 2018-Q3 (0.67) to 2019-Q1 (0.70), then remained almost flat for the next period of 2019-Q2 (0.70).
- When filtered for Team 5, you can see that Team 5 experienced a significant uptick in their Average Sentiment score from 0.45 (45% positive) in 2018-Q3 to 0.79 (79% positive) in 2019-Q2.
- When filtered for other teams, note that some teams don’t have bars on the chart for certain periods. This could indicate missing data or potentially a re-structuring of teams/departments in the company.
Visualization Three – Bar Chart of Average Sentiment Score Comparison across Teams
- Create a new worksheet. Drag and drop the Teams dimension to the Columns shelf.
- Drag and drop the Sentiment Score measure to the Rows shelf. Right-click on it and change the aggregation to Average.
- Right-click in the Y-Axis of the resulting bar-chart and click Edit Axis. Select the General Tab. Choose the Fixed option under the Range radio button. Enter a value of 0 in the field Fixed Start and value of 1 in the field Fixed end. This sets the range for the Y-axis.
- Right-click on the Team dimension in the Columns shelf and select Sort. Then change the Sort By drop-down to Field and select Sentiment Score in the Field Name drop-down list. Select Ascending in the Sort Order radio button. Select Average in the Aggregation drop-down list. This will sort the bar chart by increasing the value of Average Sentiment Score.
Figure 26. Sort the Bar-chart
- Drag and drop the Period dimension to the Filters card and right-click it. Select Show filter to display period on the right side and allow users to see how the team’s rankings changes over time
- Rename this sheet to SentimentScore_Teams and change the Title to Sentiment Score by Team
Figure 27. Compare Sentiment Scores across teams
This bar chart indicates the following.
- Aggregated across all Periods, Team 5 has the lowest Average Sentiment Score, and Team 8 has the highest Average Sentiment Score
- Filtering Data for Period 2018-Q3 shows that Team 5 had the lowest Average Sentiment Score, and Team 2 has the highest Average Sentiment Score
- Filtering Data for Period 2019-Q2 shows that Team 1 has the lowest Average Sentiment Score, and Team 5 has the highest Average Sentiment Score
- An observation to make note of is that the aggregated data across all periods shows that Team 5 has the lowest Average Sentiment Score. However, analyzing the data one period at a time reveals that Team 5 has made tremendous improvements in their Average Sentiment Score over the course of four periods and has gone from having the lowest score to the highest.
Visualization Four – Histogram
A histogram is a representation of the distribution of numerical data. To construct a histogram, the first step is to bin (or bucket) the range of values into a series of intervals, then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but not necessarily) of equal size. In this scenario, the sentiment score bin will form the x axis, and the frequency (count of responses) belonging to that bin will be on the y axis.
Bin the Sentiment Score measure before using it to create the histogram. This will simply categorize the sentiment scores into ten distinct buckets
- Create a New Worksheet. Right-click on the Sentiment Score measure > Create > Bins
- On the Edit Bins screen, enter a value of 0.1 in the field Size of bins. Click OK.
Figure 28. Create Sentiment Score Bins to use in the histogram
- Drag and drop the Sentiment Score(bin) dimension to the Columns shelf
- Drag and drop the Sentiment Score measure to the Rows shelf
- Right-click on the Sentiment Score measure in the Rows shelf and change the measure (aggregation) to Count
- Drag and drop Manager, Team and Period dimensions to the Filters card
- Right-click on each dimension in the Filters card and select Show Filter, which will display them on the right side for users to interact with
- Rename the worksheet to Histogram
- Edit the Title to Histogram
Figure 29. Histogram
This histogram shows an imbalanced bimodal data distribution, indicating some degree of polarization in terms of how team members feel about their team’s health. Many team members (200+ responses clustered towards the right) feel strongly positive about their team’s health, while some (about 100 responses clustered towards the left) feel strongly negative. Very few (only 17 responses in the middle) are in the neutral score range. Users can filter the Histogram by Period, Team, or Manager for further in-depth analysis.
Visualization Five – Focus on Targeted Responses
The first four visualizations helped to answer the following questions.
- What are the trending topics for each team, across various periods and managers?
- How does the sentiment score trend over time?
- How do the Sentiment Scores for teams compare against each other, and how has that evolved over time?
- How does the distribution of Sentiment Scores look like across various members of the same team? Are there any groupings/clusters in certain areas of the score range?
Next, you might want to read the text of responses that are of interest to you. A set of users may be interested in reading the most negative responses for a period. In contrast, the most positive responses from a particular team might interest other group users. You may want to share this Tableau Dashboard with the managers of these teams. A particular manager may be interested in reading the responses only for their own team(s). In this section, I will demonstrate a visualization for serving up this detailed information in an easy to search format.
- Create a new sheet and drag the Period, Manager, Team and Response dimensions to the Rows shelf
- Drag the Sentiment Score measure and drop it on Text within the Marks shelf
- Drag the Period, Manager, Team and Sentiment Score (bin) dimensions and drop them to the Filters card
- Right-click on each dimension in the Filters card and select Show filter
- Rename the worksheet to Response Details
- Edit the title to add the following text to it ‘Select 1 or more Sentiment Score Bins, to view all the responses that fall within the selected score range. Then filter by Team/Manager/Period as needed’
Figure 30. Response Details
The Tableau visualizations demonstrated so far have helped to analyze the team health data.
- The word cloud identifies popular topics/themes and allows drill down by Period, Team and Manager
- The Sentiment Score Trend bar chart shows how the average sentiment scores are trending over a period of time
- The Sentiment Score by Teams bar chart allows for quick and easy comparison of average sentiment scores across teams. It helps to identify which teams have the highest/lowest average scores and if they changed over time.
- The histogram visualizes distribution of Responses across the range of sentiment scores. It helps to identify clusters/groupings in the positive, neutral or negative ranges and gauge polarization.
- The Response details visualization allows users to focus on reading text of only a sub-set of responses, that are of interest of them.
Over the course of these five visualizations, you (or your users) can identify which teams are doing great, which ones may need some help to improve their team’s health, and which areas deserve further in-depth conversations with Team managers.
Conclusion
This article demonstrated how to connect with an Oracle Database from Tableau Desktop, to visualize the Team Health data. It also covered creating five different visualizations in Tableau Desktop to gain insights into themes, identify trends, extract business value, and narrate a meaningful story from the team health survey responses.
References:
- Tableau desktop – https://www.tableau.com/products/desktop
- Creating a word cloud in Tableau – https://kb.tableau.com/articles/howto/creating-a-word-cloud
- Bar charts – https://en.wikipedia.org/wiki/Bar_chart
- Histogram – https://en.wikipedia.org/wiki/Histogram
- Bimodal distribution – https://www.statisticshowto.com/what-is-a-bimodal-distribution/
The post Text Mining and Sentiment Analysis: Data Visualization in Tableau appeared first on Simple Talk.
from Simple Talk https://ift.tt/3lRR94D
via
No comments:
Post a Comment