Wednesday, December 25, 2024

Word Cloud of Negative Sentiment Summaries

The task here involves performing exploratory data analysis (EDA) to visualize negative sentiment text data. The data used is a collection of summaries, with the sentiment labeled (e.g., polarity score). We focus on negative sentiment sentences (polarity < 0), and the objective is to generate a word cloud that visually represents the most frequent words used in the negative summaries.

### Code Explanation:

1. **Importing Required Libraries**:
    
    from mlAASentimentAnalysis import data
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud, STOPWORDS
    
   - `mlAASentimentAnalysis` is a custom module (presumably containing a dataset `data`).
   - `matplotlib.pyplot` is used to plot the word cloud.
   - `WordCloud` is used to generate a visual representation of frequent words in the dataset, and `STOPWORDS` provides a list of common words (like "the", "and") to exclude from the word cloud.

2. **Setting Stopwords**:
    
    stopwords = set(STOPWORDS)
    
    - This converts the `STOPWORDS` list into a set to eliminate common, irrelevant words (like "a", "the") from the word cloud.

3. **Filtering Negative Sentences**:
    
    data_negative = data[data['polarity'] < 0]
    
    - Here, the dataset `data` is filtered to include only rows where the `polarity` is less than 0 (indicating negative sentiment). The filtered data is stored in `data_negative`.

4. **Concatenating Negative Sentences**:
    
    total_negative = (' '.join(data_negative['Summary']))
    
    - The summaries (or text content) of the negative sentences are concatenated into a single string `total_negative`. This is necessary to generate the word cloud.

5. **Data Cleaning**:
    
    import re
    total_negative = re.sub('[^a-zA-Z]', ' ', total_negative)
    total_negative = re.sub(' +', ' ', total_negative)
    
    - The first `re.sub()` removes all non-alphabetical characters (like numbers or special symbols) from the text.
    - The second `re.sub()` replaces any consecutive spaces with a single space, ensuring cleaner text.

6. **Generating the Word Cloud**:
    
    wordcloud = WordCloud(width=1000, height=500, stopwords=stopwords).generate(total_negative)
    
    - A `WordCloud` object is created, where the width and height are specified (1000x500 pixels). The `stopwords` set is passed to ensure that common words are excluded from the cloud. The `generate()` method processes the text to build the word cloud.

7. **Plotting the Word Cloud**:
    
    plt.figure(figsize=(15, 5))
    plt.imshow(wordcloud)
    plt.axis('off')
    plt.show()
    
    - The figure size is set to 15x5 inches.
    - `plt.imshow(wordcloud)` displays the word cloud.
    - `plt.axis('off')` removes the axes for a cleaner visualization.
    - `plt.show()` renders the plot.

### Plot Explanation:

The word cloud generated from this code will visually represent the most frequent words in the summaries that have a negative sentiment (polarity < 0). The size of each word in the word cloud corresponds to its frequency in the dataset—larger words appear more often, while smaller words appear less frequently.

#### Key Observations:
- Words that are frequently used in negative summaries will dominate the word cloud.
- Common words that are irrelevant to sentiment analysis (like "the", "and", "of") are excluded due to the stopwords filtering.

### Solution:

The solution involves two main steps:
1. **Data Filtering**: By isolating the negative sentences using the `polarity < 0` condition, we focus only on the negative sentiment text.
2. **Text Visualization**: The word cloud is a great tool for visualizing the most common words associated with negative sentiment in the dataset. This allows us to identify trends, recurring themes, or specific words that appear frequently in negative summaries.

Overall, this approach helps in gaining insights into the language or phrases that are commonly used in negative contexts in the dataset.

No comments:

Post a Comment

Featured Post

How HMT Watches Lost the Time: A Deep Dive into Disruptive Innovation Blindness in Indian Manufacturing

The Rise and Fall of HMT Watches: A Story of Brand Dominance and Disruptive Innovation Blindness The Rise and Fal...

Popular Posts