Wait, not “the Monica” entirely. But you can definitely relate to it, & anyone who has ever watched friends can remember this:
Let’s extend it by a little, it is not just about pleasing people (which ultimately is the end goal), but understanding what people are talking about and taking appropriate responses. So, what we are going to talk about today is, how to actually understand whether people are pleased or sometimes when you can just not get it.
What is Sentiment Analysis?
The basic idea of Sentiment Analysis is the contextual mining or opinion mining of the text to determine the tone and sentiment of the text as positive, negative, or neutral. In reality, Sentiment Analysis uses Natural Language Processing (NLP) techniques on texts, Social media data, or reviews to bring insights about either the customer or the product.
Examples of Sentiment Analysis: Why should it be done properly?
Did the definition sound simple? Want to know more about the types and how it is done? Wait for a few more moments, first it is important to understand how it needs to be used, and were there any examples where Sentiment Analysis results didn't go exactly as planned? Today let’s not start with an award-winning example, but something with a word of caution.
Sony re-released Morbius in June 2022 but was it the right choice? Though there is a lot of debate going on around the launch, it is important to understand why they did. Twitter users sarcastically praised the film, made-up quotes, and “It’s Morbin’ Time” quotes and Sony misread them as praise. Ok, it doesn't mean we are asking you NOT to do sentiment analysis, just means that it is not absolute ( Sarcasm can hurt more than just feelings, but ML too). But what we mean is that it needs to be seen cautiously with more context.
For example, the Morbius team could have checked additional data like reactions and compared them with other releases, which BuzzSumo research summarizes beautifully below:
Source: BuzzSumo Research
This shows how sometimes data needs context and needs to be seen with respect to comparables. The above data shows the difference in reactions between Morbius and other Marvel movies.
Everyone knows that KPMG is an advisory and auditing firm, which makes textual analysis highly important to the organization. There are two instances where it is especially important for a firm like KPMG to have contextual mining
Sustainability reports are reports which are generally available to the public and investors companies cover their economic, environmental, and societal impact. KPMG as an auditing company is qualified to analyze whether the reports can be published or not i.e. it checks if the reports are balanced (not only covering the positive aspects but also negative to help people make the right assessment). But it had to overcome the hurdle of positively worded negative statements, which it did use Bidirectional Encoder Representation or BERT by Google. KPMG also uses text analytics in detecting patterns and keywords to flag compliance risks.
The Infamous Johnny Depp - Amber Heard Case
First of all, not taking sides, just talking about an analysis both the sides presented about the tweets which contained a few hashtags including #justiceforjohhnydepp. In this case, an Intellectual Property Consultant presented the “social sentiment” with
- Quantity of the tweets
- Time of tweets
- Hypothesis testing
As a data enthusiast, the way he explained the Twitter APIs' extraction of data and the analysis was very intriguing. It also gives an ordinary person the opportunity to dive into the details of the analysis. They tried to analyze correlations between the statements and time, also the sentiment of the overall tweets.
Irrespective of the solution, it was great to see Data Science being represented and understood by everyone. There are drawbacks like the need to use the entire statements or words exactly while searching. Again Typos can not only create confusion but also evade the system. (For “The Office Fans” like how Micheal Scott couldn’t replace one Dwight with Samuel L. Chang)
If you’re thinking this is the end of the use cases, you are mistaken. These are just a few real-life examples, we shall be talking more about the Use cases of Sentiment Analysis after we talk more about how sentiment analysis is actually done.
Building blocks of Sentiment Analysis
Natural Language Processing is the building block of Sentiment Analysis and further analysis like Intent Analysis. Based on the amount of data that needs to be analyzed there are multiple ways that data can be analyzed. Majorly falling under two categories, they are: Rule-based and Automatic.
As the name suggests, the rule-based approach is used to help identify the subject and the opinion on it. Some of the phases included in a rule-based approach to tweets or texts in no particular order are:
- Transformation: It turns the text into lowercase, removing HTML, accents, and URLs present in the data.
- Tokenization: It is the method of breaking down the text into smaller words which includes various forms like Word & Punctuation, Whitespace, Sentence, Regexp, and tweet. We use the Regexp pattern here.
- Normalization: this includes the stemming and lemmatization of the words. This also has multiple types like- Porter Stemmer, Snowball Stemmer, WordNet Lemmatizer which applies a network of cognitive synonyms to tokens based on a large lexical database of English, UDPipe applies a pre-trained model for normalizing data, etc.
- Filtering & PoS tagging
If you want to see all of them at once, here’s an example of the multiple phases of Preprocessing text in Orange, with the picture below:
The disadvantages of using a Rule-based system are that it is not advanced enough to process how words combine to form sentences, and they require fine-tuning, testing, and maintenance to make it work.
This uses Machine Learning techniques to model the texts as a classification problem to segment them into positive, negative, or neutral texts. Because the models used are ML, first, the data used would be training data, for which the feature extraction and classification take place. After the data is trained, the actual data set is brought to bring the results.
The advantage of using a Machine Learning model is that it can not only be trained to suit the three parameters or tags (PNN) but it can also be trained for more parameters. For example, the Tweet Sentiment Visualization offers a wide range of emotions to map the outputs.
Any kind of data would require some amount of preparation like we were talking about in the Rule-based approach like text vectorization, Bag of words, Bag of n-grams, etc.
The classification algorithms that are used in this case can be:
- Multiple Regression: a very well-known algorithm in statistics used to predict the value of the independent variable with multiple dependent variables
- Naïve Bayes: Uses Bayes’s Theorem to predict the category of a text (probabilistic model)
- Support Vector Machines: a non-probabilistic model which supports both classification and regression, plots points in a multi-dimensional space. It maps points into hyper-planes differentiating the classes.
- Neural Networks: by representing words of similar meanings and similar vector-value by using techniques of Deep Learning and ANN
In addition to these two models, you can also use a Hybrid Model with a combination of automatic and rule-based approaches. Not talking about it specifically, coz we have mentioned in detail the techniques and the methods used in both of them. Now let’s talk about the usage of Sentiment Analysis for organizations.
Use-Cases of Sentiment Analysis
Though we’ve spoken about a few examples above that cover Social Sentiment Analysis, Text Mining, etc. let’s talk a little bit about the use cases of Sentiment Analysis for companies wrt their Marketing, Branding, Research, and Customer Services. Some of the applications of Sentiment Analysis are:
1. Intent Analysis
This is the next step of sentiment analysis as it is understanding the intent behind the comment or tweet or text or query. For example, Intent Analysis is about finding what the message is about whether it is news, opinion, ad, suggestion, query, etc. This can help in not only finding the relevant tone but also in classifying the kind of messages being received.
2. Brand Management
Brand image is important to both companies and people alike. Coming back to Monica’s quote (from FRIENDS), you want everyone to like you, but how are looking at it? Checking the rating on Google can be now considered Old School, especially for companies that have a huge social media presence. Now it is all about analyzing what the people are actually thinking about especially in the case of new products and services launches. All the information shared like stories, blogs, forums, etc., is data and all of these can be turned into insights at least to the extent of monitoring what the social sentiment is.
3. Voice of Customer
One easy way to answer “Why?” to this question is: To improve. For example, take companies like Airbnb or Hotels or restaurants, their key data is the customer feedback, and majorly the negative reviews. Remember when Monica gave a negative review and tried to show the chef how to make it and got the job? (In case you’re not a Friends fan, Ignore it!) In short, it is about how you can turn the negative reviews, analyze them, and change them into something actionable when the change can cause a significant impact on future orders.
4. Employee Feedback
Not everything is related to the top or the bottom line of the organization, it is also important to look at the health of the organization with the employees. Though companies can have an Employee Mood Index to gauge their mood, or an internal NPS, analyzing their feedback would give a) the areas that most of them are concerned about b) the overall sentiment of the work-life c) more ideas to keep them productive. Not only existing employees but also data from sources like Glassdoor, reviews, etc.
5. Market Research
When launching a new product or a service, you can study, analyze, and compare your product with an existing one, especially with the wealth of feedback and reviews of most products available online. You can also identify the demographics of the users interacting with the product or market trends wrt formal journals and reports by assessing them. You can tap into new sources of information available by analyzing both qualitative and quantitative sources.
6. Social Media Monitoring
We’ve already spoken about two different types of examples where social media has played a role. Keeping a track of what is happening, what is being spoken about, and how people perceive them, would give both the companies and the people a better understanding of how to act in the future. Why? Think about the times when people have shared videos about bad customer service, or think about the time when GOT unintentionally sponsored Starbucks after a few people spotted the cups in a few scenes, or when Airlines got a bad rep for overbooked flights.
These are just some of the well-known and the most used examples of sentiment analysis, in case you dig deeper you can find many like Review analysis for eCommerce stores, or how Mckinsey developed a sentiment analysis tool called City Voices for the urban-planning department of Brazil. You can always create a new use case too.
Challenges of Sentiment Analysis
Promise, we are reaching the end of this discussion, but instead of going the ordinary route of talking about the advantages of sentiment analysis, I would like to take a few seconds of your time to talk about the challenges. (Obviously, there will be challenges as we try to make a machine understand the emotions behind tones & texts).
Irony & Sarcasm: Morbius's re-release is a living example of this. Don’t think we need to come back to this again! We need a Chandler to train the data.
Context: For someone who has watched Friends: What you doin’? Has a different context to someone who didn’t. Therefore while processing data a bit of context might be required during the processing or pre-processing of data. For example, No! Sounds negative, but when the question would be: Did you hate it? Then the context would have been positive.
Comparison: A is better than B is understood. But when the statements are only better than having nothing it might be difficult to classify it as a negative statement.
Emojis: Welcome to the 20th century where more than the text emojis speak a louder language, especially with tweets. You either might need to remove them or assign their Unicode characters to sentiments separately. Questioning our statement? Look at the image below. What is the only difference between the two answers? An Emoji. The second one clearly shows sarcasm to a human, but to a bot. Probably questionable, if it is not tested well.
Definition of Neutral: This tag in the middle of positive and negative can be tricky to solve unless the data is trained well. The data should be objective, shouldn’t contain irrelevant information, understanding wishes, etc.
Thank you for reaching till the end of the discussion, hope you had fun reading this.
But in conclusion, what we want to say is the Sentiment Analysis or Opinion Mining or Contextual Analysis, call it whatever you want, is important especially now given the gold mine of data that is available for organizations to analyze, and with APIs being easier to integrate and bring out the data from.
What is stopping you from doing this? Want some help? Contact us today!