Key Tasks Performed by Sentiment Analysis when Combined with Machine Learning

Photo by Steve Johnson from Pexels

Sentiment Analysis of any social media data actually consists of attitudes, assessments, as well as emotions which can be easily be considered as a way human think which is quite an essential part of any AI development company. Essentially, understanding and classifying the large collection of documents into actual positive and negative aspects are quite a difficult task. Especially, social networks such as Twitter, Facebook, and Instagram provide a platform in order to gather information about people’s sentiments as well as opinions. Also, considering the fact that people spend hours daily on social media and share their own opinion on various different topics actually helps us analyze sentiments better.

A lot of companies are now utilizing social media tools to provide various services as well as interact with customers.

Sentiment Analysis

(SA) actually classifies the polarity of given published tweets to positive and negative tweets in order to understand the actual sentiments of the public. Word2vec with Random Forest readily improves the accuracy of sentiment analysis significantly in comparison to traditional methods such as BOW and TF-IDF. Word2vec improves the quality of features by considering contextual semantics of words in a text hence improving the accuracy of machine learning and sentiment analysis.

Introduction

Essentially, the popularity of social media is increasing rapidly as it is easy in use and simple to create and even share images, video from those users who are actually technically unaware of social media. Certainly, there are many web platforms which are utilized to share non-textual content like videos, images, animations that actually allow users to add comments for each item. Also, YouTube is probably the most popular among them, with millions of videos uploaded by its various users and billions of comments for all of these uploaded videos. Artificial Intelligence solutions provide deeper insights into these platforms.

In the case of social media, especially inYouTube, detection of actual sentiment polarity is quite a challenging task due to some inherent limitations in current and present sentiment dictionaries. In present dictionaries, essentially there are no proper sentiments of terms created by the community. Also, it is quite clear from the studies conducted by that the web traffic is 20% and Internet traffic is 10% of the total YouTube traffic. There are many mechanisms of YouTube for the judgment of opinions and views of users on a video. These mechanisms include voting, rating, favourites, sharing. Analysis of user comments is a source through which userdata may be achieved for many applications.These applications may include a comment filtering, personal recommendation and user profiling. Different techniques are adopted for sentiment analysis of user comments and for this purpose sentiment lexicon called SentiWordNet is used.

Having internet access, much social media networking sites are providing enormous information on various topics, at real-time from any-place, any-time, and anywhere. On average, 6000 tweets are generated every second that corresponds to 500 million tweets in a day and 350,000 tweets in a minute. Hence twitter provides a huge source of information and data on recent trends, people opinions, the sentiment at real-time which can be used of data analytics and text analytical research and obtain valuable insights from it.

Identifying sentiments, opinion and emotions from textual information is known as Sentiment analysis and also named as opinion mining. It is one of the major research areas in natural language processing. The main objective of the sentiment analysis is to classify the data into positive or negative polarity in order to identify the sentiments of the public or given data. This analysis is applied in several fields such as Fraud detection, Healthcare, Finance, stock market, selling and purchasing items and several other business organizations to improve client reach and sales, brand building and many more. Real-time sentiment analysis can have a huge impact on various areas like politics, government and organization, elections, and businesses as they can quickly act on it and helps them gain benefits by taking necessary actions or decisions.

Word2Vec is a set of the unsupervised shallow two-layer neural network model that produces word embedding. Numeric description of word in the form of the vector is known as word embedding. Word2Vec considers contextual semantics of words to produce word embedding, i.e., instead of focusing on a single word and two or three words, it considers the context in which the word is occurring. Similar words with same or relative context are mathematically clustered together into vector space. That further conserves the semantic relationship between words. Hence word embedding produced using Word2Vec, can be used to train machine learning and classification algorithm to improve sentiment calcification accuracy base on context and semantic relation between words.

Random Forest is one of a versatile machine learning algorithm capable of performing both classification and regression tasks. Random Forest is essentially an ensemble learning model, where a few weak models combine to form a powerful model. In Random Forest, one grows multiple trees as opposed to a single decision tree.

Background

The explosion in social media data holding valuable, vast, rapidly-emerging unstructured information has created an opportunity to study public opinion and to know the sentiments of the people. Capturing the opinions about social events, company strategies, political movements, marketing campaigns, and product preferences, etc. brings in growing interest from the science and business communities to improve their performances. This has resulted in an emerging field of Opinion Mining or Sentiment Analysis.Adaption to change quickly is very crucial in this ever changing world for taking any actions or decisions faster.

Faster the data is available, faster decision or action should be taken and, in some cases, it has even necessary to prevent a loss of lives and save the lives. Such as, analysis of Twitter data is used for real-time event detections such as earthquakes in Japan, attacks in Paris, tsunami alerts, etc. It has also played quite a crucial role in US presidential elections in the year 2016 based on public opinions.

Trending topics on social media expose people’s interests, their intentions and most importantly, recent activities throughout the world. Interestingly, currently trending topic on social media may not be trending after one hour and might not be trending an hour ago and hence cloud computing services need to look at this aspect.

Establishing a connection 

A. Establishing Connection

1. Getting Twitter API keys – Initially, a user needs to create an account of the Twitter developer to obtain credentials such as API key, API secret,Access token along with Access token Secret required to access theTwitter API. The steps involved are as follows:

Step 1: User needs to visit https://developer.twitter.com/ to create an account if he/she doesn’t have a Developer Account.

Step 2: User needs to visit the following URL:https://developer.twitter.com/en/apps and login with Twitter developer credentials created in Step 1.

Step 3: User needs to create an Application by selecting an option and Create an App by filling the form with the required Application details and select the Create button.

Step 4: Then in the next page user needs to click on theKeys and Access Tokens tab, then under the Consumer API keys section the details of API Key and API secret are displayed which user can copy for use in the program for accessing the Application for collecting the tweets.

Step 5: In the same Keys and Access Token tab the userneeds to visit the Access token and Access token secretthe section then click and select the Create button to generate the Access token along with Access token secret. The user can copy them for accessing the Twitter application for collecting the tweets.

Twitter Data Extraction 

Successful connection establishment User needs to the required credentials which are API Key, API secret the key, Access token as well as the Access token secret in order to access the Twitter API.

a. Installing Twitter library- By using the required libraries, the user can connect to Twitter API and then download the tweets directly from Twitter through the Twitter API. There are multiple libraries available and supported by most of the programming languages.

b. Establishing a connection on Programming Language – In order to extract tweets, one needs to establish a secure connection between the programming language and Twitter. One will be directed to Twitter’s authorization screen. Click on Authorize App and note the PIN generated. Go back to the programming language and enter the PIN. Note, this only needs to be done once. Thereby, it can successfully access Twitter API and extract tweets.

c. Extracted tweets using the search twitter function and collect tweets in English without Retweets on the termIndian Elections.

Conclusion

Classification of polarity is one of the key roles of sentiment analysis for any custom software development company. Opinion mining and sentiment analysis have attracted increasing attention in natural language processing and data science research in recent years. Most of the previous sentiment analysis approaches mainly focused on the subjective part of the text such that considering word sentiment instead of context in which word is present. It is essential to classify polarity based on the context in which word is a present and semantic relationship between the words.


Amit Agrawal, is the Founder and COO at Cyber Infrastructure (P) Limited which is an custom software development company provides services such as custom application development, mobile application development, creative web design, Microsoft solutions, SAP solutions, open source development, Java development, Oracle development, big data solutions, digital experience solutions, CAD/CAM architectural services, testing automation, infrastructure automation and cloud, digital marketing, ITeS, etc.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top