Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof.
36 Slides316.00 KB
Sentiment Analysis Presented by Aditya Joshi 08305908 Guided by Prof. Pushpak Bhattacharyya IIT Bombay
What is SA & OM? Identify the orientation of opinion in a piece of text The movie was fabulous! The movie stars Mr. X The movie was horrible! Can be generalized to a wider set of emotions
Motivation Knowing sentiment is a very natural ability of a human being. Can a machine be trained to do it? SA aims at getting sentiment-related knowledge especially from the huge amount of information on the internet Can be generally used to understand opinion in a set of documents
Tripod of Sentiment Analysis Cognitive Science Sentiment Analysis Machine Learning Natural Language Processing
Contents : Challenges Lexical Resources Subjectivity detection SA Approaches Applications
Challenges Contrasts with standard text-based categorization Sentiment of a word Domain dependent Sarcasm usesof words ofis the sentences/words that Mere presence words is w.r.t. the a polarity tothe represent contradict the overall sentiment Indicative of category Sarcasm domain. another the arepolarity. in majority in of case ofsettext categorization. Thwarted expressions Example: ‘unpredictable’ Example: are good, Example:The Theactors perfume is so Not the case with the music issteering brilliant appealing. For ofand a car, amazing that I suggest you wear it sentiment analysis Yet, the movie to strike a chord. with yourfails windows shut For movie review,
SentiWordNet Lexical resource for sentiment analysis Built on the top of WordNet synsets Attaches sentiment-related information with synsets
Quantifying sentiment Positive Negative Subjective Polarity Term sense position Objective Polarity Each term has a Positive, Negative and Objective score. The scores sum to one.
Building SentiWordNet Ln, Lo, Lp are the three seed sets Iteratively expand the seed sets through K steps Train the classifier for the expanded sets
Expansion of seed sets al ee s so an to ny m y Ln Lp The sets at the end of kth step are called Tr(k,p) and Tr(k,n) Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)
Committee of classifiers Train a committee of classifiers of different types and different K-values for the given data Observations: – Low values of K give high precision and low recall – Accuracy in determining positivity or negativity, however, remains almost constant
WordNet Affect Similar to SentiWordNet (an earlier work) WordNet-Affect: WordNet annotated affective concepts in hierarchical order Hierarchy called ‘affective domain labels’ – behaviour – personality – cognitive state
Subjectivity detection Aim: To extract subjective portions of text Algorithm used: Minimum cut algorithm
Constructing the graph Why graphs? Nodes and edges? Individual Scores Association scores T : Threshold – maximum distance upto which sentences may be considered proximal f: The decaying function i, j : Position numbers Nodes: Sentences of the document and source & sink Prediction whethertwo ToPrediction model item-specific whether Source &issink represent the sentence subjective or not and pairwise information sentences should have the two classes of sentences independently. theInd same subjectivity level sub(si) Edges: Weighted with either of the two scores
Constructing the graph Build an undirected graph G with vertices {v1, v2 ,s, t} (sentences and s,t) Add edges (s, vi) each with weight ind1(xi) Add edges (t, vi) each with weight ind2(xi) Add edges (vi, vk) with weight assoc (vi, vk) Partition cost:
Example Sample cuts:
Results (1/2) Naïve Bayes, no extraction : 82.8% Naïve Bayes, subjective extraction : 86.4% Naïve Bayes, ‘flipped experiment’ : 71 % Subjective Document Document Subjectivity detector POLARITY CLASSIFIER Objective
Results (2/2)
Approach 1: Using adjectives Many adjectives have high sentiment value – A ‘beautiful’ bag – A ‘wooden’ bench – An ‘embarrassing’ performance An idea would be to augment this polarity information to adjectives in the WordNet
Setup Two anchor words (extremes of the polarity spectrum) were chosen PMI of adjectives with respect to these adjectives is calculated word PMI excellent PMI poor Polarity Score (W) PMI(W,excellent) – PMI (W, poor)
Experimentation K-means clustering algorithm used on the basis of polarity scores The clusters contain words with similar polarities These words can be linked using an ‘isopolarity link’ in WordNet
Results Three clusters seen Major words were with negative polarity scores The obscure words were removed by selecting adjectives with familiarity count of 3 – the ones that are not very common
Approach 2: Using AdverbAdjective Combinations (AACs) Calculate sentiment value based on the effect of adverbs on adjectives Linguistic ideas: Adverbs of affirmation: certainly Adverbs of doubt: possibly Strong intensifying adverbs: extremely Weak intensifying adverbs: scarcely Negation and Minimizers: never
Moving towards computation Based on type of adverb, the score of the resultant AAC will be affected Example of an axiom: Example : ‘extremely good’ is more positive than ‘good’
AAC Scoring Algorithms 1. Variable Scoring Algorithm 2. Adjective Priority Scoring Algorithm 3. Adverb first scoring algorithm
Scoring the sentiment on a topic Rel (t) : Sentences in d that reference to topic t s : Sentence is Rel (t) Appl (s) : AACs with positive score in s Appl-(s) : AACs with negative score in s Return strength
Findings APSr with r 0.35 worked the best (Better correlation with human subject) – Adjectives are more important than adverbs in terms of sentiment AACs give better precision and recall as compared to only adjectives
Approach 3: Subject-based SA Examples: The horse bolted. The movie lacks a good story.
Lexicon subj. bolt b VB bolt subj Argument that receives the sentiment (subj./obj.) subj. lack obj. b VB lack obj subj Argument that sends the sentiment (subj./obj.) Argument that receives the sentiment (subj./obj.)
Lexicon Also allows ‘\S ’ characters Similar to regular expressions E.g. to put \S to risk – The favorability of the subject depends on the favorability of ‘\S ’.
Example The movie lacks a good story. The movie lacks \S . Lexicon : G JJ good obj. B VB lack obj subj. Steps : 1) Consider a context window of upto five words 2) Shallow parse the sentence 3) Step-by-step calculate the sentiment value based on lexicon and by adding ‘\S ’ characters at each step
Results Description Precision Recall Benchmark Mixed 94.3% corpus statements 28% Open Test corpus 24% Reviews of 94% a camera
Applications Review-related analysis Developing ‘hate mail filters’ analogous to ‘spam mail filters’ Question-answering (Opinion-oriented questions may involve different treatment)
Conclusion & Future Work Lexical Resources have been developed to capture sentiment-related nature Subjective extracts provide a better accuracy of sentiment prediction Several approaches use algorithms like Naïve Bayes, clustering, etc. to perform sentiment analysis The cognitive angle to Sentiment Analysis can be explored in the future
References (1/2) Tetsuya Nasukawa, Jeonghee Yi. ‘Sentiment Analysis: Capturing Favorability Using Natural Language Processing’. In K-CAP ’03, Florida, pages 1-8. 2003. Alekh Agarwal, Pushpak Bhattacharyya. ‘Augmenting WordNet with polarity information on adjectives’. In K-CAP ’03, Florida, pages 1-8. 2003. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining Andrea Esuli, Fabrizio Sebastiani ‘Machine Learning’, Han and Kamber, 2nd edition, 310-330. http://wordnet.princeton.edu Farah Benamara, Carmine Cesarano, Antonio Picariello, VS Subrahmanian et al; ‘Sentiment Analysis: Adjectives and Adverbs are better than Adjectives Alone’; In ICWSM ’2007 Boulder, CO USA, 2007.
References (2/2) Jon M. Kleinberg; ‘Authoritative Sources in a Hyperlinked Environment’ as IBM Research Report RJ 10076, May 1997, Pgs. 1 – 34. www.cs.uah.edu/ jrushing/cs696-summer2004/notes/Ch8Supp.ppt Opinion Mining and Sentiment Analysis, Foundations and Trends in Information Retrieval, B. Pang and L. Lee, Vol. 2, Nos. 1–2 (2008) 1–135, 2008. Bo Pang, Lillian Lee; ‘A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts’; Proceedings of the 42nd ACL; pp. 271–278; 2004. http://www.cse.iitb.ac.in/ veeranna/ppt/Wordnet-Affect.ppt