Site Loader

Abstract:

In the fields of sentiment and emotion recognition, bag of words
modeling has lately become popular for the estimation of valence in text. A
typical application is the evaluation of reviews of e. g. movies, music, or
games. In this respect we suggest the use of back-off N-Grams as basis for a
vector space construction in order to combine advantages of word-order modeling
and easy integration into potential acoustic feature vectors intended for
spoken document retrieval. For a fine granular estimate we consider data-driven
regression next to classification based on Support Vector Machines.
Alternatively the on-line knowledge sources ConceptNet, General Inquirer, and
WordNet not only serve to reduce out-of-vocabulary events, but also as basis
for a purely linguistic analysis. As special benefit, this approach does not
demand labeled training data. A large set of 100 k movie reviews of 20 years
stemming from Metacritic is utilized throughout extensive parameter discussion
and comparative evaluation effectively demonstrating efficiency of the proposed
methods.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Introduction

Emerging new Internet
technologies such as blogs or review websites encourage users to post their own
views on products, news articles, or movies. While a lot of effort has been put
into estimating valence of product reviews, movies have had less attention in
the past. This might be due to the fact that movie reviews are more difficult
to handle than e.g. product reviews. Turney 9 observed a discrepancy between
the orientation of words that describe the elements and the style of a movie,
leading to only 66% accuracy for movies in contrast to up to 84% for automobile
reviews. Pointwise mutual information is used to determine valence. The data
set consists of 410 reviews from different domains. Pang et al. 5 compare
different machine learning techniques and word level features for sentiment
classification of movie reviews on a corpus of 1 400 reviews. Best results are
achieved with Support Vector Machines (SVM) using word presence information as
features. Word frequency, N-grams, part-of-speech (POS), and word position
information do not improve performance in their case. A method based on
multiple knowledge sources and grammatical patterns is described in 12.
Features and opinion words are learned from training data, and the latter are
enhanced by facilitating WordNet. Feature-opinion pairs are then built using
grammatical patterns. Experiments are carried out on a corpus of 1100 reviews.
In 1, context-dependent opinion words are utilized in addition to general
ones. A number of linguistic rules are used to associate detected opinions to
topic features. Liu et al. 4 introduced a novel affect sensing system based
solely on world knowledge about everyday situations. The contributions of this
paper lie in two fields: First, to the knowledge of the authors, the largest
annotated corpus of movie reviews so far is presented, containing over 100 k
instances. Experiments with both machine-learning and linguistic methods are
carried out for the first time on a movie review database of that size. Second,
on-line knowledge sources are incorporated into both methods for improved
accuracy and attempt to resolve known issues. Additionally, we show how a
regression approach can resolve more subtle differences than “The
Godfather” – the best rated movie of the database –
vs. “Chaos” – on the lowest end.

Post Author: admin

x

Hi!
I'm Katherine!

Would you like to get a custom essay? How about receiving a customized one?

Check it out