Sentiment Analysis using Stanford NLP library

Image copyright StockUnlimited.com

Today I will use the Stanford NLP library to analyze sentiment for a review. I was working on a different project where it was necessary to rate customer feedbacks into positive or negative bucket. Since this project was on Java, I started looking for some libraries in Java as well. One of the libraries that came up was Stanford NLP library.

According to Stanford NLP group, CoreNLP is the one stop shop for all things in natural language processing for Java. Even though we are mostly interested in sentiment analysis here, I will also add some other features that I had tried out.

Start a new project

I created a new maven project in eclipse with the following dependencies in pom.xml.

 <!-- Core NLP -->
<dependency>
  <groupId>edu.stanford.nlp</groupId>
  <artifactId>stanford-corenlp</artifactId>
  <version>4.4.0</version>
</dependency>
<dependency>
  <groupId>edu.stanford.nlp</groupId>
  <artifactId>stanford-corenlp</artifactId>
  <version>4.4.0</version>
  <classifier>models</classifier>
</dependency>

<!-- Logger -->
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-api</artifactId>
  <version>1.7.36</version>
</dependency>
<dependency>
  <groupId>ch.qos.logback</groupId>
  <artifactId>logback-classic</artifactId>
  <version>1.2.11</version>
  <scope>runtime</scope>
</dependency>

Two CoreNLP libraries include the base and models. I also have logback for logging.

Annotators

CoreNLP works on pipeline processing. You will define a set of annotations that needs to be executed, and CoreNLP runs them. We extract result from a map for each annotator.

Let’s first initialize the annotators.

private StanfordCoreNLP CORENLP;
::::::
final Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, pos, lemma, ner, coref");
CORENLP = new StanfordCoreNLP(props);

Let’s see what we are requesting from these annotators.

AnnotationMeaning
tokenizeBreak into tokens
ssplitSplit words and sentences
parseRun parser on tokens
posPart of Speech extraction
lemmaFind the root word
nerExtract all named entities
corefIdentify cross references across sentences
CoreNLP Annotations used

With the pipeline completed, we start working on getting the different attributes for the sentence sent. I am sending the following sentence for analysis.

This hair oil is the best in Paris and worked great on my hair. I will definitely buy it again.

I will first extract all the cross references.

final Annotation doc = new Annotation(text);
CORENLP.annotate(doc);

for (final CorefChain cc : doc.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
  LOG.info(cc.toString());
}

Here is the list of cross references identified.

CHAIN4-["my" in sentence 1, "I" in sentence 2]
CHAIN5-["This hair oil" in sentence 1, "it" in sentence 2]

Next, we will identify the parts of speech and if it is a named entity.

final List<CoreMap> sentences = doc.get(SentencesAnnotation.class);
for(final CoreMap sentence: sentences) {
  for (final CoreLabel word: sentence.get(TokensAnnotation.class)) {
    LOG.info("Word: {}, POS: {}, NER: {}", 
             word.get(TextAnnotation.class),
             word.get(PartOfSpeechAnnotation.class),
             word.get(NamedEntityTagAnnotation.class));
  }

  final Tree tree = sentence.get(TreeAnnotation.class);
  LOG.info(tree.toString());

  final SemanticGraph graph = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
  final Collection<TypedDependency> deps = graph.typedDependencies();
  for (TypedDependency td : deps) {
    LOG.info(td.toString());
  }
}

I added line numbers for convenience to the code window. From line 3-8 we are dumping the extracted words, parts of speech and if this is a named entity. Given below are couple of examples.

Word: best, POS: JJS, NER: O
Word: Paris, POS: NNP, NER: CITY

The parts of speech in this will seem little peculiar. You can get a list of it from a PDF downloaded from here. I am extracting the relevant table here for convenience.

Penn Treebank POS target

Line 13-17 creates a dependency graph. I am not going into outputs from these codes. You can run them and visualize output.

Sentiment Analysis

Next, I will start with the sentiment analysis code. My idea was to capture the sentiments from a long review and count the number of positive statement vs number of negative sentiments and come up with a rating. So, final target for me was to get a count of positive, neutral and negative statements.

final Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
CORENLP = new StanfordCoreNLP(props);

int posSentimentVal = 0, ntrSentimentValue = 0, negSentimentValue = 0;

final Annotation doc = CORENLP.process(text);
for(final CoreMap sentence : doc.get(SentencesAnnotation.class)) {
  int sentimentVal;
  String sentimentName;
  LOG.info(sentence.toString());

  final Tree tree = sentence.get(SentimentAnnotatedTree.class);
  sentimentVal = RNNCoreAnnotations.getPredictedClass(tree);
  if (sentimentVal < 2) negSentimentValue += 1;
  else if (sentimentVal > 2) posSentimentVal += 1;
  else ntrSentimentValue += 1;

  sentimentName = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
  LOG.info("Sentiment: {}, value: {}", sentimentVal, sentimentName);
}

LOG.info("Positive: {}, Neutral: {}, Negative: {}", posSentimentVal, ntrSentimentValue, negSentimentValue);

In this code, we add ‘sentiment’ as an annotator for CoreNLP. We then start breaking up in individual sentences and analyzing each of them. The sentiment value is always between 1 and 5. Here 1: Extremely Negative, 2: Negative, 3: Neutral, 4: Positive and 5: Extremely Positive. For ease, I have put both normal and extreme cases in the same bucket. I tested with three different sentences. Here is the output for each.

--------------------------------
This hair oil is the best in Paris and worked great on my hair.
Sentiment: 3, value: Positive
I will definitely buy it again.
Sentiment: 3, value: Positive
Positive: 2, Neutral: 0, Negative: 0
--------------------------------
The movie sucked.
Sentiment: 1, value: Negative
One of the worst movies in my life.
Sentiment: 1, value: Negative
Positive: 0, Neutral: 0, Negative: 2
--------------------------------
This airline was great a year back.
Sentiment: 3, value: Positive
But now the service is deteriorating
Sentiment: 2, value: Neutral
Positive: 1, Neutral: 1, Negative: 0
--------------------------------

Conclusion

So, just with a few lines of code we were able to generate a sentiment analyzer for text review. It mostly worked well. There were instances where CoreNLP showed neutral where sentiments were definitely negative.

Hope you found this useful. Ciao for now!