In this article I’d like to give you a short introduction to a subset of Google’s machine learning capabilities: the natural language API. This API processes text snippets and can apply several analysis algorithms:
- analyze-entities: detects entities (proper nouns such as public places, art, etc.), and returns information about those entities.
- analyze-sentiment: identifies the prevailing emotional opinion within the text, especially to determine a writer’s attitude as positive, negative, or neutral.
- analyze-entity-sentiment: combines both entity analysis and sentiment analysis and attempts to determine the sentiment (positive and negative) expressed about the entities.
- analyze-syntax: extracts linguistic information, breaking up the given text into a series of sentences and tokens and providing further analysis on those tokens.
- classify-text: analyzes a document and returns a list of content categories that apply to the text found in the document.
Natural Language API
For the beginning, the easiest way to start is the gcloud CLI from the Cloud SDK. (I described how to set up the SDK in this article .):
1gcloud ml \ 2language analyze-entities \ 3--content "Sentence to be analyzed"
You can also use a HTTP API (but you need to generate an API key for that):
1curl -X POST https://language.googleapis.com/v1/documents:analyzeEntities?key=[YOUR_API_KEY] \ 2-H "Content-Type: application/json" \ 3-d @- << 'EOF' 4{ 5 "encodingType": "UTF8", 6 "document": { 7 "content": "Sentence to be analyzed", 8 "type": "PLAIN_TEXT" 9 } 10} 11EOF
For the following examples I’ll use the gcloud tool for the sake of brevity. I will walk you to two of the analysis methods.
Entity Analysis
We will start with this sentence:
1gcloud ml \ 2language analyze-entities\ 3 --content "The Louvre is the home of the beautiful Mona Lisa"
The result looks like this:
1{ 2 "entities": [ 3 { 4 "mentions": [ 5 { 6 "text": { 7 "beginOffset": 4, 8 "content": "Louvre" 9 }, 10 "type": "PROPER" 11 }, 12 { 13 "text": { 14 "beginOffset": 18, 15 "content": "home" 16 }, 17 "type": "COMMON" 18 } 19 ], 20 "metadata": { 21 "mid": "/m/04gdr", 22 "wikipedia_url": "https://en.wikipedia.org/wiki/Palais_du_Louvre" 23 }, 24 "name": "Louvre", 25 "salience": 0.9340278, 26 "type": "LOCATION" 27 }, 28 { 29 "mentions": [ 30 { 31 "text": { 32 "beginOffset": 40, 33 "content": "Mona Lisa" 34 }, 35 "type": "PROPER" 36 } 37 ], 38 "metadata": { 39 "mid": "/m/0jbg2", 40 "wikipedia_url": "https://en.wikipedia.org/wiki/Mona_Lisa" 41 }, 42 "name": "Mona Lisa", 43 "salience": 0.06597222, 44 "type": "PERSON" 45 } 46 ], 47 "language": "en" 48}
The language is identified as English. We also get a list of recognized entites, in our case these two: Louvre and Mona Lisa.
This example clearly shows that the entity detection is more than just scanning for nouns. The algorithm understood that the words “Louvre” and “home” refer to the same thing. Pretty smart, ain’t it?
For each entity all of its mentions in the text are listed. We have two mentions for Louvre and one for Mona Lisa. The salience (a value in the range [0,1]) of an entity denotes its importance within the sentence. So this sentences is mainly about the Louvre, since its salience is close to 1. Entities are also classified by their type . Louvre is a LOCATION, and Mona Lisa is a PERSON.
If available, the analysis also provides meta data about entities. Up to now these are the IDs from the Google Knowledge Graph Search API and a Wikipedia link.
Entity Sentiment Analysis
When running a sentiment analysis on the same sentence …
1gcloud ml \ 2language analyze-entity-sentiment\ 3 --content "The Louvre is the home of the beautiful Mona Lisa"
… each entity and all of its mentions have an additional sentiment score looking like this:
1"sentiment": { 2 "magnitude": 0.7, 3 "score": 0.3 4}
The score ranges from [-1.0, 1.0] which means from negative to positive sentiment. The magnitude ranges from 0.0 to infinity and denotes the strength of the emotion (both negative and positive).
Now we are playing around a little bit to make things more understable. I will list the overall sentiment values for the two entities with several variations of our sentence.
Input | Louvre score/magnitude | Mona Lisa score/magnitude | |
---|---|---|---|
1 | The Louvre is the home of the beautiful Mona Lisa | 0.3 / 0.7 | 0.9 / 0.9 |
2 | The famous Louvre is the home of the beautiful Mona Lisa | 0.6 / 1.3 | 0.9 / 0.9 |
3 | The Louvre is the home of the Mona Lisa | 0.0 / 0.0 | 0.0 / 0.0 |
4 | The boring Louvre is the home of the ugly Mona Lisa | -0.8 / 1.6 | -0.9 / 0.9 |
Rows 1 and 2 are clearly positive statements, all scores are higher than 0. Please note that Louvre in the first row has already a positive score although there is no positive adjective.
The third row shows a neutral statement and row 4 has an overall negative sentiment.
Summary
You learned how to use the natural language API and how to interpret the results for the entity analysis and the entity sentiment analysis.
In one of my next articles I will show you how to access this API from a Google Cloud Function .
If you are interested in AI and machine learning, have a look at our codecentric.AI channel on YouTube.
MerkenMerken
More articles
fromTobias Trelle
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
More articles in this subject area
Discover exciting further topics and let the codecentric world inspire you.
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog author
Tobias Trelle
Software Architect
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.