Semantic Analysis

What Is Semantic Analysis?

Simply put, semantic analysis is the process of drawing meaning from text. It allows computers to understand and interpret sentences, paragraphs, or whole documents, by analyzing their grammatical structure, and identifying relationships between individual words in a particular context.

‘I just love the way you made me wait 30 minutes to then have your customer service hang up on me!’

For humans, making sense of text is simple: we recognize individual words and the context. But for machines to identify sarcasm it takes a lot of analysis. The word comes from the Greek σαρκασμός (sarkasmós) which derives from σαρκάζειν (sarkázein) meaning “to tear flesh, bite the lip in rage, sneer”. Mind you it took us a long time to develop this powerful rhetorical device. Sarcasm was common in Greek tragedy, by which the full significance of a character’s words or actions are clear to the audience or reader although unknown to the character.

Applied Semantics Analysis

However, if trained, powered by machine learning algorithms and natural language processing, semantic analysis systems can understand the context of natural language, detect emotions and sarcasm, and extract valuable information from unstructured data, achieving human-level accuracy.

It’s an essential sub-task of Natural Language Processing (NLP) and the driving force behind machine learning tools like chatbots, search engines, and text analysis.

Semantic analysis-driven tools can help companies automatically extract meaningful information from unstructured data, such as emails, support tickets, and customer feedback.

Let’s take the word date. As noun it holds three meaning but we can also use it as a verb.

sea sunset beach couple. Semantics analysis
a social or romantic appointment or engagement.
“a college student on a date with someone he met in class”
brown framed eyeglasses on a calendar
the day of the month or year as specified by a number.
“what’s the date today?”
close up photo of raisins and dates
a sweet, dark brown oval fruit containing a hard stone, usually eaten dried.

Date as a Verb

have its origin at a particular time; have existed since.

“the controversy dates back to 1986″

How Semantic Analysis Works

Here is how each part of semantic analysis works:

  • Lexical analysis is the process of reading a stream of characters, identifying the lexemes and converting them into tokens that machines can read.
  • Grammatical analysis correlates the sequence of lexemes (words) and applies formal grammar to them so part-of-speech tagging can occur.
  • Syntactical analysis analyzes or parses the syntax and applies grammar rules to provide context to meaning at the word and sentence level.
  • Semantic analysis uses all of the above to understand the meaning of words and interpret sentence structure so machines can understand language as humans do.

Lexical semantics plays an important role in semantic analysis, allowing machines to understand relationships between lexical items (words, phrasal verbs, etc.):

  • Hyponyms: specific lexical items of a generic lexical item (hypernym) e.g. orange is a hyponym of fruit (hypernym).
  • Meronomy: a logical arrangement of text and words that denotes a constituent part of or member of something e.g., a segment of an orange
  • Polysemy: a relationship between the meanings of words or phrases, although slightly different, share a common core meaning e.g. I read a paper, and I wrote a paper)
  • Synonyms: words that have the same sense or nearly the same meaning as another, e.g., happy, content, ecstatic, overjoyed
  • Antonyms: words that have close to opposite meanings e.g., happy, sad
  • Homonyms: two words that are sound the same and are spelled alike but have a different meaning e.g., orange (color), orange (fruit). See all homophones here.

Semantic analysis also takes into account signs and symbols (semiotics) and collocations (words that often go together). 

Automated semantic analysis works with the help of machine learning algorithms. 

By feeding semantically enhanced machine learning algorithms with samples of text, you can train machines to make accurate predictions based on past observations. There are various sub-tasks in a semantic-based approach for machine learning, including word sense disambiguation and relationship extraction:

Word Sense Disambiguation

Word Sense Disambiguation is the automated process of identifying in which sense is a word used according to its context. Natural language is ambiguous and polysemic; sometimes, the same word can have different meanings depending on how we use it. 

What are polysemy and examples?

When a symbol, word, or phrase means many different things, we call that polysemy. The verb “get” is a good example of polysemy — it can mean “procure,” “become,” or “understand.”

One example of polysemy is the word ‘sound’. This word has a very large number of meanings. It has 19 noun meanings, 12 adjective meanings, 12 verb meanings, 4 meanings in verb phrases, and 2 adverb meanings. A word with an even greater number of meanings is another example, ‘set’.

Mathematical words: mean, power, even, volume, root.

subjectpolysememeaning
linguistsmorphologythe study of the form of words
biologythe study of the form of living organisms

How are polysemes and homonyms different?

Where the topic of homonymy becomes even more complicated is when we compare polysemes with homonyms. This is because polysemy can be thought of as a type of homonymy. Indeed, like true homonyms, polysemous words must have identical pronunciation and different meanings. However, of the two sets of homonyms listed above, only the second set (‘bow’ /boʊ/ and ‘bow’ /boʊ/) is additionally polysemous. This is because the defining feature of polysemous words is that their meanings, though different, are related.

brown gift box with black ribbon
Noun: Bow /bou/
woman aiming on a target
Noun: Bow /bou/
man in white thobe bowing down on red and blue rug
Verb: Bow /bau/

Semantic Analysis Techniques

Depending on the type of information you’d like to obtain from data, you can use one of two semantic analysis techniques: a text classification model (which assigns predefined categories to text) or a text extractor (which pulls out specific information from the text). 

Semantic Classification Models 

  • Topic classification: sorting text into predefined categories based on its content. Customer service teams may want to classify support tickets as they drop into their help desk. Through semantic analysis, machine learning tools can recognize if a ticket should be classified as a “Payment issue” or a “Shipping problem.”
  • Sentiment analysis: detecting positive, negative, or neutral emotions in a text to denote urgency. For example, tagging Twitter mentions by sentiment to get a sense of how customers feel about your brand, and being able to identify disgruntled customers in real time. 
  • Intent classification: classifying text based on what customers want to do next. You can use this to tag sales emails as “Interested” and “Not Interested” to proactively reach out to those who may want to try your product.

Semantic Extraction Models

  • Keyword extraction: finding relevant words and expressions in a text. We use this technique alone or alongside one of the above methods to gain more granular insights. For instance, you could analyze the keywords in a bunch of tweets that have been categorized as “negative” and detect which words or topics are mentioned most often.
  • Entity extraction: identifying named entities in text, like names of people, companies, places, etc. A customer service team might find this useful to automatically extract names of products, shipping numbers, emails, and any other relevant data from customer support tickets.

Automatically classifying tickets using semantic analysis tools alleviates agents from repetitive tasks and allows them to focus on tasks that provide more value while improving the whole customer experience. 

Tickets can be instantly routed to the right hands, and urgent issues can be easily prioritized, shortening response times, and keeping satisfaction levels high.

Insights derived from data also help teams detect areas of improvement and make better decisions. For example, you might decide to create a strong knowledge base by identifying the most common customer inquiries.

Latent Semantic Analysis (LSA)

Latent Semantic Analysis is a natural language processing method that analyzes relationships between a set of documents and the terms contained within. It uses singular value decomposition, a mathematical technique, to scan unstructured data to find hidden relationships between terms and concepts.

Latent Semantic Analysis is an information retrieval technique patented in 1988, although its origin dates back to the 1960s.

LSA is primarily used for concept searching and automated document categorization. As well as in software engineering (to understand source code), publishing (text summarization), search engine optimization, and other applications.

There are a number of drawbacks to Latent Semantic Analysis, the major one being is its inability to capture polysemy (multiple meanings of a word). The vector representation, in this case, ends as an average of all the word’s meanings in the corpus. That makes it challenging to compare documents.

Latent Semantic Indexing

In the world of search engine optimization, Latent Semantic Indexing (LSI) is a term often used in place of Latent Semantic Analysis. Some marketers believe using LSI can improve on-page SEO.  However, given that there are more recent and elegant approaches to natural language processing, the effectiveness of LSI in optimizing content for search is in doubt.

LSA and LSI are mostly used synonymously, with the information retrieval community usually referring to it as LSI. LSA/LSI uses SVD to decompose the term-document matrix A into a term-concept matrix U, a singular value matrix S, and a concept-document matrix V in the form: A = USV’.

If your page’s primary keyword is ‘credit cards,’ then LSI keywords would be things like “money,” “credit score,” “credit limit,” or “interest rate.”

What is the difference between LSA and SVD?

Usually when comparing documents we do so using the fundamental unit of the text; the actual terms themselves. LSA gives a way of comparing documents at a higher level than the terms by introducing a concept called the feature. The singular value decomposition (svd) is a way of extracting features from documents.

%d bloggers like this: