Computational Linguistics and Natural Language Processing

Stephen DeAngelis

November 15, 2018

The deeper we journey into the Information Age the more likely we are to find ourselves conversing with smart machines. For the technically-challenged among us, communicating with a computer can sound daunting. To ease such concerns, computer scientists and linguists are developing sophisticated Natural Language Processing (NLP) solutions. NLP makes so much sense in an increasingly technical world you would think companies would be quick to jump on the bandwagon. However, Bob Violino (@BobViolino) reports, “By most accounts, the top technology trend for this year is artificial intelligence, but one facet of AI has been slow to take off is natural language processing. Natural language processing is a component of artificial intelligence that enables computers to understand, interpret and manipulate human language, [NLP] continues to advance as organizations look for ways to leverage human-to-machine communications for a variety of applications.”[1]

NLP begins with the study of linguistics

Teaching computers to speak in languages familiar to users sounds easy enough; after all, almost every advanced science fiction show depicts human/computer interchanges. Linguistics, however, is not as straight forward as most people think. Linguist and software developer Pat Gaston explains, “Linguistics is the scientific study of language.”[2] He goes on to explain, “At linguistics core, we have 6 core areas.” They are:

  • Syntax – the study of how sentences are formed
  • Morphology – the study of how words are formed
    Phonetics – the study of speech sounds and how they are articulated
  • Phonology – the study of speech sounds and how/why some sounds won’t work together in a sequence (hint: try to say ‘dogs’ and keep the ‘s’ from sounding like a ‘z’)
  • Semantics – the study of the meaning of words (‘bitch’ vs. ‘notebook’ – why is one derogatory? debate time?)
  • Pragmatics – the study of how language is used

Gaston concludes, “Every linguist studies and is well versed in these areas, although most choose favorite areas and go on to choose a smaller set of linguistics to focus their energy on.” Gaston’s particular field is computational linguistics. He explains, “Computational linguistics is the interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective. … Computational Linguistics is the intersection of computer science and linguistics.” Before moving on, I would like to throw in one other important term: ontology.

An ontology is a graphical way of representing knowledge in a particular domain. Why is this important? Say, for example, a computer comes across the word “tank” during a user query. An ontology can help the computer determine whether the document is talking about a water tank, a fuel tank, a fish tank, a military tank, or some other kind of tank. An ontology interrelates concepts and facts with many-to-many relationships that are generationally more advanced and appropriate for artificial intelligence applications than standard relational databases. An ontology:

  • Shares common understanding of the structure of information. Enables reuse of domain knowledge. Separates domain knowledge from the operational knowledge.
  • Makes domain assumptions explicit and allows for encoding subtle and rich, multi-faceted relationships.
  • Naturally allows for perturbative analyses. Analyzes information and can expose non-obvious relationships.

Within an ontology, relationships can be very rich and can be used to model the complexities of real-world relationships. Enterra Solutions® cognitive computing products often leverage the common sense provided by an ontology.

The importance of Natural Language Processing

Whether you are technically-challenged or a seasoned data scientist, precision in language is important for understanding documents or conversations. Analysts from Expert System note, “Social media, blog posts, comments in forums, documents, group chat applications or dialog with customer service chatbots: Text is at the heart of how we communicate with companies online. Each type of communication, whether it’s a tweet, a post on LinkedIn or a review in the comments section of a website, contains potentially relevant, even valuable information that must be captured and understood by companies who want to stay ahead. Capturing the information isn’t the hard part. What’s really difficult is understanding what is being said, and doing it at scale.”[3] Because understanding context is difficult, Harrine Freeman (@harrine), asserts, “Great advances have been made in the Artificial Intelligence industry, but there is still a long way to go.”[4] She believes, however, cognitive technologies with embedded Natural Language Processing can move the needle. “Natural Language Processing technology or cognitive technology,” she explains, “broadens the power of information technology to perform tasks traditionally performed by humans such as account status, order status, performing transactions, and queries. Such technologies help companies improve the quality of services, reduce response time for customers, and reduce costs.” Robin Sandhu, a technology consultant, describes five different ways NLP is being or will be used.[5] They are:

  • Machine Translation: “The challenge of making the world’s information accessible to everyone, across language barriers, has simply outgrown the capacity for human translation. … But machine translation offers an even more scalable alternative to harmonizing the world’s information.”
  • Fighting Spam: “Almost everyone that uses email extensively has experienced agony over unwanted emails that are still received, or important emails that have been accidentally caught in the filter. The false-positive and false-negative issues of spam filters are at the heart of NLP technology.”
  • Information Extraction: “Financial decisions are impacted by news, by journalism which is still presented predominantly in English. A major task, then, of NLP has become taking these plain text announcements, and extracting the pertinent info in a format that can be factored into algorithmic trading decisions.”
  • Summarization: “Information overload is a real phenomenon in our digital age, and already our access to knowledge and information far exceeds our capacity to understand it. … Based on aggregated data from social media, can a company determine the general sentiment for its latest product offering? This branch of NLP will become increasingly useful as a valuable marketing asset.”
  • Answering Questions: “Though certainly improving, [answering specific questions] remains a major challenge for search engines, and one of the main applications of natural language processing research.”

Sandhu’s list is not exhaustive, but it does provide a glimpse into the importance of Natural Language Processing. Companies leveraging cognitive systems can make insights and analysis available to all employees thanks to NLP. This is becoming increasingly important. Charles Roe explains, “The business narrative coming out of many cubicles, board rooms, and various staff members puzzling over dashboards and spreadsheets is that there is too much data and it’s too hard to compile all together into meaningful information — data is useless in and of itself. It must become an information asset before real insight is gained.”[6] That’s where NLP or natural language generation plays an important role. Roe explains, “[Natural language generation] literally takes an organization’s data and transforms it into language, not standard computer-generated text that is overly technical and difficult to read, but natural human language that reads like a literate and well-educated person wrote it.” The process may begin with computational linguistics but the ultimate result is better decision-making.

Footnotes
[1] Bob Violino, “Natural language processing slow to reap benefits of AI enthusiasm,” Information Management, 17 October 2018.
[2] Pat Gaston, “What is computational linguistics?Quora, 14 November 2016.
[3] Staff, “Natural Language Process semantic analysis: definition,” Expert System, 14 November 201y7.
[4] Harrine Freeman, “Big Data and Artificial Intelligence: Advances in Natural Language Processing,” Dataversity, 1 March 2016.
[5] Robin Sandhu, “Applications of Natural Language Processing,” Lifewire, 7 June 2018.
[6] Charles Roe, “Natural Language Generation: A Revolution in Business Insight,” Dataversity, 19 May 2016.