Artificial Intelligence and the Semantic Web

Stephen DeAngelis

January 14, 2013

James Hendler, the Tetherless World Professor of Computer and Cognitive Science at Rensselaer Polytechnic Institute, asserts that accurately interpreting natural-language questions and providing better responses to those questions is “one of the hardest problems on the World Wide Web.” [“2012 in Review: The Semantic Web,” Encyclopaedia Britannica Blog, 14 December 2012] He reports, however, that during this past year “computer programmers working on the Web were able to take advantage of an emerging technology that may hold the key to helping to solve … how to accurately interpret natural-language questions and provide better responses to the questions that users are asking.” He continues:

“This technology, known as the Semantic Web, is providing new techniques that can be used to help create ‘intelligent agents’ that can allow users to find the answers to their queries more precisely. Suppose, for example, you want to know the average size of an elephant. You could go online and type ‘what is the average size of an elephant’ into a search engine. Unfortunately, most search engines will not tell you the answer. Rather, they will identify many documents that might include the information that you are seeking. Instead, you are likely to find articles about the average weight of an elephant, some general articles about elephants, and maybe an article about the average size of an elephant’s foot. Clearly, the search engine does not really understand what you want to know. If, on the other hand, you have an Apple iPhone running the Siri™ application, which was introduced in 2011, you can ask it the same question, and you will see a screen telling you the average length of an elephant—'(18 to 25) feet’—and a number of other relevant facts about elephants. Siri, it seems, figured out what you meant in your question and produced a single relevant, and (one hopes) correct answer.”

For those unfamiliar with term “intelligent agent,” Wikipedia describes it this way:

“In artificial intelligence, an intelligent agent (IA) is an autonomous entity which observes through sensors and acts upon an environment using actuators (i.e., it is an agent) and directs its activity towards achieving goals (i.e., it is rational). Intelligent agents may also learn or use knowledge to achieve their goals. They may be very simple or very complex: a reflex machine such as a thermostat is an intelligent agent. … Intelligent agents are often described schematically as an abstract functional system similar to a computer program. For this reason, intelligent agents are sometimes called abstract intelligent agents (AIA) to distinguish them from their real world implementations as computer systems, biological systems, or organizations. Some definitions of intelligent agents emphasize their autonomy, and so prefer the term autonomous intelligent agents. Still others … considered goal-directed behavior as the essence of intelligence and so prefer a term borrowed from economics, ‘rational agent’.”

Hendler admits that despite applications like Siri, “there is still a long way to go.” When Siri was first introduced, Alexis Madrigal, a senior editor at The Atlantic, reported that Siri is “a voice-driven artificial intelligence system created with DARPA funds and, … if the hype holds up, the software will be the biggest deployment of human-like AI the world has seen.” [“Siri: The Perfect Robot for Our Time,” 12 October 2011] Hendler agrees that “even a few years ago, the idea of a working ‘intelligent agent’ on a phone seemed to be the stuff of science fiction.” He continues:

“Indeed, the quest for more intelligent computers has been a dream of artificial intelligence researchers for many years. … Human language is an amazingly flexible tool, and the understanding of how our brains process language and produce answers remains one of the significant challenges for modern science. In the past few years, however, with the growing power of computer devices and the huge amount of data available on the Web, computer programmers have been learning to ‘cheat,’ producing applications that can process massive amounts of textual information to find answers to questions in a human-seeming way.”

Hendler goes on to discuss the well-known Jeopardy! challenge back in February 2011 during which IBM’s Watson computer took on some the show’s best champions and beat them. The three-day event made headlines around the world and IBM has capitalized on Watson’s success through a series of advertisements. Remarking on Watson’s triumph, innovator and entrepreneur Ray Kurzweil wrote, “The point has been made: Watson can compete at the championship level—and is making it more difficult for anyone to argue that there are human tasks that computers will never achieve.” [“When Computers Beat Humans on Jeopardy,” The Wall Street Journal, 17 February 2011] Hendler admits that IBM used more brute computing force than elegance to win the match. “The techniques developed at IBM work a surprising amount of the time,” he writes, “and many of the sources available on the Web make this match-based process more usable. Unfortunately, there are significant limits to what Watson can do and how far this technique can be pushed.” He goes on to discuss why the ambiguity of language makes the Semantic Web so hard to achieve. He writes:

“The key problem is one of semantics—that is, the meaning of the words and symbols that people use in their day-to-day lives. If someone asks, ‘Can you pass the salt?’ people typically understand that they are not inquiring about a capability but are merely asking for the salt. When told to ‘put the fish in the tank,’ a person would generally look for a container of water and not an army tank or any of the many other things in the world for which the term tank might be used. Human language is inherently ambiguous, with most words having multiple meanings, and the context in which a word is used makes a huge difference to its intended results. Despite the vast number of documents on the World Wide Web, estimated to be in the tens of billions, the context and use of documents remains something difficult to pin down, and even the best programs are limited. Without the context, identifying whether the word Gates is being used to describe a person or a garden item—and if the former, which person—is hard.”

I agree with Hendler that the world of semantics is complex. I know this because Enterra Solutions currently uses Natural Language Processing to parse out terms used by our clients to help them discover relationships and insights about their business. Hendler continues:

“One technique that has been proving very powerful is for humans to provide ‘hints’ to the computer by making certain kinds of semantics available in online documents. Using a technology known as the Semantic Web, developers putting information on the Web can provide machine-readable annotations that make it clear, for example, whether the word Apple is intended to describe the computer company, the fruit, or something else. These annotations, in the form of embedded markup within a page or ontology descriptions that supply separate information, or metadata, about the items in a document or database, provide powerful techniques that can be used on the Web.”

Hendler notes that the concept of a semantic web was first “envisioned as a crucial part of the Web by its inventor, Sir Tim Berners-Lee, who unveiled his idea for the Semantic Web at the first International Conference on the World Wide Web in 1994, only a few years after he began developing the Web in 1989.” He then discusses how annotations have allowed programmers to start linking databases. He writes:

“The Semantic Web allows more and more of the structured data preferred by computer programs to be made sharable between applications and Web sites. … Linking databases using semantic descriptions has become known as ‘linked data,’ and it is a powerful emerging technology on the Web. As these linked data start to increasingly interact with the semantic annotations on Web pages, new and dynamic techniques can be designed to better match capabilities and needs, to disambiguate complex terms, and to provide for better question answering on the Web.”

Hendler concludes, “Over the next decade users are likely to see the Web appearing to ‘get smarter’ as these new Semantic Web capabilities are more and more widely used.” The following six-minute video by Manu Sporny provides a good primer about the Semantic Web.

The most important thing to note in the comments by Hendler, Berners-Lee, and Sporny is that the Semantic Web involves tagging or annotating data. Many people confuse the Semantic Web with semantic interpretation, which is a much more difficult thing to achieve. I will discuss that topic in a future post.