Google’s Knowledge Graph Brings Semantic Web a Step Closer

Stephen DeAngelis

August 10, 2012

This past May Google quietly introduced the world to its “Knowledge Graph.” Gary Marcus writes, “In the short-term, Knowledge Graph will not make a big difference in your world. … But what’s under the hood represents a significant change in engineering for the world’s largest search-engine company.” [“The Web Gets Smarter,” The New Yorker, 23 May 2012] Google hopes that its Knowledge Graph is a hit with customers and yesterday it “began rolling out Knowledge Graph, a new feature of its ubiquitous search engine, to Canada and other English-speaking nations other than the United States.” [“Google Knowledge Graph aims to parse search words for deeper meaning,” by Matt Hartley, Financial Post, 8 August 2012] Back in May, Marcus wrote:

“In a decade or two, scientists and journalists may well look back at this moment as the dividing line between machines that dredged massive amounts of data — with no clue what that data meant — and machines that started to think, just a little bit, like people. Since its beginning, Google has used brute force as its main strategy to organize the Internet’s knowledge, and not without reason. Google has one of the largest collections of computers in the world, wired up in parallel, housing some of the largest databases in the world. Your search queries can be answered so quickly because they are outsourced to immense data farms, which then draw upon enormous amounts of precompiled data, accumulated every second by millions of virtual Google ‘spiders’ that crawl the Web. In many ways, Google’s operation has been reminiscent of I.B.M.’s Deep Blue chess-playing machine, which conquered all human challengers not by playing smarter but by computing faster. Deep Blue won through brute force and not by thinking like humans do. The computer was all power, no finesse.”

I’ve made that point before about IBM’s “smart” machines that have taken on and bested their human counterparts (i.e., that they basically use brute force rather than finesse). Google’s Knowledge Graph uses a more sophisticated approach to gather and present knowledge. The following video provides a brief overview of what Google is trying to accomplish.

I’m obviously a fan of this approach because my company, Enterra Solutions, uses a similar approach to create a Knowledge Base for clients that takes advantage of a sophisticated ontology and artificial intelligence. Terrence O’Brien notes, “The ability to discern your intended search goal and present you with relevant information immediately, as opposed to just a page of links, is the next step in search technology and the secret sauce powering the somewhat creepy Google Now in Jelly Bean.” [“Google Knowledge Graph coming to all English-speaking nations tomorrow, adds lists to results,” Engadget, 8 August 2012] He continues:

“The Graph has also received a few enhancements and tweaks, including the ability to answer queries with collections and lists. So, say you’re looking for rides in Disney World, a thumbnail of every attraction will appear at the top in a horizontally scrollable list. How much longer till our Spanish, French or Chinese speaking pals can get in on the action? That’s anyone’s guess. But, if English is the native tongue of your home, then rest assured your flavor of Google has just gotten a little bit smarter.”

Hartley writes, “Google Inc. believes its latest innovation brings Web surfers a step closer to a truly futuristic experience, one that’s kind of like having Hal 9000 from 2001: A Space Odyssey run your searches — with a profound (but preferably less malicious) understanding of the universe and your place in it.” Undoubtedly some critics will see a malicious intent behind the Knowledge Graph (mostly, that Google will be censoring what information actually makes it to your device). But algorithms are already deciding what information you receive (see my post entitled The Big Data Dialogues, Part 5: Algorithms). Marcus reminded us that the brute force approach is not always bad. He wrote:

“Sometimes, of course, power has its advantages. Google’s immense computing resources have allowed them to revolutionize how classical problems in artificial intelligence are solved. Take the process of spell-checking. One of the features that first made word processing really popular was the automatic spell-checker. Engineers at places like Microsoft catalogued the most common errors that people made, such as doubled letters and transpositions (‘poeple’), and built upon these patterns to make educated guesses about users’ intentions.”

We all know, however, that there is more than one way to solve a problem. Marcus wrote that “Google solves the spelling-correction problem entirely differently — and much more efficiently — by simply looking at a huge database of users correcting their own errors. What did users most often type next after failing to find what they wanted with the word ‘peopple’? Aha, ‘people.'” He continued:

“Google’s algorithm doesn’t know a thing about doubled letters, transpositions, or the psychology of how humans type or spell, only what people tend to type after they make an error. The lesson, it seemed, was that with a big enough database and fast enough computers, human problems could be solved without much insight into the particulars of the human mind.”

That’s because machines embedded with artificial intelligence programs can learn on their own. Marcus reported that “for the last decade, most work in artificial intelligence has been dominated by approaches similar to Google’s: bigger and faster machines with larger and larger databases.” But he also wrote that “no matter how capacious your database is, the world is complicated, and data dredging alone is not enough.” He noted:

“Even in a Web search, Google’s bread and butter, brute force is defeated often, and annoyingly, by the problem of homonyms. The word ‘Boston,’ for instance, can refer to a city in Massachusetts or to a band; ‘Paris’ can refer to the city or to an exhibitionist socialite. To deal with the ‘Paris” problem, Google Knowledge Search revives an idea first developed in the nineteen-fifties and sixties, known as semantic networks, that was a first guess at how the human mind might encode information in the brain. In place of simple associations between words, these networks encode relationships between unique entities. Paris the place and Paris the person get different unique I.D.s — sort of like bar codes or Social Security numbers — and simple associations are replaced by (or supplemented by) annotated taxonomies that encode relationships between entities. So, ‘Paris1’ (the city) is connected to the Eiffel tower by a ‘contains’ relationship, while ‘Paris2’ (the person) is connected to various reality shows by a ‘cancelled’ relationship. As all the places, persons, and relationships get connected to each other, these networks start to resemble vast spiderwebs. In essence, Google is now attempting to reshape the Internet and provide its spiders with a smarter Web to crawl.”

Ben Gomes, Google’s search vice-president, told Hartley, “Just based on the word, it’s not really possible to know what you’re talking about. So, what we embarked on was a quest to understand the real-world objects that underlie these words. In many ways, this is one of the biggest launches Google has ever done. It touches every part of search, and it’s just the beginning of a long trajectory in front of us, turning search into something that understands and translates your words into the real-world entities you’re talking about.” Marcus reported that research into semantic networks has been ongoing for decades. During the 1980s, however, he noted that neural networks received more attention because they more closely imitated how human brains functions. He noted, however, “Neural nets apply a blanket learning rule that treats all associations as equal, differentiated only by how often they appear in the world.” He continued:

“This battle between structured knowledge and huge databases of statistics echoes one of the longest debates in psychology and philosophy, the debate between ‘nativists’ (like Plato, Kant, and, in recent times, Noam Chomsky and Steve Pinker) that believe the mind comes equipped with important basic knowledge, and ’empiricists’ (like John Locke and B. F. Skinner) who believed the mind starts as blank slate, with virtually all knowledge acquired through association and experience.”

The reason that the Knowledge Graph represents a break with Google’s past, Marcus noted, is that “Google used to be essentially an empiricist machine, crafted with almost no intrinsic knowledge, but endowed with an enormous capacity to learn associations between individual bits of information.” With the introduction of the Knowledge Graph, he writes, “Google is becoming something else, a rapprochement between nativism and empiricism, a machine that combines the great statistical power empiricists have always yearned for with an enormous built-in database of the structured categories of persons, places, and things, much as nativists might have liked.” He continued:

“There’s very good reason for Google to move in this direction. As the pioneering developmental psychologist Elizabeth Spelke … put it: ‘If children are endowed [innately] with abilities to perceive objects, persons, sets, and places, then they may use their perceptual experience to learn about the properties and behaviors of such entities… It is far from clear how children could learn anything about the entities in a domain, however, if they could not single out those entities in their surroundings.’ The same goes for computers.”

Obviously, Marcus believes that Google is moving in the right direction. He concluded:

“I personally have long envied the way in which Google and its main competitor, Microsoft’s Bing, keep so much information at their virtual fingertips, but we humans have a few tricks left. It’s refreshing to see that computer engineers still occasionally need to steal a page from the human mind.”

In past posts about innovation, I’ve noted that humans are still learning to appreciate how millions of years of evolution have allowed animals and plants to create solutions to some of the knottiest challenges they face in nature. How to deal with data is another area where I suspect we will come to understand that nature has more lessons to teach us.