Importance of Latent Semantic Indexing (LSI) in SEO: latent semantic content

Latent Semantic Indexing (LSI) is a method employed by Google when indexing and retrieving documents on the web in order to return more relevant results for a search query. Given the sheer size of the Internet, any relevant information we search for may show in the midst of irrelevant pages. This is where latent semantic content comes in very handy.

Latent Semantic Indexing was originally used in Google’s Adsense program to extract the most relevant ads on a particular web site. In its effort to use lexical semantic indexing concepts and ideas in its search rankings, Google even bought a California-based company called Applied Semantics* (in Santa Monica) in 2003 for US$102 million (source Wikipedia). The company had developed a proprietary search algorithm that was based on word meanings and built upon an underlying lexical database for the English language called WordNet (created and maintained at Princeton University).

*Semantics is the study of word meaning. Applied semantics (in linguistics) is the study and application of semantics, for example, in advertising, text analysis, web traffic, search engine queries, web page rankings, word indexing.

Latent Semantic Content

Meaning of the words contained in this term (according to the Free Online Dictionary by Farlex):

  • Latent: “present or potential but not evident or active” (synonyms: potential, possible),
  • Semantic: “Of or relating to meaning, especially meaning in language”,
  • Content: “The material, including text and images, that constitutes a publication or document.”

As the name suggests, the term comprises semantically related terms that are latent in a collection of text.

Simply said, latent semantic content refers to words that are thematically connected.

For example, “The Big 3” – Dwyane Wade, LeBron James, and Chris Bosh (2010-present), Florida and basketball are all thematically connected to Miami Heat. If the topic of your Web page is Miami Heat, the web page doesn’t need the keyword Miami Heat repeated numerous times in your content in order to rank for that term.

Search engines consider a web site’s overall theme (or the quantity of your web pages related to the same topic including internal links), and then rank the page based on relevance and authority, not on targeted keyword density.

Latent semantic content allows a content writer to create content relevance through thematically linked terms while maintaining the organic, natural flow of content.

How to Find & Use Latent Semantic Content

The best two sources for latent semantic keyword research are Google’s Latent Semantic Tool and Quintura.com.

Google’s Latent Semantic Tool

Navigate to www.google.com

Type ~example as your search a term.

Example #1:

Type ~car as your search term.

Note the bolded words and other words appearing in link titles and descriptions: BMW North America, automobiles, new & used cars for sale, auto dealers, car reviews, auto trader, BMW UK (see Figure 1).

Figure 1

Example #2:

Type ~vehicle as your search term.

Note the bolded words and other words appearing in link titles and descriptions: new & used cars for sale, auto dealers, car reviews, auto trader, cars, bikes, vans, trucks and caravans, car prices…auto leasing (see Figure 2).

Figure 2

Example #3:

Type ~snow as your search term.

Note the bolded words and other words appearing in link titles and descriptions: make-a-flake – a snowflake maker, snowboards, snowboarding, list of resorts, ski, ski clothing & equipment retailer, ski vacations and ski resorts are also on the first return page (see Figure 3).

Figure 3

Quintura.com

Navigate to www.quintura.com.

Example #1:

Type car as your search term.

Note the words in the red frame; in the world of organic search, they are considered thematically related to the word car (surprised to see smell, research or carrier in the list?) (see Figure 4).

Figure 4

Example #2:

Type vehicle as your search term.

Note the words in the red frame; they are considered thematically related to the word vehicle (e.g. new, history, shipping, etc.) (see Figure 5).

Figure 5

Example #3:

Type snow as your search term.

Note the words in the red frame; they are considered thematically related to the word snow (e.g. gun, shovel) (see Figure 6).

Figure 6

NOTE!

If you click any of these related terms, you will be presented with another list on a deeper level introducing the latent semantic words for the term you have chosen. For example, if you click snow ? shovel, you’ll find related terms such as square shovel, ergonomic, garden, electric, etc.) (see Figure 7).

What is Latent Semantic Indexing (LSI)?

LSI is a method for retrieving documents relevant to a search without necessarily containing the specific keyword entered by a user. It allows the search engine to return documents outside our specific search phrase, but which are still relevant to our search. A key characteristic of latent semantic indexing is its ability to “weed out” the conceptual content of a body of text by establishing associations between words or group of words that occur in similar contexts.

The Purpose of LSI

Latent semantic indexing turned out to be a very helpful method of solving a number of conceptual matching problems. Based on a mathematical technique and how we use language in real life, LSI is used to:

1) perform automated document categorization

Crawlers, spiders robots or “bots” assign documents to one or more categories based on their similarity to the conceptual content of those categories. Latent semantic indexing uses example documents to set up a conceptual basis for each category, or it groups documents based on their conceptual similarity to each other without using example documents based on so called dynamic clustering (which is very useful when working with an unknown collection of unstructured text).

2) overcome the constraints of synonymy and polysemy

Two of the most problematic constraints of Boolean keyword queries are synonymy (multiple words with similar meanings) and polysemy (words with more than one meaning) .

For example:

  1. answer and reply are synonyms
  2. apple can refer to fruit or to computer, Quick Time or Mac.

Generally, these word categories are the cause of mismatches in the vocabulary used by content writers and users of information retrieval systems. As a result, Boolean keyword queries often miss relevant information and return irrelevant results.

3) perform cross-linguistic concept searching

If there are no results for queries made in one language, conceptually similar results will be returned even if they are composed in another language or multiple languages.

4) deal effectively with sparse, ambiguous, and contradictory data

LSI overcomes the question of unreadable characters, misspelled words, typographical errors, etc.

Other uses of latent semantic indexing: in Customer Management (for example, online customer support), automatic keyword annotation of images, Software Engineering (understanding software source code), System Administration (filtering spam), etc.

Final notes on LSI

Anchor text when acquiring backlinks

One of the top Internet marketing experts, Kristopher B. Jones, strongly recommends mixing up your anchor text when acquiring backlinks. If your web page deals with search engine optimization, but the only anchor text you use in backlinks is “search engine optimization,” search engines may see this as unnatural and penalize your web page. Instead, you should use various words in the field of search engine optimization. Anchor text such as SEO, keyword research, linking strategy, etc. enriches your link portfolio.

How search engines view web pages

LSI looks at word patterns across documents on a statistical and mathematical basis and although it does focus on word meaning, it doesn’t pay attention to every single word on a web page. Some words carry meaning (content words such as car, vehicle, snow, apple, computer, love, devotion, etc), but some don’t carry meaning (functional words such as a, an, the, but, to, etc).

LSI ignores function words and irrelevant terms, and it focuses on terms with semantic meaning related to the topic.

Conclusion

Although Google doesn’t rely entirely on LSI for finding relevant results, the use of latent semantic indexing has significantly increased in recent years. The idea of working with text on a semantic basis is now seen as imperative to modern information retrieval systems.

* * *

25 fun and useful SEO QUOTES brought to you by Jasmine, Seekyt.com

* * *

If you liked the article about the importance of latent semantic indexing (LSI) in SEO and latent semantic content, you may also want to read how to create good quality content on your web page.