SEO GoodiesJune 29, 2005 6:36 pm

Everyone must be familiar with today’s traditional keyword search methodology of popular search engines.

So to say, in traditional keyword searches approach, while searching a document collection, it is scanned with an accountant mentality. That is either the document contains the typed keyword or it doesn’t. There is no middle ground.

The resulting result set is created by looking through each document for typed keywords and phrases, ignoring any documents that don’t contain them, and ordering the result based on some ranking algorithm. Each document that contains the keyword stands alone in judgment before the search algorithm - there is no interdependence of any kind between documents, which are evaluated solely on their contents.

These types of traditional keyword search approaches are popular, but they are far from delivering the desired results as anyone who has used a Web search engine would vouch for. One important aspect of the problem is Relevancy — on average 50% of the information retrieved will be irrelevant.

The primary reason for missing on relevant information is that there are surprisingly different ways to describe an idea or concept. For instance if a document author uses one word and a searcher another, relevant materials will be missed. To make it even clearer, a simple query about “laptop” computers, for example, will ignore articles about “portable” or “lightweight” or “notebook” or “palmtop” or “ThinkPad” computers. Searchers and authors alike find it very difficult to anticipate the many ways in which the same idea might be described.

So to overcome this traditional keyword search approaches, a new Concept-based retrieval method is being thought to be the answer. This method of keyword search overcomes many of the problems in today’s popular word-based retrieval systems. This new method is called Latent Semantic Indexing (LSI).

LSI is fully automatic and widely applicable, and has been shown to be 30% more effective in finding and ranking relevant items than the comparable word matching methods.

It adds an important step to the document indexing process of word-based retrieval systems. In addition to recognizing keywords a document contains, it also sees the document collection as a whole, to determine which other documents contain some of those same words. It then assigns a similarity values to the words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant.

When a user searches a LSI-indexed database, the search engine looks at similarity values it has calculated for every content word, and returns the documents that it thinks best fit the types query. Because two documents may be semantically very close even if they do not share a particular keyword, LSI does not require an exact match to return useful results. Where a plain keyword search will fail if there is no exact match, LSI will often return relevant documents that don’t even contain the typed keyword at all.

This simple method of recognizing associations between keywords reflects more or less how a human being might classify a document collection after scanning the content. Although the LSI algorithm doesn’t understand anything about what the words mean (being a human generated code), the patterns it recognizes can make it closer to showing Artificial intelligence.

SEO GoodiesMay 8, 2005 4:21 pm

The other day I was reading a research paper on Link popularity from MarketingSherpa titled :How to (Really) Gain Link Popularity: 5 Mistakes & 3 Proven Tactics.

Everyone seems to know the importance of Link popularity and are obsessed with getting better PR every time Google updates its Backlinks once in a while.

I’ve seem such a varied personal reaction to the Google’s Backlink latest updates last weeks. One of the reaction was- “I hate Google. My PR dropped from PR5 to PR3″. :(

But has anyone wondered for the real reason why the PR rankings dropped from PR5 to PR3. Surely Google doesn’t have any personal grudge against your website or for that matter for anyone’s website.

Google is very choosy about attaching any importance to links on a given website. It doesn’t pass on the importance straight away to a given link adding to a site’s PR rankings. It passes on a percentage of importance to a link and keeps the link on hold for a while till the link matures and then finally passing on the full importance.