Search
engines use a large database to store web pages. It applies a number of
techniques for retrieving the stored data. One such data retrieval technique is
Latent Semantic Indexing, popularly known as “LSI”. This technique is based
upon a mathematical technique known as Singular value decomposition. This
technique is mainly used to find out words that are used in the same context.
This helps to extract the conceptual content of a body of text by noticing connections
between words. For example - Synonyms.
Example of Latent Semantic Indexing
An easier
way to understand this concept is given below:-
Suppose we
have 2 different web pages containing information related to dog food. The main
content (use of semantic words) of the 2 pages is given as follows:-
Page A- Used
words – (Dogs, dog food, meat, diet, pedigree, foods, breed, breeding, canine,
meds, and cat)
Page B- Used
words- (Dogs, dog meal, pets, dog food information, pet food, nutrition, dog
health, breeders, Great Dane, German shepherd, Pug, Cocker Spaniel, grains,
meats, quacker oats, bone ,meal, raw food, samples, biscuits , wheat gluten,
meat inspection act etc)
These 2
pages when retrieved from the database would clearly indicate that Page B is more
relevant to the user query “Dog food” as it contains more similar words
gathered with the help of the process of LSI.
Please note: - LSI often returns relevant documents that don't
contain the keyword at all.
No comments:
Post a Comment