The future of Internet: Real-time Web Search and the Semantic Web

The big boom in search engine effectiveness & usage, which led to the rise of Google, took place in 1998 when research on the use of link analysis in search engine algorithms, led to a new breed of search engines. Nowadays all major search engines incorporate link analysis functionality in their algorithms.

What is the concept behind link analysis?

Since there is no high authority on the web to judge content quality, an incoming link from a relevant and of ‘certain quality’ page, is handled as the equivalent of a citation in an acknowledged academic paper. Google’s dominant position in the search engine market and efforts by webmasters to achieve search engine optimization, have attracted enormous attention to Google’s link analysis algorithm (called Pagerank) and its inner workings. Search services are currently provisioned by most popular engines, as follows: crawlers crawl registered sites periodically, according to their page rank (pages with low page rank are crawled less frequently than high page rank pages). This way, search engines handle the huge computational load that crawling entails. Crawling activity is producing an index of crawled pages content, which is stored in the search engine database. When a key phrase or keyword is entered in a search engine, the search is performed on this indexed-pages database, in order to find relevant content. The relevant content retrieved, may be thousands or millions of pages. How is this presented to the user?

The page rank score, comes here into play. Pages with higher page rank are placed higher in the search engine result pages. Older search engine algorithms would only rank pages retrieved, according to their keyword density on the selected keyword. News web sites which update their content frequently, do not work well with index-based search engines, because indexed information may become obsolete at any moment. Databases accessed via dynamic web pages, are in fact inaccessible via search engines – since crawlers cannot access them. The execution of the pagerank algorithm may be a challenging task, even for a single web page. Intensive research has been under way, to analyze the pagerank algorithm and discover ways to execute it in an efficient way. Link-analysis algorithm improvement, combined with a brute computational force approach, may gradually increase the degree of ‘freshness’ of data, accessed via search engines, even though web page volume is continuously growing. Discovering the wholy grail of real-time or near-real-time web search, may not be the only evolution direction of the internet.

On the other hand, the effort to build more structure in the web content, with the use of taxonomies and semantics (the semantic web), can improve the search experience by allowing to focus more on the area of interest. However, building a static tree-like web site structure is not always efficient, since the content topics may be changing. Most sites can only afford to manage a very simple content structure, which cannot reflect a detailed classification. Moreover, content classifications vary since there is no such standard, and there is no control on the appropriate use of semantics.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.