A Model of Vertical Crawler Based on Hidden Markov Chain

Abstract: The large size and the dynamic nature of the Web make it necessary to continually maintain Web based information retrieval systems. In order to get more objects by visiting few irrelevant web pages, the web crawler usually takes the heuristic searching strategy that ranks urls by their importance and preferentially visits the more important web pages. While some systems rely on crawlers that exhaustively crawl the Web, others incorporate “focus” within their crawlers to harvest application or topic-specific collections. In this paper, using the Hidden Markov Model(HMM) learning ability to solve the problem of the theme of the crawler drift, has obtained the certain effect.
Keywords: hidden markov model, crawler, uniform resource locator
Author: Ye Hu, Jun Tu, Wangyu Tong
Journal Code: jptkomputergg140104

Artikel Terkait :