A Model of Vertical Crawler Based on Hidden Markov Chain
Abstract: The large size and
the dynamic nature of the Web make it necessary to continually maintain Web
based information retrieval systems. In order to get more objects by visiting
few irrelevant web pages, the web crawler usually takes the heuristic searching
strategy that ranks urls by their importance and preferentially visits the more
important web pages. While some systems rely on crawlers that exhaustively
crawl the Web, others incorporate “focus” within their crawlers to harvest
application or topic-specific collections. In this paper, using the Hidden
Markov Model(HMM) learning ability to solve the problem of the theme of the
crawler drift, has obtained the certain effect.
Author: Ye Hu, Jun Tu, Wangyu
Tong
Journal Code: jptkomputergg140104