Using Reinforcement Learning to Spider the Web Efficiently

One of the major issues faced by the crawlers is that it is increasingly arduous to learn that some sets of off-topic documents often lead reliably to highly relevant documents. An important observation of the topic-specific spidering is that the environment presents situations with delayed reward – this makes reinforcement learning the appropriate framework.

To help explain how reinforcement learning relates to spidering, the on-topic documents are immediate rewards. An action is following a particular hyperlink and the state is the set of on-topic documents remaining to be found and the set of hyperlinks to be discovered. One preliminary observation is that the state space is huge and difficult to generalize as it encompasses not only the on-topic documents remaining to be explored but also the set of the hyperlinks which serve as possible actions. Also, the number of the available actions is large and difficult to generalize if we consider the number of distinct hyperlinks within the Web as our action space, as described earlier.

To explicitly address this problem, Rennie and McCallum do not model the state space and instead, capture relevant distinctions between the actions using only the words in the neighborhood of the corresponding hyperlink. For every action, a single Q-value is determined by calculating the discounted sum of rewards received by following the optimal policy after traversing the given hyperlink. Thus the Q-value becomes a mapping from a “bag of words” to a scalar.

Efficient spidering is achieved when this mapping is learnt from a training dataset containing the bag-of-words/Q-value pairs. The authors chose greedy search, which yields in following hyperlinks to maximize immediate reward.

The paper also discusses a method for using Naive Bayes as a function approximator. In addition, the text classification task is built on a Bayesian learning framework, where the text data is generated by a parametric model and the training data(BOW/Q-value tuples) is used to calculate the MAP estimates of the model parameters. Each class is modeled by a mutlinomial over words. Therefore, classification essentially is the simple matter of selecting the most probable class given the document’s words.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webSpidering.md

webSpidering.md

Using Reinforcement Learning to Spider the Web Efficiently

Files

webSpidering.md

Latest commit

History

webSpidering.md

File metadata and controls

Using Reinforcement Learning to Spider the Web Efficiently