Detecting Nepotistic Links by Language Model
In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the target page without the need of white and blacklists or human interaction. We fight both comment spam in blogs and guestbooks as well as various forms of nepotism such as common maintainers, ads or link exchanges. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page random sample.
Other items being presented by these speakers
Sponsor of The CIO Dinner