Journal Title:International Journal of Computer Science and Mobile Computing - IJCSMC Abstract
Forum Crawler Under Supervision (FoCUS) is a supervised web-scale forum crawler. The web
contains large data and innumerable websites that are monitored by a tool or program known as crawler. The
goal is to crawl relevant forum content from the web with minimal overhead. Forums have different layouts or
styles and are powered by different forum software packages. They have similar implicit navigation paths
connected by specific URL types to lead users from entry pages to thread pages. It reduces the web forum
crawling problem to a URL-type recognition problem. It also shows how to learn accurate and effective regular
expression patterns of implicit navigation paths from automatically created training sets using aggregated results
from weak page type classifiers. These type classifiers can be trained and applied to large set of unseen forums. It
produces the best effectiveness and addresses the scalability issue and includes the concept called sentimental
analysis.