An Approach to Main Web Content and Keyword Extraction to Develop a Vietnamese Contextual Advertising Engine
Abstract
Contextual advertising is a next-generation of online marketing. A contextual advertising system scans the text of a web page for keywords and returns advertisements to that web page based on what the user is viewing. For example, if the user is viewing a web page pertaining to sports, the user may see advertisements for sports-related things, such as sport equipments or sporting events. To do so, we face two main problems: detecting the main content of the web page, and extracting keywords from Vietnamest text automatically. For the first problem, we proposed a method of web page segmentation by employing histogram scheme. For the second problem, we have mesured and combined the local statistic together with global statistic of each term to estimate the importance of a term, and then determine the main keywords. Our methods were evaluated and compared with others base on the following measures: prcision, recall, and F-measure. Received results are very positive.