The Mugatiya and Uguduwa proxies have a content rating engine. What this does is it looks at the content and rates it as G (General), M (Mature), R (Restricted) and U (Unclassified). How this works is that the engine will go through the content of the page. It then performs the steps outlined below.
It will first look for common English words such as the, to of etc to determine if the page is in English. If there are not enough words to determine if the page is in English or not, the page is then classified as U(unclassified).
If there are enough words it then proceeds to score the page. Stop words such as dismember, maimed etc score very highly and slang words such as horny, cocaine, pharmaceuticals etc score lower. The points are then summed up and based on what range it falls into it is categorized as G,M or R.
It should be noted that the rating is very strict. So only squeaky clean pages get rated as 'G'. Advertising based on content rating will be better able to target users and keep adult adverts from appearing in innocuous sites.
No comments:
Post a Comment