Classifying news automatically using Bayesian filter
Thanksgiving 2010 was great, time to burn all these fats with more programming.
A friend of mine mentioned that he would like Cooln.es(s) better if the programming section is more accurate. He is one of the many programmers who just want to read news without participating to a community, quickly glancing on what's happening in the web.
So, what's Cooln.es(s)? It is an hourly newspaper, simple as that. The programming section is initially simple; scrape obvious sources of tech news. But as we try to expand to a lot more sources, classification becomes more tricky and less obvious.
There are endless source of news and news publishers on the web. Anyone with internet connection can publish, thanks to various (free) blog platforms. Many of these platforms use tagging as free-form way of classification. Unfortunately, most blog posts are tagged vaguely or even not at all.
This is where Bayesian filter comes in handy. We can train the filter using the obvious source of news, and slowly using it to classify incoming news.
That's how BayesOnRedis came to live. It is both fast and persistent, perfect for weeks of continuous machine learning. Hopefully, with it, Cooln.es(s) could avoid manual classification of news.















