Tonight I made good progress in data mining & machine learning. I recommend Weka to everyone interested in the subject. It’s a math library with console tools, around which there is also a convenient graphical UI.
As a training exercise, I took 5000 products with 1800 characteristics from eBay (which is only 0.25% of their database), and clustered them based on characteristics alone. The outcome was separating items like cases separately, laptops separately. New products are correctly identified into the right group, hooray.
I also played with Time series forecasting. I uploaded search queries by day over the last couple of weeks. Weka provides estimates on the number of queries for the coming days. Cool, useful. For outliers beyond the original range, some sort of notifications could be devised, indicating a significant rise or fall.
There’s a database of 550,000 records from an online store (order number, product number, price, user number, date, time). I’m still not quite sure how to extract new knowledge from these through machine learning algorithms. Everything that comes to mind seems to be doable without much complexity. Any ideas?








