October 13 2017, 00:35

Tonight I made good progress in data mining & machine learning. I recommend Weka to everyone interested in the subject. It’s a math library with console tools, around which there is also a convenient graphical UI.

As a training exercise, I took 5000 products with 1800 characteristics from eBay (which is only 0.25% of their database), and clustered them based on characteristics alone. The outcome was separating items like cases separately, laptops separately. New products are correctly identified into the right group, hooray.

I also played with Time series forecasting. I uploaded search queries by day over the last couple of weeks. Weka provides estimates on the number of queries for the coming days. Cool, useful. For outliers beyond the original range, some sort of notifications could be devised, indicating a significant rise or fall.

There’s a database of 550,000 records from an online store (order number, product number, price, user number, date, time). I’m still not quite sure how to extract new knowledge from these through machine learning algorithms. Everything that comes to mind seems to be doable without much complexity. Any ideas?

https://www.cs.waikato.ac.nz/ml/weka/

October 12 2017, 13:03

(TIL) In my spare time from work, I watch Lavrenko’s lectures. This morning, I listened to the lecture “Laws of the Text”.

For example, did you know that there’s something called Zipf’s Law, stating that the frequency of the n-th word in the list of most common words of any language is roughly inversely proportional to its rank n?

Or here’s the empirical Benford’s Law: in number tables based on data from real-life sources (anything from electricity bills to house numbers in cities) the digit 1 appears at the beginning much more often than all the others (approximately in 30% of cases), the digit 2 appears more often than, for instance, 8 and so forth. Simply put, Benford’s Law can be described thus: there are always more small things in the world than large ones. The explanation for Benford’s Law lies in the fact that quantities in this world tend to grow exponentially, not linearly. Very intriguing.

Or take Heaps’ Law. The number of unique words in any text with N words follows the pattern f(N) = k*N^b, where b is most often equal to 1/2.

These laws allow, for example, to check data or a text for “naturalness”.

Or another example. For any very rare word, the probability that it will occur in a text is very low, which makes sense. But if this word does appear in the text, the likelihood of it appearing again is very high.

October 11 2017, 18:07

A wonderful text about the peculiarities of e-commerce in the Russian hinterlands. It’s over a year old, I’m probably the only one who hasn’t read it yet. Many letters, but interesting.

October 11 2017, 13:14

I look at these fascinations with chatbots, Yandex’s Alice, etc., and recall how I toyed with something similar back in 2003. We had a chat, Starchat.ru, where people constantly hung out and interacted with each other.

I developed the chat, so for fun, I made a bot that you could chat with simply by sending it a private message. It was always online, and not everyone realized that it was a bot. When the robot received a message, it searched the chat logs for messages containing the maximum number of words from the query that also had a response. A response is defined as the next message directed to the user by someone (like “Vasya: go to hell!” being a response to Vasya’s message). When there were multiple options (and there always were), a random one was chosen.

The result was a robot that amusingly responded to questions. If you asked its name, it would always respond with different names, but relevantly, complete with emojis and suffixes. The bot also always provided suitable answers to standard questions like “where do you live” or “how old are you”. Since there was a huge history and people discussed everything in general, it was hard to find a question that the system couldn’t give an interesting/correct/funny answer to.

So, the bot had an interesting side effect. If you started swearing at it offensively, it would swear back even more offensively. And in general, it often reacted inadequately to attacks and reproaches. Simply because in real dialogs, a polite question is answered politely, and a rude one, of course, rudely. The audience had a lot of fun with this bot.

It was especially interesting to read the bot’s own logs later. People there didn’t understand that it was a robot. They asked it questions, argued with it, and made up with it. It was fun.)

Dima Gordy Plugotarenko Sergey Max Nizamov Dmitry Mottl Dmitry Nilov

October 10 2017, 21:19

Somehow it’s not yet in the regular news: the Brazil national team will come to live in “our dear Lobnya.” Locals are proud, Brazilians don’t yet understand. There, they are indeed renovating the stadium, but it was in quite a bad state before.. But funny, yes, the Brazilian national team in Lobnya on a Moskvich 🙂

(

)

October 10 2017, 07:02

In 1975, instead of installing expensive road signs or “speed bumps,” Napa, California experimented with using chickens to slow down drivers on one of the streets—Streblow Drive, adjacent to Kennedy Park. They simply released 85 chickens to roam as they pleased. Park manager Bob Pelusi said, “Only occasionally would an impatient driver cut through the flock. Over nine months, we lost just 12 of them. You could say they died in the line of duty.”

An interesting idea. Only I think that in the Russian hinterlands, these chickens wouldn’t survive a night. Perhaps they should have had POLICE painted on their sides, so that if someone tried to harm them, it could trigger a criminal charge?

October 09 2017, 22:24

Published the second part of the video from my presentation at SAP Moscow two weeks ago.

I discuss Search Analytics — a development that has enabled the identification and correction of search issues on the site through user behavior analysis. This approach will work on any site, but is specifically designed for eCommerce. It allows for the identification of issues such as “search queries that are not performing well enough but could be” or “products that turn out to be difficult to find”.

I recommend watching it to everyone involved in online trading and search. This video does not cover Hybris, it is all about site search. Slides are included.

Stay tuned for another interesting topic on hybrismart.com in about a week.