Thursday, August 14, 2014

And now for something completely different

I've started this blog roughly two years ago. At the time I felt an occasional frustration with my work as I started to get involved into a completely new kind of science: Big Data, Big Graphs, Statistics, Machine Learning... Whatever you want to call it - it was great and difficult at the same time. While I learned a lot, it also meant dealing with a mouthful of issues, to which I found sub-optimal solutions - then changed directions, found better solutions and so on. Yet, even after many months of hard work I was still far away from publishing at a conference. In fact, I would be for quite a while.

So I started this blog. It became a logbook for many things I did since. In contrast to scientific conferences, there is no PC to get past and nobody dictates what to publish, except me. And better yet, it might even be useful for others. Heck, I'd even go as far as to say that this blog probably has had more readers then some of my actual scientific publications. And after all, it's a lot of fun to write.

So what has changed after these years? Well, I'm on my train to Warsaw where I'll be presenting the work that frustrated me so much and motivated me to start this blog. And it turns out, I'm a pretty lucky man in that my current job allows me to do so many seemingly unrelated things. So while I was busy with doing my research on community detection, I also joined a project with the goal to provide a highly scalable publish/subscribe system. At the time, we called it StreamHUB. It was a massive beast only limited by the number of machines we had to our availability. But it was also hard to configure and run (and well, understand, once there were problems). It was, what people call a "research prototype". Eventually, I had to use it again for a project we received funding for and I knew I'd have to put in a massive amount of work in order to get StreamHUB ready for use within that project. To this day, I'm thankful to a colleague who suggested that if I'd go for a complete rewrite, Erlang might be a better fit to this problem than C++ was.

So I tried and was sold. Having done distributed systems in C++ and a little in Python, I was plain out amazed at what Erlang could do out of the box. Sending messages between processes - regardless on which machine they run without having to choose and then learn some library was like a revelation to me and I finally started to understand what "the right language for the problem" actually means. After a couple of months I was not only on feature parity with StreamHUB but I also had elasticity - the ability to add more silicon to the computation without a restart of the system.

So without any further ado I present uPS, a scalable, elastic content-based publish/subscribe system. Please go ahead and try it out. It's far away from being perfect and there are some things missing (see the Issue-section) but perfection is impossible anyway and it feels like the right time to publish. If you have any thoughts on it, please let me know.