This article was originally posted on Space Monkey’s blog.
Optimal performance has always been a top goal of Space Monkey’s product offering, and though we are making steady progress on this journey, we haven’t arrived yet. So we wanted to talk a bit about that and delve deeper into what’s been going on behind the scenes for the last few months. This post will definitely be technical in detail.
The core of our software began as distributed systems research, written in Python. Space Monkey benefited greatly from the Python programming language’s expressiveness, rich standard library, and extensibility. We also benefited greatly from the Twisted event-driven library. Both Python and Twisted are wonderful systems and we love them.
One common refrain in the software development community is that programmer time is more valuable than processor time. This is sage and valuable advice. It is easy to solve many problems by throwing more hardware at them, provided that your problems are not algorithmic complexity, and sometimes even then. With Python in particular, it is very easy to optimize hot paths by rewriting them in C. Even though we were shipping a performance-critical distributed system on embedded hardware using reduced resource ARM chipsets, we knew that our system would be heavily IO-bound, and any CPU-intensive parts could easily be rewritten in C.
Our back of the napkin calculations said this approach should have been sufficient. Our ethernet device tops out over 20MB/s, and our crypto chip and CPU, used together solely for data transfer, can do SSL at 8 MB/s. That said, our initial versions of the system routinely only achieved transfer rates near 1MB/s.
So, what was happening? Our throughput has been thoroughly CPU constrained. The amount of work our little ARM CPU was doing to make sure your encrypted data is safe on the Space Monkey network (Reed Solomon! DHT maintenance! Oh my!), in addition to SSL for file transfers and just general Python runtime and Twisted overhead was enough that the CPU was pegged and couldn’t process things any quicker.
Time to optimize, right? Starting with a 90k line codebase of Python and C, fast forward to now and we’ve spent months finding hotspots, optimizing, configuring, and writing C modules, rinse, lather, and repeat. Some highlights:
After all of this optimization, we got up to 1.2 MB/s.
We spent long nights poring over profiling readouts and traces, wrote our own monitoring system, and wrote our own benchmarking tools. The fundamental holdup seemed to be that the CPU was simply doing too many non-optional things. Our main event library was doing too much bookkeeping. When your I/O loop framework is your hot path and all your code fundamentally relies on that framework, rewriting in C is tantamount to starting over.
Twisted and Python are great on adequate hardware, but our little ARM devices were pooped.
We hoped that the PyPy JIT Python interpreter might save us. Unfortunately, our ARM architecture does not support the floating point instruction set PyPy requires.
Python performance is frankly not as good as compiled languages. In practice, this is hardly an issue – until it is. Python no longer holds the powerful position it once did at Google. Dropbox is writing a new Python runtime to try to deal with issues they’ve found. It was infeasible to build our own Python runtime and PyPy is not ready for primetime yet on our hardware. So we started looking at possible compiled languages to switch to. The Space Monkey development team has spent many years at previous employers working in C and C++ (and we actually maintain a large C++ codebase for desktop clients internally at Space Monkey), so those were seriously considered.
But we’ve also relied on Google’s new Go language for many of our supporting cloud services since Space Monkey’s inception. Our NAT-failover relay system has been written in Go since before Go1. With already 40k lines of code and experience in Go, we knew Go was semantically very similar to Python.
So in the bottom of the ninth, we decided to transliterate our 90k lines of Python directly to Go, line by line.
It took us about 4 weeks.
It was a heroic effort by the whole team. With a very clear set of rules about transliteration up front, we very carefully changed the code flow from Python to Go. With careful line-by-line transliteration, we avoided many of the pitfalls teams typically make when they decide to rewrite. We then did pair-programming audits of each and every line. We ported our integration tests. We ran our system tests. The tests passed.
Our very rough initial draft achieved speeds of 4MB/s with only 16% CPU utilization. We decided that an optimization in the hand is worth two in the bush, and have spent the last few weeks running the code through its paces, stabilizing the new codebase, searching for and eliminating every bug we can find.
This new code began rolling out this week. It will likely take a few more weeks to reach all of our current customers, but once it does, you should see drastic performance improvements.
Edit: A few more things I probably should have mentioned that people have seemed curious about.
Obviously we aren’t done. 4MB/s is a 400% increase in speed on most workloads, but the team isn’t satisfied to rest here. We’ll be working tirelessly to bring that number up even more.
Our CPU issues in the past have delayed the release of new on-device features such as SMB and DLNA support. The demos you saw in Kickstarter Update 16 were enabled by this new Go codebase. Since we now have CPU room to spare, SMB and DLNA will be out shortly.
So expect great things soon!
As the adage says, programmer time is much more valuable than processor time. So you should use Go.
It’s a shame Go didn’t exist when Space Monkey started. We’re very lucky to have it now.
In transliterating a large Python codebase to Go, we ended up porting or writing some useful things we had already written in Python or Python already had. We’ve also written some useful tools for understanding and debugging Go.
We’ll be open sourcing a handful of libraries and utilities we’ve written to give back to the Go community in the coming weeks, but while we work on getting that ready to go, we wanted to let our users know what’s going on. So stay tuned!
Go’s standard library is very rich and useful and it was rare that we had to stray outside of it, but we wanted to give a special shout out to Walter Schulze’s excellent gogoprotobuf extended Protocol Buffer library.
Update: We open sourced some stuff!
We’re a small team based in Salt Lake City, and we’re looking for Go developers! Drop us a line.