Tuesday, February 14, 2012

Ruby vs. Go... FIGHT!

Only sorta-kinda.  I've been trying to use Go for some tasks for which I'd normally reach to Ruby; most recently was grabbing some date elements from a large-ish XML file.  I know, I know... Ruby has the best XML library ever built into it, but I'm more aware than anybody of the performance issues it has, so I tend to use it for only very small files.  So when I needed to extract some information out of this fat XML, I thought I'd try Go.

Warning: Micro-Benchmarks ahead!  Keep this in mind as you read.  Usually, micro-benchmarks are looked at with skepticism; however, I'm claiming this is useful information because it's a real-world application, solving a real-world need... even if it is only a very tiny little program in a very large world, after all.

The file wasn't so big that I was worried about memory use, but since I only needed a leaf from each branch of the tree, I chose the SAX-ish API anyway.  That's going to make any code more bloated, but it wasn't too bad; the results are in Version 1.  After I got everything working, I got to wondering about the performance, so I wrote it again in Ruby.  I did not try to use the same logic; instead, I did it in what, for me, is more idiomatic Ruby code.  The timings (I chose the average-looking times; I did not properly benchmark these, but I did run each program several times and then grabbed the middlin' looking one), which may or may not be surprising, look like this:

VersionLanguageTotal time (s)CPU usage
Version 1Go0.11396%
Version RRuby0.09966%

"Hmmm", I hear you say.  Well, the Go version is actually parsing the XML, and we all know XML for the bloated, expensive-to-parse format that it is.  OTOH, Ruby is doing regexp on every line, and is additionally reading the entire file into memory first and splitting it into an array on line endings.  Hmmm. Well, let's try a Go version that is a little more like the Ruby version.  That's Version 2:

VersionLanguageTotal time (s)CPU usage
Version 2Go0.284103%

Yowsa!  That's going in the wrong direction.  Interestingly, it's now using more than one core of my CPU, so it's doing something thready underneath.  Maybe it's because I'm reading the file line-by-line off the disk?  Let's make it even more like the Ruby version; Version 3:

VersionLanguageTotal time (s)CPU usage
Version 3Go0.292105%

Definitely going in the wrong direction. Maybe it's the sre2 library? Let's try Version 4:

VersionLanguageTotal time (s)CPU usage
Version 4Go0.03789%

Ok, that's better. Armed with this, I went back to not reading the file entirely into memory in Version 5:

VersionLanguageTotal time (s)CPU usage
Version 1Go0.11396%
Version RRuby0.09996%
Version 2Go0.284103%
Version 3Go0.292105%
Version 4Go0.03789%
Version 5Go0.03491%

Not a lot of difference; I did see a couple of runs where the CPU use dropped to 89% without affecting the total time, but these are pretty small numbers and we could be seeing actual run time being overwhelmed by the program initialization and what-not.

Anyway, I thought it was interesting.  Ruby is slow as all get-out, but for micro-tasks where most of the heavy lifting is running in native C (regexp in Ruby is native, as is IO), it's more than capable enough. It's also worth noticing that this was with ca. 30 lines of Go code, vs. 8 lines of Ruby code.

Thursday, January 26, 2012

Mozilla's Rust Language

Mozilla released version 0.1 of it's programming language offering called Rust.  There are a number of things about Rust which are nice; it uses LLVM, which means it gets tail-call-optimization, which Go doesn't have; it has isolated, lightweight tasks, and channels much like Go; I've come across discussions about Erlang-style supervisors, which is encouraging; it has a Ruby-like syntax for closures; it has some limited type inferrence (very similar in scope to what Go can do); and it has pattern matching a-la OCaml, which Go doesn't.  On top of all of this, the executables it produces for small applications are two orders of magnitude smaller than those produced by Go (Go's "Hello World" is 1.2MB; Rust's is 14KB).  I don't have any performance comparisons yet, but I wouldn't be surprised if Rust was (currently) faster than Go. Go's compiler is faster, the value of which increases proportional to the size of the project on which you're working.

Despite all of this, I haven't switched over to Rust yet; it's a touchy-feel-ey reason, more than anything concrete.  Go feels more mature, despite Rust being older.  The Go library documentation is much better, and the developer tools feel more complete.  And it's just easier to code in; Rust is a bit more wordy in tiny little ways.  For example, to do something multithreaded, in Rust you'll end up doing something like this:

  let in = comm::port::int();
  let out = comm::chan::int(in);
  task::spawn { ||
    some_func(out);
  }
  let res = comm::recv(in);


ports are for writing to, and chans are for reading from, and never the twain shall meet.  In Go, the equivalent would be:

  var inout = make(chan int)
  go some_func(inout)
  res := <-inout


There in't a huge difference, but the Rust version is more wordy, and not any more clear.  I just get a tiny, nagging feeling that, over time, the verbosity of Rust would start to wear.  Especially with the whole port/chan thing, which smells an awful lot like boiler-plate.

The other thing is that I keep having trouble writing threaded apps in Rust.  At some point in every attempt, something or other blocks me and I can't get around it.  Most recently, it had to do with the fact that you can't send a port to a function if that function is in another task.  This means that you can't share a port between tasks.  You can't do this:


   let in = comm::port::int();
   task::spawn { || comm::recv(in); }


The Rust compiler raises this as an error (unsendable value).  I will readily admit that this is almost certainly a failure on my part to understand the idiomatic idiosyncrasies of  Rust, and that there may be a way to accomplish this, but my question remains: if you can't share ports between tasks, then how do you implement a single-producer / multiple-consumer multi-threaded application?  I didn't find any code in any of the examples in src/test (in the Rust repository) which demonstrated something like this, and when I asked on #rust IRC, my question was met with silence.

What it all boils down to is that, at least for me, Rust is being a PITA to get started with, whereas Go wasn't.  And in some ways, they're really very similar languages; they both have the "arg TYPE" ordering of type declarations; they both have "type SOMETHING {" structure for records; they have similar threading models and channel communications... but Google, in my mind, at least, has been much more successful at making Go accessible than Mozilla has with Rust.  So for now, I'm going to continue working with Go, and keep an eye on developments in Rust.

P.S., I can't properly annotate the Rust examples because blogger.com doesn't handle less-than / greater-than signs.

Thursday, December 8, 2011

Droid Incredible 2

I have a new company phone; I got it yesterday, and have been spending the usual amount of time getting it set up and playing with it.  I'm pleasantly surprised by the battery life, actually; at the time of this writing, the phone has been on for 31 hours and has about 40% battery left; 39% has been used by the display.  I don't know how much time I've spent in calls, but I've made or received 12 of them in that 31 hours.  Not too shabby.

Thursday, October 27, 2011

The cult of technology personality

Recently, I've come to the conclusion that products are irrelevant; popularity is all in branding and marketing. Us developers (of hardware and software) like to kid ourselves into thinking that we're the ones who do the "real work," but really, it's the sales and marketing people who are the backbone. Apple didn't "invent" the smartphone, any more than they invented the MP3 player (they were three years late on that), or the laptop, or the slate PC (again, late by several years), or any of the other stuff they've been successful with in the past ten years.. They've just been able to corner the "sexy" market, through good advertising and branding. I think that since Jobs returned to the company, they also payed more attention to quality and product polish, and were willing to sacrifice volume in the increased costs that often incurred; he had the same attention to detail at NeXT, although he failed to identify the right market for that platform. But I really think what makes a successful product is the cult of personality.

Other examples:
  • Microsoft. There is almost always been a better competing product to whatever Microsoft is selling, but Microsoft managed to capture the Business sector with its early and intimate association with IBM. Even OS/2, an arguably better OS, couldn't wrestle that crown away, and that's because they didn't have Bill Gates, not because it was a technically inferior product.
  • Linux. Minix predates Linux, and had the potential to be as successful as Linux, and can be argued to have a better architecture, but Tanenbaum had different priorities and isn't, I dare suggest, the personality that Linus is. Or, if you don't like microkernels, BSD. Same thing: they don't lack technology, they lack Linus.
  • Java. There are many at least equivalent languages out there, even if you restrict yourself to the OO space, but none of them had Sun behind it.  Sun pushed Java aggressively. I'm not going to credit McNeally or Gosling directly for that; I don't think there was a personality behind that one, just aggressive and persistent marketing.
History is littered with the detritus of better products that lost to inferior products, simply because of better marketing.

Thursday, October 6, 2011

Universal truths

The fact that channel communications (whatever their implementation details) take much, much longer than function calls seems to be a constant truth no matter what the programming language.  I don't have the benchmarks for Erlang offhand, but here are ones that I just recently ran for Go:

main.BenchmarkChannel-2     500000              6106 ns/op
main.BenchmarkFunction-2        100000000               10.2 ns/op
main.BenchmarkAnonymous-2       100000000               11.5 ns/op

Here's the source code in case you want to pick holes in my benchmark:

package main

import "testing"

func BenchmarkChannel(b *testing.B) {
        c := make(chan int, 1000)
        accum := 0
        go func() { for { accum += <-c } }()
        for i := 0; i < b.N; i++ {
                c <- i
        }
}

func BenchmarkFunction(b *testing.B) {
        accum := 0
        for i := 0; i < b.N; i++ {
                handle(i, &accum)
        }
}

func handle(v int, accum *int) {
        *accum += v
}

func BenchmarkAnonymous(b *testing.B) {
        accum := 0
        f := func(i int) { accum += i }
        for i := 0; i < b.N; i++ {
                f(i)
        }
}

This was run with:

gotest -run -cpu=2 -bench='.*'

Changing GOMAXPROCS didn't make any difference. I was running it on a Core 2 Duo.  It's not hugely surprising, but still.  It makes me wonder how the ratio of performance compares between languages.

Wednesday, September 21, 2011

Just a rant

I need to get this out of my system.  It's probably better if you just go ahead and skip this rant; there's nothing constructive in it.

Windows
Windows sucks.  There's really nothing more to say about it; it's a horrible operating system, and I don't know why anybody -- especially anybody in a management position -- would think that it's suitable for running a business on.  It's buggy, slow, bloated, and obtuse.  It's so bad at being a server, I don't even know where to begin criticizing it.  And as bad as it sucks being a server, it really sucks at being a desktop computer, too. There really is nothing redeeming about it.  Using it is a painful chore; it's my personal belief that anybody who thinks otherwise has simply been conditioned to believe so.  It's like, if you hit somebody frequently and often enough, they get used to being beaten.  That's Windows.  It is like being beaten by an entire company.

DellDell hardware sucks.  I have both a (personal) two year old MacBook Pro and a one-year-old (company) Dell Latitude E6410.  If I put the Dell to sleep (not hibernate), the battery gives out within 48 hours.  If I put the Mac to sleep, the battery gives out after... well, I don't know how long, because  I've never had the battery give out.  It lasts at least a week, though; I went backpacking once and was gone for about a week, and the Mac still woke up after I got back.  Granted, the Dell does have more memory than the Mac (8GB vs 4GB), and that's going to affect the battery drain.  That's still pretty pathetic.  And the battery on the Mac lasts a lot longer; I can get a full 6 hours out of it, with constant use (if not heavy load)... the Dell gives out in about 3.   The CPUs in both have two cores each; Windows claims the Dell's CPU is running at 2.4GHz, and OSX claims it's running at 2.5GHz.  The Mac's display is larger (15" vs. 14"), and that's a big battery drain.  Of course, the Dell is running Windows, so you have suckage upon suckage.  Windows is probably doing more to kill the battery life than the Dell hardware.  I've heard that the Mac will hibernate itself while it is asleep if it detects that the battery is going to give out.  Again, I wouldn't know, because I've never had the battery drain entirely while the Mac was asleep, and because OSX does so much of this smart stuff behind the scenes. 

If I close the lid on my Mac, it goes to sleep.  If I open the lid, it wakes up.  All of the Dells in our company have this feature turned off because if you sleep a Windows laptop by closing the lid, odds are good that the laptop will never wake up, or won't be able to find a wireless connection after it resumes, or something.  That's why you see people walking the halls with their laptop lids only partially closed.  Every time you see that, just remember: Windows sucks.


All of the managers above a certain level at my company get Macs.  All of the peons get Dells.  Honestly, the Macs aren't much more expensive.  A new 15" Pro starts at $1800 for a quad-core 2GHz 4GB RAM 500GB HD; a new 14" E6410 dual-core 2.8GHz 4GB RAM 380GB HD starts at $1400.  Not much difference in price there -- and you're getting better, newer hardware in the Mac.

It baffles me why people are still running their businesses on Windows, and on Dell hardware.

Wednesday, September 7, 2011

Incremental backups with btrfs

btrfs is teh win.  No, seriously.  It's not the only file system that can do this, but it's the first one I've had installed, and it's beautiful, man.

I got myself a little 2TB external USB 3.0 hard drive ($99!  Past Sean, be very jealous) and wrote a backup script; my first version had all of this complex Towers of Hanoi rotation scheme, but then I realized I didn't need any of it if I used btrfs's snapshots.  Now, my backup script consists mainly of:

sg_start --start /dev/sdh   # Tell the drive to spin up
mount /dev/sdh1 /mnt/backups
btrfsctl -s /mnt/backups/backup-$todays_date \
            /mnt/backups/backup-$yesterdays_date
rsync -va --numeric-ids --delete-before --ignore-errors \
      --partial --inplace $backup_paths \
      /mnt/backups/backup-$todays_date
sync
umount /mnt/backups
sg_start --stop /dev/sdh

It does a little more than that (error checking, logging, etc), but not much more.  After the first backup, the disk usage was 15GB; ten days later, I have ten incremental backups, and it's still 15GB.

Of course, the problem with this is that snapshots are, essentially, COW hard links; this means that if there's a corruption on the disk for a file, it'll affect all child snapshots.  My mitigation is to add another backup disk: they're only $99 (cripes, I can't get over that price), and each backup is taking about 8 minutes to run (most of that time is spent in rsync, detecting changes) -- I can easily affordable to run two backups a night.  It's not as safe as a rotation backup, but it's safe enough.