Monday, June 27, 2011

Now we get to the annoying aspects of Go

I've spent another ten hours or so writing extensive functional tests for my code, and fixing bugs, so I have some further thoughts about Go.

POLA

gotest continues to be easy to use and is versatile, even if you do have to implement a bunch of basic functionality yourself (there are no convenient "assert" functions, there's no setup or teardown functionality, and nothing like TestNG's DataProvider mechanism).  Even functional testing of a server process is easier than in other languages, although I think most of that is again due to Go helping me separate my code into modules.  I also found my first non-contrived reason to use goroutines for the first time.  I really like goroutines, but that's probably because I really like Erlang's process paradigm.

What really bugs me are the inconsistencies.  Go is self-consistent, which seems to be the Go-team's answer to any criticism of consistency in the language, which is itself annoying, but it isn't consistent in the Principle of Least Astonishment way.  In fact, Go is really pretty bad when it comes to POLA.  The thing that kept biting me this weekend -- and, admittedly, it was my fault resulting from a lack of experience with Go -- was the following case:


  package main
  type Foo map[string]string
  type Bar struct {
    thingOne string
    thingTwo int 
  }
  func main() {
    x := new(Foo)
    y := new(Bar)
    (*y).thingOne = "hello"
    (*y).thingTwo = 1 
    (*x)["x"] = "goodbye"
    (*x)["y"] = "world"
  }


Go requires that you know details about the implementation of the types defined above.  x is not really a new object; it's a pointer to something that doesn't exist, while y is an actually allocated object.  If you try to use x, you'll get a runtime fault and your program will crash.  Instead, you have to do this:


    x = make(Foo)


However, if you try to do the same thing with Bar:


    y = make(Bar)


you get a compiler error.  This bothers me, because it's one of those submarine bugs -- if you make a mistake in how you allocate an instance, you won't find out unless your tests actually hit that code.  I had hoped we'd have gotten away from that by now.  I'd rather Go just made make() work on structures, and then you could eschew new() entirely (unless you really needed it for a specific purpose), and this type of error would become exceedingly rare.  It's a gotcha; it's an inconsistency in the syntax.

I've mentioned before that I like catching errors as early in the development process as possible, and this is the reason why I prefer strongly typed languages. The sort of problem that I've just mentioned weakens Go a bit in this area; for comparison, Haskell eliminates this sort of submarine bug -- if you can get your program to compile, it will run, and if it fails, it'll be entirely due to a logical, algorithmic error, not a type-usage one.  There's no compiler yet that will check your logic for you, so Haskell is pretty close to perfect, in that respect.

Before anybody comments on my syntax use above, I'll point out that the dereferencing of the y pointer is unnecessary: Go does that for you.  I did it for illustration purposes.

Pointers

It's been a long time since I've had to use pointers, and during my development and testing I was floundering around trying to determine when and when not to use them.  At one point, I was getting annoyed and was thinking that this was another case of POLA in Go, but it turns out I didn't have to worry about it.  All you have to do is to ensure that any function that needs to change a structure (or variable) gets passed a pointer, and Go happily figures out the rest.  Another example:

  func (b *Bar) foo() {  b.thing = 1 }
  func (b Bar) write() string { return fmt.Sprint(b) }
  func tralaTrala() {
      var b Bar
      b.foo()
      fmt.Println(b.write())


      a := new(Bar)
      a.foo()
      fmt.Println(a.write())
  }

tralaTrala() does not need to know whether a and b are pointers or not; Go will "do the right thing" here, and the syntax is the same despite the fact that the types of the two variables are different (one's a pointer, the other isn't).  The only real issue is that, when writing a function, the programmer does need to know whether foo() and write() modify the state of the Bar they belong to, because s/he needs to know whether to request a pointer or not:

  func doSomething( b Bar ) {
      b.foo()
  }

will probably end up being a runtime logic error, and Go does nothing to help you with these.  This makes me think that the safest way to use Go is to pretend that it's entirely functional, and write all of your code that way, until you need to optimize it.  This would mean that the signature for foo() would be:

    foo() Bar

and it'd be used like this:

  func doSomething( b Bar ) Bar {
      rv := b.foo()
      return rv
  }

This is another thing that I'm not entirely happy about, but it may be a reasonable trade-off to provide a powerful optimization path.

25 comments:

Stefan said...

Can I assume that they solved run-time loading then? When I last ventured into golang my journey came to an abrupt halt as I was trying to load a shared object at run-time to have compiled module support.

The list told me that this was simply not implemented yet.

Also regex was lacking.

paul said...

As far as new and make, is it not a go idiom to have a factory for types? That would be workaround if nothing else. I do agree the new make thing is odd.

Jessta said...

Both make() and new() work as they are defined.
make() is a constructor for built-in types.
new() allocates a block of zero'd memory.

You can't get rid of make() because built-ins need constructing and you can't use make() to allocate memory for structs (without initialization them) because that's semantically weird.
Because make() would initialize built-in types but not structs.

Your pointer problem is really weird. I can't imagine writing a function that called methods on it's parameters without first looking at that those methods do.
In fact, how do you find out the name of the method without looking at it's definition?

Unknown said...

Jessta,

Your argument is a bit circular. make() and new() exist because Russ decided there were two semantic categories where the rest of the programming universe considers there to be one. That's why everybody complains about it except Russ and people who have only used Go. There are two ways to resolve the semantic oddity:

- have the compiler notice you're allocating a built-in and perform the construction step for you (eliminating make)
- provide a way to associate construction routines with user-defined structs (regularizing make and retaining the distinction)

Otherwise, you have to keep track in your mind of what you're making, and whenever the type of your variable changes, you have to once-over the uses of the variable to make sure you're using the right function.

Russ et. al. like this scheme because they actually wrote the standard library and are aware of the places they wrote constructors, and they can appeal to Go's self-consistency that it doesn't magically construct things for you while denying you access to the constructor mechanism in the name of compiler speed and simplicity. Other languages that provide such high-level types as maps have no problem with either making the compiler more complex or hiding initialization from you. While I admire their passion and no-holds-barred attitude, this is one case where they should just shut up and relent to what the rest of the world is doing.

The pointer problem can be mitigated when writing small programs, but as your programs grow in size and age you're bound to have less control over the type signatures of your interfaces. So you'll get bitten by it more in the future.

SER said...

"Unknown" already responded more succinctly than I could to Jessta's response about new() and make().

As for the pointers: I'd hope that I could use library functions without having to inspect the code for all of the functionality that the library provides. That's what API documentation is for. It isn't really a "problem," per se. I'm pretty happy with what Go has done with pointers. It's just another potential "gotcha."

Daniel Lyons said...

@SER: Unknown here. Google has decided I don't have an identity. Hopefully they locate it again soon, or my wife will be very upset.

Utopian Nihilist said...

> That's why everybody complains about it except Russ and people who have only used Go.

I don't think there is anyone yet lucky enough to "have only used Go".

Also the designers of the language are: Robert Griesemer, Rob Pike, and Ken Thompson. Russ is just the most visible face of the development process.

> denying you access to the constructor mechanism in the name of compiler speed and simplicity.

I doubt very much anyone ever claimed that Go lacks constructors in the name of compiler speed and simplicity.

I find constructors clumsy and unnecessary when you can have 'factory' functions that work just fine and don't introduce extra complexity *to the language*.

Also, the Go way is to try to make the 'zero value' directly usable, and this is the case for most types.

> While I admire their passion and no-holds-barred attitude, this is one case where they should just shut up and relent to what the rest of the world is doing.

I have yet to see any compelling argument as to what real value constructors add to the language.

Steve Phillips said...

I would personally rewrite your first example as

// Steve Phillips / elimisteve
// 2011.07.06

package main
import "fmt"

type Foo map[string]string

type Bar struct {
thingOne string
thingTwo int
}

func main() {
x := Foo{"x": "goodbye", "y": "world"}
y := Bar{thingOne: "hello", thingTwo: 1}
fmt.Printf("%v, %v\n", x, y)
}

I've now written >2000 lines of Go and I've yet to use new(). I use make() to make, say, channels of booleans -- with make(chan bool), which couldn't possibly be any simpler IMO.

Just because it's possible to screw up doesn't mean your tools are broken.

Go is the best-designed language I've ever used, and battles daily with Python for the title of my favorite programming language of all time.

Eric said...

Steve Phillips,

If you don't mind me asking, what kind of work do you do where you're using Go and Python daily?

Daniel Lyons said...

@Nihilist: forgive my simplification vis-a-vis the face of Go. Russ is also the face of Limbo and Plan 9, and I see a lot of those fingerprints on Go. But you're right, it's a group effort and lots of people have been involved from the beginning. On the other hand, I think fame should come with a price.

My argument for constructors is simply that it's one way to retain make() without the subtle and easy-to-screw-up distinction between it and new. I put forth two solutions to the problem because the problem isn't the lack of constructors, it's the presence of two different mechanisms that do something like allocation and setup where every other language on earth has just one mechanism for both. I'm sure I can rest easy knowing that it being a stumbling block to the legions of programmers who tried Go but did not fall madly in love with it will not constitute a compelling argument to you or anyone else who does love it. And that's fine, the world can always use another programming language with a "community" of self-assured assholes that do their best to squelch any form of criticism with a firm slap and some elitist condescension, thereby driving out anyone with a slightly different opinion and ensuring complete mental uniformity amongst the system's increasingly extremist userbase. It's not like we've seen that before with Plan 9 or anything else.

My point, and I believe Sean's point, is just that it adds a bit of needless pain to a system that is actually quite good.

@Steve- As I get older and better with Haskell, I have come to expect the computer to do more of the work when it can, due to the work being rote or largely meaningless. In this particular case, you have two functions that are used in mutually exclusive contexts to do essentially the same thing. There's nothing meaningful about new()ing a channel or make()ing a struct, so it's not that the tool is being misused, it's that the tool has two identical looking parts that do almost identical things but which cause the machine to explode if you use them backwards.

This is the kind of detail we invented computers and programming languages to avoid having to remember. Unless you're using OCaml, you don't have to keep track of whether you're adding integers or floats, despite the fact the hardware has to use two different opcodes depending; the compiler knows their type and does the right thing. In fact, in that example, there could even be cases where there is meaning in performing an integer addition on a float or vice versa, which isn't the case with new() and make(). I'm glad this problem hasn't bitten you yet, but that doesn't mean the problem doesn't have teeth.

steven099 said...

You've run into a situation where something doesn't work as you expected. That doesn't make this a universal thing. Your arguments about why this is universally wrong don't make sense (imo, as I will try to demonstrate in hopes that you'll see my point of view :)

new(T) returns a *T. The equivalent make (if it were legal) would be make(*T), not make(T), since you're making a pointer. If you just want a struct value, declare it with `var t T`. You only need new to get a pointer to a T without giving a name to the thing it points to (I know you don't like pointers, but frankly, they're a lot more flexible than what other languages provide as "replacements", and often simpler).

Your concerns about libraries are unfounded. Normally, a library will supply a function to call in order to get a value of a type. This function is the only place that has to worry about the representation of the type and use the built-in functions as needed. Users just call the function and don't care what the representation is.

Consider this: you have to know whether the value is a map or a struct [pointer] to access its fields or index it. So you're already making your code dependent on the kind of type. If you want to generalize access to the type, you'll need to provide functions and methods. One of those functions will initialize a new value for the user.

Have you read the go spec, effective go, and the FAQ? These really clear up some of these questions.

PS - you don't need to do (*x).a, x.a works fine.

PPS - Try to think of pointers as a distinct kind of type, not a quirky way to handle a value. Confront what you're finding confusing rather than hiding from it :). They're like a single element slice with syntactic benefits. A pointer is a distinct value with an element, and all copies of the same pointer value refer to the same copy of the element, so changes via a pointer can be seen via a copy of that pointer. I don't know if this helped, but pointers aren't really all that complicated and are a really useful tool if you can get past your mental blocks.

Karol Mieczysław Marcjan said...

I'd say you're overstating the importance of the make / new difference. IMHO it ain't really such a big deal.

Also there are two ways to define new methods on types from outside the package. You can either use embedding (as someone have suggested before) or define a new type in this way:

type MyType package.Type

And then define methods on MyType.

xan.php said...

Regarding the pointer issue, it's interesting that Google's C++ Style Guide forbids reference arguments to avoid the same confusion you're pointing out in Go. "References can be confusing, as they have value syntax but pointer semantics."

http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Reference_Arguments

steven099 said...

@xan I wouldn't say it's the same issue. By that language, map, slice and chan types use both pointer semantics and syntax. If you assign to a slice/map/chan parameter, it doesn't affect the original argument, same as with a pointer. However, if you alter an element of the type, that change affects the original argument, the same as with a pointer. You just have to recognize that Go has 4 (5 counting closures) reference types (which get passed by value like any other type), rather than just one (pointers in C/C++). There are no pass by reference semantics in Go, which is the problem with C++ references.

steven099 said...

@Karol — It isn't a big deal once you understand and accept it. You never get confused once you know the distinction.

From a learning perspective though, it's a bit of a distraction from the real questions. You have to realize that new(T) is equivalent to make(*T) — if it were legal — not make(T). Comparing make(*T) with make(T) makes the distinction plainly evident, since you can see clearly that you're making two different types. The question becomes "do I want to make a *T or a T?". new(T), however, hides the pointer from you, making it less evident, so you just have to know the pointer is there.

Again, it's fine once you get it, which is always going to be the case for some things, but it isn't particularly newbie friendly or intellectually satisfying.

Steve Phillips said...

Eric,

I'm using Go to create Decentra, a distributed computing platform. It's like SETI@Home and other distributed computing projects, but it's unique in that _anyone_ can submit a task to be computed, not just a single, central entity.

I'm using Python and Django to create, well, lots of stuff... an eLearning platform, a conference call scheduling and automation system about to enter closed beta, and various Santa Barbara Hackerspace projects.

I do Rails consulting to pay rent but am focusing as much as possible on the more entrepreneurial endeavors, some of which are also in Rails.

If you asked because getting paid to use Python and Go all day would be awesome, I agree!

SER said...

@Steve Phillips: That's a useful shorthand. It alleviates the need for new(), basically, and reduces everything to {}, or make() in some cases where {} wouldn't make sense anyway. I still think the dichotomy is both unnecessary as well as a poor design decision.

@Daniel Lyons: Your point about Haskell is on the money. I'm increasingly becoming impatient with compilers that don't figure things out on their own, when it's reasonable to expect that they can.

@Karol Mieczysław Marcjan: Can you provide a full example? That doesn't seem to accomplish what I want it to. I have a function that takes an interface of my devising:
func F( m MyInterface ) { ... }
and I want to extend an external type to support my interface. For example, say I want to do this:
func (j json.Encoder) Serialize(i interface{}) string {
err = j.Encode(i)
// handle error
// do some other stuff
// return something
}
}

Are you suggesting that there's a way to do this?

steven099 said...

@SER — This should work:

type JsonEncoder struct { *json.Encoder }
func (enc JsonEncoder) Serialize(i interface{}) string {
    err = enc.Encode(i)
    // handle error
    // do some other stuff
    // return something
}
...
enc := json.NewEncoder(wr)
str := F(JsonEncoder{enc})

SER said...

Notice that your solution isn't the same as what I was looking for. My signature is a function that operates on a json.Encoder, not some other struct that I had to define and then wrap in another struct just to use it.

I'm not sure what the purpose to this limitation in Go is; it seem sort of arbitrary.

steven099 said...

@SER — The "limitation" is there so that the method set of a type, and the meaning of a method call, doesn't depend on what packages you've imported, and what packages those packages have imported, etc... The behaviour you want is fine in a scripting language for small programs, but in a larger project, which is what Go is intended for, it quickly becomes unmanageable. A change in some fourth order dependency down a separate import path from the one providing the type shouldn't be able to break your code using that type.

Go gives you the flexibility of taking a type and adding methods to it, but you have to provide the resulting type as your own type, so people can explicitly choose to use it, or not. This is the same as Go not having implicit conversions. They're nice on a small scale, but become unmanageable on the larger scale.

So let me ask you this: why is it important to you to alter the original type rather than making your own? Creating the type is just a one liner in Go. Wrapping the value is a no-op that takes only a tiny bit of extra work to do. Having a static method set is a nearly indispensable in a larger project. Giving people the choice to ignore your "improvements" might hurt your ego a little, but in the end it's for the better.

SER said...

@steven099
How does it become unmangeable? If you use an API that expects a certain function, you include the module that provides the function. It's no different from having to import "strings" to get the "extra" string operations. There are large applications written entirely in scripting languages, and they get along pretty well, so I don't accept that argument.

Why is it important to alter the original type rather than making my own? Because it makes for cleaner code. If I can define functions on types in my module, then I can expose functions that accept types and pass them around within my module without having to track what kind of type they are, without having to define N new types for N supported external types, without having to write giant case statements who's only purpose is to determine what's being passed in so that I can wrap it up in a struct who's sole purpose is to allow me to call functions on it. You can write code to do all of this; it's annoying that you have to.


type MyInterface interface {
F()
}
func (k p.ExternalType1) F() { ... }
func (k p.ExternalType2) F() { ... }
func (k p.ExternalType3) F() { ... }
...
func MyFunc(k MyInterface) {
k.F()
}


Now I can pass in any of the supported types to MyFunc(). This would be an elegant (and clean!) solution to the problems introduced by the fact that Go does not support polymorphism. I'd argue that instead of making large applications less maintainable, it'd make them much more maintainable; this is easier to read than a bunch of wrapper structs and (possibly) a verbose switch/case statement testing type and wrapping structs in other structs. Additionally, it would allow for even more clean module separation by allowing specialist code (new F() declarations for additional types) to be defined in different modules, which could be included (or not) at compile time based on user preference or need.

Database support might be a good example. I could have a FetchData module that exposes functions for getting data from some source, maybe a file system, or a database. I might have a SQLite module, and an Oracle module, and a FileSystem module that each adds functions to some external SQLite library, or Oracle library -- functions used only by my FetchData module. I can do it your way, but it means adding more boilerplate structs and wrapping constructors, but I think it's much cleaner to simply add the functions needed by FetchData to the external structs. Let's compare my code above with how I have to do it in Go (well, one way; I'll be happy to hear of a less ugly one):


type MyInterface interface {
F()
}

type MyExternalType1 struct { *p.ExternalType1 }
type MyExternalType2 struct { *p.ExternalType2 }
type MyExternalType2 struct { *p.ExternalType2 }

func (k MyExternalType1) F() { ... }
func (k MyExternalType2) F() { ... }
func (k MyExternalType3) F() { ... }

func MyFunc( k MyInterface ) {
k.F()
}


And rather than calls to MyFunc() being merely:

MyFunc(externalInstance1)

it's now

MyFunc(MyExternalType1{externalInstance1})

Worse, this code:


func Bar(k interface{}) {
MyFunc(k)
}


becomes:


func Bar(k interface{}) {
var m MyInterface
switch k.(type) {
case p.ExternalType1:
m = MyExternalType1{k}
case p.ExternalType2:
m = MyExternalType2{k}
case p.ExternalType3:
m = MyExternalType3{k}
default:
error()
}
MyFunc(m)
}


Every time I add support for a new external type, I have to modify Bar() (whereas I don't, in the other version). This makes Bar more brittle, and the
application more difficult to maintain, and more error-prone. I can't see any way in which the Go limitation improves the situation.

steven099 said...

@SER — You're assuming that end user is the only one importing packages. You're assuming that there aren't any conflicts. I don't control what packages the packages I am using import, or what those packages import. The method I add could conflict with a method added by a package I didn't directly import (and thus have no control over importing), and this could break either my code, the code in the package I'm importing (directly or indirectly) or both. Heck, two packages I've indirectly imported from different sources could conflict. You're essentially creating an unregulated global namespace.

Here are the key words in your post: "without having to track what types they are". This is a very dynamic idea of what you should be able to do. If your function accepts an arbitrary type, it can't expect that you have added a specific method to that particular type and you're good to go. There's no static justification for that assumption.

The Go way would be to make that method part of the interface type of the parameter, so that the choice of how to wrap the value is up to the user of the function. By moving the choice back, you allow it to be done statically. And this gives the user more latitude in what they give you, since it doesn't have to be one you've considered, it just has to implement the interface, and the code will work. They could add their own method. And they can change that choice between calls to your function, as it suits them.

Another way, if you want to apply the method to an arbitrary type passed in (I didn't realize this might be what you wanted, since you specified statically known type) is:

type I1 interface { Bar(); Baz() }
type I2 interface { Foo(); Bar(); Baz() }

type Wrapper struct { I1 }
func (w Wrapper) Foo() {
    // Do something requiring only Bar and Baz
}

func F(v I1) I2 { return Wrapper{v} }

This way, you can do it based only on statically known information. If you want, you can decide whether to wrap it or not based on whether it already implements the interface, but that's the only type switching you might need.

So, either the user knows you added the method, and therefore you can leave it up to them to wrap the type however they want or provide their own (giving the user choice), or you're adding it on-the-fly to an arbitrary type, so that's all you have to worry about.

I hope this has helped you see how you might solve this kind of problem in Go.

steven099 said...

@SER — I should mention that a common idiom is to use a function rather than a method:

type Fooer interface { Foo() }
func Foo(v interface{}) {
    if v, ok := v.(Fooer); ok {
        v.Foo()
        return
    }
    // default implementation
}

Of course, the behavior doesn't have to be a direct override. For example, the fmt package has a Stringer interface which types can implement if they want to override their default string representation.

SER said...

@steven009: I understand your point about the namespace conflicts, but the compiler can easily figure this out; I shouldn't have to. If I add a function to a namespace, and so does another package, then the only possible conflict in in code that imports both packages. The compiler can easily verify at compile time that there are no namespace conflicts in those packages.

I'm not claiming that go should statically analyze dynamic casting situations. The point in my example was not that I shouldn't have to use the "t.(Interface)" idiom, but that I shouldn't have to use the "if x is type k, then wrap in kwrapper, else if x is type y then wrap in ywrapper, else...".

I'm claiming that it is both practical and useful to be able to add functions to an existing namespace, and that I haven't yet heard a good reason why this is neither desirable, nor technically possible.

steven099 said...

@SER — I'm not quite sure you realize the implications of having incompatible libraries. It's not just about keeping track of it. The D1 community was fractured because they'd developed two mutually exclusive ecosystems. D2 is supposed to have fixed that, but it was a huge issue for the language.

So, you'd think you'd need some way to resolve the conflict, but that would be impossible. If I add a String method to a type so it gets JSON formatted when printed, and someone else adds a String method to the same type so that it's HTML formatted, there's simply no resolving it. One package or the other will be broken. As long as you are altering the original type, these kinds of unresolvable conflicts can and will occur, at great detriment to the users of the language. Under Go's approach, these would be two distinct types, so it would be a non-issue.

I can see the appeal of wanting to add a method to a type so that you don't have to worry about it, and the user doesn't have to worry about it, it just works.

But, aside from the conflict issue, since this means altering the type itself, it means the user has less flexibility in overriding this behaviour. They don't get to choose which method to use on a case by case basis, because the type can only have one.

Go's approach allows choice. Sure, you have to define N local types for N supported external types, but you already have to write N*k methods, and the types are just an extra line each. You can hide the wrapping from the user using an N+2 case type switch (+2: default wrapping and user defined), or you can leave it up to the user, which is fairly easy for them since they statically know the type of the value they're giving you and may want to do it anyway.