Entirely Missing the Pointer

Whenever I read about Swift, I read about the distinction between reference types and value types.

In the C-based languages I used (and still use), I never thought about it like that. Instead, I thought in terms of pointers1 and everything else, which here I’ll call non-pointers.

You could have a pointer to anything, but in Objective-C they are used especially for class instances. And you could have a “non-pointer” to anything, including scalars, structs, and (in C++) class instances.

And it’s always been visually easy to distinguish between the two: one has an asterisk, and one doesn’t.2

Check it out. The ones with the asterisks have reference-based semantics, and the ones without the asterisks have value-based semantics:

Reference:

@interface Foo : NSObject
@property int bar;
@end

@implementation Foo
@end

struct Bar {
	int foo;
};

void referenceTest() {
	int intStorage = 0;
	int *myInt1 = &intStorage;
	int *myInt2 = myInt1;
	*myInt1 = 10;
	printf("%d\n", *myInt2); // Result: 10, same as myInt1
	
	struct Bar barStorage = { 0 };
	struct Bar *bar1 = &barStorage;
	struct Bar *bar2 = bar1;
	bar1->foo = 10;
	printf("%d\n", bar2->foo); // Result: 10, same as bar1
	
	Foo *foo1 = [Foo new];
	Foo *foo2 = foo1;
	foo1.bar = 10;
	printf("%d\n", foo2.bar); // Result: 10, same as foo1
}

Value:

// Same declarations/definitions as above

void valueTest() {
	int myInt1 = 0;
	int myInt2 = myInt1;
	myInt1 = 10;
	printf("%d\n", myInt2); // Result: 0, not same as myInt1
	
	struct Bar bar1 = { 0 };
	struct Bar bar2 = bar1;
	bar1.foo = 10;
	printf("%d\n", bar2.foo); // Result: 0, not same as bar1
}

Note in Objective-C we can’t have a value-based version of the class. (Though we could in C++.)

Swift has a completely different philosophy. Reference vs. value isn’t syntax-based, it’s identity-based. The exact same syntax will produce different results depending on the original definition of what you’re working on.

Reference:

class Foo {
	var bar: Int = 0
}

var foo1 = Foo()
var foo2 = foo1

foo1.bar = 10
foo2.bar // Result: 10, same as foo1

Value:

struct Foo {
	var bar: Int = 0
}

var foo1 = Foo()
var foo2 = foo1

foo1.bar = 10
foo2.bar // Result: 0, not same as foo1

The only difference in the two samples above is the class vs. struct keyword.

Because that’s such a stark philosophical gap, and for me, an unexamined one, it was quite hard for me to get my mind around it at first.

Notes:

  • For me, Java was the first mainstream language that removed the asterisk for reference types and stopped calling them “pointers”. Though amusingly, they still have a java.lang.NullPointerException, which I’ve always assumed has to be confusing to newbies!
  • Swift further muddies my concept of reference-as-pointer and value-as-non-pointer by allowing multiple value type instances to actually point to the same memory as long as you don’t modify their contents, which C-based value types never did. So Swift value types can now actually be implemented by C-style pointers under the hood.
  • For Objective-C users, our first taste of this kind of thing was with blocks. A newly-created block is kind of a “value” type, created on the stack like all other non-pointer C types. But then you copy it, and it becomes a kind of “reference” type you can pass around outside of the function scope. Same syntax for either type, just like Swift.

1. Here, I define “pointer” as an explicit C reference to memory, detached from its management. Non-pointers are still referencing memory locations, but the runtime manages their creation and destruction as part of something larger: the stack, a class or struct, etc. ↩︎
2. C++ muddied this distinction a bit by introducing references, though they had the decency to give it a difference punctuation mark. ↩︎

One comment

  1. Nate

    I’ve been thinking recently that the problem with C isn’t so much the existence of pointers but that pointers are unrestricted: a pointer plus some arithmetic can give you access to an entire machine’s memory, which a function doesn’t own and has no business accessing. And that’s how we get all the security pain we’re now experiencing. I guess that’s why the trend for Java-like ‘references’ in modern languages which don’t let you get at their innards to break them. But as you say, the lack of that visual syntax (the asterisk) makes it harder to grasp the semantics of whether it’s stack-allocated vs heap-allocated, and how it behaves under mutation.

    Could a C-like language have C-style pointer syntax (eg asterisk) without letting you do arbitrary operations on the value you get from accessing a pointer without the asterisk?

    Alternatively, might it be useful for a machine-word-sized pointer to specify a range, rather than just a single cell of memory? Eg (off the top of my head) add a couple of bits to the top or bottom end specifying a power-of-2 block size? I suppose that’s still unhelpful unless something is checking every pointer operation against the range bits, and that would get slow (this should be the CPU’s job in any decent architecture, but we don’t get to have decent architectures since the Intel x86 ate the world).

    The Linux scene (eg KDE and GNOME) seems to be filled with various kinds of mostly incompatible object systems layered over C/C++ with various ‘smart pointer’ implementations, which do… I dunno what… but I guess none of this is very helpful unless it’s implemented and enforced at the level of at least a language runtime, if not a full virtual machine.

    Between Swift/Go/Rust, is it looking like there’ll be a winner that could take over from C yet? Eg open-sourced, available on all platforms, moderately secure, doesn’t randomly change core design specs every few months?