For those who haven’t seen it yet, due to popular demand, the StackOverflow people created a new site called programmers.stackexchange.com, a site for the more subjective questions that StackOverflow isn’t really designed for. Someone recently set up a poll: What’s your favorite programming language. You can probably guess what my answer was.
Delphi
It’s the standard, imperative, object-oriented paradigm that most coders are familiar with, but it gets all the little details right that the C family always gets wrong. Plus the more recent versions have started to add support for mixing in functional programming concepts, without running into the ugly abstraction inversions that typically come with functional languages.
I’ve gotten some interesting replies to that. One of them asked, “to justify the C-bashing (so they don’t clobber you!), and to satisfy my curiosity, what are the little details Delphi/Pascal gets right?”
Well, I’d answer that in a comment, but there isn’t enough space in 500 characters to do it justice, so I figured I’d write up my answer here. Let’s start with the elephant in the room.
SECURITY!!!
There is no good reason why anyone should ever under any circumstances write a networked program with security requirements–such as an operating system or a Web browser–in C, C++ or Objective-C. We’ve known that for 22 years now, ever since Robert Tappan Morris released the Internet Worm that used buffer overflow exploits in Unix to crash about 10% of the Internet, causing tens of millions of dollars in damage. (It was a very small Internet at the time. Today, the cost would be measured in hundreds of billions, and it probably could be, if not more, if you add up all the damages lost due to buffer overflow and other C-specific vulnerabilities exploited over the last couple decades.)
This should have been a major wake-up call. You cannot write programs that require reliable security in a language that was designed with no thought to it! At least not with any degree of consistency. Just look at how many patches are still being issued today for Windows, Linux, OSX, iOS, and various Internet services and Web browsers, all due to buffer overflows. And it’s getting worse. We’ve got a lot of people talking about computerizing the power grid, which makes a lot of sense in theory, but it’s likely to open up a few hundred million new vulnerabilities to terrorists both domestic and foreign. (Who all’s seen Live Free Or Die Hard? Anyone want to live in that world, except without Bruce Willis and that “I’m a Mac” kid to conveniently step in and save the day?)
It’s the same old issue over and over and over again. It keeps showing up in C programs because it simply can’t be solved in C without breaking backwards compatibility. In any sane world, the C language would have been dead by 1989–the Morris Worm having shown it to be utterly unsuitable for its intended purpose: building operating systems–and all our computers would be safer for it. And it’s not like there are no alternatives. By the time the worm hit, Apple had been building the most advanced operating system of its day for several years already, in Pascal. Continuing to write Internet-facing OSes, browsers and apps in C (or C++ or Objective-C) ought to be treated as an act of criminal negligence.
In Delphi, on the other hand, we have a real string type, the best-thought-out string type I’ve seen in any language. It’s reference-counted and grows and resizes automatically as needed, which frees the coder from dealing with string-size and string-memory hassles. It’s bounds-checked, and it does not live on the stack, so there’s no way to use a Delphi string buffer overflow for a stack-smashing exploit. Likewise, for non-string data, Delphi has a real array type which is also bounds-checked. In fact, we’ve got two real array types, both bounds-checked, and the one whose size is not fixed (the more dangerous kind) also does not live on the stack. We’ve also got a string-format routine that doesn’t use varargs, and string-output code that doesn’t assume its input is a format string in the first place, which means that Delphi programs are immune to format string exploits.
In the interest of fairness, of course, I should point out that as with any language that supports pointers, it’s still possible to write unsafe code in Delphi. But you have to really go out of your way to do it; it’s not the default state of the language, the way it is in C! (And removing pointers from a language causes more problems than it fixes. This is why Java and C# both have explicit “unsafe” features: because they’re necessary to actually accomplish important tasks.)
Syntax and semantics
OK, enough ranting about security. Let’s move on to other things. A lot of C’s security problems have been fixed by managed languages like C# and Java. But what they can’t fix is the syntax, at least not without abandoning a great deal of their C roots, which is a very important marketing device. For example, C has no boolean type. Some of its descendants do, but they haven’t managed to escape the ramifications of this blunder: because C has no boolean type, anything can be treated as a boolean.
I heard a really horrible pun a while ago: The cake may be a lie, but Pi is always True. (Because it’s a nonzero number, and in C, anything can be treated as a boolean.) When everything is a boolean, including an assignment operation, it’s not safe to write “if x = 5″, no matter how intuitive that looks. And when everything is a boolean, including a number, if you try to and two expressions, the compiler doesn’t know if you mean a logical or a bitwise and, so you need two versions of all the boolean operators. And if you get them wrong, it might work, or you might end up with some very hard-to-debug issues.
Java, JavaScript and C# still have double versions of all the operators. And I know the “if x = 5″ bug still exists in JavaScript. (I’ve heard the compiler doesn’t accept it in C#. Not sure about Java.)
In Delphi, a boolean is a boolean, and a number (or a string or an object) is not. This means that we have one and, one or, one xor and one not, and the compiler knows what to do with them by looking at the operands. And if you try to do something nonsensical like anding a boolean and a number, the compiler throws an error instead of silently accepting it and generating nonsensical code.
And while we’re on the subject of operators, can anyone tell me what * or & do in C? “Well, it depends on whether–” Oh, I see. Fundamental syntactic elements whose meanings are context sensitive. How lovely. In Delphi, “a * b” means multiplication and nothing else. And for addressing and dereferencing, we’ve got the @ and ^ symbols, which actually make sense mnemonically.
Then we’ve got object-oriented programming. C++’s object model is a big mess. There’s no base object class, which means that there’s no way to pass an object of arbitrary type between one routine and another. This also means that there’s no standardized way to take an object and get RTTI information about it. And objects are value types, declared by default on the stack (or inline in the larger object that contains them), and passed around by value by default. This wreaks havoc on inheritance and polymorphism.
For example, what’s the output of this program? And if you change the signature of Foo to pass the object by reference, does it alter the output of the program?
#include
class Parent
{
public:
int a;
int b;
int c;
Parent(int ia, int ib, int ic) {
a = ia; b = ib; c = ic;
};
virtual void doSomething(void) {
std::cout << "Parent doSomething" << std::endl;
}
};
class Child : public Parent {
public:
int d;
int e;
Child(int id, int ie) : Parent(1,2,3) {
d = id; e = ie;
};
virtual void doSomething(void) {
std::cout << "Child doSomething : D = " << d << std::endl;
}
};
void foo(Parent a) {
a.doSomething();
}
int main(void)
{
Child c(4, 5);
foo(c);
return 0;
}
If you have to stop and reason about it for any length of time, that’s a warning sign. I asked our local C++ expert at work what this would do, and he thought about it for a few minutes, came to a logical-sounding conclusion about what he thought it should have to do. Then he wrote up the code above to test it, just to be sure. (This is a guy who’s been writing C++ professionally since I was in high school, and he’s really good at it. But even with all that experience, he’s not experienced enough to be confident what it would do without testing it.)
In Delphi, there aren’t a bunch of arcane passing and copying rules to keep track of when you’re working with polymorphism. Objects are always reference types, so when you pass an object to a function, it passes that object, and when you call a virtual method on an object, it calls that object’s class’s version. Always.
External code
There are at least technical reasons that can explain a lot of the above issues. But here’s something really bizarre that I’ve never heard a good explanation for. I had to debug a C DLL that one of my Delphi programs calls into a while ago to fix some problems in it. I opened it up in Visual Studio and tried to get it to build. Everything was syntactically correct, and it compiled just fine… and then failed at the link phase, because it couldn’t find the .lib file for a second DLL that this DLL requires.
.lib file? What in the name of Turing is a .lib file?!? Turns out it’s a file that describes… something… about the other DLL so that the linker can hook… something… up properly. I really have no clue what it is or why it’s necessary. I’ve never had to deal with them before. In Delphi, if you need to link to a DLL, you put a function header in the code, declare that it’s an external reference, and provide the name of the DLL it’s found in, and that’s it.
The C code had all the same information: there was a .h file containing the function headers and… ohhhh, wait. Now I see what’s going on! That’s actually the exact same .h file that’s in the DLL I’m linking against. So it doesn’t specify that these functions are external references, or where they’re found. That information, even though it’s an important part of your source code, needs to be provided in a .lib file, a binary blob generated by the compiler of the external DLL, that’s not human-editable and not version-control friendly. (And if your external DLL wasn’t written in a C family language, you’re in for even more fun trying to generate a .lib file.)
What it all boils down to is that, for some bizarre reason, Delphi does a more hassle-free job of linking to C DLLs than C does. That doesn’t even make any sense, but it’s true. Delphi can talk to external C code better than C can!
I could go on, (I haven’t even mentioned templates yet!) but this post is getting long enough already. But I think the facts speak for themselves. By paying attention to little details like the things I’ve mentioned here and thinking through the ramifications, the Delphi language designers have managed to build a language that is easier to work with and easier to write correct code in. I hope this clarifies what I meant when I posted that.