Facebook

Tuesday, June 2, 2009

Parallel Objects Never Meet: What to do with all those cores

In the last few years the steady increase of microprocessor clock rates and decrease of power consumption has leveled out, due to physical causes that IC designers haven't been able to overcome. This change in a decades-long trend meant that companies like Intel, whose business model is based on continually increasing the performance of their products by following Moore's Law towards ever smaller individual features, and thus ever larger numbers of transistors on the silicon their chips are made of, have had to find another way to leverage the increased number of transistors to improve performance other than by adding more memory and computing elements that go faster. The strategy that Intel, and other companies trailing it down the Moore's Law curve, has taken is to freeze the speed and functionality of each processor, and then put multiple processors, called "cores" on each chip. The current generation of chips on the market has 4 or 6 cores per chip, with 8 core chips coming soon. Chip designs capable of 64 cores have been demonstrated in the lab.

Of course, it's not that simple to take advantage of those extra cores with existing software. Parallel processing was a very active field of research in the 1980's, but the conclusions about that research were that new programming languages would need to be needed, and all new software would have to be written, with the programmers having to control the parallelism of their code explicitly. Moreover, most research was in specialized application domains like numeric programming, and computer vision, where the required parallelism fell easily out of the problem descriptions. But most computing from the 90's on was business or personal computing, not scientific, so the chip manufacturers didn't see parallel computer architectures as an important part of their market. Parallel processing fell out of vogue even in research, and very little money was available for research and development in the commercial sector of the computer industry.

The rise of the web application and the development of large scale application servers provided one way to use parallelism for business and personal computers: since any given server is doing pretty much the same thing, only with different data, for each web client it serves, it's not hard to build server software that can take advantage of large numbers of processors. Server computers have been built with multiple processors for more than 10 years now; using multiple-core chips isn't difficult.

But that doesn't help the large number of personal and business desktop and laptop (and soon palmtop) computers. Utilizing 2 cores isn't too hard, and 4 cores can be kept somewhat busy, but after that it's not clear that there's a lot of benefit to more cores without completely rewriting the software, a monumental and very costly task for most computer applications. It's not surprising then that a lot of research and money is going into looking for ways to reduce the amount of rewriting necessary. Some of the ideas being investigated include extensions to existing programming languages to allow explicit control of parallelism and runtime support libraries that do the work of synchronizing and controlling code running on multiple processors.

I think there's another way to do the job, one that may even allow many existing programs to be just be recompiled, and will almost certainly allow tools to examine and modify or flag code that needs to be changed to take advantage of multiple core. It's based on the fundamental nature of object-oriented programming, in which an object is an encapsulated piece of computation, including code and data.

When I first learned about objects, they were described to me as conceptually little computers, with inputs and outputs. Each object could be, at least in theory, be run on a separate computer, as long as their internal states were not accessible to other objects. In some of the first object-oriented languages this was the basic model of computation; there was no way for one object to access another object's data, the only thing an object could do was send a message to another object and receive a reply. That model works very nicely in a parallel environment because the message-sending mechanism can have synchronization between processors or even distributed computers built into it.

In fact, there are several object-oriented environments that allow a message to be passed to a local object on the same computer as the sender, or to an object on another computer, depending on where the destination object is running. It's even possible for an object to move between computers and have its messages follow it correctly, without any change in the program that the sender or receiver are running. Smalltalk, for instance, does not allow one object to access another's data, only message passing is allowed between objects, so implementing such a system is relatively easy. Java does allow an object of a given class to access the data in an object of that same class, so it's not as easy. The difference between the two languages is that in Smalltalk any object of any class can be moved around, whereas in Java an interface has to be designed for the destination class, and the sending object has to use a reference of that interface type as the destination object; an arbitrary class cannot be used. And, of course, non-object-oriented languages need not apply.

Unfortunately this style of program design is not common even among object programmers. Most OO languages don't have any way to prevent an object of a given class from accessing the data of another object of that class, and many programmers have fallen into the habit of using this feature on the assumption that direct data access is faster than messaging in the local case. In fact, most programmers never think of the remote case unless their program requirements include it. So most OO programs have inter-object data accesses, and often global data (such as class variables) that would have to be modified into message sends or replicated and synchronized data.

With some languages this may not be a great problem: the semantics of the language allow a smart compiler or runtime support system to recognize the data accesses and convert them into message sends. Global data accesses can be converted into message sends to class objects. This allows objects to communicate even if there is no shared memory area for global data. Smalltalk would require a minor modification to the compiler to handle class variable accesses. Java would require a new compiler to detect and modify data accesses. Both languages could benefit from compiler and runtime support to detect object dependencies and optimize the placement of objects in memory spaces and the grouping of obejcts in threads or processes. Python might require some constraints on use, as it has some non-object functionality in it (the module namespace might cause some trouble, for instance), but a suitably jiggered compiler might be able to catch the problems, and perhaps even fix them. CLOS (Common Lisp Object System) would be a good language to try; I suspect that some features of the base Lisp would have to be constrained (macros, for instance).

The more impure object languages like C++ are problematic; the more ways there are to code without or in spite of objects, the harder it will be to find all the dependencies that would break correct parallelism, and the more the culture of the languages users is likely to cause them to reject the idea. That's fine with me because I'm not a fan of those languages (because of having been a C++ programmer for years, and a member of the C++ language standard committee). All the arguments I've heard for low-level languages have left me cold: I don't believe that the language I use has to look like the inside of a crippled von Neumann machine because I don't see any advantage to that. Making it difficult to do things in a natural way because that's "good programming discipline" is just silly1. And the efficiency and performance arguments are based on not understanding the alternatives; a good Lisp compiler can give a C program a run for its money any day.

I'm curious to see if there are software developers, programming language and compiler experts, or parallel system gurus out there who can tell me why this scheme won't work, or why it's obviously less practical than any other scheme for making use of multiple cores for existing software applications, not system software.

1. To me, programming languages are user interfaces to computer capabilities. Making a bad interface deliberately is downright perverse in my view, not matter what the justification. Instead of forcing the programmer to bend to some model of the computer, why not spend time and energy finding ways of optimizing programs written in user-friendly languages?


No comments: