I have an exceptionally long history with software portability which
began in the early 1980s. My first C programming was on the legendary
VAX-11/780 running Berkeley UNIX, and I quickly learned the dangers of
that platform. The 32-bit VAX architecture was so forgiving and generous
of error that moving to other platforms quickly revealed the bad
assumptions made while programming on the VAX. These lessons have
stuck with me for years.
Even in 1983, I routinely moved my C code back and forth between the
32-bit VAX and my 8-bit CP/M Z80 system which ran the also legendary BDS
C Compiler by Leor Zolman (I still have the v1.5 user manual on my shelf,
the first piece of software I ever purchased). This was the start of a
career in software portability, and it even included teaching a week-long
class at AT&T Bell Laboratories in Holmdel NJ entitled "Portability,
Efficiency and Maintainability".
Below are two specific projects I've worked on that had "portability"
higher on their to-do list than most, followed by some more detailed
thoughts on portability in general. This is by no means the exhaustive
list of software that runs on multiple platforms, but it's a flavor
of them.
Gosling/UniPress Emacs
James Gosling, now best known for his involvement in the creation of
Java, was well known in the early 1980s as the author of Gosling Emacs.
This very powerful editor predated GNU Emacs, and at the time was marketed
commercially by UniPress Software.
We had a tape of Gosling Emacs at school, and I undertook to learn it
inside and out.
In addition to using lint extensively on the code to clean up bugs
and nonportable issues, I undertook to port Emacs to the Onyx C8002, which
was based on the Zilog Z8000 chip. This was a quasi-16-bit architecture
which had a much smaller address space than provided by the VAX. It
took a great deal of conditional compilation to eliminate the fluff and
non-critical components: this Onyx system ran UNIX 7th Edition and had
only the ed line editor - no vi.
I ultimately was successful in making this port, and the resulting
code was dubbed minimacs and was used quite a bit on campus. It also
somehow came to the attention of the folks at UniPress Software, and
they licensed my changes from the university: the Math Department where
I was based received royalty checks from Unipress for several years due
to my work.
UniPress also hired me to visit their New Jersey headquarters and help
in a porting effort: they had many UNIX machines around that needed
Emacs builds, and I got familiar with many of them. Though much of the
code was quite portable, areas such as signal handling and terminal
I/O were troublesome and required changes to the code and the master
build environment.
I don't believe UniPress still distributes Gosling Emacs - GNU seems
to have taken over - but this was my first large-scale introduction
to porting.
VSI-FAX
I was the principal author of the first commercial release of the VSI-FAX
UNIX facsimile system, and in that capacity was responsible for porting
our software to more than the 30 supported UNIX platforms. As with Emacs,
we found that some parts of the code ported very nicely, but others
(signals, terminal I/O, shared memory) did not.
I created the very extensive build environment and front-ends to the C
compiler and linker that allowed us to craft the compiler environment
(#include path, define macros, find libraries) that did not burden the
individual subdirectory makefiles with huge long command lines. While
working in one of these subdirectories, it was a simple matter to type
"make" and it would use the correct compiler, compiler flags,
and other associated tools. This approach proved to be extraordinarily
successful over the years: the build system is largely as I created it
more than 15 years ago.
Then in 1997 I led the effort to port our software to Windows
NT. Originally another developer attempted this by using a commercial
UNIX-emulation layer (NuTCracker), but we found it wanting. Though much
of the code had long had its UNIX-dependent parts isolated, we found
that some of the key parts simply did not lend themselves to abtractions:
whole new versions were created to take proper advantage of the functions
offered by the Win32 architecture.
Win32 provided a radically differnet approach to interprocess
communication, plus the entire NT "service" model and installers had to
be accomodated. But the fortuitous use of the MKS Toolkit - with the
stable of traditional UNIX tools ported to NT - we were able to use
the same build environment as the UNIX product line. The early NT
developers were all UNIX folks, and we all preferred the Korn Shell and
vi to the Microsoft IDE tools anyway.
What is "Portability"?
The notion of "portability" is widely used, and it's often attached
to software inappropriately. Software is only "portable" if it can
actually be moved to a different platform, and these are some of
the issues that are considered when a port is imminent.
- CPU Architecture Portability
- The differences underlying CPU architecture exposes itself in many ways to
the C programmer, and these can erupt in surprising and disturbing ways. We
can touch on a few of these here:
- Wordsize
-
Modern machines are typically 16, 32 or 64 bits for
their "natural" words, and the scalar variables short,
int and long have varying sizes. Portable software
does not typically rely on an integer being larger than 16
bits: a long must be used if 32 bits are required. On
a 32-bit processor they're the same, but on a 16-bit Windows
system the difference is noted properly by the compiler.
-
Using correct word sizes can lead to more optimal data storage
and processor effiency with minimal wasted space on data types
that are "obviously" too large for their intended purpose.
- Word Order
-
Integral quanties can be stored in "big-endian" or "little-endian"
order, which describes whether the high-order or the low-order bits
of the word are stored in low memory. This is not normally visible
to the developer, but for network data communications or moving data
between systems of different architectures, this distinction cannot
be ignored. Examples: the DEC/Compaq Alpha and Intel processors are
little-endian, and the SPARC, Motorola 68000, and AT&T 3B2 processors
are all big-endian. Traditionally, TCP/IP uses big-endian word order
as data travels over the network. Software that does not take into
account word order will find very rude surprises when being ported
to a platform with different endian-ness.
- Word Alignment
-
Traditionally, scalar data items are aligned at memory addresses
that are even multiples of their own size: 16-bit values are found
at even addresses, and 32-bit values are stored at addresses divisible
by four. But not all processors enforce this, to the chagrin of
nonportable software.
-
Most modern processors will fault when accessing these values at odd
addresses, but architectures like the VAX permitted them (though with
a modest performance penalty). Software that packed data very tightly
into memory buffers without alignment would work correctly on the VAX
but would fail badly on machines with stricter alignment restrictions.
The "fix" often required copying data from their unaligned forms into
temporary variables for manipulation, then copying them back. This
made for very ugly code. Proper attention to
word alignment (typically by using C structures instead of maintaining
"raw" buffers) would have alleviated this whole mess.
- Compiler Portability
-
Not all C compilers accept precisely the same language, though in practice
this has been getting easier over time. In the early 1980s, many who ported
software avoided C language constructs such as bitfields and structure copy
because they were notorious for being implemented badly. Dealing with sign
extensions from char and short to int were also dependent on
the compiler vendor in ways that were not always easy to check.
-
With the introduction of ANSI C in the late 1980s, this introduced an
even stickier problem: should one use the outstanding features of ANSI C
(function prototypes, the const and volatile type qualifiers,
the <stdarg> facilities, etc.) or not? At the time, portable
software could simply not rely on ANSI C being widely available - GNU C
was not yet mature - and it was often an agonizing tradeoff on just how
much of ANSI C could be use.
-
Ultimately, most software porters created portability macros that allowed
the use of many of these features in code that could be straight K&R
or ANSI. Found in header files would be:
-
#ifdef __STDC__
# define PROTO(args) args
#else
# define PROTO(args) (/*nothing*/)
# define const /*nothing*/
# define volatile /*nothing*/
#endif
...
extern char *strcpy PROTO((char *dst, const char *src));
extern int printf PROTO((const char *format, ...));
- This compiled in function prototypes when they were available,
but omitted them when not so. Sadly, this only worked for the function
declarations: the function definitions didn't lend themselves
to convenient macro support such as this.
- But no amount of clever macros could completley hide the differences:
some facilities simply had to be conditionally compiled depending on
the ANSI- or non-ANSI-ness of the compiler. Notable was support for
variadic functions:
-
#ifdef __STDC__
# include <stdarg.h>
#else
# include <varargs.h>
#endif
#ifdef __STDC__
void die(const char *format, ...)
#else
void die(va_alist) va_dcl
#endif
{
va_list args;
#ifndef __STDC__
char *format;
#endif
#ifdef __STDC__
va_start(args, format);
#else
va_start(args);
format = va_arg(args, char *);
#endif
vfprintf(stderr, format, args);
va_end(args);
exit(1);
}
-
This kind of mechanism - though not terribly attractive - permits use
of the best features of both worlds.
- Build Environment Portability
- The
"build environment" is the set of all software and tools required to go
from "source code" to "delivered product". This includes the compiler,
of course, and even in this area there can be wide diversity. The GNU
compilers have largely a standard set of command-line parameters across
platforms, but vendor-provided compilers have a very wide array of
features that can and should be enabled for product builds. Even minor
features such as "enable maximum compiler warnings" must be determined for
each platform so that the feature may always be invoked properly.
-
In addition to the compiler, there is the linker, the set of
third-party libraries required to build the software, the make
tool, and the overall configuration scripts that set up the developer's
environment. Software that uses TCP/IP sockets often requires more than
one library, and the particular libraries vary widely.
-
Even the ability to use perl and shell scripts as "helpers" is a consideration:
fantastic tools can be built with perl, but a requirement that a platform even
have a recent perl interpreter might be an onerous one.
-
Operating System Portability
-
Here we find the most troublesome issues, because operating systems vary
much more widely than compilers, CPU architectures and build environments. In
addition, it's often simply not possible to "work around" an issue with an
operating system in the same way that conditional compilation can resolve one
with the compiler.
-
Some examples limited to just the UNIX platforms:
-
- Length of filenames: 14 characters, or 255?
- Are symbolic links supported?
- Is the select() system call present?
- Will select() work on a serial device?
- Is serial I/O control done via System V or Berkeley ioctl() semantics?
- Are signals managed with Berkeley, System V or POSIX semantics?
- Are UNIX domain sockets available?
- Is job control available?
- Can the operating system map a file into memory?
- Can shell scripts be run setuid?
- Can a user "give away" a file with chown()
- Are threads supported? Which semantics?
-
These are just the start of questions that arise when porting
software. Some of the questions are of relatively minor impact, such as
knowing how signals are handled. With a certain amount of conditional
compilation, it's possible to support all the varied flavors of signal
semantics and be relatively sure that they will work correctly, but if
(for instance) the select() system call is not available on all
platforms that are to be supported, it puts a very serious damper on
how a program does I/O.
-
Virtually all modern systems support select(), but this wasn't the
case in the early nintiees, and even those that did would not include
serial devices in the class of supported file descriptors. "Working
around" the lack of select() meant a complete re-architecting of
the software that could have benefitted from it.
-
When considering a port to entirely different operating systems, such as
Windows NT, these issues become much larger. Though the ANSI C library
is highly portable across all supported platforms, most "real" software
uses operating system functions that are far beyond the standard C
library. UNIX and Win32 have fundamentally different approaches to things
like process management and interprocess communication, and these cannot
often be abstracted away. Win32's WaitForMultipleObjects() API call
and I/O completion ports are often too useful to avoid using simply in
the name of portability, so separate modules for UNIX and Win32 evolve.
Ultimately, the final test of software portability is in the porting itself:
by moving the source to the new system and trying to build it, a whole
throng of issues suggested above will veritably come out of the woodwork
to frustrate the efforts. Only after solving these problems many times does
one start to take pre-emptive action to engineer portability in from the very
start.