using nwcc (in a gcc world)
Regardless of whether you are a developer or a user of open-source
software (or both) - use of compilers which aren't gcc poses a variety
of simple but diverse problems. I recommend that you read this document
to get an overview to some of them before you begin using nwcc.
Before I get to the problems of using nwcc in place of gcc, I'll
first describe some ways in which nwcc can actually benefit from gcc
and associated tools in a symbiotic way.
This text is therefore divided into two parts:
- Part 1: The GNU Toolchain
- Part 2: C Standards and Compatibility
If you have a working nwcc setup, and are only interested in using it to
compile (and develop) arbitrary open-source applications, you can skip
the first part.
If you have a broken nwcc setup, or are interested in combining gcc
code with nwcc code, the first part is for you.
Part 1: The GNU Toolchain
gcc is typically used in conjunction with GNU binutils to build a
complete stack of applications - or "toolchain" - to turn C code into
executable machine code:
- cpp
The GNU preprocessor; This used to do preprocessing for gcc,
but since the introduction of libcpp, preprocessing is done inside
of gcc
- gcc
The C compiler driver; Translates C code into textual
assembly language output, and drives assembling and linking
- gas
The GNU assembler; Translates textual assembly language
code into binary machine code object files
- ld
The GNU linker; Combines multiple object files into shared
libraries, or multiple object files and libraries into executable
programs
(There are also libraries and loaders, but we shall ignore those
for now.)
Let's begin with the lower, less interesting (for nwcc) parts of
the software stack before we get to gcc itself.
1.1 Linking
nwcc always uses the most "native" linker of a given system. On
Linux and BSD systems this means GNU ld, and on proprietary vendor
systems, it means the corresponding vendor linker. Thus there is no
question about which linker to use, or how.
Note, however, that some linking problems can be worked around by
using gcc to give a helping hand; This is explained in the gcc section
later.
1.2 Assembling
On x86 and AMD64 systems, nwcc supports the GNU assembler as well
as nasm, yasm and nwasm. The NWCC_ASM environment variable, the -asm
command line option, and the "asm" configuration file entry can be
used to select the desired assembler to use; See the USAGE file for
details.
On other UNIX systems, the vendor assembler is preferred. As with
the linker, there isn't much more to be done about this.
1.3 Preprocessing
nwcc prefers to use "gcc -E" for preprocessing, if possible.
External GNU cpp programs can also be used, but those were found to be
buggy on OpenBSD, and may possibly use an unsupported output format on
various proprietary UNIX versions. Although using gcc might seem to be
an odd dependency, nwcc can also work without it, and will fall back to
using cpp if gcc is not available.
In addition, there is the nwcpp preprocessor, which is part of the
nwcc distribution, but which is neither mature nor well-tested (it is
slower than GNU's preprocessor too).
Also, it has not been properly ported to various supported systems
yet. You can use nwcpp by setting the NWCC_CPP environment variable,
but this is not generally desirable.
Thus it is recommended that gcc be available to assist nwcc
with preprocessing.
1.4 gcc
On to he most interesting component of the toolchain: The gcc
compiler itself!
A more or less up-to-date and correctly installed gcc version on
a given system knows two things which nwcc doesn't necessarily know
that well:
- How to link an application
- How to generate very correct code
The first point is interesting for unanticipated fluctuations in library
components, such as the bit'ness of a given library directory (32bit
vs 64bit), as well as the locations of the dynamic linker, the
standard C libraries, and C runtime support modules.
Thus we find the second symbiotic relationship:
If nwcc cannot link applications correctly, use gcc instead.
(And mail me a bug report!)
That is to say, if the final linking step fails, but nwcc can
otherwise compile your code, you can use the -c option to have nwcc
output object files and use gcc to link it to a final application:
nwcc -c foo.c # compile and assemble foo.c
gcc -o foo foo.o # link foo.o to create program foo
But wait, there's more to it.
In some circumstances on some platforms, gcc and nwcc use software
library routines to emulate what might in other circumstances or on
other platforms be done in hardware.
Consider the simple example of the at-least-64-bit-type "long long",
which is available in C99 and GNU C. This type supports all of the
arithmetic operations, including multiplication and division, but
these operations generally cannot be performed by 32bit processors
such as x86.
Thus the solution is to implement a non-trivial software algorithm
to multiply or divide two numbers which are too large to be handled
by the processor. These software implementations are typically too
large to generate inline in the resulting assembly code, and they
would also be inconvenient to write that way, so compiler developers
put them into a library of support routines.
As a result, the snippet
long long foo = somevalue;
long long bar = othervalue;
foo /= bar;
... may invoke a library routine behind your back:
push 64
push 0
push ecx
push eax
call __nwcc_lldiv
What you need to know when linking nwcc-generated code with gcc, is
that nwcc on some systems references libnwcc.o (which is just an
object file instead of a shared library.) So to link an nwcc
application with gcc, the command line sometimes has to look like
this:
gcc foo.o /usr/local/nwcc/lib/libnwcc.o
(Or whereever you installed nwcc.)
What you need to know when linking gcc-generated code with nwcc, is
that it usually needs libgcc:
nwcc foo.o -lgcc_s
or
nwcc foo.o `gcc -print-libgcc-file-name`
In either case, each compiler drags in its own library automatically,
so when mixing code from both compilers you only have to explicitly
tell nwcc about libgcc, and gcc only about libnwcc.
Another way in which gcc can be used in conjunction with nwcc is by
using it to compile some code with gcc, and some with nwcc, and to
link it all together. This may be needed because:
- nwcc generates bad code for one or more files
- gcc generates bad code for one or more files (unlikely)
- You need optimization for one or more files
- You need strong GNU C support for one or more files
Also, some libraries may have been compiled with gcc and are about
to be used with nwcc. In any case, when linking gcc code with nwcc
code, you have to watch out for ABI incompatibilities. In particular:
- Structs and unions passed and returned by value (on non-x86)
- Bitfields
If either side uses these features in any interfaces, the code will
probably break horribly, and there is nothing you can do about it,
short of compiling all with gcc or with nwcc, or changing the
interfaces.
Part 2: C Standards and Compatibility
The C language was initially defined by Kernighan and Ritchie's first
edition of the book "The C Programming Language". This was generally
viewed as a "normative" reference to the language by developers and
users of C implementations (C originated on Unix, but quickly spread
over to lots of other systems as well.)
This language became known as "K&R C", and is generally irrelevant
today.
Some of the finer aspects of the language were not described clearly
and unambiguously by K&R C, which lead to diverging semantics between the
various C implementations. Also, many implementations added their own
extensions, which were useful and sensible, but made code nonportable to
other implementations that didn't support these features. In order to
remove ambiguities, and to standardize popular extensions, C was first
standardized by ANSI.
This language became known as ANSI C89, the C "as we know it today"
ISO later developed a new standard which standardizes another
set of popular extensions (such as inline functions and the long long
type), and introduces a set of language inventions of its own (such as
type-generic math functions and restricted pointers.)
This language became known as ISO C99, and is still not widely
implemented and used, though some of the most useful features were adapted
by many C89 implementations
The gcc authors have also invented and added many of their own extensions
to C. Some of these features overlap with C99 (C99 adapted GNU features),
while others are for syntactical convenience, and still others are
strongly targeted at implementors of C system software, such as libraries
and operating systems.
For example, there are features to help write type-safe and multi-
evaluation-safe macros, to control linking, and to embed inline assembly
into C code. There are also lots of builtin library functions which can
be used in freestanding environments.
This language became known as "GNU C", and is the dominant language
in today's open-source landscape
2.1 The Key Players
A given C source, as seen by the C compiler, is usually composed of two
separate and independent entities:
- The system library headers
- The actual source file
Where the source file drags in the system headers by using preprocessor
#include directives.
It is important to distinguish between these two because they often
have different ideals of portability.
The system headers are usually tailored to a specific operating system
and will never be used anywhere else (this is not true of libraries such
as glibc, which run on multiple systems.) Thus it may make sense for them
to assume a very specific environment, even down to a particular C
compiler which will always be used with them.
2.2 GNU C vs Standard C
GNU C has a lot of sensible and desirable features, however it also has a
lot of junk. GNU C is huge!
nwcc (as well as other compilers such as tinycc) supports a subset of
sensible GNU C features, but other features which are deemed "not sensible"
or "very specialized", or even just "very difficult to implement", are
missing.
If nwcc supported all of gcc's features, in exactly the same way as gcc,
then there would be no problem whatsoever, and nwcc could always be used
in place of gcc. But it does not.
The problem is GNU C features - used by applications and library headers -
which are not supported by nwcc.
nwcc can be put into "ANSI" mode; In this mode, it accepts GNU extensions
(With a warning), but does not claim to be gcc. In GNU mode it additionally
claims to be gcc. Read on for some advantages and disadvantages of both
ways.
2.3 Applications
Applications usually come in three flavors:
- They are unconditionally standard C (89)
This means the application always works with nwcc (except bugs and some corner
cases)
- They are unconditionally GNU C/C99
This means the application always uses one or more GNU C features. If nwcc
supports those features, the program will work. If it doesn't, it won't
- They are conditionally GNU C/C99
This means the application only uses GNU C features if it is compiled with
gcc (or with a compiler that pretends to be gcc, such as nwcc!)
For nwcc, the first two cases are the easiest to deal with because no standards
choice is required - the program either works or it doesn't.
The last case is the most dangerous one, because it means we have to chooose
between ANSI and GNU C mode. In particular, many applications which test for
gcc by looking at the __GNUC__ macro are quite aware of GNU C. In fact, they
are so aware that they use every conceivable GNU C feature! Many GNU apps
are written in this way, such as gcc.
In general, for applications, we don't want to pretend to be gcc because if
the application checks for __GNUC__, it may use unsupported features, and if
it unconditionally uses GNU C features, there is still a chance that those
are implemented by nwcc regardless.
Thus:
For applications, we don't want to define __GNUC__!
2.4 Headers
System library headers are the most painful part. Some headers were only
written for gcc and gcc-compatible compilers. For example, the FreeBSD
headers "mostly" work with non-gcc compilers, but a few essential ones do
not.
In general, with a GNU-C-aware set of headers, the GNU C mode implements
some macros using inline assembly instead of standard C operations, decorates
some function and structure declarations with attributes for alignment and
things like that, and, most importantly, opens up a lot of GNU declarations.
If we look at standard C and beyond, we will find lots of incompatible
standards:
- C89
- C99
- Various versions of POSIX
- Various versions of UNIX
- The above with additional, system-specific extensions
These standards mostly differ in what functions and macros they provide, and
in some cases, they provide things with the same names but different
interfaces.
This stuff is mostly about namespaces. For example, the various POSIX and
UNIX standards declare the fileno() function in stdio.h. However, if you have
a standard C89 program which does this:
#include <stdio.h>
void fileno() { puts("hello"); }
int main()
{
fileno();
}
Then this program has to work because fileno() in no way reserved in C89, and
you are (supposedly) perfectly free to use this name. But with a set of UNIX
and POSIX headers, this program will break because your fileno() declaration
clashes with the one in stdio.h.
The solution is to introduce "feature selection" macros. A typical system
header will recognize all of these industry standards, and use #if/#else chains
to keep incompatible features such as fileno() separated.
Finally, most headers also have an "extended" mode; Which is a (hopefully
sensible) choice of one of these standards, plus a lot of added stuff. For
example, there are lots of glibc-specific functions and macros which are not
available in any of the industry standards, and are thus put into the extended
default headers.
A given application may come to depend on macros, functions or other types
which are only declared in GNU mode, e.g.:
/home/nils/nwcc_ng [0]$ cat y.c
#include <sys/types.h>
int
main() {
int64_t i;
}
/home/nils/nwcc_ng [0]$ ./nwcc y.c
/home/nils/nwcc_ng [0]$ ./nwcc y.c -ansi
y.c.4: Error: Parse error at `i' (#2)
int64_t i;
^ here
/var/tmp/cpp239.cpp - 1 error(s), 0 warning(s)
No valid files to link.
/home/nils/nwcc_ng [1]$
(A different header could be used to work around this particular problem, but
that is beside the point.)
If we add this usually minor problem to the major problems of e.g. FreeBSD's
headers, we can conclude:
For some headers, we do want to define __GNUC__!
2.5 Conclusion
Somtimes an application will work with any standard C compiler, and sometimes
it will only work if the compiler is, or claims to be, gcc. Historically many
important system headers have been found to be useless in non-GNU mode because
they used huge amounts of obscure GNU C features, or exhibited bugs because
they were simply never tested in a non-GNU configuration.
Lately I've found that most applications work best with nwcc on most systems
if they are compiled in non-GNU mode. Therefore, this is the default setting
on all systems apart from OSX (many OSX versions up until at least a few
years ago seems to absolutely require GNU mode) and the BSDs (where the situation
is unclear). It is the default on Linux since version 0.8.2.
It is always possible to override the default choice explicitly by using the
"-gnu" (enables GNU mode) and "-notgnu" (disables GNU mode) command line
options. These can also be put into one of the configuration files
(~/.nwcc/nwcc.conf or the system-wide file - see the USAGE file for details):
options = notgnu
Note that all GNU C features are still accepted, so defining __GNUC__ is primarily
needed to fool the system headers, or to compile apps which do things like
#if !__GNUC__
# error This program only works with gcc
#endif
There are some other minor points to note here:
- nwcc's ANSI mode isn't quite "conforming", it only means "don't pretend to
be gcc", and that namespace rules are honored. However, some mandatory warnings are not
emitted yet
- The GNU C version makes a difference in many cases. nwcc currently only
pretends to be gcc 3.0. You may wish to set the __GNUC__ and __GNUC_MINOR__ macros to
a different version yourself
There's more to say about this, but this should be enough for an introduction. You
may contact me if you have any further questions.