using nwcc (in a gcc world)

Regardless of whether you are a developer or a user of open-source software (or both) - use of compilers which aren't gcc poses a variety of simple but diverse problems. I recommend that you read this document to get an overview to some of them before you begin using nwcc.

Before I get to the problems of using nwcc in place of gcc, I'll first describe some ways in which nwcc can actually benefit from gcc and associated tools in a symbiotic way.
This text is therefore divided into two parts:

Part 1: The GNU Toolchain
Part 2: C Standards and Compatibility

If you have a working nwcc setup, and are only interested in using it to compile (and develop) arbitrary open-source applications, you can skip the first part.

If you have a broken nwcc setup, or are interested in combining gcc code with nwcc code, the first part is for you.

Part 1: The GNU Toolchain

gcc is typically used in conjunction with GNU binutils to build a complete stack of applications - or "toolchain" - to turn C code into executable machine code:

cpp
The GNU preprocessor; This used to do preprocessing for gcc, but since the introduction of libcpp, preprocessing is done inside of gcc

gcc
The C compiler driver; Translates C code into textual assembly language output, and drives assembling and linking

gas
The GNU assembler; Translates textual assembly language code into binary machine code object files

ld
The GNU linker; Combines multiple object files into shared libraries, or multiple object files and libraries into executable programs

(There are also libraries and loaders, but we shall ignore those for now.)
Let's begin with the lower, less interesting (for nwcc) parts of the software stack before we get to gcc itself.

1.1 Linking

nwcc always uses the most "native" linker of a given system. On Linux and BSD systems this means GNU ld, and on proprietary vendor systems, it means the corresponding vendor linker. Thus there is no question about which linker to use, or how.

Note, however, that some linking problems can be worked around by using gcc to give a helping hand; This is explained in the gcc section later.

1.2 Assembling

On x86 and AMD64 systems, nwcc supports the GNU assembler as well as nasm, yasm and nwasm. The NWCC_ASM environment variable, the -asm command line option, and the "asm" configuration file entry can be used to select the desired assembler to use; See the USAGE file for details.

On other UNIX systems, the vendor assembler is preferred. As with the linker, there isn't much more to be done about this.

1.3 Preprocessing

nwcc prefers to use "gcc -E" for preprocessing, if possible. External GNU cpp programs can also be used, but those were found to be buggy on OpenBSD, and may possibly use an unsupported output format on various proprietary UNIX versions. Although using gcc might seem to be an odd dependency, nwcc can also work without it, and will fall back to using cpp if gcc is not available.

In addition, there is the nwcpp preprocessor, which is part of the nwcc distribution, but which is neither mature nor well-tested (it is slower than GNU's preprocessor too). Also, it has not been properly ported to various supported systems yet. You can use nwcpp by setting the NWCC_CPP environment variable, but this is not generally desirable.

Thus it is recommended that gcc be available to assist nwcc with preprocessing.

1.4 gcc

On to he most interesting component of the toolchain: The gcc compiler itself!

A more or less up-to-date and correctly installed gcc version on a given system knows two things which nwcc doesn't necessarily know that well:

How to link an application
How to generate very correct code

The first point is interesting for unanticipated fluctuations in library components, such as the bit'ness of a given library directory (32bit vs 64bit), as well as the locations of the dynamic linker, the standard C libraries, and C runtime support modules.

Thus we find the second symbiotic relationship:

If nwcc cannot link applications correctly, use gcc instead.
(And mail me a bug report!)

That is to say, if the final linking step fails, but nwcc can otherwise compile your code, you can use the -c option to have nwcc output object files and use gcc to link it to a final application:

nwcc -c foo.c # compile and assemble foo.c
gcc -o foo foo.o # link foo.o to create program foo

But wait, there's more to it.

In some circumstances on some platforms, gcc and nwcc use software library routines to emulate what might in other circumstances or on other platforms be done in hardware.

Consider the simple example of the at-least-64-bit-type "long long", which is available in C99 and GNU C. This type supports all of the arithmetic operations, including multiplication and division, but these operations generally cannot be performed by 32bit processors such as x86.

Thus the solution is to implement a non-trivial software algorithm to multiply or divide two numbers which are too large to be handled by the processor. These software implementations are typically too large to generate inline in the resulting assembly code, and they would also be inconvenient to write that way, so compiler developers put them into a library of support routines.

As a result, the snippet

long long foo = somevalue;
long long bar = othervalue;

foo /= bar;

... may invoke a library routine behind your back:

push 64
push 0
push ecx
push eax
call __nwcc_lldiv

What you need to know when linking nwcc-generated code with gcc, is that nwcc on some systems references libnwcc.o (which is just an object file instead of a shared library.) So to link an nwcc application with gcc, the command line sometimes has to look like this:

gcc foo.o /usr/local/nwcc/lib/libnwcc.o

(Or whereever you installed nwcc.)

What you need to know when linking gcc-generated code with nwcc, is that it usually needs libgcc:

nwcc foo.o -lgcc_s
or
nwcc foo.o `gcc -print-libgcc-file-name`

In either case, each compiler drags in its own library automatically, so when mixing code from both compilers you only have to explicitly tell nwcc about libgcc, and gcc only about libnwcc.

Another way in which gcc can be used in conjunction with nwcc is by using it to compile some code with gcc, and some with nwcc, and to link it all together. This may be needed because:

nwcc generates bad code for one or more files
gcc generates bad code for one or more files (unlikely)
You need optimization for one or more files
You need strong GNU C support for one or more files

Also, some libraries may have been compiled with gcc and are about to be used with nwcc. In any case, when linking gcc code with nwcc code, you have to watch out for ABI incompatibilities. In particular:

Structs and unions passed and returned by value (on non-x86)
Bitfields

If either side uses these features in any interfaces, the code will probably break horribly, and there is nothing you can do about it, short of compiling all with gcc or with nwcc, or changing the interfaces.

Part 2: C Standards and Compatibility

The C language was initially defined by Kernighan and Ritchie's first edition of the book "The C Programming Language". This was generally viewed as a "normative" reference to the language by developers and users of C implementations (C originated on Unix, but quickly spread over to lots of other systems as well.)

This language became known as "K&R C", and is generally irrelevant today.

Some of the finer aspects of the language were not described clearly and unambiguously by K&R C, which lead to diverging semantics between the various C implementations. Also, many implementations added their own extensions, which were useful and sensible, but made code nonportable to other implementations that didn't support these features. In order to remove ambiguities, and to standardize popular extensions, C was first standardized by ANSI.

This language became known as ANSI C89, the C "as we know it today"

ISO later developed a new standard which standardizes another set of popular extensions (such as inline functions and the long long type), and introduces a set of language inventions of its own (such as type-generic math functions and restricted pointers.)

This language became known as ISO C99, and is still not widely implemented and used, though some of the most useful features were adapted by many C89 implementations

The gcc authors have also invented and added many of their own extensions to C. Some of these features overlap with C99 (C99 adapted GNU features), while others are for syntactical convenience, and still others are strongly targeted at implementors of C system software, such as libraries and operating systems.

For example, there are features to help write type-safe and multi- evaluation-safe macros, to control linking, and to embed inline assembly into C code. There are also lots of builtin library functions which can be used in freestanding environments.

This language became known as "GNU C", and is the dominant language in today's open-source landscape

2.1 The Key Players

A given C source, as seen by the C compiler, is usually composed of two separate and independent entities:

The system library headers
The actual source file

Where the source file drags in the system headers by using preprocessor #include directives.

It is important to distinguish between these two because they often have different ideals of portability.

The system headers are usually tailored to a specific operating system and will never be used anywhere else (this is not true of libraries such as glibc, which run on multiple systems.) Thus it may make sense for them to assume a very specific environment, even down to a particular C compiler which will always be used with them.

2.2 GNU C vs Standard C

GNU C has a lot of sensible and desirable features, however it also has a lot of junk. GNU C is huge!

nwcc (as well as other compilers such as tinycc) supports a subset of sensible GNU C features, but other features which are deemed "not sensible" or "very specialized", or even just "very difficult to implement", are missing.

If nwcc supported all of gcc's features, in exactly the same way as gcc, then there would be no problem whatsoever, and nwcc could always be used in place of gcc. But it does not.

The problem is GNU C features - used by applications and library headers - which are not supported by nwcc.

nwcc can be put into "ANSI" mode; In this mode, it accepts GNU extensions (With a warning), but does not claim to be gcc. In GNU mode it additionally claims to be gcc. Read on for some advantages and disadvantages of both ways.

2.3 Applications

Applications usually come in three flavors:

They are unconditionally standard C (89)
This means the application always works with nwcc (except bugs and some corner cases)

They are unconditionally GNU C/C99
This means the application always uses one or more GNU C features. If nwcc supports those features, the program will work. If it doesn't, it won't

They are conditionally GNU C/C99
This means the application only uses GNU C features if it is compiled with gcc (or with a compiler that pretends to be gcc, such as nwcc!)

For nwcc, the first two cases are the easiest to deal with because no standards choice is required - the program either works or it doesn't.

The last case is the most dangerous one, because it means we have to chooose between ANSI and GNU C mode. In particular, many applications which test for gcc by looking at the __GNUC__ macro are quite aware of GNU C. In fact, they are so aware that they use every conceivable GNU C feature! Many GNU apps are written in this way, such as gcc.

In general, for applications, we don't want to pretend to be gcc because if the application checks for __GNUC__, it may use unsupported features, and if it unconditionally uses GNU C features, there is still a chance that those are implemented by nwcc regardless.

Thus:

For applications, we don't want to define __GNUC__!

2.4 Headers

System library headers are the most painful part. Some headers were only written for gcc and gcc-compatible compilers. For example, the FreeBSD headers "mostly" work with non-gcc compilers, but a few essential ones do not.

In general, with a GNU-C-aware set of headers, the GNU C mode implements some macros using inline assembly instead of standard C operations, decorates some function and structure declarations with attributes for alignment and things like that, and, most importantly, opens up a lot of GNU declarations.

If we look at standard C and beyond, we will find lots of incompatible standards:

C89
C99
Various versions of POSIX
Various versions of UNIX
The above with additional, system-specific extensions

These standards mostly differ in what functions and macros they provide, and in some cases, they provide things with the same names but different interfaces.

This stuff is mostly about namespaces. For example, the various POSIX and UNIX standards declare the fileno() function in stdio.h. However, if you have a standard C89 program which does this:

#include <stdio.h>

void fileno() { puts("hello"); }
int main()
{
fileno();
}

Then this program has to work because fileno() in no way reserved in C89, and you are (supposedly) perfectly free to use this name. But with a set of UNIX and POSIX headers, this program will break because your fileno() declaration clashes with the one in stdio.h.

The solution is to introduce "feature selection" macros. A typical system header will recognize all of these industry standards, and use #if/#else chains to keep incompatible features such as fileno() separated.

Finally, most headers also have an "extended" mode; Which is a (hopefully sensible) choice of one of these standards, plus a lot of added stuff. For example, there are lots of glibc-specific functions and macros which are not available in any of the industry standards, and are thus put into the extended default headers.

A given application may come to depend on macros, functions or other types which are only declared in GNU mode, e.g.:

/home/nils/nwcc_ng [0]$ cat y.c
#include <sys/types.h>

int
main() {
        int64_t i;
}
/home/nils/nwcc_ng [0]$ ./nwcc y.c
/home/nils/nwcc_ng [0]$ ./nwcc y.c -ansi
y.c.4: Error: Parse error at `i' (#2)
        int64_t i;
                ^ here
/var/tmp/cpp239.cpp - 1 error(s), 0 warning(s)
No valid files to link.
/home/nils/nwcc_ng [1]$

(A different header could be used to work around this particular problem, but that is beside the point.)

If we add this usually minor problem to the major problems of e.g. FreeBSD's headers, we can conclude:

For some headers, we do want to define __GNUC__!

2.5 Conclusion

Somtimes an application will work with any standard C compiler, and sometimes it will only work if the compiler is, or claims to be, gcc. Historically many important system headers have been found to be useless in non-GNU mode because they used huge amounts of obscure GNU C features, or exhibited bugs because they were simply never tested in a non-GNU configuration. Lately I've found that most applications work best with nwcc on most systems if they are compiled in non-GNU mode. Therefore, this is the default setting on all systems apart from OSX (many OSX versions up until at least a few years ago seems to absolutely require GNU mode) and the BSDs (where the situation is unclear). It is the default on Linux since version 0.8.2.
It is always possible to override the default choice explicitly by using the "-gnu" (enables GNU mode) and "-notgnu" (disables GNU mode) command line options. These can also be put into one of the configuration files (~/.nwcc/nwcc.conf or the system-wide file - see the USAGE file for details):

options = notgnu

Note that all GNU C features are still accepted, so defining __GNUC__ is primarily needed to fool the system headers, or to compile apps which do things like

#if !__GNUC__
# error This program only works with gcc
#endif

There are some other minor points to note here:

- nwcc's ANSI mode isn't quite "conforming", it only means "don't pretend to be gcc", and that namespace rules are honored. However, some mandatory warnings are not emitted yet

- The GNU C version makes a difference in many cases. nwcc currently only pretends to be gcc 3.0. You may wish to set the __GNUC__ and __GNUC_MINOR__ macros to a different version yourself

There's more to say about this, but this should be enough for an introduction. You may contact me if you have any further questions.