People sometimes ask me about some of the workings of nwcc, so here's a small general introduction (thanks to a suggestion by Basile Starynkevitch.)
nwcc is completely hand-written; It does not use any kind of code generators for either lexing, parsing, or code generation. Like traditional compilers, nwcc is separated into the actual compiler application - called nwcc1 (cc1_main.c and most other files) - and a driver application that glues the compilation together with the assembler and linker - called nwcc (mostly composed of driver.c and cc_main.c.) For simplicity I'll just refer to the whole thing as nwcc.
The compiler does not have a builtin preprocessor; It passes the C input file to an external preprocessor, which can be GNU cpp or nwcpp, the latter of which is a completely new preprocessor that evolved out of nwcc's lexical analyzer - see the cpp sub directory. The preprocessed translation unit is then read into a list of tokens, i.e. keywords, numbers, operators and so on are recognized and stored in a convenient format (of course, some operators are still ambiguous at this point, such as the unary vs. binary plus operator.) This is driven by lex.c, uses helper functions from token.c, and revolves around the data structure ``struct token''. The storing of the whole file into a list, rather than reading tokens ``on the fly'' while parsing, means that nwcc's memory use grows lineary with the size of its input, and takes a bit of additional processing time. However, the parser was also easier to code this way.
The next pass performs a syntactical analysis on this token list; It detects and reads all kinds of declarations, function definitions, control structures, and expressions. This also records all scope information, and expressions are already stored into expression trees (all operator ambiguity is resolved at this point.) These things are all glued together by the first part of analyze() in analyze.c. It's important to note that the resulting code representation has not fully been analyzed for semantic correctness yet. This part mosty uses decl.c/decl_adv.c, control.c, expr.c, and some parts of subexpr.c for syntax analysis. In other words, nonsense code like the following;
struct foo { int x; char *p; } foo, bar;
foo = foo + bar; /* bogus! structs cannot be added! */
... is silently accepted because the declaration of ``foo'' and ``bar'', as well as their use in the expression, are syntactically correct. It's with the next step that this invalid use of the two structure types is detected.
Also of importance is that the data structures generated from this step currently aren't designed with optimization in mind; A more tree-like structure that permits for easy analysis of the functions components will be needed. In addition, semantic analysis probably should have been done at this point already, at least partially.
The next step is the transformation of these function, declaration, scope, statement and expression data into ``intermediate code'' (icode), on a per-function basis (it's just a loop at the bottom of analyze() which calls xlate_to_icode() on every function body - icode.c and parts of subexpr.c do high-level parsing and code generation; icodeinstr.c provides various primitive utility functions to do so.) That's the time when most type-checking and semantic correctness checking is done. Thus an expression like
int foo, bar, baz;
...
foo = bar + baz;
... generates an icode list containing at least the instructions LOAD, LOAD, ADD and STORE to load the values into registers, perform the addition, and save the result in foo. The registers to use are also allocated at this point, such that there may be the need to generate code for saving register contents on the stack if there are no unused registers.
As you can see, the icode mostly resembles typical assembly language instructions. However, there are also a few pseudo-instructions that do not usually map directly to machine instructions, such as COPYSTRUCT to copy the contents of one struct to another, COPYINIT to copy an initializer to its destination, etc. Finally, because C compilation on most machines requires some instructions that are unique to the particular architecture, there are also a few architecture-specific intermediate instructions.
Naturally icode generation requires some architecture-specific knowledge. For example, function calls are performed in different ways on different machines. These machine details are hidden behind a few generic backend interfaces (these are stored in the *_gen.c files; There's one *_gen.c file for every architecture, except for AMD64 which also uses much of x86_gen.c.) For example, to perform a function call (after having evaluated all arguments), the frontend invokes the following function:
struct vreg *
icode_make_fcall(struct fcall_data *fdat,
struct vreg **args,
int nargs,
struct icode_list *il);
Every backend has its own implementation of this function, and the invocation happens through a function pointer (there's a backend structure that exports most backend stuff through function pointers.) This function generates the code to pass the arguments to the function, to perform the actual call, and finally to obtain the result. The return value is the result ``virtual register'' (primarily managed by reg.c) - which is an abstraction representing a data item, its associated type, and location - in this case probably a machine register. As you can see, this neatly allows the frontend to ignore all ABI details.
(On a side note: When I started thinking about register allocation and stack management, I found it very enlightening to draw an analogy to virtual memory management - data items are essentially like virtual pages, machine registers are like page frames, stack space for saving registers and intermediate expression results is like a swap file, and so on. It's remarkably similar, and the terminology used by nwcc occasionally hints at the similarities. For instance loading an item into a register is done by vreg_faultin(), though unlike page fault handlers it has to be called explicitly.)
The final intermediate code is passed on to the ``emitters'' to output assembly code. The emitters are meant to abstract assembler sytnax; There's one emitter per assembler. The rest of the backends (the *_gen.c files corresponding to the emitters) provides general architecture-specific functionality, such as the aforementioned function call example, register allocation, converting between data types, allocating storage, and putting all pieces together by invoking the emitter functions to output the final assembly code.
Refer to the CROSSCOMP file in the nwcc source directory to get an overview of how cross-compilation is done.
Well, I hope that answers some questions. I could easily write a whole book about nwcc, but this is just an introduction. If you have any further questons, feel free to ask!