                     M C P P - M A N U A L . T X T
                         == How to Use MCPP ==
                Kiyoshi Matsui      kmatsui@t3.rim.or.jp

V.2.0   1998/08     First released.
                                                                kmatsui
V.2.1   1998/09     Updated according to C99 1998/08 draft.
                                                                kmatsui
V.2.2   1998/11     Updated according to C++98 Standard.
                                                                kmatsui
V.2.3 prerelease 1      2002/08     Updated according to C99 Standard.
                Added porting to Linux / GCC, CygWIN and LCC-Win32.
                GCC-compatible features augmented.
                                                                kmatsui
V.2.3 prerelease 2      2002/12     Added porting to GCC V.3.2.
                Revised some wording.
                                                                kmatsui
V.2.3 release   2003/02     Finally released.
                                                                kmatsui
V.2.3 patch 1   2003/03     Slightly modified.
                                                                kmatsui
V.2.4 prerelease    2003/11     Added porting to Visual C++.
                Added #pragma MCPP preprocess, #pragma MCPP preprocessed
                                                                kmatsui
V.2.4 release   2004/02     Extended multi-byte character handling.
                Added porting to Plan 9/pcc.
                                                                kmatsui
V.2.4.1         2004/03     Revised recursive macro expansion, and added
                    -c option.
                                                                kmatsui
V.2.5           2005/03     Absorbed POST_STANDARD into STANDARD as an
                    execution time option, absorbed OLD_PREPROCESSOR
                    setting as an execution option of PRE_STANDARD.
                Renamed most of #pragma __* directives as #pragma MCPP *.
                Added porting to GCC V.3.3 and 3.4, changed some options
                    accordingly.
                Removed documents on older compiler-systems (DJGPP,
                    compiler-systems on MS-DOS except Borland C 4.0).  
                                                                kmatsui
V.2.6           2006/07     Integrated STANDARD and PRE_STANDARD
                    modes into one executable, differentiating the modes
                    by the invocation options.
                Removed compiler-specific behaviors from stand-alone
                    build.
                Removed settings for pre-C90 compilers, MS-DOS
                    compilers and Plan 9 / cpp.
                Added comment on GCC system header and #include_next.
                Removed #pragma MCPP include_next.
                                                                kmatsui

V.2.6.1         2006/08     Added porting to MinGW.
                Revised some subtle points.
                                                                kmatsui


                                Contents

1.  Overview
    1.1     High Portability
    1.2     Standard C Mode with Highest Conformance and Other Modes

2.  Invocation Options and Environment Settings
    2.1     2 Kinds of Build and 5 Behavioral Modes
    2.2     How to Specify Invocation Options
    2.3     Common Options
    2.4     Options by MCPP Behavioral Modes
    2.5     Common Options Except for Some Compiler Systems
    2.6     Options by Compiler System
    2.7     Environment Variables
    2.8     Multi-Byte Character Encodings
    2.9     How to Use MCPP in One-Path Compilers
    2.10    How to Use MCPP in IDE

3.  Enhancement and Compatibility
    3.1     #pragma MCPP put_defines,
                    #pragma MCPP preprocess, #pragma MCPP preprocessed,
                    #put_defines, #preprocess, #preprocessed
        3.1.1   Pre-preprocessing of Header File
    3.2     #pragma once
        3.2.1   Tool to Write #pragma once to Header Files
    3.3     #pragma MCPP warning,
                    #include_next, #warning
    3.4     #pragma MCPP push_macro, #pragma MCPP pop_macro,
                    #pragma push_macro, #pragma pop_macro
                    #pragma __setlocale, #pragma setlocale
    3.5     #pragma MCPP debug, #pragma MCPP end_debug,
                    #debug, #end_debug
    3.6     #assert, #asm, #endasm
    3.7     New C99 Features (_Pragma() Operator, Variable Argument
                    Macro and others)
    3.8     Asm Statement in Borland C and Other Special Syntaxes
    3.9     Compatibility with GCC
        3.9.1   Preprocessing FreeBSD 2/Kernel
        3.9.2   Preprocessing FreeBSD 2/Libc
        3.9.3   Problems Concerning GCC 2/cpp
        3.9.4   Preprocessing Linux/glibc 2.1
        3.9.5   To Use MCPP with GCC 2
        3.9.6   Preprocessing GCC 3.2
        3.9.7   To Use MCPP with GCC 3 or 4
        3.9.8   The Problems of Linux / stddef.h, limits.h
                        and #include_next
    3.10    Visual C++ System Header Problems
        3.10.1  Comment Generating Macro?

4.  Implementation-defined Behaviors
    4.1     Status Value on Exit
    4.2     Include Directory Search Path
    4.3     How to Construct Header Name
    4.4     Evaluation of #if Expression
    4.5     Character Constant Evaluation in #if Expression
    4.6     #if sizeof (type)
    4.7     How to Handle White-Space Sequence
    4.8     Default Specifications for MCPP Executables

5.  Diagnostic Messages
    5.1     Diagnostic Messages Format
    5.2     Translation Limits
    5.3     Fatal Errors
        5.3.1   MCPP's Own Bugs
        5.3.2   Physical Errors
        5.3.3   Translation Limits and Internal Buffer Errors
        5.3.4   #pragma MCPP preprocessed Related Errors
    5.4     Errors
        5.4.1   Character and Token Related Errors
        5.4.2   Unterminated Source File Related Errors
        5.4.3   Ill-Balanced Preprocessing Group Related Errors
        5.4.4   Simple Syntax Errors on Directive Lines
        5.4.5   Syntax Errors in #if Expressions
        5.4.6   #if Expression Evaluation Errors
        5.4.7   #define Related Errors
        5.4.8   #undef Related Errors
        5.4.9   Macro Expansion Errors
        5.4.10  #error and #assert
        5.4.11  Failure of #include
        5.4.12  Other Errors
    5.5     Warnings (Class 1)
        5.5.1   Character, Token and Comment Related Warnings
        5.5.2   Unterminated Source File Related Warnings
        5.5.3   Directive Line Related Warnings
        5.5.4   #if Expression Related Warnings
        5.5.5   Macro Expansion Related Warnings
        5.5.6   Line Number Related Warnings
        5.5.7   #pragma MCPP warning, #warning
    5.6     Warnings (Class 2)
    5.7     Warnings (Class 4)
    5.8     Warnings (Class 8)
    5.9     Warnings (Class 16)
    5.10    Diagnostic Messages Index

6.  Reporting on Bugs and Others
    6.1     MCPP's Bug?
    6.2     malloc() Related Bugs
    6.3     How to Report Bugs
    6.4     Give Us Your Feedback


                              1. Overview

MCPP is a C preprocessor developed by kmatsui (Kiyoshi Matsui) based on
DECUS cpp written by Martin Minow, and then rewritten entirely.  MCPP
means Matsui cpp.  This software is supplied as source codes, and to use
MCPP in any compiler systems, a small amount of modifications to adapt
to the compiler system are required before it can be compiled into an
executable.

This document describes the specification for a MCPP executable that has
been already ported to a certain compiler system.  For those who want to
know more about MCPP or want to port it to other compiler systems, refer
to MCPP source and its document "mcpp-porting.txt".

All these sources and related documents are provided as free software.

Before going into detail, some of the MCPP features are introduced here.
(The sections 1.1 and 1.2 are identical with those of mcpp-porting.txt.)


1.1     High portability

MCPP is a portable preprocessor, supporting various operating systems,
including Linux, FreeBSD and Windows.  Its source has a wide portability,
and can be compiled by any compilers which support Standard C or C++
(ANSI/ISO C or C++).  The library functions used are only the classic
ones.

To port MCPP to each compiler system, in many cases, one only needs to
change some macro definitions in the header files and simply compile it.
In the worst case, adding several dozen of lines into the source file,
system.c, would be enough.

To process multi-byte characters (Kanji), it supports Japanese EUC-JP,
shift-JIS and ISO2022-JP, Chinese GB-2312, Taiwanese Big-5 and Korean
KSC-5601 (KSX 1001), as well as UTF-8.  For shift-JIS, ISO2022-JP or Big-
5, MCPP can complement the compiler-proper if it does not recognize them.


1.2     Standard C mode with highest conformance and other modes

MCPP has various behavioral modes.  Other than Standard-conforming mode,
there are K&R 1st mode, "Reiser" cpp mode and what I call post-Standard
mode.  MCPP has also an execution option for C++ preprocessor.

Different from many existing preprocessors, Standard mode of MCPP has
the highest conformance to Standards: all of C90, C99 and C++98.  It has
been developed aiming to become the reference model of the Standard C
preprocessor.  Those versions of the Standard can be specified by an
execution option. *

In addition, it provides several useful enhancements: #pragma MCPP debug,
which traces the process of macro expansion or #if expression evaluation,
and the header file "pre-preprocessing" facility.

MCPP also provides several useful execution options, such as warning
level or include directory specification options.

Even if there are any mistakes in the source, MCPP deals suitably with
accurate plain diagnostic messages without running out of control or
displaying misguiding error messages.  It also displays warnings for
portability problems.  The detailed documents are also attached.

In spite of the high quality, MCPP's code size and memory usage is
relatively small.

A disadvantage of MCPP, if any, is slower processing speed.  It takes
twice time of GCC V.3/cc1, but seeing that its processing speed is
almost the same as that of Borland C 5.5/cpp32 and that it runs a little
bit faster when the header file pre-preprocessing facility is used, it
cannot be described as particularly slow.  MCPP puts an emphasis on
standard conformance, source portability and operability in a small
memory space, making this level of processing speed inevitable.

Validation Suite for Standard C Preprocessing, which is used to test the
extent to which a preprocessor conforms to Standard C, its documentation
cpp-test.txt, which contains results of applying Validation Suite to
various preprocessors, are also released with MCPP.  When looking
through this file, you will notice that so-called Standard-conforming
preprocessors have so many conformance-related problems.

During the course of developing MCPP V.2.3, it was selected as one of
the "Exploratory Software Projects for 2002" by Information-technology
Promotion Agency (IPA), Japan, along with its Validation Suite.  From
July 2002 to February 2003, the project, financed by IPA, proceeded
under advice of Yutaka Niibe project manager.  I asked "HighWell, Inc."
Limited Company, Tokyo, for translation of all the documents.  For
technical details, I revised and corrected the translated documents.

MCPP was continuously adopted to one of the "Exploratory Software
Projects" in 2003 by Hiroshi Ichiji project manager.  The update of MCPP
proceeded into the next version, V.2.4. [1]

After the project, I am still going on updating MCPP and Validation
Suite.

Note:
[1] ISO/IEC 9899:1990 (JIS X 3010-1993) had been used as C Standard, but
in 1999, ISO/IEC 9899:1999 was adopted as a new Standard.  This document
calls the former C90 and latter C99.  The former is generally called
ANSI C or C89 because it migrated from ANSI X3.159-1989.  ISO/IEC 9899:
1990 + Amendment 1995 is sometimes called C95.  C++ Standards are ISO/
IEC 14882:1998 and its corrigendum version ISO/IEC 14882:2003.  This
document calls both of them C++98.

[2] The outline of the "Exploratory Software Project" can be seen at the
following site (Japanese only).

    http://www.ipa.go.jp/jinzai/esp/

MCPP from V.2.3 through V.2.5 had been located at:

    http://www.m17n.org/mcpp/

In April 2006, MCPP project moved to:

    http://mcpp.sourceforge.net/

MCPP V.2.2 and Validation Suite V.1.2 are located in the following
Vector's web site.  They are in the directory called dos/prog/c, but
they are not for MS-DOS exclusively.  Sources are for UNIX, WIN32, MS-
DOS.

    http://download.vector.co.jp/pack/dos/prog/c/cpp22src.lzh
    http://download.vector.co.jp/pack/dos/prog/c/cpp22bin.lzh
    http://download.vector.co.jp/pack/dos/prog/c/cpp12tst.lzh

    http://download.vector.co.jp/
and
    ftp://ftp.vector.co.jp/
seem to be the same.

The text files in these archive files available at Vector use [CR]+[LF]
as a <newline> and encode Kanji in shift-JIS for DOS/Windows.  On the
other hand, those from V.2.3 through V.2.5 available at SourceForge use
[LF] as a <newline> and encode Kanji in EUC-JP for UNIX.  From V.2.6 on
two types of archive, .tar.gz file with [LF]/EUC-JP and .zip file with
[CR]+[LF]/shift-JIS, are provided.


            2.  Invocation Options and Environment Settings


2.1     2 Kinds of Build and 5 Behavioral Modes

There are two types of build (or configuration) for MCPP executable.

The first is stand-alone-build: the preprocessor which behaves on its
own not depending on compiler system.  The invocation options of stand-
alone-build are the same across the compilers with which MCPP is
compiled.  Although It can preprocess source files, it cannot behave as
an integrated part of the compiler system.

Another is compiler-specific-build: the preprocessor to replace the
resident preprocessor of certain compiler system if possible.  Each
compiler-specific-build has some different specifications for
compatibility with the compiler system.  It has the options common with
the stand-alone-build, except a few options different from the commons
to avoid conflicts with the compiler system.

MCPP executable has following 5 behavioral modes regardless of the
building types.

STD
    Standards (C90, C99, C++98) conforming mode.  This is the default.
COMPAT
    A variation of STD mode, which expands recursive macro more than the
    Standards specification.
POSTSTD
    Special "post-Standard" mode created by the author, based on the
    Standards and simplified removing all the Standards irregular
    specifications.
KR
    K&R 1st specification mode.
OLDPREP
    "Reiser" model cpp mode (old-preprocessor mode).

The mode of MCPP is specified by the run-time options as follows:

-@std
    The STD mode (default).
-@compat
    The COMPAT mode.
-@poststd, -@post
    The POSTSTD mode.
-@kr
    The KR mode.
-@oldprep,
    The OLDPREP mode.

In this document, I group OLDPREP and KR into pre-Standard modes, and
group STD, COMPAT and POSTSTD into Standard modes.  Since COMPAT mode is
almost the same with STD mode, STD includes COMPAT unless otherwise
mentioned.

There are differences in the macro expansion methods between Standard
and pre-Standard modes.  Roughly speaking, this difference is the
difference between C90 and pre-C90.  The biggest difference is the
expansion of the function-like macros (macros with arguments).  For the
arguments with macros, while in Standard mode, MCPP substitutes the
parameter within the replacement list of the original macro after
completely expanding the arguments, in pre-Standard, MCPP substitutes
the parameter without expanding, then expands the argument at rescan
time.

Also, in Standard mode, a macro is not expanded recursively in principle,
even if the macro definition is recursive directly or indirectly.  If
there is a recursive macro definition in pre-Standard mode, it becomes
an error at expansion time.

Handling of \ at line end is also different by mode.  In Standard mode,
after processing the trigraph, the sequence of <backslash> <newline>
gets deleted before tokenization, but in pre-Standard mode, these only
get deleted when they are within the string literals or in a #define
line.

There is a subtle difference in tokenization (token parsing,
decomposition to tokens).  In Standard mode, it tokenizes on
"token based processing" principle.  To put it concretely, in Standard
mode, spaces will be inserted surrounding the expanded macro to prevent
the unexpected merging with its adjacent tokens.  In pre-Standard mode,
traditional, convenient and tacit tokenization and the macro expansion
methods of "character based text replacement" are left a trace.  About
these, please see cpp-test.txt Sec 1.

In Standard mode, it handles the numeric token, called preprocessing
number, according to the Standard specification.  In pre-Standard, the
numeric tokens are the same as integer constant tokens or floating point
tokens.  The suffix 'U', 'u', 'LL' and 'll' of the integer constant and
the suffixes 'F', 'f', 'L' and 'I' of floating point are not recognized
as a part of the tokens in pre-Standard.

The string literals and character constants of wide characters are
recognized as single tokens only in Standard mode.

Digraph, #error, #pragma, and _Pragma() operator are available only in
Standard mode.  Also, -S <n> option (strict-ansi specs) and -+ option
(the one run as C++ preprocessor) are used only in Standard mode.  Pre-
defined macros __STDC__, __STDC_VERSION__ are defined in Standard mode,
and they don't get defined in pre-Standard.

#if defined, #elif cannot be used in pre-Standard mode.  Macros cannot
be used within argument of #include or #line in pre-Standard.  Pre-
defined macros, __FILE__, __LINE__, __DATE__, __TIME__ are not defined
at pre-Standard.

On the other hand, #assert, #asm (#endasm), #put_defines and #debug are
available in pre-Standard mode only.

#if expression is evaluated in long / unsigned long or long long /
unsigned long long at Standard mode, and in (signed) long only at pre-
Standard.  sizeof (type) in #if expression can be used only in pre-
Standard.

Trigraphs and UCN (universal character name) are available only in STD
mode.

The output of diagnostic messages is also slightly different between the
modes.  Please see chapter 5 for details.

Any other items, which do not have any distinct rules between K&R 1st
and the Standards, follow the C90 rules in pre-Standard mode.

The difference of OLDPREP mode from KR mode and the difference of
POSTSTD and COMPAT modes from STD mode are as follow:

OLDPREP
    1. Convert comment to 0 space instead of 1 space.  Usually this
    conversion is done in the output at the end.  In macro definition,
    however, the conversion is done immediately after the definition.
    2. When there are string literals or character constants in the
    replacement list of the macro definition, and if any of the
    parameter names match to any part of these, that part will be
    substituted with the argument corresponding to the parameter when
    calling the macro.  That is to say, when the content of the string
    literal or character constant is searched as token sequence,
    stripping the enclosing quotes, if a parameter name is found, that
    will be substituted.
    3. You can write anything you like in the lines of #else, #endif.
    (One usually writes MACRO of corresponding #if MACRO or #ifdef MACRO.)
    4. It stops "unterminated string literal" and "unterminated
    character constant" errors.  If there is no closure of the literal "
    or ', it assumes the close at line end.
    5. It treats '# 123' line as '#line 123'.

COMPAT
    Expand recursive macro more than the Standard's specification.  On
    expanding recursive macro, set the range of non-re-replacing of the
    same name narrower than the Standard.
    Refer to cpp-test.txt section 2.4.26 about the specifications of
    recursive macro expansion.  See test-t/recurs.t for a sample of
    recursive macro. [1]

POSTSTD
    This mode differs from STD mode in the following points:

    1. Does not recognize trigraphs.  Digraphs are converted at
    translation phase 1, that is, the beginning of preprocessing.  Does
    not deal with digraph as a token.
    2. Simplified tokenization according to complete token-base rule.
    When there is no white space, as a token separator between
    preprocessing tokens in the source code, insert a space
    automatically.  (However, this does not get inserted between macro
    name and the following "(" within macro definition).  Therefore,
    even for stringizing by # operator, it gets stringized after a space
    is inserted between all the preprocessing tokens.  Also, at the re-
    definition of macros, it does not matter whether there is a token
    separator or not.
    3. At the re-definition of function-like macros, the difference of
    the parameter name is not relevant.
    4. Character constants cannot be used in #if expressions (it will
    cause an error).
    5. It removed irregular "function-unlike" rules for function-like
    macro expansion.  Hence, rescanning only targets to the replacement
    list of the macro, and not the sequence after that.
    6. Normally, the header name with the format of #include <stdio.h>
    is accepted, but it gets a warning. (by class 2 warning option.)  If
    the header name with the format of <stdio.h> is used in a macro, it
    can get an error at the particular instance.  It recommends to use
    the format of #include "stdio.h".
    7. The rule, a space is required between macro name and replacement
    list in macro definition, is added in C99, but this rule is not
    complied with. (A space is inserted automatically at tokenization.)
    8. UCN (universal-character-name) is not recognized.  Multi-byte
    characters in identifier are not recognized.
    9. In C++, eleven identifier-like operators are not dealt as
    operators.

For the above reasons, there are some different specifications in MCPP
executables.  So, please read this manual carefully.  This chapter
describes first the common options, next the behavioral-mode-dependent
options, then the the options common to most compiler systems, finally
the compiler-dependent options for each compiler-specific-build.

Note:
[1] This option is for compatibility with GCC, Visual C++ and other
major implementations.  'compat' means "compatible mode".


2.2    How to Specify Invocation Options

The <arg> and [arg] shown below indicate required and optional arguments
respectively.  Note that the <,  >,  [, or ] character itself must not
be entered.

MCPP invocation takes a form of:

    mcpp [-<opts> [-<opts>]] [in_file] [out_file] [-<opts> [-<opts>]]

Note that you must replace the above "mcpp" with other name, depending
on how MCPP is installed.

When out_file (an output path) is omitted, stdout is used unless the -o
option is specified.  When in_file (an input path) is omitted, stdin is
used.  A diagnostic message is output to stderr unless the -Q option is
specified.

If any of these files cannot be opened, preprocessing is terminated,
issuing an error message.

MCPP uses getopt() to get an option.

For an option with argument, white-space characters may or may not be
inserted between the option character and an argument.  In other words,
both of "-I<arg>" and "-I <arg>" are acceptable.  For options without
argument, both of "-Qi" and "-Q -i" are valid.

For an option with an argument, missing a required argument causes an
error except for the -M option,

If -D, -U, -I, or -W option is specified multiple times, each of them is
valid.  For -S, -V, or -+ option, only the first one is valid.  For -2,
or -3 option, its specification switches each time an option is
specified.  For other options, the last one is valid.

The option letters are case sensitive.

The switch character is '-', not '/', even under Windows.

When invalid options are specified, a usage statement is displayed.  To
check valid options, enter a command, such as "mcpp -?".  In addition to
the usage message, there are several error messages, but they are self-
explanatory.  I will omit their explanations.


2.3    General Options

This section covers common options across MCPP modes or compiler systems.

-C
    Output also comments in source code.  I hear this option is required
    when the UNIX lint utility is used.  This option is useful for
    debugging even when the lint utility is not used.  Note that a
    comment is moved ahead of a logical source line when output.  This
    is because a comment is processed before macro expansion or
    directive processing, and a comment may appear during a macro
    invocation.

-D <macro>[=[<value>]]
-D <macro(a,b)>[=[<value>]]
    Define a macro named "macro".  This option can be used to change the
    definitions of predefined macros other than __STDC__,
    __STDC_VERSION__,  __FILE__,  __LINE__,  __DATE__,  __TIME__, and
    __cplusplus. (__STDC_HOSTED__, C99's predefined macro, is
    exceptionally redefined by this option, because some compiler
    systems, like GCC V.3, use the -D option to define __STDC_HOSTED__.)
    To specify a value, use "=<value>".  If "=<value>" is omitted, 1 is
    assumed. (Note that in bcc32, the macro is defined as zero-token by
    default.)  Do not enter white-space characters immediately before
    "=".  If a white-space character is entered immediately after "=",
    the macro is defined as zero token.
    A macro with arguments can be defined by this option.  This option
    can be specified repeatedly.

-e <encoding>
    Change a multi-byte character encoding to <encoding>. For <encoding>,
    refer to 2.8.

-I <directory>
    Specify the first directory in the include directory search path
    order with <directory>.  For a search path, refer to 4.2.  If a
    directory name contains spaces, it must be enclosed with " and ".

-I 1, -I 2, -I 3
    Specify a directory from which MCPP begins searching when it
    encounters a #include "header" directive (i.e. not <header> format).
    -I1, -I2 and -I3 indicate the current directory, the source file (i.
    e. includer) directory, and the both respectively.  For details, see
    4.2.

-j
    On outputting a diagnostic message, MCPP displays only one line of
    diagnostic without additional information, such as source lines.
    (By default, one line of diagnostic message is followed by a source
    code line having a problem.  If the source code line in question is
    found in a #included file, all the #including file names and
    including line numbers are also displayed in sequence.  For a
    diagnostic on macro, MCPP displays also its definition information).
    When Validation Suite is used in the GCC testsuite, this option must
    be specified to output a diagnostic message in the same format as
    GCC.

-M* options are to output source file dependency lines for makefile.
When there are several source files and the -M* option is specified for
each of these source files to process and merge the outputs into a file,
dependency description lines are aligned.  These options are similar to
those of GCC, but there are several differences. [1]

-M
    Output lines that describe dependency among source files.  The
    output destination is the file specified in a command line, or
    stdout if omitted.  If a dependency description is too long to fit
    in a line, it is folded over the next lines.  The preprocessing
    result is not output.
-MM
    Almost the same with -M, except that the following header files are
    not output.
    1. Files specified in the format of #include <stdio.h>
    2. Files specified using an absolute path name, such as #include
    "/include/stdio.h".
    3. Files specified in the format of #include "stdio.h" that are
    found not in the current or source directory, depending on compiler
    systems or the -I <n> option, but in system include directories,
    including those specified with the -I <directory> option or with
    environment variables.
-MD [FILE]
    Almost the same with -M, except that the preprocessing result is
    output to the specified file on a command line or stdout.  If FILE
    is specified, MCPP outputs dependency description lines to that file.
    Otherwise, they are output to a file having the same base filename
    with the source file and the suffix of ".d" instead of ".c".
-MMD [FILE]
    Almost the same with -MD, except that, like -MM, the files that are
    regarded as system header are not output.  An output file MCPP
    outputs dependency description lines to is same as -MD [FILE].
-MF FILE
    The dependency lines are output to FILE.  -MF FILE takes precedence
    over -MD FILE or -MMD FILE.
-MP
    "Phony targets" are also output.  Each included file can be written
    as a phony target without a dependency as follows:
        test.o: test.c test.h
        test.h:
-MT TARGET
    The target name is specified as TARGET not foo.o.
    -MT '$(objpfx)foo.o' outputs the following line.
        $(objpfx)foo.o: foo.c
-MQ TARGET
    Same as -MT, except that a string that has a special meaning to
    'make' is quoted as follows:
        $$(objpfx)foo.o: foo.c

-N
    Disable all the predefined macros, including those that begin with
    "_", except for the ones required by Standards and __MCPP.  The
    Standard predefined macros include __FILE__, __LINE__,  __DATE__,
    __TIME__,  __STDC__, and __STDC_VERSION__, as well as
    __STDC_HOSTED__ for C99 and __cplusplus for C++.  If you want to
    disable __MCPP, use the -U option.

-o <file>
    Output the preprocessed source to the file.  If this option is
    omitted, the second argument ([out_file]) is regarded as an output
    path, so this option is not necessary, however, some compiler
    drivers use this option.

-P
    Do not output line number information for the compiler-proper.  This
    option is specified when you want to use MCPP for purpose other than
    C preprocessing.

-Q
    Output diagnostic messages to the "mcpp.err" file in the current
    directory.  As these messages are appended to this file, it may
    become bigger.  Delete it from time to time.

-U <macro>
    Disable predefined macro named "macro".  This option cannot disable
    __FILE__, __LINE__,  __DATE__,  __TIME__,  __STDC__,
    __STDC_VERSION__ (and __STDC_HOSTED__ for C99), as well as
    __cplusplus invoked with -+ options.

-v
    Output the MCPP version and a search order of include directories to
    stderr.

-W <level>
    Specify a warning level with <level>.  <level> should be 0 or "OR"
    of any one or more values of 1, 2, 4, 8 or 16.  1, 2, 4, 8, or 16
    indicates a warning class.  For example, if -W 5 is specified,
    warnings of classes 1 and 4 are output.  If 0 is specified, no
    warnings are output.  If this option is specified several times, all
    the specified values are "ORed" together.  For example, -W 1 -W 2 -W
    4 is equivalent to -W 7.  Instead of -W 7 you can also write as -W
    "1|2|4". (Enclose with " and " so as | is not interpreted as a pipe.)
    If this option is omitted, -W 1 is assumed.  For warning messages,
    refer to 5.5 to 5.9.

-z
    The preprocessing result of the #included files is not output, but
    macros are defined.  This option is used in debug of preprocessing.

Note:
[1] MCPP differs from GCC in that:
1. MCPP does not provide the -MG option because its option
specifications are too complicated. (Therefore, I will omit their
explanations.)  The -M option can substitute for the -MG option because
when include files cannot be found using the -M option, MCPP fails but
outputs dependency description lines.
2. MCPP excludes a wider range of header files when using the -MM and
-MMD options.  The GCC 2/cpp does not exclude the header files shown in
2 and 3 of the -MM option.  The GCC 3/cpp0 now excludes the header files
shown in 3 that are found in the system header directory.


2.4     Options by MCPP Mode

MCPP has several behavioral modes.  For their specifications refer to
sec 2.1.

This manual shows a list of various MCPP behaviors by mode, which may
not readable.  Please be patient.  All the uppercased names below
(including Chapters 3-5) that do not begin with "__", such as MODE, STDC,
TFLAG_INIT, etc, are macros defined in system.H.  These macros are used
only for compiling MCPP itself and a MCPP executable generated does not
contain these macros.  You must understand this point clearly.

The following options are available in Standard mode:

-+
    Behave as C++ preprocessor.  MCPP predefines the __cplusplus macro
    (its value is defined in system.H and defaults to 1), interprets the
    text from // to the end of a logical line as a comment and
    recognizes "::", ".*" and "->*" as a single token.   It evaluates
    "true" and "false" tokens in a #if expression to 1 and 0,
    respectively.  If __STDC__ and __STDC_VERSION__ are defined, they
    are undefined.  For GCC-specific-MCPP, __STDC__ is not undefined for
    compatibility with GCC/cc1.  The predefined macros that do not begin
    with "_" are also undefined.  However, extended characters are not
    converted to UCN. [1], [2].

-2
    Reverse initial settings for the digraphs processing.  With
    DIGRAPHS_INIT == FALSE, MCPP recognizes digraphs.  Otherwise, it
    doesn't.

-h <n>
    Define the value of __STDC_HOSTED__ macro with <n>.

-S <n>
    Change the value of __STDC__ to <n> in C.  In C++, this option is
    ignored.  The range of <n> must be 0-9.  With <n> set to 1 or higher,
    the predefined macros that do not begin with "_", such as unix,
    linux, are disabled.  S indicates __STDC__.  If this option is
    omitted, __STDC__ is set to a default value (i.e. 1).  For a GCC-
    specific-build, -pedantic, -pedantic-errors, or -lang-c89 is
    equivalent to -S1, so the next -S is ignored.

-V <value>
    Change the values of the predefined macros __STDC_VERSION__ for C
    and __cplusplus for C++ to <value>.  <value> is of a long type. (In
    C95, C99, and C++ Standard, this value is set to 199409L, 199901L
    and 199711L, respectively.)  With __STDC__  set to 0,
    __STDC_VERSION__  is always set to 0L, overriding the -V option.

    If this option is omitted for C, __STDC_VERSION__ is set to the
    value of STDC_VERSION in system.H. (For GCC V.2.7 - V.2.9, 199409L.
    For others, 0L.)  If specifying -V199901L results in
    __STDC_VERSION__ >= 199901L, MCPP conforms to the following C99
    specifications (See 3.7.):

    1. Treats the text from // to the end of a line as a comment. [3]
    2. Allows the sequence of p+, P+, p-, and P-, as well as e+, E+, e-,
    and E-, in the preprocessing-number.  This is to represent a bit
    pattern of a floating-point number in Hex, like 0x1.FFFFFEp+128.
    3. Enables the _Pragma operator (A _Pragma( "foo bar") has the same
    effect as specifying a #pragma foo bar.)
    4. MCPP compiled with the EXPAND_PRAGMA macro set to TRUE will macro-
    expand an argument on a #pragma line that does not begin with STDC
    or MCPP. (By default, EXPAND_PRAGMA is set to FALSE in other than
    Visual C-specific-build, so macro expansion does not occur.)
    5. Allows an escape sequence of Universal-Character-Name (UCN) in
    identifiers, character constants, string literals and pp-numbers.

    Note that although C99 provides for variable argument macros, MCPP
    allows them in the C90 and C++ modes. [4]

    In C++ also, when specifying -V199901L results in __cplusplus >=
    199901L, MCPP will enter the C99 compatibility mode, providing the
    above 2-4 enhancements.  (1 is enabled unconditionally and 5 is
    almost the same.)  These are MCPP's own enhancements that do not
    conform to the C++ Standard.

    The -D option cannot be used with __STDC__, __STDC_VERSION__, and
    __cplusplus.  This is to distinguish system-defined macros from user-
    defined ones.

The following option is available for STD mode:

-3
    Reverse initial settings for the trigraphs processing.  With
    TFLAG_INIT == FALSE, MCPP recognizes trigraphs.  Otherwise, it does
    not.

Note:
[1] C++'s __STDC__ is not desirable and causes many problems.  GCC
document says that __STDC__ needs to be predefined in C++ because many
header files expect __STDC__ to be defined.  The header files should be
blamed for this.  For common parts among C90, C99 and C++,
"#if __STDC__ || __cplusplus" should be used.

[2] Different from C99, the C++ Standard makes much of UCN.  So did C
1997/11 draft.  Half-hearted implementation is not permitted.  However,
implementing Unicode in earnest is too much burden for preprocessor.

[3] In C90 MCPP treats // as a comment but issues a warning.

[4] This is for compatibility with GCC.


2.5    General Options Except for Some Compiler Systems

Since GCC has so many options that GCC-specific-build of MCPP has some
different options from the other builds in order to avoid conflicts with
GCC.  Note that the options in stand-alone-build are all the same even
if compiled by GCC.  The options common to the builds other than GCC-
specific are as follows.

-a
    Accept the following notations used in some assembler sources
    without causing an error.

    1.
      #APP
    Even if a line that begins with # does not agree with any of C
    directives, MCPP outputs this line without causing an error.
    2.
      "A very very
      long long
      string literal"
    The above old-fashioned string literals are concatenated into
    "A very very\nlong long\nstring literal".
    3.  Even if token concatenation using a ## operator generates an
    invalid pp-token, it is not regarded as error.

    These sometimes happen to GNU source code, however, this option for
    GCC is -x assembler-with-cpp or -lang-asm..
    This option cannot be used in POSTSTD mode.

-I-
    Cancel default include directories and enable only ones specified
    with an environment variable and the -I option.  Instead of -I-, GCC-
    specific-build uses -nostdinc.  In GCC, the -I- option provides
    quite different functionality.  See 2.6.


2.6     Options by Compiler System

To use MCPP replacing the compiler system's resident preprocessor,
install it in the directory where the resident preprocessor should be
located under an appropriate name.  Before copying MCPP, be sure to
change the name of compiler-system-specific one so that it may not be
overwritten.

For settings on Linux, FreeBSD, or CygWIN see 3.9.5.  For settings in
GCC 3.*, 4.*, see also 3.9.7. and 3.9.7.1.  For MinGW, see 3.9.7.1.

Possibly the compiler driver cannot pass some options to MCPP in a
normal manner.  However, GCC provides the -Wp almighty option to allow
you to pass any options to the preprocessor.  For example, if you
specify as follows:

    gcc -Wp,-W31,-Q23

The -W31 and -Q23 options are passed to preprocessor.  The options you
want to pass to preprocessor must be specified following -Wp with each
option delimited by ", ". [1], [2]

For other compiler systems, if their compiler driver source is available,
it is recommended that this type of an almighty option should be added
to the source.  If you modify the compiler driver source code in the way
that, for example, when -P<opt> is specified, only -<opt> is passed to
preprocessor, it would be very convenient because any options can be
passed.

An alternative way to use all the options of MCPP is to write a makefile
in which first preprocess with MCPP, then compile the output file of
MCPP as a source file.  For this method, refer to sections 2.9 and 2.10.

The following options are available for some compiler-specific-builds.
The stand-alone-build has not these options, of course.

The following options are available for the LCC-Win32-specific-build.

-g <n>
    Define the __LCCDEBUGLEVEL macro as <n>.
-O
    Defines the __LCCOPTIMLEVEL macro as 1.

The following options are available for the Visual C-specific-build.

arch:SSE, arch:SSE2
    Define the macro _M_IX86_FP as 1, 2 respectively.
-Fl <file>
    Same as -include <file> for GCC.
-G<n>
    If <n> is one of 3, 4, 5, 6, B, define the macro _M_IX86 as 300, 400,
    500, 600, 600, respectively.
-GR
    Define the macro _CPPRTTI to 1.
-GX
    Define the macro _CPPUNWIND to 1.
-GZ
    Define the macro __MSVC_RUNTIME_CHECKS to 1.
-J
    Define the macro _CHAR_UNSIGNED to 1.
-RTC*
    If -RTC1, -RTCc, -RTCs, -RTCu and such option is specified, define
    the macro __MSVC_RUNTIME_CHECKS to 1.
-Tc, -TC
    Specify that the source is written in C.  The result is same with or
    without this option.
-Tp, TP
    Same as -+.
-u
    Same as -N.
-Wall
    Same as -W17 (-W1 -W16).
-WL
    Same as -j.
-w
    Same as -W0.
-X
    Same as -I-.
-Zc:wchar_t
    Define the macros _NATIVE_WCHAR_T_DEFINED and  _WCHAR_T_DEFINED to 1.
-Zl
    Define the macro _VC_NODEFAULTLIB to 1.

The following options (until at the end of this 2.6 section) are
available for the GCC-specific-build.  Note that since __STDC__  is set
to 1 for GCC, the result is same with or without the -S1 option.

The followings are available across the modes.

-b
    Output line number information just like C sources.
    The format used to pass the line number information from a
    preprocessor to compiler-proper is usually as follows:

    #line 123 "filename"

    Most compiler systems can use this C source format, but some systems
    cannot.  The default specification of MCPP is such that, in compiler-
    specific-build for the compiler systems that cannot use the C source
    format, MCPP outputs the line number information in a format that
    the compiler-proper can accept it.
    However, with this option specified, even in compiler-specific-build
    for the compiler systems that do not accept the C source format
    outputs the line number information in that format.  This option is
    used with #pragma MCPP preprocess to pre-preprocess header files.
-dD, -dM
    Output valid macro definitions in the form of #define lines at the
    end of preprocessing.
    With the -dD option specified, the preprocessing result is output
    too.  Predefined macros are not output.
    With the -dM option specified, the preprocessing result is not
    output, and predefined macros are output except the Standard
    predefined ones. [3], [4]
-finput-charset=<encoding>
    Same as -e <encoding>.  Note that GCC convert the <encoding> to UTF-
    8 by this option, whereas MCPP does not convert any encoding.
-fworking-directory
    Emit a special #line as the second line of preprocessor's output to
    convey the current working directory.
-I-
    Switch the specification of the -I <directory> before and after this
    option; directories specified with the -I options before -I- are
    used to search for header files only in the form of #include
    "header.h"; the directories specified with -I after -I-, if any, are
    used to search for all #include directives.  In addition, during the
    former search, includer's directories are not used.
-include <file>
    #include the <file> before processing the main source file.  This is
    equivalent to writing #include <file> at the beginning of the main
    source file.
-isystem <dir>
    Add <dir> to the include path immediately before system-specific
    directories and immediately after site-specific directories.
-lang-c, -x c
    Perform C preprocessing.  The same as not specifying this option at
    all.
-nostdinc
    Same as -I- for other compiler systems.
-undef
    Same as -N.
-Wcomment, -Wcomments
    Same as -W1.  The result is same with or without this option.
-Wtrigraphs
    Same as -W16.
-Wall
    Same as -W17. (With -Wall, MCPP does not issue class 2 and 4
    warnings because these warnings are issued frequently and annoying
    for GCC standard header files.  Class 8 warnings are generally
    surplus and bothering, but are helpful to confirm portability and
    etc.  To use this option, be sure to specify gcc -Wp,-W31.)
-w
    Same as -W0.

The following options are available for Standard mode.

-ansi
    Define macro __STRICT_ANSI__ as 1.
-digraphs
    Recognize digraphs.  Digraphs specification is also reversed by -2.
-lang-c89, -std=c89, -std=gnu89
    Same as -S1.  Not only C90 but also C95 specifications are used.
    The result is same with or without this option.
-lang-c99, -lang-c9x, -std=c99, -std=c9x, -std=gnu99, -std=gnu9x
    Same as -V199901L.
-lang-c++, -x c++
    Perform C++ preprocessing.  Same as -+.
-pedantic, -pedantic-errors
    Same as -W7 (i.e. -W1 -W2 -W4).
-std=iso<n>:<ym>
    Specify a version of C Standard.  To specify C, <n> is 9899 and C++,
    14882.  If <n> is 9899, <ym> is any of 1990, 199409,1999 and 199901.
    If <n> is 14882, <ym> is 199711.  If you enter other value than
    these in <ym>, __STDC_VERSION__  or __cplusplus is set to that value.
    In this case, <ym> must be specified in six digits, like 200503.

For STD mode, following options are available.

-lang-asm, -x assembler-with-cpp
    Same as -a for other compiler systems.  This option cannot be used
    in POSTSTD mode.
-trigraphs
    Recognize trigraphs.  Trigraphs specification is also reversed by -3.

The following option is available for pre-Standard mode of GCC-specific-
build.

-traditional, -traditional-cpp
    Same as -@old.

The next option is available on CygWIN GCC-specific-build.

-mno-cygwin
    Alter the include directory from /usr/include to /usr/include/mingw,
    and alter the predefined macros from the ones for cygwin1.dll to the
    ones for msvcrt.dll.

MCPP neither makes the following options an error nor does anything
about them (It sometime issues a warning.)

-A <predicate(answer)>
    MCPP ignores this option.  In GCC, this option is equivalent to
    writing #assert <predicate (answer)> in the source code.  Standard C,
    does not permit extension directives other than #pragma.
    Fortunately, so far, gcc, by default, passes an equivalent macro
    with the -D option, so there are no actual problems unless a source
    program uses #assert, which is a rare case.

-$
-g <n>
-idirafter <dir>
-iprefix <dir>, -iwithprefix <dir>, -iwithprefixbefore <dir>
-noprecomp
-remap

In GCC V.3.3 or later, preprocessor has been absorbed into compiler, and
independent preprocessor does not exist.  Moreover, gcc often passes to
preprocessor the options not for preprocessor, even if it is invoked
with -no-integrated-cpp option.  GCC-specific-build of MCPP for V.3.3 or
later ignores the following options, if it cannot recognize them, as
that kind of pseudo-options.

-c
-E
-f*
-m*
-quiet
-W*

Note:
[1] -Wa and -Wl are almighty options for assembler and linker,
respectively.  The documentation on UNIX/System V/cc describes these
options.  Probably, GCC provides the -W<x> option for compatibility.

[2] In GCC V.3, cpp was absorbed into cc1 (cc1plus). Therefore, the
options specified with -Wp are normally passed to cc1 (cc1plus).  To
have cpp (cpp0), not ccl, preprocess, the -no-integrated-cpp option must
be specified on gcc invocation.

[3] GCC V.3.3 or later predefines several dozen of macros.  -dD option
does not regard these macros as predefined and output them.

[4] The output of -dM option is similar to that of '#pragma MCPP
put_defines' ('#put_defines') with the following differences:
1. 'put_defines' outputs also Standard predefined macros as comments.
2. 'put_defines' outputs also the file name and the line number of the
macro definition as a comment, arranging to readable format.  On the
other hand, -d* options output in the same simple format to GCC, because
some makefiles expect the format.


2.7    Environment Variables

In stand-alone-build of MCPP, the include directories are not set up
other than /usr/include and /usr/local/include in UNIX systems.  Other
directories, if required, must be specified using environment variables
or runtime options.  The environment variable in stand-alone-build is
INCLUDE for C and CPLUS_INCLUDE for C++.  Searching the file starts from
the includer's source directory by default. (refer to 4.2 for the search
rule.)  Besides, in Linux there is a confusion of include directories,
hence special setup is necessary to cope with this problem.  Refer to 3.
9.8 for the problem.

For the default include directories on GCC-specific-build, refer to
noconfig/*.dif files, and for search rule and environment variable name,
refer to 4.2.

For the environment variable LC_ALL, LC_CTYPE, LANG, refer to 2.8.


2.8       Multi-Byte Character Encodings

MCPP can process various multi-byte character encodings as follows.

    EUC-JP:     Japanese extended UNIX code (UJIS)
    shift-JIS:  Japanese MS-Kanji
    GB-2312 :   EUC-like Chinese encoding (Simplified Chinese)
    Big-Five:   Taiwanese encoding (Traditional Chinese)
    KSC-5601:   EUC-like Korean encoding (KSX 1001)
    ISO-2022-JP1:   International standard Japanese
    UTF-8:      A kind of Unicode encoding

The encoding used during execution can be specified as follows (Priority
is given in this order):

  1.  The encoding specified in '#pragma __setlocale( "<encoding>")' in
    source code. (For Visual C-specific-build, '#pragma setlocale
    ( "<encoding>")'.)  This directive allows you to specify several
    encodings in one source file.
  2.  The encoding specified with -e <encoding> or -finput-charset=
    <encoding> as run-time option.
  3.  The encoding specified with the LC_ALL, LC_CTYPE and LANG
    environment variables.  Priority is given in this order.
  4.  The default encoding specified when MCPP is compiled.

How to specify a <encoding> is basically same across #pragma __setlocale,
-e option, and the environment variables; the encoding on the left-side
hand is specified by the <encoding> on right-hand side; <encoding> is
not case sensitive; '-' and '_' are ignored.  Moreover, if it has '.',
the character sequence to the '.' is ignored.  Therefore, EUC_JP, EUC-JP,
EUCJP, euc-jp, eucjp and ja_JP.eucJP are regarded as same.  '*'
represents any character sequence of zero or more bytes.(iso8859-1,
iso8859-2 are equivalent to iso8859*.).

    EUC-JP:     eucjp, euc, ujis
    shift-JIS:  sjis, shiftjis, mskanji
    GB-2312:    gb2312, cngb, euccn
    BIG-FIVE:   bigfive, big5, cnbig5, euctw
    KSC-5601:   ksc5601, ksx1001, wansung, euckr
    IS0-2022-JP1:   iso2022jp, iso2022jp1, jis
    UTF-8:      utf8, utf
    Not specified:  c, en*, latin*, iso8859*

If any of the following encodings is specified, MCPP is no longer able
to recognize multi-byte characters: C, en* (english), latin* and iso8859
*.  When a non-ASCII ISO-8859 Latin-<n> single-byte character set is
used, one of these encodings must be specified.  When an empty name is
used (#pragma __setlocale( "")), the encoding is restored to the default.

Only in the Visual C-specific-build, the following encoding name can be
specified with '#pragma setlocale'.  This is for compatibility with
Visual C++.  It is recommended you should use these names because the
Visual C++ compiler cannot recognize encoding names other than these.
('-' can be omitted for MCPP, but not for the Visual C++ compiler-proper.)

    shift-JIS:  japanese, jpn
    GB-2312:    chinese-simplified, chs
    BIG-FIVE:   chinese-traditional, cht
    KSC-5601:   korean, kor
    Not specified:  C, english

In Visual C++, the default multi-byte character encoding varies,
depending on what language the language parameter and "Region and
Language Option" of Windows are set to.  However, the #pragma setlocale
specification takes precedent over these Windows's settings.

Only in the GCC-specific-build , the following encoding names can be
specified with the environment variable LANG.  This is for compatibility
with GCC.  It is recommended that you should use these names because the
GCC compiler cannot recognize encoding names other than these.  ('-' can
be omitted and lowercase letters can be used for MCPP, but for the GCC
compiler-proper, these names must be specified exactly as shown below.)

    EUC-JP:         C-EUC
    shift-JIS:      C-SJIS
    ISO-2022-JP1:   C-JIS
    Not specified:  C

Depending on the configuration used to compile the GCC compiler, the GCC
compiler sometimes recognizes the environment variable LANG's C-*
specification and sometimes not. [1]  When the compiler fails to
recognize it, MCPP complements it.

Note:
[1] If the --enable-c-mbchar option is specified when a configure script
is used to compile GCC itself, an encoding can be specified using an
environmental variable, such as LANG, gcc's info says.  This way of
compilation seems to be available from July 1998 onward, but its
implementation does not work properly yet at least on V.3.2.  It is
documented that, besides LANG, environmental variables, such as LC_ALL
and LC_CTYPE, can be used to specify an encoding.  However, the
difference between using LC_ALL and LC_CTYPE or not lies only in their
diagnostic messages.


2.9       How to Use MCPP in One-Path Compilers

Compilers whose preprocessor is integrated into themselves are called
one-path compilers.  These includes Visual C, Borland C, and LCC-Win32.
Such compilers are becoming more popular because they can achieve a
little higher processing speed.  However, the time for preprocessing
becomes shorter due to better hardware performance.  In the first place,
there is much point for preprocessing to be a common phase, mostly
independent of run-time environment and compiler systems. It is not
desirable that one-path compilers become more popular.  There will be
more compiler-system-specific specifications.

Anyhow, it is impossible to replace the preprocessor of a one-path
compiler with MCPP.  To use MCPP, a source program is preprocessed with
MCPP and then the output is passed to a one-path compiler.  As you see,
preprocessing takes place twice.  It is useless but inevitable.  Using
of MCPP still has merits of source checking and can avail functions not
available in resident preprocessor.

To use MCPP with a one-path compiler, the procedure must be written in
makefile.  For sample procedures, refer to the makefile re-compilation
settings used to compile MCPP itself, such as visualc.mak, borlandc.mak,
and lcc_w32.mak.

Although GCC 3 or 4 compiler now integrates its preprocessing facility
into itself, gcc provides an option to use an external preprocessor.
Use this option when MCPP is used. (See 3.9.7.)


2.10       How to Use MCPP in IDE

It is difficult to use MCPP in Integrated Development Environment (IDE)
because IDE's GUI follows compiler-system-specific specifications and
internal interfaces are not usually made available to third parties.
Furthermore, one-path compilers make it more difficult to insert a phase
to use MCPP.

This subsection describes how to make MCPP available in Visual C++ 2003,
2005 IDE.  I have only version 4 of Borland C++ IDE, which is too old to
do so.  I think I can do the same for LCC-Win32's IDE because LCC-Win32
is shareware, although it may take time.  I have not tried it.  Use the
compiler-specific-build for Borland C and LCC-Win32 on command lines.

2.10.1      How to MCPP Available in Visual C++ IDE

MCPP cannot be used in a normal "project" since the internal
specifications of Visual C++'s IDE are not made available to third
parties and the compiler is a one-path compiler.  However, once a
makefile that uses MCPP is created, Visual C++'s IDE can recognize the
makefile and you can create a "makefile project" using that file.  This
allows you to utilize most of the IDE functions, including source
editing, search, and source level debugging.

"Creating a Makefile Project" of a Visual C++ 2003 document and Visual C
++ 2005 Express Edition Help describes how to make a makefile project.
Perform the following procedure to create a makefile project.

  1. Log in as a user with debugging privilege. [1]
  2. Create a makefile that specifies MCPP. (Refer to noconfig/visualc.
    mak.)
  3. Start Visual Studio. [2]
  4. Click "New Project" to display the "New Project" window.  Select
    "Makefile Project" and specify "Name" and "Location", and then click
    "OK".
  5. Then the "Makefile Application Wizard" windows appears.  Click
    "Application settings", and enter appropriate values in the "Build
    command line", "Output", "Clean commands", and "Rebuild command
    line" fields.

    Let me explain the appropriate values for these fields by taking an
    example of making the stand-alone-build of MCPP itself. (Assuming
    the name of MCPP executable as mcpp.exe.)

        "Build command line":   nmake
        "Output":               mcpp.exe
        "Clean command":        nmake clean
        "Rebuild command line": nmake PREPROCESSED=1

    To make the Visual C-specific-build of MCPP, add an option COMPILER=
    MSC as:

        "Build command line":   nmake COMPILER=MSC
        "Output":               mcpp.exe
        "Clean command":        nmake clean
        "Rebuild command line": nmake COMPILER=MSC PREPROCESSED=1

    Since a Makefile project does not provide a 'make install'
    equivalent command, you must write the makefile in such a way that
    the commands you specify in "Build command line" and "Rebuild
    command line" also perform installation.

    If you do not compile MCPP, "Build command line" and "Rebuild
    command line" can be the same.

    When completed, click "Finish".

  6.  Then the Makefile project appears in "Solution Explorer".  Click
    the "Source Files" folder, choose "Add Existing Solution Item" from
    the "Project" menu, select all the source files, and then click "OK".
    Then the source file names appear in Solution Explorer.

You can now use every functions, including Edit, Build, Rebuild and
Debugging.

Note:
[1]  To use the debugging function under Windows XP pro or Windows 2000,
a user must belong to a group called "Debugger users".  However, Windows
XP HE does not provide such a group, so one must log in as administrator.
In addition, in order to perform the source level debugging function,
makefile must be written in such a way that cl.exe is called with the
-Zi option appended to generate debugging information.

[2]  If you start Visual Studio by selecting "Start" -> "Programs",
environment variables, such as for include directories, are not set.  In
order to have these variables set, you must open the 'Visual Studio
command prompt' to start Visual Studio by typing on VC 2003:

    devenv <Project File> /useenv

On VC 2005 express edition:

    vcexpress <Project File> /useenv


                   3.  Enhancements and Compatibility

MCPP has its own enhancements.  Each compiler-system-specific
preprocessor has its own enhancements, some of which are not available
in MCPP.  This section covers these enhancements and their compatibility
problems.

Principally, MCPP outputs #pragma lines as they are.  This principle is
applied to the #pragma lines processed by MCPP itself.  This is because
the compiler-proper may interpret the same #pragma for itself.

However, MCPP does not outputs the lines beginning with '#pragma MCPP',
since these are for MCPP only.  Also, MCPP does not outputs lines of
'#pragma GCC' followed by either 'poison', 'dependency' or
'system_header'.  Moreover, MCPP outputs neither of '#pragma once',
'#pragma push_macro', nor '#pragma pop_macro' because they are useless
on the later phases.

MCPP compiled with EXPAND_PRAGMA == TRUE expands macros in #pragma line
(in actual, EXPAND_PRAGMA is set TRUE for only Visual C-specific one).
However, #pragma lines followed by STDC, MCPP or GCC are never expanded.

#pragma sub-directives are implementation-defined, hence there are risks
of same name sub-directive having different meanings to different
compiler-systems.  Some device is necessary to avoid name collision.
Moreover, when EXPAND_PRAGMA == TRUE, there should be a device to avoid
the name of #pragma sub-directive itself being macro expanded.  This is
why MCPP-specific sub-directives begin with '#pragma MCPP' and are not
subject to macro expansion.  This device is adopted from '#pragma STDC'
of C99 and '#pragma GCC' of GCC 3.

'#pragma once' is, however, implemented as it is, since this pragma has
been implemented in many preprocessors and has now no risk of name
collision.  '#pragma __setlocale' is prefixed with "__" instead of MCPP,
because it has also meaning for compiler-proper, and because the prefix
avoids user-name-space.


3.1    #pragma MCPP put_defines,
                #pragma MCPP preprocess, #pragma MCPP preprocessed,
                #put_defines, #preprocess, #preprocessed

MCPP in Standard mode uses '#pragma MCPP put_defines', '#pragma MCPP
preprocess' and '#pragma MCPP preprocessed'.  Pre-Standard mode uses #
put_defines, #preprocess and #preprocessed.  Let me explain by taking an
example of #pragma.

When MCPP encounters '#pragma MCPP put_defines' directive, it outputs
all the macros defined at that time in the form of #define lines.  Of
course, the #undef-ed macros are not output.  The macros that cannot be
#defined or #undef-ed, such __STDC__ and etc, are output in the form of
#define lines, but are enclosed with comment marks.  (Since __FILE__ and
__LINE__ are special macros defined dynamically on a macro invocation,
the replacement list output here means nothing.)

In pre-Standard mode and POSTSTD mode MCPP do not memorize parameter
names of function-like macro definitions.  So, these directives
mechanically represent names of the first, second, third parameters as a,
b, c, ... and so on.  If it reaches the 27th parameter, it begins with
a1, b1, c1, ..., a2, b2, c2, ... and so on.

If you enter the following directive after invoking MCPP from keyboard
without specifying input and output files, all the predefined macros are
listed.

    #pragma MCPP put_defines

it also outputs a comment to indicate the source file name where each
macro definition is found, as well as its line number.  If you invoke
MCPP with options such as -S1 or -N, you will see a different set of
predefined macros.

When MCPP encounters '#pragma MCPP preprocess' directive, it outputs the
following line:

    #pragma MCPP preprocessed

This indicates that the source file has been preprocessed already.

When MCPP encounters a '#pragma MCPP preprocessed' directive, it
determines that the source file has been preprocessed by MCPP and
continues to output the code it reads as it is, until it encounters a #
define line.  When MCPP does encounter a #define directive, MCPP
determines that the rest of the source file are all #define lines and
defines macros.  At this time, MCPP would memorize the source filename
and line number in the comment. [1], [2]

A '#pragma MCPP preprocessed' is applied only to the lines that follow
the directive in the source file where the '#pragma MCPP preprocessed'
directive is found.  If the source file is an #included one, when
control is returned to the #including file, '#pragma MCPP preprocessed'
is no longer applied.

Note:
[1]  Actual processing is a little more complex.  When MCPP encounters a
'#pragma MCPP preprocessed', MCPP outputs lines it has read just as they
are, except for #line lines, which compiler-specific-build of MCPP
converts and outputs into a format that the compiler-proper can accept.
MCPP disregards predefined standard macro because its #define line is
enclosed with comment marks.

[2]  Therefore, information on where a macro definition is found is not
lost during pre-preprocessing.

3.1.1   Pre-Preprocessing Header File

With above directives, you can "pre-preprocess" header files.  Pre-
preprocessing considerably saves the entire preprocessing time.  I think
the explanation so far has already given you an understanding of how to
pre-preprocess header files, but to deepen your understanding, let me
explain it by taking an example of MCPP's own source code.

MCPP source consists of eight *.c files, of which seven files include
"system.H" and "internal.H".  No other headers are included.  The source
looks like this:

#if PREPROCESSED
#include  "mcpp.H"
#else
#include  "system.H"
#include  "internal.H"
#endif

The system.H includes noconfig.H or configed.H, as well as several
standard header files.  mcpp.H is not a source file I provide and is a
"pre-preprocessed" header file I am going to generate.

To generate mcpp.H (Of course, after setting up noconfig.H and other
headers), invoke MCPP as follows:

    mcpp > mcpp.H

For compiler systems, such as GCC, also specify the -b option.

Enter the following directives from the keyboard:

#pragma MCPP preprocess
#include "system.H"
#include "internal.H"
#pragma MCPP put_defines

Enter "end-of-file" to terminate MCPP.

This has accomplished mcpp.H, which consists of the preprocessed system.
H and internal.H and a set of #define lines following them.  Including
mcpp.H gives the same effect as including system.H and internal.H, but
its size is one-nth of the original header files containing standard
ones. This is because #if and comments are eliminated.  It takes far
less time to include mcpp.H in seven *.c files than to include system.H
and internal.H seven times.  By using #pragma MCPP preprocess, much more
time can be saved.

On compilation, use the -DPREPROCESSED=1 option.

It is recommended that the above procedure should be written in a file
and the makefile should refer to it. The makefile and preproc.c appended
to MCPP sources contain the procedure.  Please refer to it.

Although the usage of independent preprocessor is limited for one-path
compilers like Visual C, Borland C or LCC-Win32, the pre-preprocessing
facility is useful.

The pre-preprocessing facility of header files is similar to that of the
-dD option of GCC, but it differs from it in that:

1. GCC outputs line number information not in the form of #line 123
"filename", but in the form of # 123 "filename", which allows GCC to
reprocess the information, but the Standard C preprocessor cannot.

2. GCC/cpp of older version outputs a #define line whenever it
encounters it, but does not output a #undef line.  Therefore,
reprocessing the preprocessed result may produce a different result from
what the original source intends.

3. By using #pragma MCPP preprocess, which is not provided by GCC, MCPP
can provides a higher processing speed.

As far as the pre-preprocessing facility is concerned, MCPP is more
accurate and practical than GCC.


3.2    #pragma once

#pragma once directive is available in Standard mode.

#pragma once is also available for GCC, Visual C, LCC-Win32 and stand-
alone preprocessor called Wave.

This directive is used when you want to include a header file only once.
With the following directive in a header file, MCPP includes the header
file only once even if a #include line for that file appears many times.

#pragma once

Usually, compiler-system-specific standard header files prevent
duplicate definitions by using the following code:

#ifndef __STDIO_H
#define __STDIO_H
/* Contents of stdio.h */
#endif

#pragma once provides similar functionality to this.  Using macros
always involves reading a header file. (The preprocessor cannot skip
reading the code as people do and must read the entire header file for #
if's or #endif's; It must read a comment before it can determine whether
a line is a control line, that is, a line with # at the beginning
followed by a preprocessing directive; To do so, the preprocessor must
identify a string literal; After all, it must read through the entire
header file and perform most of tokenization.)  #pragma once eliminates
the need of even accessing to a header file, resulting in a improved
processing speed for multiple includes.

To determine whether two header files are identical, file name
characters, including directory names in a search path, are compared.
Windows is not case sensitive.  Therefore, "/DIR1/header.h" and
"/DIR2/header.h" are regarded as distinct, but "header.h" and "HEADER.H"
are regarded as the same on Windows, but distinct on UNIX-like systems.
A directory is memorized after converting to absolute path, and a
symbolic linked file in UNIX systems is memorized after dereferencing.
So, the identical files are determined always correctly. [1], [2]

I borrowed the idea of #pragma once from GCC V.1.*/cpp.  GCC V.2.*, and
V.3.* still has this functionality but it is regarded as obsolete.  The
specification of GCC V.2.*/cpp has been changed as follows: If the
entire header file is enclosed with #ifndef _MACRO, #define _MACRO, and
#endif, the cpp memorizes it and inclusion occurs only once, even
without #pragma once.

However, this GCC V.2 and V.3 specification sometimes does not work for
commercially available compiler systems that are not based on the GCC
specification, due to a difference in the standard header file notation.
In addition, the GCC V.2 and V.3 specification is more complex to
implement.  For this reason, I decided to implement only #pragma once.

As with other preprocessors, it is not advisable to rely only on #pragma
once when the same header files are used.  It is recommended that #
pragma once should be combined with macros as follows:

#ifndef __STDIO_H
#define __STDIO_H
#pragma once
/* Contents of stdio.h */
#endif

Note that #pragma once must not be written in <assert.h>.  For the
reason, see 4.1.2 of cpp-test.txt.  The same thing can be said with
<cassert> and <cassert.h> of C++.

Another problem is that the recent GCC/GLIBC system has header files,
like <stddef.h>, which are repeatedly #included by other system headers.
They define  macros, such as __need_NULL, __need_size_t, and
__need_ptrdiff_t, and then #include <stddef.h>.  Each time they do so,
definitions such as NULL, size_t, and ptrdiff_t are defined in the
<stddef.h>.  The same thing can be said with <errono.h> and <signal.h>,
and even with <stdio.h>.  Other system headers define macros, such as
__need___FILE, and then #include <stdio.h>.  Each time they do so,
definitions such as FILE may be defined in <stdio.h>.  #pragma once can
not be used in such header files. [3]

Note:
[1] On CygWIN, / and /usr are the same directory in real, and supposing
/ is C:/dir/cygwin on Windows, /cygdrive/c/dir/cygwin is the same as /.
MCPP treats these directories as the same, converting the path-list to
the format of /cygdrive/c/dir/cygwin/dir-list/file.

[2] On MinGW, /bin and /usr/bin are the same directory in real, also
/lib and /usr/lib are the same.  Supposing / is C:/dir/msys/1.0, /c/dir/
msys/1.0 is the same as /, and supposing /mingw is C:/dir/mingw, /c/dir/
mingw is the same with /mingw.  MCPP treats each of these as the same
directories, converting the path-list to the format of c:/dir/msys/1.0/
dir-list/file or c:/dir/mingw/dir-list/file.

[3] This is applied at least to Linux/GCC 2.9x, 3.* and 4.*/glibc 2.1, 2.
2 and 2.3.  FreeBSD 4, 5 has much simpler system headers because it does
not use glibc.

3.2.1   Tool to Write #pragma once to Header Files

With a small number of header files, writing #pragma once to them does
not require much effort, but it would be tremendous work if there are
many header files.  I wrote a simple tool to write it automatically to
header files.

tool/ins_once.c is a tool written for old versions of GCC.  As Borland C
4.0, 5.5 conform to the same standard header file notation with GCC,
this tool can be used.  However, it is advisable that this tool should
not be used in the systems like Glibc 2 that has many exceptions shown
above.

Even in the compiler systems that can use the tool, some header files do
not strictly conform to the GCC notation.  GCC's read-once functionality
also does not work properly for these header files.

Compile ins_once.c and perform the following command in a directory,
such as /usr/include or /usr/local/include, under UNIX.

    chmod -R u+w *

and then execute ins_once as follows:

    ins_once -t *.h */*.h */*/*.h

Ins_once reports header files that do not begin with #ifndef or #if !
defined.  Manually modify these files.  Then, execute ins_once as
follows:

    ins_once *.h */*.h */*/*.h

If the first directive in each header file is #ifndef or #if !defined, a
#pragma once line is inserted immediate below the line.  Only a root
user or a user with an appropriate permission is eligible for this
modification.  When you modified access permission, use 'chmod -R u-w *'
to restore to original permission.

Ins_once provides the following options.  Select the most appropriate
one for your system.

    -t:  Check whether a header file begins with #ifndef or #if !defined,
        excluding a comment.  This option does not modify the file.
    -p:  Insert a #pragma once line at the beginning of file.  By
        default, this line is inserted immediate below the #ifndef or #
        if !defined line.
    -g:  For GCC system, <stddef.h>, <stdio.h>, <signal.h>, <errno.h>
        are also excluded.  By default, only <assert.h>, <cassert> and
        <cassert.h> are excluded.

ins_once roughly checks to write a #pragma once line only once in the
same header file even if it is executed several times, but the check is
not very strict.  As this ins_once is of temporary and tentative nature,
it scarcely performs tokenization.  It worked as I expected with FreeBSD
2.0 and 2.2.7, Borland C 4.0, 5.5, but it may not work properly for
special header files.  So before executing this tool, be sure to make a
backup of an original file.

Have the shell expand a wild-card. (In case of buffer overflow, execute
ins_once several times by specifying some of your system header files.)


3.3    #pragma MCPP warning,
                #include_next, #warning

These directives are provided for compatibility with GCC.  GCC provides
the #include_next and #warning directives.  Although these directives
are non-conforming, not only some source programs sometimes use them but
also some Glibc2 system header files do.  Taking this situation into
consideration, I implemented the #include_next and #warning directives
in GCC-specific-build to allow compilation of such source programs,
however, MCPP issues a warning when it finds the directives.  Regardless
of the compiler systems MCPP is ported to, MCPP in Standard mode also
implements #pragma MCPP warning.

With the following directive, MCPP skips the current file's directory
and start searching header.h from the next directory of search path.

    #include_next  <header.h>

CygWIN and MinGW ignores the distinctions of alphabetical case of header
names.

The following code outputs 'any message' to stderr as a warning message:

    #pragma MCPP warning    any message
    #warning  any message

Different from #error, this is not counted as an error.


3.4     #pragma MCPP push_macro, #pragma MCPP pop_macro,
                #pragma push_macro, #pragma pop_macro
                #pragma __setlocale, #pragma setlocale

When I ported MCPP to Visual C, I implemented these directives in MCPP,
and then made them available for other systems.

#pragma MCPP push_macro( "MACRO") and #pragma MCPP pop_macro( "MACRO")
are used to "push" or "pop" a macro definition (MACRO) to the current
macro definition stack.

#pragma push_macro( "MACRO") and #pragma pop_macro( "MACRO") are also
available for Visual C.

push_macro saves a macro definition to the stack, and pop_macro
retrieves the macro definition.   The pushed macro definition remains
valid after push_macro.  To invalidate it, use #undef or redefine the
macro with a new definition.  push_macro can be used many times.

#pragma __setlocale( "<encoding>") changes the current multi-byte
character encoding to <encoding>.  The argument of setlocale must be a
string literal.  For <encoding>, refer to 2.8.  This directive allows
you to use several encodings in one translation unit.

In Visual C++, #pragma __setlocale cannot be used.  Use #pragma
setlocale instead.  Encoding specification must be conveyed not only to
MCPP but also to the compiler-proper.  The latter can recognize only #
pragma setlocale.  For other compiler systems, when the compiler-proper
cannot recognize an encoding, MCPP complements it.

There is not yet any compiler-proper which can recognize #pragma
__setlocale.


3.5    #pragma MCPP debug, #pragma MCPP end_debug,
                #debug, #end_debug

#pragma MCPP debug and #pragma MCPP end_debug are for Standard mode.  #
debug and #end_debug are for pre-Standard mode.

The #pragma MCPP debug <args> directive can be written anywhere in a
source program.  <args> specifies a debug information type.  One #pragma
MCPP debug directive can take several <arg>.  One or more <arg> must be
specified for each directive.  MCPP begins to output debug information
when it finds this directive, and stops it when it encounters #pragma
MCPP end_debug <args>.  The <args> can be omitted, in which case all
types of debug information is reset.  If <args> contains an argument
that is not supported by MCPP, MCPP issues a warning, but all the
preceding arguments are regarded as valid.

All the debug information is output to the same path with the
preprocessing output to synchronize with it.  Therefore, this directive
prevents compilation.

When you noticed something was wrong with the preprocessing result,
enclose the coding you want to debug with the following directives, for
example:

#pragma MCPP debug token expand
/* Coding you want to debug  */
#pragma MCPP end_debug

As this directive was originally used for debugging MCPP itself, it was
not developed with end users in mind.  So, you may not understand its
behavior unless you read its source code, and you may sometimes feel it
outputs too much information, but it is useful for tracing the
preprocessing process.  Be patient.

The following debug information types can be specified with <arg>.

    path        Displays the include file search path.
    token       Parses tokens one by one and displays its type.
    expand      Traces a macro expansion process.
    if          Displays the result (true or false) of #if, #elif,
                    #ifdef and #ifndef.
    expression  Traces #if expression evaluation.
    getc        Traces preprocess 1-byte by 1-byte.
    memory      Displays the status of heap memory used by MCPP.

3.5.1   #pragma MCPP debug path, #debug path

With these directives, MCPP displays include directories in the search
path (excluding the current and source directories with which search
begins) in the order of priority, starting with the highest one first.

In addition, with a #include directive, MCPP displays all the
directories, including the current one, it actually searched for the #
include file.  When a header file with #pragma once specified is #
included again, the message to that effect is displayed.

3.5.2   #pragma MCPP debug token, #debug token

With these directives, MCPP displays a source line it has read, and then
displays a token and its type on the source line each time it has read.
This token, more specifically, is a preprocessing-token (pp-token).  Not
only pp-tokens on a source line but also ones MCPP reads again
internally during macro expansion are displayed repeatedly.

However, the following 1-byte tokens are not displayed for MCPP
program's convenience sake:

1. '#' at the beginning of a preprocessing directive line.
2. '(' at the beginning of a parameter list of a function-like macro
definition.
3. ',' delimiting between function-like macro definition parameters.
4. '(' at the beginning of an argument list used for a function-like
macro invocation.

A pp-token has the following types:

    NAM: Identifier                     STR: String literal
    NUM: Preprocessing-number           WSTR: Wide string literal
    OPE: Operator or punctuator         CHR: Character constant
    SPE: Special pp-tokens, such as $ and @
    SEP: Token separator white space    WCHR: wide character constant

Of SEP, other than <newline> are not normally displayed.  Control codes
such as <newline> are displayed as <^J> or <^M>.

3.5.3   #pragma MCPP debug expand, #debug expand

With these directives, MCPP traces the expansion process of a macro
invocation.  When MCPP in Standard mode encounters a #pragma MCPP debug,
it behaves as follows:

If there is a macro invocation, MCPP displays the macro definition.
Each argument is read, the argument is substituted for the corresponding
parameter in the replacement list and the replacement list is rescanned.
MCPP displays this whole process.  In case of nested macro definitions,
they are rescanned and expanded one by one.  If an argument has a macro,
MCPP traces the above process recursively before parameter substitution.

Each time control is passed to and returned from a certain set of MCPP
internal functions, MCPP displays the trace information along with the
function name.  The following table shows the role of these functions.
Reading MCPP source code will gives you a concrete idea on what each
function is doing.

    expand          Entrance routine for macro expansion
    replace         Expands a macro one level down.
    collect_args    Collects arguments.
    prescan         Scans a replacement list and processes # and ##
                        operator.
    substitute      Substitutes a parameter with an argument.
    rescan          Rescans a replacement list.

Except for expand, above functions are indirectly recursive with each
other.

For replace and collect_args, MCPP displays data it internally stacks
during macro expansion.  This data is displayed using the following
internal codes:

    <n>         Nth parameter
    <TSEP>      Token delimiter inserted by MCPP
    <MAGIC>     Code that inhibits re-replacement of the macro of the
                same name
    <RT_END>    Code that indicates the end of a replacement list
    <SRC>       Code that indicates an identifier taken from source
                    file while rescanning

<SRC> is used only in MCPP when it is in STD mode.

It is recommended that '#pragma MCPP debug token' should be also used.

For #debug expand, MCPP uses internal routines considerably different
from those used for Standard mode.  The explanations are omitted.

3.5.4   #pragma MCPP debug if, #debug if

With these directives, MCPP displays #if, #elif, #ifdef and #ifndef
lines and reports their evaluation result (true or false).  As a #if
section is skipped, no report is made.

3.5.5   #pragma MCPP debug expression, #debug expression

With these directives, MCPP traces evaluation of a #if or #elif
expression.  DECUS cpp, based on which MCPP has been developed, provides
these directives for the purpose of debugging cpp itself.  I scarcely
modified them.  This directive outputs a very long list of internal
functions, as well as variable names and their values.  Unless you read
the MCPP source code, you may not understand these variables.  However,
without the source code, you can manage to understand how the MCPP
pushes onto and takes out of a evaluation stack a complex expression
value.

3.5.6   #pragma MCPP debug getc, #debug getc

With these directives, MCPP outputs detailed data each time it calls get
(), a function to read one byte.  When MCPP in Standard mode scans a pp-
token, this routine is called to read only the first byte of the pp-
token.

With a #debug getc, MCPP calls this routine during token scan, resulting
in a tremendous amount of data output.

In any way, using these directives outputs a huge amount of data, so you
scarcely need to use them.

3.5.7   #pragma MCPP debug memory, #debug memory

With these directives, MCPP reports the status of the heap memory it has
internally allocated or released using malloc(), realloc() or free()
only once.  Only the kmmalloc I developed and some other types of malloc
() provide this functionality.  Refer to "4.extra" of mcpp-porting.txt.
In case of other malloc(), MCPP will neither cause an error nor report a
status.

MCPP reports the heap memory status again when it terminates with these
directives on.  The same thing happens when MCPP terminates due to out
of memory.


3.6     #assert, #asm, #endasm

#assert is available in pre-Standard mode, except the GCC-specific-build.
#assert provides the functionality equivalent to the #error directive in
the Standard C.  The following code in the Standard C:

#if ULONG_MAX/2 < LONG_MAX
#error Bad unsigned long handling.
#endif

can be expressed as:

#assert LONG_MAX <= ULONG_MAX/2

The argument of #assert is evaluated as a #if expression.  If it
evaluates to true (non-zero), MCPP does nothing and if false (0), it
displays the following message and then the argument line (after
processing line splicing and comments):

    Preprocessing assertion failed

MCPP counts this as error but continues processing.

This #assert is quite different from that of System V or GCC.

MCPP in pre-Standard mode regards a block enclosed with the #asm and #
endasm directives as assembler coding.  MCPP implements this
functionality for Microware C/6809 only.  To implement this
functionality in other compiler systems, do_old() and put_asm() in
system.c must be modified.

For a #asm block, MCPP performs trigraphs conversion and deletes
<backslash><newline> sequence, but it neither performs comment
processing, checks tokens or characters, nor deletes white-space
characters at the beginning of a line.  Also, it does not expand a token
that happens to have the same name with a macro and outputs it as it is.
Other directive lines have no meaning within the #asm block.

These #asm and #endasm directives do not conform to Standard C.  In the
first place, extension directives in the form other than "#pragma sub-
directive" are not Standard C conforming.  Changing their directive
names to #pragma asm and #pragma endasm does not solve this problem.  In
Standard C, the source code must consist of a C token sequence (more
precisely, a preprocessing token sequence), however, an assembler
program is not a C token sequence.  To use assembly code in the Standard
C, there is no other way but to embed it in a string literal token.
Then, you have to implement a built-in function that processes that
string literal in the compiler-proper and call it as follows:

    asm (
        " leax _iob+13,y\n"
        " pshs x\n"
    );

However, this is not suitable for a longer assembly code, in which case,
you had better write the assembly code as a separate file like a library
function, and assemble and link the program.  This seems to be
inconvenient, but it is necessary to separate the assembler portion
completely to write a portable C program.  It is recommended that you
should write assembly code in a separate file rather than using #asm.


3.7    New C99 Features (_Pragma() Operators, Variable Argument
                Macros and others)

These features are available in Standard mode.  The -V199901L option
with __STDC_VERSION__ set to 199901L enables the following C99's
features.  The same thing can be said with C++ for the -V199901L option
with __cplusplus set to 199901L or more.  Although C++ Standard does not
provides for the features other than 1 or 7, MCPP in Standard mode
provides them for better compatibility with C99.  Standard mode also
allows variable argument macros even in the C90 and C++ modes. [1]

1. Treats the text from // to the end of a line as a comment.

2. Enables variable argument macros.

3. Allows the sequence of p+, P+, p-, and P-, as well as e+, E+, e-, and
E-, in the preprocessing-number.  This is to represent a bit pattern of
a floating-point number in Hex, like 0x1.FFFFFEp+128.

4. Enables the _Pragma() operator.

5. MCPP compiled with the EXPAND_PRAGMA macro set to TRUE macro-expands
the argument of #pragma line that do not begin with STDC, MCPP nor GCC.
(By default, MCPP is compiled with EXPAND_PRAGMA == FALSE, so it is not
subject to macro expansion.  It is macro expanded only in Visual C-
specific-build.)

6. For compiler-systems with long long, a #if expression is evaluated in
long long or unsigned long long.

7. Allows an escape sequence named UCN for Unicode in the forms of
\unnnn and \Unnnnnnnn in identifiers, character constants, string
literals and pp-numbers.  The value of a UCN in #if expression is
evaluated as a hexadecimal representation. (UCN cannot be used in
POSTSTD mode.)

A variable argument macro takes a form of:

    #define debug(...)  fprintf(stderr, __VA_ARGS__)

Here is a macro invocation:

    debug( "X = %d\n", x);

This macro is expanded as follows:

    fprintf(stderr, "X = %d\n", x);

"..." in the parameter list corresponds to one or more parameters.  In
the above example, "..." corresponds to __VA_ARGS__ in the replacement
list.  During a macro invocation, several arguments that correspond to
the "...", including ",", are concatenated to be treated as one argument.

_Pragma( "foo bar") has the same effect as specifying #pragma foo bar.
The argument of the _Pragma() operator must be one string literal or
wide string literal.  For a wide string, the prefix (L) is deleted, and
for a string literal, " enclosing that string literal is deleted, and \"
and \\ in that literal is replaced with " and \, respectively, before it
is treated as a #pragma argument.

#pragma must be written somewhere in one logical line and its argument
is not macro-expanded at least for C90.  On the other hand, the _Pragma()
operator can be written anywhere in source code (even in a replacement
list), which gives the same effect with #pragma written in a logical
line.  The _Pragma() operator generated during macro expansion is also
valid.  This flexibility provides the pragma directive with a wide range
of portability and allows a header file to absorb the difference in #
pragma among compiler systems. (For this sample, see pragmas.h and
pragmas.t of "Validation Suite".) [2]

C99 stipulates a #if expression is of maximum integer type.  As "long
long" and "unsigned long long" are required types, the type of an #if
expression is "long long / unsigned long long" or larger.  C90 and C++98
stipulate the type is long / unsigned long.  MCPP, however, evaluates it
by long long / unsigned long long even in C90 and C++98, and issues a
warning when the value is out of range of long / unsigned long. [1]

Note:
[1] This is for compatibility with GCC and Visual C++ 2005.  It is
difficult also for other compiler systems to implement C99
specifications all at once.  Probably, they will begin to implement them
little by little with __STDC_VERSION__ set to 199409L or so.

[2] C99 says that a #pragma argument that begins with STDC is not macro-
expanded.  For other #pragma arguments, whether macro is expanded is
implementation-defined.


3.8     Asm Statement in Borland C and Other Special Syntaxes

Borland C has the asm keyword.  This keyword is used to write assembly
code as follows:

    asm {
        mov x,4;
        ...;
    }

This is quite irregular and deviates from the C grammar more than #asm.
If there happen to be a token with the same name as a macro, it will be
macro-expanded.  The same can be said with Borland C itself and MCPP.
It is recommended that an assembler program should be written in a
separate .ASM file.

Visual C++ also has the __asm keyword, which provides the similar
functionality to this.

GCC provide a Standard-conforming built-in function, asm( " mov x,4\n").


3.9     Compatibility with GCC

Although I tried to develop MCPP in such manner that the GCC-specific-
build provides compatibility with GCC / cpp (cc1) to the extent that it
does not hinder practical use, it is still incompatible in many aspects.

First of all, as shown in Chapter 2, there are many differences in
execution options.  MCPP implements neither -A option nor non-conforming
directives, including #assert and #ident. [1]

Fortunately, there seems to be quite few sources that cannot be compiled
due to a lack of this compatibility.

It is more problematic that there are some sources that assume special
behaviors of old preprocessors.  Most of such source code receives a
warning when -pedantic is specified in GCC.  MCPP in Standard mode, by
default, provides almost the same behavior as GCC's -pedantic since it
implements Standard conforming error checking.  However, since GCC/cpp,
by default, allows such Standard violations without issuing a diagnostic,
there are some sources that take advantage of this.

It is very easy to rewrite such non-conforming code to Standard-
conforming code, so it is meaningless to take the trouble to write non-
conforming code only to impair portability and, what is worse, to
provide a hotbed of bugs.  When you find such code, do not hesitate to
correct it.

Note:
[1] The functionality of #assert and #ident should be implemented using
#pragma, if necessary.  The same can be said with #include_next and #
warning, but these directives seem to be sometimes used in GCC system,
so I grudgingly implemented them in GCC-specific-build, however, a
warning is issued when they are used.

3.9.1   Preprocessing FreeBSD 2/Kernel Source

Taking FreeBSD 2.2.2-R (1997/05) kernel source code as an example, this
section explains some preprocessing problems.  All the directories that
appear in this section are installed in /sys (/usr/src/sys).  Of the
items I point out below, 3.9.1.7 and 3.9.1.8 are not necessarily
Standard violations and work as expected in MCPP, but MCPP issues a
warning because their coding is confusing.  3.9.1.6 is an enhancement
and C99 provides the same functionality, but it differs from GNUC/cpp in
notation.

3.9.1.1     Multi-Line String Literal

Assembly codes are embedded by the following manner in i386/apm/apm.c,
i386/isa/npx.c, i386/isa/seagate.c, i386/scsi/aic7xxx.h, dev/aic7xxx/
aic7xxx_asm.c,  dev/aic7xxx/symbol.c, gnu/ext2fs/i386- bitops.h, pc98/
pc98/npx.c:

asm("
    asm code0
#ifdef PC98
    asm code1
#else
    asm code2
#endif
    ...
");

When no " closing a string literal appears by the end of line, GCC/cpp,
by default, interprets that the string literal ends at the end of line.
The above coding is based on this specification.  In addition, the
compiler-proper seems to interpret the whole content of asm() as a
string literal spreading across lines.

I think that assembler source code should be written in an separate file,
but if you want to embed it in ".c" file by all means, write it in the
following manner, instead of using the confusing coding shown above.

asm(
    "  asm code0\n"
#ifdef PC98
    "  asm code1\n"
#else
    "  asm code2\n"
#endif
    "  ...\n"
);

Standard C conforming preprocessors will accept it.

3.9.1.2     #else junk, #endif junk

The following line appears in ddb/db_run.c, netatalk/at.h, netatalk/aarp.
c, net/if-ethersubr.c, i386/isa/isa.h, i386/isa/wdreg.h, i386/isa/tw.c,
i386/isa/b004.c, i386/isa/matcd/matcd.c, i386/isa/sound/sound_calls.h,
i386/isa/pcvt/pcvt_drv.c, pci/meteor.c, and pc98/pc98/pc98.h:

#endif MACRO

This line should be changed to:

#endif /* MACRO */

3.9.1.3     #ifdef 0

To my surprise, i386/apm/apm.c contains the following strange line:

#ifdef 0

Of course, this should be written as:

#if 0

This code must have been neither debugged nor used.

3.9.1.4     Duplicate Definition of Macro

gnu/i386/isa/dgb.c has a duplicate definition of the following macro:

#define DEBUG

Some of header files have a macro definition conflicting with this.

The Standard C regards duplicate definitions as "undefined", but how
they are treated depends on compiler systems; some make the first
definition valid after issuing an error message and others, like GCC 2/
cpp, make the last definition valid without issuing any messages.  To
make the last definition valid, the following code should be added
immediately before the last definition.

#undef DEBUG

3.9.1.5     #warning

i386/isa/if_ze.c, and i386/isa/if_zp.c have the #warning directive.
This is the only Standard violation directive I found in the kernel
source.  To conform to the Standard C, there is no way but to comment
out this line.

MCPP accepts #warning.

3.9.1.6     Variable Argument Macros

gnu/ext2fs/ext2_fs.h and i386/isa/mcd.c have the following macro that
takes variable number of arguments:

#define MCD_TRACE(fmt, a...)        \
{                                   \
    if (mcd_data[unit].debug) {     \
        printf("mcd%d: status=0x%02x: ",    \
            unit, mcd_data[unit].status);   \
        printf(fmt, ## a);          \
    }                               \
}

#   define ext2_debug(fmt, a...)  { \
        printf("EXT2-fs DEBUG (%s, %d): %s:",   \
            __FILE__, __LINE__, __FUNCTION__);  \
        printf(fmt, ## a);          \
        }

This is a GCC-specific enhanced specification and cannot be applied to
other compiler systems.  The above "## a" can be simply written as "a".
With ## and in the absence of an argument corresponding to "a..." in a
macro invocation, the preceding comma is deleted.  C99 also provides for
variable argument macros, but their notation differs from that of GCC.
The above example is written as follows in C99:

#define MCD_TRACE( ...)             \
{                                   \
    if (mcd_data[unit].debug) {     \
        printf("mcd%d: status=0x%02x: ",    \
            unit, mcd_data[unit].status);   \
        printf( __VA_ARGS__);       \
    }                               \
}

#  define ext2_debug( ...)     {    \
            printf("EXT2-fs DEBUG (%s, %d): %s:",   \
                __FILE__, __LINE__, __FUNCTION__);  \
            printf( __VA_ARGS__);   \
            }

The most annoying difference is that in C99 requires one or more
arguments on a macro invocation corresponding to "..." while GNUC/cpp
requires 0 or more arguments corresponding to "a...".  To handle this,
when there is no argument corresponding to "...", MCPP issues a warning,
instead of making it an error.  Therefore, you can change the above code
as follows:

#define MCD_TRACE(fmt, ...)         \
{                                   \
    if (mcd_data[unit].debug) {     \
        printf("mcd%d: status=0x%02x: ",    \
            unit, mcd_data[unit].status);   \
        printf(fmt, __VA_ARGS__);   \
    }                               \
}

#  define ext2_debug(fmt, ...)     {    \
            printf("EXT2-fs DEBUG (%s, %d): %s:",   \
                __FILE__, __LINE__, __FUNCTION__);  \
            printf(fmt, __VA_ARGS__);   \
            }

This is simpler with one-to-one correspondence.  However, this way of
writing has a disadvantage that a comma immediately before an empty
argument remains, resulting in, for example, printf( fmt, ).  In this
case, there is no other way but to write a macro definition in
accordance with C99 specifications, or avoid using an empty argument in
a macro invocation.  Harmless tokens, such as NULL or 0, are used to
write, for example, MCD_TRACE(fmt, NULL). [1]

Note:
[1] To use MCPP, source code must be rewritten in this way.  In addition,
with the -Q option, a huge amount of warnings is output not to the
screen but to the mcpp.err file.
GCC 2.95.3 or later also implements variable argument macros based on
the C99 syntax.  It is recommended to use this syntax in the future.
GCC specific one provides the flexibility of allowing for zero number of
variable argument macros, but its notation is bad in that (1) for the
"args..." parameter, a white space must not be inserted between "args"
and "...", but such a pp-token is not permitted, and that (2) it is not
desirable that the notation for a token concatenation operator is used
to indicate a variable argument in a replacement list.  It is desirable
to allow zero number of variable arguments based on the C99 notation.
GCC 3 introduced a notation for variable argument macros that is a
mixture of GCC 2's traditional notation and C99 one.  For details, refer
to 3.9.6.3.

3.9.1.7     Empty Argument in Macro Call

The following macro invocations appear in nfs/nfs.h, nfs/nfsmount.h, nfs
/nfsmode.h, netinet/if_ether.c, netinet/in.c, sys/proc.h, sys/socketvars.
h, i386/scsi/aic7xxx.h, i386/include/pmap.h, dev/aic7xxx/scan.l, dev/
aic7xxx/aic7xxx_asm.c, kern/vfs_cache.c, pci/wd82371.c, vm/vm_object.h,
and vm/device/pager.c.  So do in /usr/include/nfs/nfs.h.

    LIST_HEAD(, arg2)
    TAILQ_HEAD(, arg2)
    CIRCLEQ_HEAD(, arg2)
    SLIST_HEAD(, arg2)
    STAILQ_HAED(, arg2)

The first argument is empty.  C99 approved empty arguments but C90
regarded them as undefined.  Taking it consideration that an argument
may happen to be empty during a nested macro invocation, empty arguments
should be approved, however, it is neither necessary nor desirable to
write an empty argument in source code.  Note that for a one-argument
macro, there is syntax ambiguity between an empty argument and a lack of
argument.

Taking everything into consideration, the following notation is
recommended:

#define EMPTY

    LIST_HEAD(EMPTY, arg2)
    TAILQ_HEAD(EMPTY, arg2)
    CIRCLEQ_HEAD(EMPTY, arg2)
    SLIST_HEAD(EMPTY, arg2)
    STAILQ_HAED(EMPTY, arg2)

Any Standard C conforming preprocessor will accept this notation.

By the way, some of the header files (in the nfs directory) shown in the
previous page neither have the macro definitions shown above nor #
include any other header files.  This is because such header files
assume that these macro definitions exist in sys/queue.h and that *.c
programs will #include sys/queue.h first.  These files arise ambiguity.

kern/kern_mib.c has the following macro definitions:

    SYSCTL_NODE(, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9)

In this case, the first argument cannot be changed to EMPTY.  Because
the corresponding macro definition in the sys/sysctl.h is as follows:

#define SYSCTL_NODE(parent, nbr, name, access, handler, descr)
    extern struct linker_set sysctl_##parent##_##name;
    SYSCTL_OID(parent, nbr, name, CTLTYPE_NODE|access,
        (void*)&sysctl_##parent##_##name, 0, handler, "N", descr);
    TEXT_SET(sysctl_##parent##_##name, sysctl__##parent##_##name);

In other words, these arguments are not macro-expanded.  The arguments
of the SYSCTL_OLD macro shown above, including the first one, are not
macro expanded.  In this case, there is no way but to leave the empty
argument as it is.  [1]

Note:
[1] C99 approves empty arguments as legitimate.  Taking macros, such as
SYSCTL_NODE() and SYSCTL_OID(), into consideration, the EMPTY macro is
not almighty and using empty arguments has some reason.  In addition,
even if EMPTY is used, a nested macro invocation may cause empty
arguments.  However, for source readability, using EMPTY is recommended
whenever possible.

3.9.1.8     Object-Like Macros Replaced with Function-like Macro Name

i386/include/endian.h, as well as /usr/include/machine/endian.h, has the
following macro definitions. (There are four same kinds of definitions.)

#define __byte_swap_long(x) (replacement text)
#define NTOHL(x)            (x) = ntohl ((u_long)x)
#define ntohl               __byte_swap_long

The problem is the ntohl definition.  Although ntohl is an object-like
macro, it is expanded to a function-like macro name, then rescanned with
subsequent text, and is expanded as if it were a function-like macro.
This way of macro-expansion has been regarded as an implicit
specification since K&R 1st, and the Standard C somehow approved it as
legitimate.  However, as I discuss in other documents, it is this
specification that makes macro-expansion unnecessarily complicated and
brings confusion to Standard documents.  This is a bug specification. [1]

This ntohl is actually a function-like macro, written as an object-like
macro omitting the parameter list.  You had better define this like a
function-like macro that it is:

#define ntohl(x)    __byte_swap_long(x)

This causes no problem.

i386/isa/sound/os.h has the same kind of macro definitions:

#define INB         inb
#define INW         inb

This should be written as follows:

#define INB(x)      inb(x)
#define INW(x)      inb(x)

Note:
[1] ISO 9899:1990 Corrigendum 1:1994 regarded the notation as undefined.
C99 replaced this article with other.  However, Standard documents are
still confusing about this.  For details, see 1.7.6 of cpp_test.txt.

3.9.1.9     Preprocessing .S File

Some kernel sources are contained in several ".S" files, that is, they
are written in assembler.  These sources contain #include's or #ifdef's,
which require preprocessing.  To preprocess them, in FreeBSD 2.2.2-R,
'cc' is called with the '-x assembler-with-cpp' option, and 'cc' calls
'/usr/libexec/cpp' with the '-lang-asm' option and then calls 'as'.

Of course, this ways of using .S files is non-conforming.  This
assembler source code must not contain a token that happens to have the
same name with a macro.  White spaces between tokens and at the
beginning of a line must be retained during preprocessing..  In addition,
if the first token at the beginning of a line is a # indicating an
assembler comment, special processing is required on the preprocessor
side.  This not only considerably limits available preprocessors but
also increases the possibility of unknowingly introducing bugs.  So,
using .S files in this way is not recommended. [1]

To preprocess source code for use with several types of machines, the
code should be written in the following manner and be saved in not ".S"
but ".c" file.  4.4BSD-Lite actually adopts this way of coding.

asm(
    "  asm code0\n"
#ifdef Machine_A
    "  asm code1\n"
#else
    "  asm code2\n"
#endif
    "  ...\n"
);

Note:
[1] In FreeBSD 2.0-R, these kernel sources are contained not in *.S but
in *.s file.  The Makefile is so defined as to call 'cpp', instead of
'cc', to process them.  Then the 'cc' calls 'as'.  When the 'cpp' is
called, '/usr/bin/cpp' is invoked.  '/usr/bin/cpp' is a shell-script
that calls '/usr/libexec/cpp -traditional'.  This method was more
convenient in that it provides a way to change preprocessors to be used
by modifying the script.

3.9.2   Preprocess of FreeBSD 2/libc Source

I recompiled all the source files in /usr/src/lib/libc of FreeBSD 2.2.2R.
There was no problem, probably because most of them comes from 4.4BSD-
Lite without modification.  It is quite rare and surprising that a huge
amount of source files in excellent quality is gathered together.

Only at one place, I found the following coding in gen/getgrent.c.  Of
course, ";" at the end of line is surplus.

#endif;

3.9.3   Problems Concerning GCC 2/cpp

As seen so far, writing a Standard-conforming source code with better
portability in a more secure manner neither requires much effort nor
provides any demerits.  In spite of it, why does source code less
conforming to Standards still exist at all?

When comparing the FreeBSD 2.0-R kernel sources with those of 2.2.2-R,
non-conforming ones do not decrease in number.  The problem is that
newer sources are not necessarily more conforming to the Standards.
There are few Standard-non-conforming sources in 4.4BSD-Lite.  This is
probably because the 4.4BSD sources were rewritten to become conforming
to the Standard C and POSIX.  However, during the process of
implementing these sources to FreeBSD, the old writing style revived in
some sources.  For example, although the ntohl shown above is written as
'ntohl(x)' in 4.4BSD-Lite, it is written as 'ntohl' in FreeBSD.  Why did
the notation once put away revive?

I blame GCC/cpp for this revival, which passes these non-conforming
sources without issuing a diagnostic.  If -pedantic had been a default
behavior, the old style source would have never revived.  If -pedantic-
errors had been a default behavior, although, GCC/cpp would not have
been put into practical use because too many sources failed to be
compiled.  The gcc's man page describes the -pedantic option as: "There
is no reason to use this option except for satisfying pedants."  Now
that eight years have already passed since Standard C was established,
it is a high time that GCC/cpp should set -pedantic as default, not go
so far as to set -pedantic-errors. [1]

In FreeBSD 2.0-R, nested comments were sometimes found, but in 2.2.2-R,
they disappeared.  This is because GCC/cpp no longer allowed them.  This
has nothing to do with -pedantic, but I want to say how influential
preprocessor's source checking is.

Note:
[1] I wrote 3.9.3 in 1998.  After that, gcc's man page or info deleted
this expression, however, the specification remains almost the same.

3.9.4   Preprocessing Linux/glibc 2.1 Source

I recompiled glibc 2.1.3 sources on Vine Linux 2.1 (i386).  Different
from those of FreeBSD libc, I found many problems.  Some sources are
written based on GCC/cpp's undocumented specifications, in which case it
took me a lot of time to identify them.

3.9.4.1     Multi-Line String Literal

sysdeps/i386/dl-machine.h and stdlib/longlong.h have many multi-line
string literals as shown below:

#define MACRO asm("
    instr 0
    instr 1
    instr 2
")

Some string literals are very long.  compile/csu/version-info.h created
by make also has a multi-line string literal.  Of course, it is non-
conforming, but GCC treats it as a string literal with embedded <newline>.

The -lang-asm (-x assembler-with-cpp, -a) option allows MCPP to convert
a multi-line string literal into the following code:

#define MACRO asm("\n  instr 0\n  instr 1\n  instr 2\n")

However, this option cannot work properly for a string literal with a
directive inserted in the middle as shown in 3.9.1.1, in which case
there is no way but to rewrite the source.

3.9.4.2     #include_next, #warning

#include_next appears in the following files:

catgets/config.h, db2/config.h, include/fpu_control.h, include/limits.h,
include/bits/ipc.h, include/sys/sysinfo.h, locale/programs/config.h, and
sysdeps/unix/sysv/linux/a.out.h

sysvipc/sys/ipc.h has #warning.

Although these directives are not approved by the Standard C, #
include_next, in particular, becomes indispensable for glibc 2.  So,
MCPP for GCC implements #include_next and #warning.

The problems concerning #include_next is that it is not only a standard
violation but also that what headers are actually included depends on
the setting of include directories and a search order, which are changed
by users via environment variables.

When glibc is installed, some files in glibc's include directory are
copied to the /usr/include directory.  These files are used as system
header files.  That these header files contain #include_next means
system headers become patchy.  It seems to be time to reorganize them.

3.9.4.3     Variable Argument Macros

The following files contain definitions of macros with variable number
of arguments based on the GCC specification, as well as macro
invocations:

elf/dl-lookup.c, elf/dl-version.c, elf/ldsodefs.h, glibc-compat/nss_db/
db-XXX.c, glibc-compat/nss_files/files-XXX.c, linuxthreads/internals.h,
locale/loadlocale.c,  locale/programs/linereader.h,  locale/programs/
locale.c, nss/nss_db/db-XXX.c, nss/nss_files/files-XXX.c, sysdeps/unix/
sysdep.h, sysdeps/unix/sysv/linux/i386/sysdep.h, and sysdeps/i386/fpu/
bits/mathinline.h

This is a deviation from the C99 Standard.  You must rewrite the source
code before you can use MCPP.

3.9.4.4     Empty Argument During Macro Calls

The following files have macro invocations with empty arguments:

catgets/catgetsinfo.h, elf/dl-open.c, grp/fgetgrent_r.c, libio/
clearerr_u.c, libio/rewind.c, libio/clearerr.c, libio/iosetbuffer.c,
locale/programs/ld-ctype.c, locale/setlocale.c, login/getutent_r.c,
malloc/thread-m.h, math/bits/mathcalls.h, misc/efgcvt_r.c, nss/nss_files
/files-rpc.c, nss/nss_files/files-network.c, nss/nss_files/files-hosts.c,
nss/nss_files/files-proto.c, pwd/fgetpwent_r.c, shadow/sgetspent_r.c,
sysdeps/unix/sysv/linux/bits/sigset.h, sysdeps/unix/dirstream.h

math/bits/mathcalls.h, in particular, contains as much as 79 empty
arguments.  This header file is installed in /usr/include/bits/mathcalls.
h and is #included by /usr/include/math.h.  Even with an EMPTY macro,
nested macro invocations generate a lot of empty arguments.  Are there
any other ways to write macros more clearly?

3.9.4.5     Object-Like Macros Replaced with Function-like Macro Name

The following files contain object-like macro definitions replaced with
function-like macro names:

argp/argp-fmtstream.h, ctype/ctype.h, elf/sprof.c, elf/dl-runtime.c, elf
/do-rel.h, elf/do-lookup.h, elf/dl-addr.c, io/ftw.c, io/ftw64.c, io/sys/
stat.h, locale/programs/ld-ctype.c, malloc/mcheck.c, math/test-*.c, nss/
nss_files/files-*.c, posix/regex.c, posix/getopt.c, stdlib/gmp-impl.h,
string/bits/string2.h, string/strcoll.c, sysdeps/i386/i486/bits/string.h,
sysdeps/generic/_G_config.h, sysdeps/unix/sysv/linux/_G_config.h

Of these, some function-like macros, like math/test-*.c , are first
replaced with an object-like macro name and then further replaced with a
function-like macro name.  Why did these macros have to be written in
this way?

3.9.4.6     Macros Expanded to 'defined'

sysdeps/generic/_G_config.h, sysdeps/unix/sysv/linux/_G_config.h, and
malloc/malloc.c contain the following macro definition expanded to the
"defined" pp-token.

#define HAVE_MREMAP defined(__linux__) && !defined(__arm__)

The intention of this macro definition is that with the following
directive,

#if HAVE_MREMAP

, the above line is expected to be expanded as follows:

#if defined(__linux__) && !defined(__arm__)

However, the behavior is undefined in Standard C when a #if line has a
"defined" pp-token in a macro expansion result.  Apart from it, this
macro definition is strange in the first place.

The HAVE_MREMAP macro is first replaced with the following,

defined(__linux__) && !defined(__arm__)             (1)

, and then the identifiers "defined", "__linux__" and "__arm__" are
rescanned for more macro replacement.  If any of them is a macro, it is
expanded.  In this case, "defined" cannot be defined as a macro
(Otherwise, it causes another undefined result), and if __linux__ is
defined as 1 and __arm__ is not defined, this macro is finally expanded
as follows:

defined(1) && !defined(__arm__)

defined(1), of course, is a syntax error of a #if expression.

However, GCC/cpp stops macro expansion at (1) and regards it as the
final macro expansion result of the #if line.  Since this is "undefined"
anyhow, this GNU specification cannot be described as wrong, but it
lacks of consistency in that how to expand a macro differs between
macros in a #if line and in other lines.  At least, it lacks of
portability. [1]

The above code should be written as follows:

#if defined(__linux__) && !defined(__arm__)
#define HAVE_MREMAP 1
#endif

I hope this kind of confusing code be eliminated as early as possible.
[2]

Note:
[1] GCC 2/cpp internally treats "defined" in a #if line as a special
macro.  For this reason, when GCC/cpp rescans the following sequence of
tokens for macro expansion, it evaluates it as a #if expression, as a
result of special handling of "defined" pseudo-macro, instead of
expanding the original macro.  In other word, distinction between macro
expansion and #if expression evaluation is ambiguous.

    defined(__linux__) && !defined(__arm__)

This problem relates to GCC/cpp' own program structure.  GCC 2/cpp has a
de facto main routine rescan(), which is a macro rescanning routine.
This routine reads and processes source file from the beginning to the
end, during the course of which, it calls a preprocessing directive
processing routine.  Although implementing everything using macros is a
traditional program structure of a macro processor, this structure can
be thought to cause mixture of macro expansion and other processing.

[2] In glibc 2.2, this macro was corrected.

3.9.4.7     Preprocessing .S File

The files named *.S contain assembler source code requiring
preprocessing.  Some of these files have preprocessing directives, such
as #include, #define, and #if.  In addition, the file named compile/csu/
crti.S generated by Make contains the following lines:

#APP

or

#NO_APP

From a syntax point of view, preprocessors cannot tell whether these
lines are invalid preprocessing directives or valid assembler comments.
GCC seems to leave these lines as they are during preprocessing and
treat it as assembler comments.

Concatenation of pp-tokens using the ## operator sometimes generates an
invalid pp-token.  GCC/cpp outputs these pp-tokens without issuing a
diagnostic.

For compatibility with GCC, I reluctantly decided that, with the
-lang-asm (-x assembler-with-cpp, -a) option, MCPP does not treat these
non-conforming directives and invalid pp-tokens generated by ## as error,
and outputs them as they are and issues a warning.

Essentially, these sources should be processed with an assembler macro
processor.  GNU seems to provide a macro processor called gasp, but it
seems to be scarcely used for some reason.

3.9.4.8     Problems of rpcgen and -dM Option

When invoked with the -dM option, GCC outputs only macro definitions,
which is used by stdlib/isomac.c in 'make check' routine.

The problem of the isomac.c is that it accepts only GCC/cpp's macro
definition file format and regards a comment or a blank line as an error.
[1]

Glibc make sometimes uses a program called rpcgen.  The problem of
rpcgen is that it accepts only GCC/cpp's output format of preprocessor
line number information as follows:

#123 "filename"

Rpcgen does accept neither:

#line 123

nor

#line 123 "filename"

Rpcgen regards them as error.

I reluctantly decided that GCC-specific-MCPP uses the GCC format by
default.  Rpcgen's specification is poor in that it is based on a
particular compiler system's format and cannot accept the standard one.

Note:
[1] MCPP V.2.5 changed the output of -d* options to the same format with
GCC.

3.9.4.9     -include, -isystem and -I- Options

Glibc 2.1 'makefile' often uses the -include option and sometimes uses
-isystem and -I- options.  The former can be substituted with #include
at the beginning of source code.  The latter two are less necessary;
these are only necessary to update system headers.

Only GCC-specific-build implements these two options, but I would like
these less necessary options to be made obsolete. [1]

Note:
[1] GCC/cpp provides several more options that specify include
directories and their search orders, such as -iprefix, -iwithprefix, and
-idirafter.  It also provides the -remap option that specifies mapping
between long-file-names and MS-DOS 8+3 format filenames.  On CygWIN
systems, specs files contain these options, but it is not necessary to
use these options because include directories can be specified with
environment variables and because such mapping is no longer necessary on
CygWIN.

3.9.4.10    Undocumented Predefined Macros

The following macros are GCC/cpp predefined macros although their names
do not appear in documentation.

    __VERSION__,  __SIZE_TYPE__,  __PTRDIFF_TYPE__, and __WCHAR_TYPE__

On Vine Linux 2.1 (egcs-1.1.2) systems, __VERSION__ is set to
"egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)".  On many systems,
including Linux/i386, the values of other three macros have types
unsigned int, int, and long int, respectively.  However, on FreeBSD and
CygWIN systems, their types are slightly different from them (I do not
know why).  Why does those predefines macros remain undocumented?

3.9.4.11    Undocumented Environment Variables

The most strange thing is the undocumented environment variable named
SUNPRO_DEPENDENCIES. sysdeps/unix/sysv/linux/Makefile contains the
following script:

    SUNPRO_DEPENDENCIES='$ (@:.h=.d)-t $@'
    $ (CC) -E -x c $ (sysinclude) $< -D_LIBC -dM |
    ...
    etc.

The intent of this script is to specify a file name with the environment
variable SUNPRO_DEPENDENCIES, and to have cpp output macro definitions
in source code and dependency description lines between source files to
that file.

I had no other way but to read the GCC/cpp source code (egcs-1.1.2/gcc/
cccp.c) to know how this environment variable works.

In addition, there is another environment variable, DEPENDENCIES_OUTPUT,
which has a similar function.  The difference between the two is that
SUNPRO_DEPENDENCIES also outputs dependency description lines among
system headers, but DEPENDENCIES_OUTPUT does not.

Only GCC-specific-build enables these two environment variables, but I
would like these undocumented specifications to be made obsolete as
early as possible.

3.9.4.12    Other Problems

Linux (i386)/GCC 2 appends the -Asystem(unix), -Acpu(i386) or -Amachine
(i386) to cpp invocation options by using specs file.  As long as the
glibc 2.1.3 for Linux/x86 is concerned, there seems to be no source code
that utilizes this functionality.

It is a big problem that glibc's system headers have become patchy and
very complicated.  A small difference in settings may result in a big
difference in preprocessing results.

On the other hand, Glibc 2.1.3 did not contain #else junk, #endif junk,
or duplicate macro definitions that were found in FreeBSD 2.2.2/kernel
sources.  In some aspects, Glibc 2.1 source is better organized than
FreeBSD 2/kernel source.

However, as a whole, there were not a few sources that are based on GCC-
specific specifications in glibc 2.1, which impairs portability to other
compiler systems although such sources form only a small portion of
several thousand source files.  Dependence on GCC local specifications
is not desirable for program readability and maintainability.  I hope
that GCC V.3 will make obsolete these local specifications and that all
the source code based on them will be completely rewritten.

3.9.5   To Use MCPP with GCC 2

You must modify some source code as follows before you can use MCPP to
compile glibc 2.1 sources:

1. Macro definitions with variable number of arguments: Modify the 14
files in 3.9.4.3 as shown in 3.9.1.6.  Of course, you had better save
the original files.

2. Macros contained in the three files shown in 3.9.4.6 that has
"defined" in its replacement list:  /usr/include/_G_config.h is a file
generated when sysdeps/unix/sysv/linux/_G_config.h is installed and has
the same contents with this.  You had better modify /usr/include/
_G_config.h.

In addition to the options specified in Makefile or specs file, you must
specify the -lang-asm (-xassembler-with-cpp) option to process *.S files
containing multi-line string literals or assembler comments before you
can invoke MCPP.  Usually, you can leave this option specified when
preprocessing other files.

When you want to use GCC/cpp or MCPP, or change the default options, you
had better perform the following steps:

1. Become a super-user to move to the directory where cpp resides (here
assuming /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66).  Let me
assume that this directory has GCC/cpp installed under the name of cpp
and MCPP as mcpp.

2. Create a file called mcpp.sh with the following contents. [1]

#!/bin/sh
/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/mcpp -Q -lang-asm "$@"

The -Q options are optional, however, I recommend that you should use -Q
to record a large amount of diagnostic messages.

3.  Enter the following commands:

    chmod a+x mcpp.sh
    mv cpp cpp_gnuc
    ln -sf mcpp.sh cpp

These commands execute mcpp.sh linked to cpp when gcc calls cpp, and
mcpp.sh calls MCPP using the above options before the ones specified by
gcc.

4.  To change default options, modify mcpp.sh or call mcpp directly.  To
use GCC/cpp do:

    ln -sf cpp_gnuc cpp

Another problem of using MCPP is that it issues a huge amount of warning
messages.  You can redirect them to a file using the -Q option, but when
you uses with the -W3 option to preprocess a large amount of source code,
such as glibc, total of several hundred MB of mcpp.err' are created, so
it is impossible for you to look through the whole files.

Taking a close look at mcpp.err, you will find same warnings being
issued repeatedly.  This is because the same *.h files are #included by
many source programs.  To make the files more readable, perform the
following procedure:

1.  To find error messages, enter the following command:

    grep fatal `find . -name mcpp.err`
    grep error `find . -name mcpp.err`

2:  To sort warning messages, enter the following command:

    grep warning `find . -name mcpp.err` | sort -k3 -u | less

3.  To find all the source lines causing a warning, enter the following
command:

    grep warning `find . -name mcpp.err` | sort -k3 | uniq | less

4.  To find a particular type of warnings, enter the following command,
for example:

    grep 'warning: Replacement' `find . -name mcpp.err` | sort -k3   \
        | uniq | less

After you get an overall idea of what source lines are causing what
kinds of errors or warnings, you can see a particular mcpp.err by "less"
and then, if necessary, see the source file in question.

In addition, you can sandwich the source code in question with '#pragma
MCPP debug expand' and '#pragma MCPP end_debug' and preprocess it again
to see the output, in which case I recommend you to invoke MCPP in the
following manner so that preprocessing results and diagnostic messages
are output to the same file:

    mcpp <-opts> in-file.c > in-file.i 2>&1

When you use "make", you must temporarily change the above shell-script.

Note:
[1] If you use 'configure' and 'make' to compile GCC-specific-build of
MCPP, the 'make install' command will set the script appropriately.  The
only thing left for you here is to add '-Q -lang-asm' options to the
script.

3.9.6     Preprocessing GCC 3.2 Source

I first compiled GCC 3.2 sources on Linux and FreeBSD, then I used the
generated gcc to compile MCPP and then I recompiled GCC 3.2 sources
using MCPP for preprocessing.

New GCC compilers are bootstrapped during various phases of make; gcc
and cc1, etc generated in an earlier phase are used to recompile
themselves, and those generated compiler drivers and compiler-propers
are used again to recompile themselves, and so on.  During the bootstrap,
gcc exists under the name of xgcc.

Other than cc1 and cc1plus, GCC 2 has a separate preprocessor called cpp.
In GCC 3, cpp was absorbed into cc1 and cc1plus.  However, there still
exists a separate cpp (or cpp0).  To have cpp0 preprocess, the -no-
integrated-cpp option must be specified when you invoke gcc or g++.
Therefore, to have MCPP preprocess, you must use a shell-script that
have gcc (xgcc) or g++ invoke MCPP first then invoke cc1 or cc1plus. [1]

In the GCC compiler system, the settings of system headers and their
search order are becoming very complex.  So, a small difference in
settings may result in a difference in preprocessing results.  Even
successful compilation was often difficult to attain.  In addition,
compilation and tests require a lot of other software.  Older versions
of such software may cause failure in compilation or tests.  Actually,
compilation sometimes failed due to some hardware problems on my machine.

Actually, I failed to compile GCC 3.2 source under FreeBSD 4.4R.  I had
to upgrade FreeBSD to 4.7R and changed software packages to those for
FreeBSD 4.7R before I was able to succeed in compilation. [2]

I use VineLinux 2.5 on two PCs.  Although compilation of GCC 3.2 sources
using GCC 2.95.3 was successful on one PC (K6/200MHz), recompilation of
GCC 3.2 sources using the generated GCC 3.2/cc1 failed, and caused many
segmentation faults.  Then I changed CPU from K6 to AthlonXP.  This time,
recompilation was successful; no segmentation faults occurred.  Hardware
may have caused the problem.

When I compiled GCC 3.2 sources using GCC 2.95.4 under FreeBSD on K6,
"make -k check" of the generated gcc was almost successful.  When I
recompiled GCC 3.2 itself using the generated GCC 3.2, in "make -k check
" of g++ and libstdc++-v3, about 20 percent of testsuite was
unsuccessful.  However, when using AthlonXP, instead of K6, everything
went OK.  Hardware may have caused the problem.

On both VineLinux PCs, when I recompiled GCC 3.2 sources using GCC 3.2
itself and MCPP, "make -k check" of the generated gcc was successful.
However, in "make -k check" of g++ and libstdc++-v3, 20 percent of
testsuite failed.  [3], [4], [5]

In anyway, the cause of this testsuite failure seems to lie not in the
generated compilers themselves, such as gcc, g++, cc1 and cc1plus, but
in the header files or some other settings.

MCPP cannot be described as completely compatible with GCC/cpp, but is
highly compatible.  So, MCPP and GCC/cpp can be used interchangeably.

GCC 3.2 sources were compiled in the following environment:

    OS              make        library     CPU
    VineLinux 2.5   GNU make    glibc 2.2.4 Celeron/1060MHz
    VineLinux 2.5   GNU make    glibc 2.2.4 K6/200MHz, AthlonXP/2.0GHz
    FreeBSD 4.7R    UCB make    libc.so.4   K6/200MHz, AthlonXP/2.0GHz

Only C and C++ were compiled.

My Validation Suite has the edition to be used in GCC testsuite.
Validation Suite allows you to perform detailed and systematic
preprocessor tests using "make check" or "runtest".  Validation Suite
checks not sources but preprocessor's behaviors.  For details, see of 2.
2.3 of cpp-test.txt

Note:
[1] I had to do this for each bootstrap stage.  Since makefile is too
large and too complex to change, I employed an inelegant method; I kept
on sitting in front of PC screen during the entire process of bootstrap.
At each end of the stages, I entered ^C and replaced xgcc and others
with shell-scripts.

[2] Due to dependency between packages, the system falls into confusion
unless appropriate versions are installed.  Actually, for this reason,
my FreeBSD temporarily failed to invoke kterm.

[3] "make -k check" cannot be used with MCPP because diagnostics of MCPP
are different from those of GCC.

[4] "make -k check" seems to require an English environment, so the LANG
environment variable must be set to C.

[5] All the testsuite failures were caused by inability of the pthread_*
functions, such as pthread_getspecific and pthread_setspecific, to be
linked in the library i686-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.
so.5.0.0.  When a correctly generated library was installed, "make -k
check" was successful.  On FreeBSD, this problem never happened.  This
is probably because of small differences in settings.

3.9.6.1     Multi-Line String Literal

This very old way of coding was no longer found in GCC 3.2 sources.
Multi-line string literals were made obsolete as late as at GCC 3.2.
GCC 3.2 processes a source with a multi-line string literal as you
expect, but issues a warning.

3.9.6.2     #include_next and #warning

limits.h and syslimits.h in build/gcc/include generated during the
course of make have #include_next.  When GCC 3.2 is installed, these
header files are copied to limits.h and syslimits.h in lib/gcc-lib/i686-
pc-linux-gnu/3.2/include.

GCC 3.2 sources does not have #warnings.

3.9.6.3     Variable Argument Macros

GCC 3.2 sources have some variable argument macros, but most of them are
found in testsuite and they are nothing but test samples.  Although GCC
3.2 still supports variable argument macros in GCC 2 notation, the ones
using __VA_ARGS__ (in C99 notation) are more frequently found in GCC 3.2
sources.

In GCC 3, variable argument macros in a mixed notation of GCC 2 and C99
are found:

    #define eprintf( fmt, ...)   fprintf( stderr, fmt, ##__VA_ARGS__)

According to the GCC 3 specification, in the absence of an argument
corresponding to "...", the comma immediately before "..." is deleted.
So, this is expanded as follows:

    eprintf( "success!\n")  ==>  fprintf( stderr, "success!\n")

As far as this example is concerned, this specification seems to be
convenient, but is not desirable in that (1) a comma in a replacement
list of a macro definition is not always used to delimit parameters, (2)
it allows a token concatenation operator (##) to have other
functionality, (3) it makes rules more complex by allowing exceptions.
MCPP does not implement this functionality.  MCPP does not regard this
macro definition as an error, but it does not delete the comma
immediately before the empty argument in a macro invocation.

3.9.6.4     Empty Arguments in Macro Invocation

Apart from #included-ed system headers, such as /usr/include/bits/
mathcalls.h and /usr/include/bits/sigset.h, empty arguments in a macro
invocation are found only in gcc/libgcc2.h of GCC 3.2 sources themselves.
[1]

Note:
[1] These two header files are copied into the system header directory
when glibc is installed.  They do not exist on FreeBSD because glibc is
not used.

3.9.6.5     Object-Like Macros Replaced with Function-Like Macros

gcc/fixinc/gnu-regex.c and libiberty/regex.c have object-like macros
that are replaced with function-like macro name.  /usr/lib/bison.simple,
a #included file, also has such macros.  These macros are all relevant
to alloca.  For example, libiberty/regex.c has the following macro
definitions.

    #define REGEX_ALLOCATE  alloca
    #define alloca( size)   __builtin_alloca( size)

This should be written as follows:

    #define REGEX_ALLOCATE( size)   alloca( size)

Why did they omit (size)?

In addition, regex.c also has another alloca, which is defined as
follows:

    #define alloca  __builtin_alloc

Their writing style is inconsistent.

Furthermore, regex.c has a #include "regex.c" line, which is including
itself.  regex.c is a strange and unnecessarily complicated source.

3.9.6.6     Macros Expanded to 'defined'

GCC 3.2 sources do not have macros expanded to 'defined'.  According to
GCC 3.2 documents, this type of macro is preprocessed in the same way as
GCC 2/cpp, but GCC 3.2 issues a warning to indicate "may not portable".
However, when I tested, GCC 3.2 did not seem to issue a warning to an
example shown in 3.9.4.6.

3.9.6.7     Preprocessing of .S Files

The gcc/config directory has several *.S files.

3.9.6.8     rpcgen and -dM Option

Make of GCC 3.2 uses neither rpcgen nor -dM option.  However,
specifications for rpcgen and the -dM option do not seem to change from
the previous versions.

3.9.6.9     -include, -isystem and -I- Options

These options are frequently used in make of GCC 3.2.  Sometimes, the -
isystem option is used to specify several system include directories at
one time.  Is it inevitable to use the option during software
compilation that updates system headers themselves?  I think they had
better use an environment variable to specify all the system include
directories.

On the other hand, GCC 3/cpp documents discourage to use the
-iwithprefix and -iwithprefixbefore options.  GCC provides many options
to specify include directories.  Does GCC 3.2 move toward reorganization
or reduction in number of them? [1]

Note:
[1]  GCC 3.2 Makefile uses the -iprefix option in a stand-alone manner
(without using -iwithprefix or -iwithprefixbefore), although the
-iprefix option makes sense only when used with one of these two options
following it.

3.9.6.10    Undocumented Predefined Macros

GCC 2 did not document predefined macros, such as __VERSION__,
__SIZE_TYPE__, __PTRDIFF_TYPE__ and __WCHAR_TYPE__.  Even with the -dM
option, their existence was unknown.  GCC 3 not only documents them but
also enhances -dM to show their values.

3.9.6.11    Undocumented Environment Variables

GCC 3 documents the SUNPRO_DEPENDENCIES environment variable GCC 2 did
not.  (I do not know why this environment variable is needed.)

3.9.6.12    Other Problems

GCC 3 implements following #pragmas:

    #pragma GCC poison
    #pragma GCC dependency
    #pragma GCC system_header

Of these, GCC 3.2 sources use poison and system_header.  MCPP does not
support these #pragmas because I do not think them necessary. (I omit
explanation of their specifications.)

GCC 3 deprecates assertion directives, such as #assert, although gcc, by
default, specifies the -A option.

In GCC 2, the -traditional option is implemented in one and the same cpp,
result in a strange mixture of very old specifications and C99 ones.  In
GCC 3, its preprocessor was divided into two: non-traditional cpp0 and
tradcpp0. The -traditional option is valid only for gcc.  cpp0 does not
provides it.  gcc -traditional invokes tradcpp0 for preprocessing.

tradcpp0 is getting closer to a true traditional preprocessor before C90.
They say that they no longer maintain tradcpp0 except for serious bugs.

The strange specifications of GCC 2/cpp seem to have been significantly
revised.

3.9.7   To Use MCPP with GCC 3 or 4

As seen above, as far as preprocessing is concerned, GCC 3.2 sources
have been much improved than glibc 2.1.3 sources in that the traditional
way of writing has been almost eliminated and that meaningless options
are no longer used.

GCC 3.2/cpp0 itself is also much superior to GCC 2/cpp in that it
regards traditional specifications as obsolete and articulates the token-
based principle.  Undocumented specifications have been significantly
reduced.  Although these improvements are not still sufficient, GCC is
certainly moving toward the right direction.

However, GNU system headers become so complex that it is difficult to
grasp their entire structure, which may one of the biggest causes of
problems in the GNU system.

Another pitiful fact is that the preprocessor is absorbed into the
compiler-proper.  Therefore, to use MCPP, the -no-integrated-cpp option
must be specified when invoking gcc or g++.  If you compile a large
amount of source files with complicated or many makefiles, or if some
program automatically invoke gcc, you must create a shell-script that
invokes gcc or g++ with the -no-integrated-cpp option automatically
specified.

Let me take an example of this.  Place the following shell-scripts in
the directory where gcc and g++ reside (on my Linux, /usr/local/gcc-3.2/
bin), under the names of gcc.sh and g++.sh, respectively.

    #!/bin/sh
    /usr/local/gcc-3.2/bin/gcc_proper -no-integrated-cpp "$@"

    #!/bin/sh
    /usr/local/gcc-3.2/bin/g++_proper -no-integrated-cpp "$@"

Move to this directory and enter the following commands:

    chmod a+x gcc.sh g++.sh
    mv gcc gcc_proper
    mv g++ g++_proper
    ln -sf gcc.sh gcc
    ln -sf g++.sh g++

In the directory where cpp is located (on my Linux, /usr/local/gcc-3.2/
lib/gcc-lib/i686-pc-linux-gnu/3.2), create a script that executes MCPP
when cpp0 is invoked, as you did for GCC 2 (See 3.9.5).  By doing this,
gcc or g++ first invokes MCPP and then invokes cc1 or cc1plus with the -
fpreprocessed option appended.  -fpreprocessed indicates the source has
been preprocessed already. [1]

Note that when a GCC version other than the system standard one is
installed, additional include directory settings may be required.  MCPP
embeds these settings when MCPP itself is compiled, thus eliminating the
need to set them with environment variables.

On my Linux, the /usr/local/gcc-3.2/lib line has been added to /etc/ld.
so.conf, and the following settings have been added to ~/.bash_profile.

    export PATH=/usr/local/gcc-3.2/bin:$PATH

If possible, I want to replace the cpplib source, the preprocessing part
of cc1 or cc1plus, with MCPP.  The source files that define the internal
interface between cpplib and ccl or cc1plus, as well as the external
interface between cpplib and user programs that use it, amount to as
much as 46KB.  It is impossible to replace.  Why is the interfaces so
complex?  It is pity.

Note:
[1] MCPP gets all the necessary informations by 'configure' and sets
these scripts by 'make install'.

3.9.7.1     To Use MCPP with GCC 3.3 and 3.4-4.1

Although GCC 3.2 seemed to go in the direction of better portability,
GCC turned its direction to a different goal on 3.3 and 3.4.  V.3.3 and
3.4 differ from 3.2 in the following points.

  1. Independent preprocessor cpp0 was abolished.  The execution option
    '-no-integrated-cpp' changed its meaning, gcc invokes cc1 (cc1plus)
    instead of cpp0 as a preprocessor even if this option is specified,
    and gcc passes to the preprocessor some options which are irrelevant
    to preprocessing. (What a dirty implementation!)
  2. Many (several dozen of) macros are predefined.  The relationship
    between the system headers and GCC became more complicated.
  3. Tradcpp was also abolished and absorbed to an execution option of
    cc1.  Some old specifications, which were obsoleted or deprecated in
    V.3.2, were restored.

GCC / cc1 is becoming one huge and complex compiler absorbing
preprocessor and some system header's contents.  I doubt whether this is
a better way of compiler construction, especially of developing open
source one.

As regards MCPP, it is a nuisance that gcc arbitrarily hands to
preprocessor some irrelevant options.  Since it is risky to ignore all
the options unrecognized by MCPP, I didn't adopt this.  Although MCPP
ignores the pseudo-options such as -c or -m* which are frequently handed
from gcc, it will get an error if other unexpected options are passed on.

In order to avoid conflicts with those wrong options, MCPP changed some
options, -c to -@compat, -m to -e, and some others.

To use MCPP with GCC 3.2 or former, it is necessary only to replace
invoking of cpp0 by MCPP.  To use MCPP with GCC 3.3 or later, it is
necessary to divide invoking of cc1 to MCPP and cc1.  src/set_mcpp.sh
will write shell-scripts for this purpose in the GCC libexec directory
on MCPP installation.  The 'make install' command will also get GCC
predefined macros using -dM option and set those for MCPP. [1], [2]

In addition, GCC 3.4 changed processing of multi-byte characters.  Its
document says as:

  1. It converts every encodings of multi-byte characters to UTF-8 at
    the first phase of preprocessing.
  2. It uses libiconv functions to convert, therefore it can handle all
    the encodings iconv can do.
  3. It has '-finput-charset=<encoding>' option to specify the source
    file's encoding.  (In other words, the encoding is not converted
    without this option.)
  4. It has also '-fexec-charset=<encoding>' option to specify the
    output encoding which defaults to UTF-8.

There is a trend to identify "internationalization" with "unicodization",
especially in the Western people who do not use multi-byte characters.
It seems that this trend has reached to GCC.

What is worse, GCC 3.4 or later does not implement their specification
sufficiently.  In actual, it behaves as: [3]

  1. As for EUC-JP, GB2312 and KSC-5601, it converts these encodings to
    UTF-8 correctly with -finput-charset option, and it passes them as
    they are without this option.
  2. The -fexec-charset option has no effect on V.3.4 nor V.4.0.  On V.4.
    1, the option has effect, and works correctly for EUC-JP, GB2312 and
    KSC-5601.
  3. As for Big5, GCC can convert the encoding to UTF-8, however it
    misbehaves in converting back to Big5 with -fexec-charset option
    even on V.4.1.
  4. With ISO2022-JP, it cannot preprocess on V.3.4 or 4.0, whereas on V.
    4.1 it can preprocess the encoding and misbehaves in converting back
    with -fexec-charset option.
  5. On shift-JIS, all the versions confuse in preprocessing.

MCPP takes -e <encoding> option to specify an encoding, and the GCC-
specific-build inserts <backslash> to the byte in multi-byte character
which has the same value with <backslash>, '"' or '\'', when the
encoding is one of BIG-5, shift-JIS or ISO2022-JP, in order to
complement GCC's inability.  However, it does not convert the encoding
to UTF-8.  MCPP also treats -finput-charset as the same option as -e.  I
adopted these specifications because: [4], [5]

  1. As for shift-JIS, ISO2022-JP and Big5, the encoding converted to
    UTF-8 cannot be converted back correctly even on GCC V.4.1.  If the
    encoding is not converted and supplemented <backslash>es, the multi-
    byte characters are output as they are by cc1 on any version of GCC.
    That is to say, the multi-byte characters are treated as single-byte
    character sequences.
  2. GCC up to V.4.0 / cc1 does not convert back from UTF-8 to the
    original encodings.
  3. I hope that GCC will change the multi-byte character handling in
    the near future.

Note:
[1] MinGW does not support symbolic link.  Though the 'ln -s' command
exists, it does not link but only copy.  Moreover, MinGW's GCC rejects
to invoke a shell-script even if it is named cc1.  To cope with this,
MCPP's MinGW GCC-specific-build generates a binary executable named cc1.
exe (copied also to cc1plus.exe) which invokes mcpp.exe or GCC's cc1.exe
/cc1plus.exe.

[2] CygWIN / GCC has -mno-cygwin option which alters system include
directory and alters GCC's predefined macros.  MCPP V.2.6.1 onward,
CygWIN GCC-specific-build supports this option and generates two sets of
header files for the predefined macros.

[3] On GCC in my FreeBSD 5.3, multi-byte character conversion to UTF-8
does not work at all, though libiconv is linked.

[4] When you pass the output of MCPP to cc1, you should not specify
-fexec-charset option as well as -finput-charset option.

[5] MCPP V.2.5 did not supplement <backslash> for GCC V.3.4.  Since
GCC's behavior was not stable, I have waited to see where it settles
down.  However, it is still not stable on V.4.1, hence I reverted the
MCPP specification of V.2.6 to that of V.2.4, that is it supplements
<backslash> for GCC V.3.4 and later as well as for V.3.3 and former.

3.9.8   The Problems of Linux / stddef.h, limits.h and #include_next

On Linux, GCC installs a version specific include directory such as
/usr/lib/gcc-lib/i386-vine-linux/3.3.2/include where the Standard
headers stddef.h, limits.h and some others are located.  These headers
and GCC behavior on them have some problems.  The problems are the same
on CygWIN as on Linux.

3.9.8.1     /usr/include Lacks Standard Headers

To begin with, on Linux, five of the Standard C header files float.h,
iso646.h, stdarg.h, stdbool.h, stddef.h are located only in the GCC
version specific directory, not in /usr/include nor /usr/local/include.
The system headers on Linux seem to intend that compiler systems other
than GCC use only /usr/include and GCC uses its version specific
directory too in addition to /usr/include.  In fact, /usr/include lacks
some Standard headers, that is the problem for non-GCC compilers or
preprocessors.

If so, how about non-GCC preprocessor uses also GCC version specific
directory?  Then, on limits.h in this directory, the preprocessor
encounters #include_next which is a GCC specific directive.  If that is
the case, why doesn't the preprocessor implement #include_next?  Then,
the limits.h causes a problem, because it is not cleanly written.  What
is worse, GCC V.3.3 or later predefines practically by itself the macros
to be defined by limits.h, hence the header is useless for other
preprocessors.

Besides, as for GCC itself, it shows queer behavior with #include_next
in this header.

Although these problems are complicated to explain, I will describe them
here, because they have been neglected for years for some reason.

3.9.8.2     Queer Handling of #include_next

The include directories for GCC are typically set as:

    /usr/local/include
    /usr/lib/gcc-lib/SYSTEM/VERSION/include
    /usr/include

These are searched from upper to lower.  The second is the GCC specific
include directory.  SYSTEM is i386-vine-linux, i368-redhat-linux or such,
VERSION is 3.3.2, 3.4.3 or such.  If you install another version of GCC
into /usr/local, the /usr/lib/gcc-lib part above will become /usr/local/
lib/gcc.  In C++, some other directories are set with higher priority
than /usr/local/include.  For GCC V.3.* and 4.*, those are:

    /usr/include/c++/VERSION
    /usr/include/c++/VERSION/SYSTEM
    /usr/include/c++/VERSION/backward

The name of these directories seem GCC specific, nevertheless no other
C++ standard directories do not exist, so the other preprocessors can
use no directories but these.  For GCC 2.95, the include directory in C+
+ was:

    /usr/include/g++-3

In addition, the directories specified by -I option or by environment
variables are prepended to the list.

Let me take an example of limits.h in C on GCC V.3.3 or later focusing
on definition of LONG_MAX, in order to make the explanations below
simple.  There are two limits.h: one in /usr/include and another in the
version specific directory.

    #include <limits.h>

By this line, GCC includes /usr/lib/gcc-lib/SYSTEM/VERSION/include/
limits.h.  This header file starts as:

    #ifndef _GCC_LIMITS_H_
    #define _GCC_LIMITS_H_
    #ifndef _LIBC_LIMITS_H_
    #include "syslimits.h"
    #endif

Then, GCC includes /usr/lib/gcc-lib/SYSTEM/VERSION/include/syslimits.h
which is a short file as:

    #define _GCC_NEXT_LIMITS_H
    #include_next <limits.h>
    #undef _GCC_NEXT_LIMITS_H

Now, limits.h is included again.  Which limits.h?  Since this directive
is #include_next, it would skip the /usr/lib/gcc-lib/SYSTEM/VERSION/
include, and would search /usr/include.  GCC's cpp.info says:

    This directive works like `#include' except in searching for the
    specified file: it starts searching the list of header file
    directories _after_ the directory in which the current file was
    found.

In fact, however, GCC does not include /usr/include/limits.h, but
includes /usr/lib/gcc-lib/SYSTEM/VERSION/include/limits.h again somehow.
This time _GCC_LIMITS_H_ has been defined already, so the block
beginning with the line:

    #ifndef _GCC_LIMITS_H_

is skipped, and the next block is evaluated:

    #else
    #ifdef _GCC_NEXT_LIMITS_H
    #include_next <limits.h>
    #endif
    #endif

Again, just the same #include_next <limits.h> which were found in /usr/
lib/gcc-lib/SYSTEM/VERSION/include/syslimits.h.  Does GCC include /usr/
lib/gcc-lib/SYSTEM/VERSION/include/limits.h again as the previous time,
which is the current file, and run into infinite recursion?  No, it does
not, but it includes /usr/include/limits.h this time.  The behavior of
GCC is beyond my understanding.

In /usr/include/limits.h, <features.h> and some other headers are
included.  Also, /usr/include/limits.h has a block beginning with the
line:

    #if !defined __GNUC__ || __GNUC__ < 2

In this block, <bits/wordsize.h> is included, and the Standard required
macros are defined depending whether wordsize is 32 bit or 64 bit.  For
example, if wordsize is 32 bit, LONG_MAX is defined as:

    #define LONG_MAX     2147483647L

Of course, GCC skips this block.  Then, going to the end of this file,
it returns to /usr/lib/gcc-lib/SYSTEM/VERSION/include/limits.h.  Then,
ending this file of the second inclusion, it returns to /usr/lib/gcc-lib
/SYSTEM/VERSION/include/syslimits.h.  Then, this file ends too, and GCC
returns to the first inclusion of /usr/lib/gcc-lib/SYSTEM/VERSION/
include/limits.h.  In this file, after the above cited part, there are
definitions of the Standard required macros.  For instance, LONG_MAX is
defined as:

    #undef LONG_MAX
    #define LONG_MAX __LONG_MAX__

Then, the file ends.

    #include <limits.h>

The processing of this line has ended.  After all, LONG_MAX is defined
to __LONG_MAX__ and it is the end.  What is __LONG_MAX__?  As a matter
of fact, GCC V.3.3 or later predefines many macros including
__LONG_MAX__ which is predefined to 2147483647L for 32 bit system.  As
with the other Standard required macros, the situations are almost the
same as LONG_MAX, because they are defined using the predefined ones.
If so, what is the purpose of these complicated header files and #
include_next handling at all?

The behavior of GCC V.2.95, V.3.4, V.4.0 and V.4.1 on #include_next is
the same as V.3.3.  That is to say:

    #include_next <limits.h>

by this line in /usr/lib/gcc-lib/SYSTEM/VERSION/include/syslimits.h, GCC
includes /usr/lib/gcc-lib/SYSTEM/VERSION/include/limits.h, and by the
same line in this file:

    #include_next <limits.h>

it includes /usr/include/limits.h.  As a result, in processing the line:

    #include <limits.h>

/usr/lib/gcc-lib/SYSTEM/VERSION/include/limits.h is included twice.
This duplicate inclusion happens to produce the same result,
nevertheless it is redundant, and first of all, the behavior differs
from the specification and is not consistent.  In addition, this part of
the file is redundant if the behavior accords to the specification.

    #else
    #ifdef _GCC_NEXT_LIMITS_H
    #include_next <limits.h>
    #endif

3.9.8.3     Standard Headers not Available for Preprocessors
                    other than GCC

Now, what happens to compiler system or preprocessor other than GCC
using Linux standard headers?  stddef.h and some other Standard headers
are not found in /usr/include nor /usr/local/include.  If so, how about
non-GCC preprocessor uses also GCC version specific directory?

    #include <limits.h>

By this line, the preprocessor includes /usr/lib/gcc-lib/SYSTEM/VERSION/
include/limits.h, and from this file it includes /usr/lib/gcc-lib/SYSTEM
/VERSION/include/syslimits.h, and in this file, it sees the line:

    #include_next <limits.h>

Then, how about the preprocessor implements #include_next?  If the #
include_next is implemented as its specification, the preprocessor
searches by this line the "next" include directory /usr/include, and
includes /usr/include/limits.h.  Then, this non-GCC preprocessor
processes the block beginning with this line:

    #if !defined __GNUC__ || __GNUC__ < 2

In this block it defines LONG_MAX as:

    #define LONG_MAX     2147483647L

and defines also the other macros appropriately.  Then, it ends this
file, and returns to /usr/lib/gcc-lib/SYSTEM/VERSION/include/syslimits.h.
Then, it ends this file, and returns to /usr/lib/gcc-lib/SYSTEM/VERSION/
include/limits.h.  And it encounters these lines:

    #undef LONG_MAX
    #define LONG_MAX __LONG_MAX__

At the end of the long run, all the correct definitions are canceled,
and they become the undefined name __LONG_MAX__ or such!

Up to GCC V.3.2, the corresponding part of version specific limits.h had
the lines like:

    #define __LONG_MAX__ 2147483647L

Hence the canceled macros are redefined correctly.  Although the most
part of the processing is useless, the results were correct.  With the
header files of V.3.3 or later, a non-GCC preprocessor is taken around
here and there to get vain results.

3.9.8.4     Workarounds for the Present

The problems are summarized as below: [1], [2], [3]

  1. /usr/include lacks Standard C headers float.h, iso646.h, stdarg.h,
    stdbool.h and stddef.h which is necessary to make Linux system
    headers available to non-GCC compiler system.
  2. C++ include directories do not exist other than /usr/include/c++/
    VERSION/*.  In order to make C++ standard include directories
    independent on GCC version, /usr/include/c++ should be used instead
    of /usr/include/c++/VERSION which should be limited to headers of
    GCC proper.
  3. The behavior of GCC on #include_next differs from its specification
    and is inconsistent.
  4. It is meaningless to process the complicated limits.h headers,
    since GCC practically predefines the Standard required macros.  It
    is doubly meaningless, since /usr/lib/gcc-lib/SYSTEM/VERSION/include
    /limits.h does #undef all.  As far as Linux and CygWIN are concerned,
    there seems to be no necessity for splitting limits.h to two.  Since
    these headers in this directory are autogenarated ones by GCC
    installation, some redundancies are inevitable.  Yet, a more clean
    method is desirable.

Under these problems lies the excessively complicated system header
structure.  The extension directive #include_next enhances the
complication.  The use of this directive is very limited.  Though GCC
and glibc use it in compiling and installing of themselves, it does not
exist in the installed system headers except for limits.h.  The rare
example in limits.h causes GCC above mentioned confusion.  This presents
a question on the reason of its existence.

Anyway, the stand-alone-build of MCPP needs the following workarounds
for the present.  The stand-alone-build does not implement #include_next
nor uses GCC specific include directories.

  1. Link /usr/include/stddef.h to /usr/lib/gcc-lib/SYSTEM/VERSION/
    include/stddef.h.  In case of multiple versions of GCC have been
    installed, a link to any of them will work for mere preprocessing.
    This setting does no harm on GCC nor GCC-specific-build of MCPP, The
    same can be said about stdarg.h, though it expands macros to GCC
    built-in functions.
  2. Copy or move iso646.h and stdbool.h from one of the GCC specific
    directories to /usr/include, since these are quite simple headers
    and independent on any system.  As for limits.h, the existing /usr/
    include/limits.h is enough for non-GCC preprocessor.
  3. float.h is useless for other preprocessor, such as DBL_MAX_EXP is
    defined to __DBL_MAX_EXP__.  If required, you must write the header
    referring to the internal setting of GCC or some other source. [4]
  4. Do not set GCC specific include directory in C include directories
    list by environment variable.
  5. Set C++ include directories by environment variable CPLUS_INCLUDE
    as /usr/include/c++/VERSION:/usr/include/c++/VERSION/SYSTEM:/usr/
    include/c++/VERSION/backward.

For the GCC-specific-build of MCPP, no special setting is required,
because it has GCC specific include directories list, implements #
include_next as its specification, and in the build for GCC V.3.3 or
later predefines the macros as GCC does.

Note:
[1] I have checked the descriptions of this 3.9.8 section on Linux / GCC
2.95.3, 3.3.2, 3.4.3, 4.0.2, 4.1.1 and on CygWIN / GCC 2.95.3, 3.4.4.
As with CygWIN, the behavior on #include_next was as its specification
on GCC 2.95.3, but on 3.4.4 it changed to the same behavior as Linux.
The C++ include directories in CygWIN was /usr/include/g++-3 on 2.95.3,
whereas /usr/lib/gcc/i686-pc-cygwin/3.4.4/include/c++ and its sub-
directories on 3.4.4.

[2] On FreeBSD 5.3 and its bundled GCC 3.4.2, all the Standard C headers
are present in /usr/include, #include_next is not used in any system
headers, and GCC specific C include directory does not exist.  However,
C++ include directories are GCC version dependent as /usr/include/c++/3.
4, /usr/include/c++/3.4/backward, that is a pity.
Even on FreeBSD, an installation of another version of GCC makes GCC-
version-specific include directory.  Most of the headers in the
directory are redundant.  However, the headers in /usr/include remain
unchanged.

[3] On MinGW, though the include directories and their precedence differ
from the other systems, the behavior of GCC on #include_next is the same,
and also some Standard headers are not in the standard include directory
/mingw/include but in its version-specific-directory.

[4] It can be written as follows referring to GCC's setting:

/* float.h  */ 

#ifndef _FLOAT_H___
#define _FLOAT_H___

#define FLT_ROUNDS      1
#define FLT_RADIX       2

#define FLT_MANT_DIG    24
#define DBL_MANT_DIG    53
#define LDBL_MANT_DIG   64

#define FLT_DIG         6
#define DBL_DIG         15
#define LDBL_DIG        18

#define FLT_MIN_EXP     (-125)
#define DBL_MIN_EXP     (-1021)
#define LDBL_MIN_EXP    (-16381)

#define FLT_MIN_10_EXP  (-37)
#define DBL_MIN_10_EXP  (-307)
#define LDBL_MIN_10_EXP (-4931)

#define FLT_MAX_EXP     128
#define DBL_MAX_EXP     1024
#define LDBL_MAX_EXP    16384

#define FLT_MAX_10_EXP  38
#define DBL_MAX_10_EXP  308
#define LDBL_MAX_10_EXP 4932

#define FLT_MAX         3.40282347e+38F
#define DBL_MAX         1.7976931348623157e+308
#define LDBL_MAX        1.18973149535723176502e+4932L

#define FLT_EPSILON     1.19209290e-7F
#define DBL_EPSILON     2.2204460492503131e-16
#define LDBL_EPSILON    1.08420217248550443401e-19L

#define FLT_MIN         1.17549435e-38F
#define DBL_MIN         2.2250738585072014e-308
#define LDBL_MIN        3.36210314311209350626e-4932L

#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
#define FLT_EVAL_METHOD 2
#define DECIMAL_DIG     21
#endif /* C99 */

#endif /* _FLOAT_H___ */


3.10    Visual C++ System Header Problems

I used MCPP to preprocess some sample programs provided by Visual C++
2003.  The system headers seems to have only a few compatibility
problems shown below.  These problems are often seen in other compile
systems and do not have a serious impact on preprocessing.

  1.  Although Visual C++ scarcely implements the C99 specifications, //
    comments are often used in C source code.
  2.  Object-like macro definitions that are expanded into function-like
    macro names are sometimes found.
  3.  There is one erroneous macro definition in limits.h. (See Note 2
    in 4.1.3.1 of cpp-test.txt)

Although the Linux and glibc system headers often contain GCC local
specification based coding, Visual C++ system headers scarcely has
Visual C++ local coding.

3.10.1  Comment Generating Macro?

I found only one outrageous macro in Visual C++ Vc7/PlatformSDK/Include/
WTypes.h has the following macro definition: [1]

    #define _VARIANT_BOOL   /##/

This macro definition is used in oaidl.h and propidl.h in Vc7/
PlatformSDK/Include/ as follows:

    _VARIANT_BOOL bool;

What does this macro aim at?

This macro seems to expect _VARIANT_BOOL to be expanded into // and the
line to be commented out.  Actually, this expectation is met in Visual C
cl.exe !

In the first place, // is not a token (preprocessing-token).  Macro
definitions should be processed and expanded after source are parsed
into tokens and a comment is converted into one space.  Therefore, it is
irrational for a macro to generate comments.  When this macro is
expanded into //, the result is undefined because // is not a valid
preprocessing-token.

In order to use these header files with MCPP, comment out these macro
definitions and change many _VARIANT_BOOL occurrences as follows:

    #if !__STDC__ && (_MSC_VER <= 1000)
        _VARIANT_BOOL bool;
    #endif

If you use only Visual C 5.0 or higher, this line can be simply
commented out as follows:

    // _VARIANT_BOOL bool;

This macro is, indeed, out of question, however, it is Visual C/cl.exe,
which allows such an outrageous macro to be preprocessed as a comment,
should be blamed.  This example reveals the following serious problems
this preprocessor has:

  1.  Preprocessing is not token-based but character-based.
  2.  The macro expansion result is treated as comment, which indicates
    the translation phases are confusing.

Probably, the cl.exe preprocessor was developed based on a very old
somewhat character-based preprocessor.  It is easy to presume that the
preprocessor has been upgraded by repeating partial revision to the old
preprocessor.

There are many preprocessors which presumably have a very old program
structure.  GCC 2/cpp, shown in 3.9, is one of such preprocessors.
Repeated partial revision of such a preprocessor will only makes its
program structure more complicated.  However much such revision may be
made, there are limits to quality such preprocessor can achieve.  Unless
a old source is given up and completely rewritten, a clear and well-
structured preprocessor cannot be obtained.

At GCC 3/cpp0, a total revision was made to GCC 2; the entire source
code was rewritten.  So, GCC 3/cpp0 has become quite different from GCC
2. Although MCPP was initially developed based on the source of an old
preprocessor, DECUS cpp, the source code was totally rewritten soon.

Note:
[1] Visual C++ 2005 Express Edition does not contain Platform SDK.
However, you can download "Platform SDK for Windows 2003", and use it
with VC2005.  wtypes.h, oaidl.h, propidl.h in this PlatformSDK/Include
directory also have the same macro definition and its usage as VC2003
Platform SDK.


                  4.  Implementation-defined Behaviors

I have neither time nor space to write all the C preprocessor
specifications here.  For details on Standard C preprocessing, refer to
cpp-test.txt.  For MCPP behaviors in each mode, refer to 2.1.  This
chapter covers several preprocessor-related specifications, including
those called implementation-defined by Standards.  For more details on
MCPP implementation-defined-behaviors, see Chapter 5, "Diagnostic
Messages".


4.1     Status Value on Exit

The header file internal.H defines values returned by MCPP to a parent
process.  MCPP returns 0 on success, and errno for errno != 0 and 1 for
errno == 0 on error.  Success means that no error has occurred.


4.2     Include Directory Search Path

This section explains the order in which MCPP searches directories for
an include file when it encounters a #include directive.

1. If a #include directive argument take a form of neither "file-name"
nor <file-name>, and is a macro, the macro is expanded.  The resulting
filename must take a form of either "file-name" or <file-name>.
Otherwise, it causes an error.

2. If the resulting filename, either in form of "file-name" or <file-
name>, is a full path name, MCPP tries to open it.  If it fails, it
causes an error.

3. If the resulting filename is not a full path but takes a form of
"file-name", MCPP regards it as a filename relative from the current
directory or source file directory, and begins searching from that
directory.  The former is a directory from which MCPP was invoked and
the latter is a directory where the source file that includes the
"file-name" resides.  Depending on the specified options and compiler
systems, MCPP begins searching directories as follows:

  If -I1 is specified, search begins from current directory.
  If -I2 is specified, source file directory.
  If -I3 is specified, current first and then source file directory.

By default, the compiler-specific-builds for UNIX compiler systems, GCC
or Visual C begin searching from the source file directory.
The other compiler-specific-builds begin searching at the current
directory.  However, Borland C-specific-build for BC 4 or lower, search
begins at current directory.  For BC 5, current first and then source
file directory.
The stand-alone-build of MCPP begins search from the source file
directory.
If MCPP fails to find the desired file, it begins searching as shown in
step 4.

In case of a nested #include, if search begins at current directory, the
base directory is always the same.  If search begins at a source file
directory, the base directory changes each time a header file resides in
other directory.

4. If the resulting filename is not a full path name but takes a form of
<file-name>, MCPP searches directories in the following order.  If any
of these directories are specified as a relative path, then MCPP regards
it as a relative directory from the current one.  If MCPP fails to find
or open the desired file after searching all the directories in these
order, it causes an error.

4.1. Directory(s) specified with the -I <directory> option on MCPP
invocation.  If several directories are specified, they are searched in
the order in which specified (with the left first).
4.2. For GCC-specific-build, directories specified with the -isystem
option.  If several directories are specified, they are searched in the
order specified (from the left).
4.3. Directories specified with an environment variable.
ENV_C_INCLUDE_DIR in noconfig.H (configed.H) defines environment
variable names.  In C++, ENV_CPLUS_INCLUDE_DIR, if defined, takes
precedence over ENV_C_INCLUDE_DIR.  GCC-specific-build uses
C_INCLUDE_PATH (and also CPLUS_INCLUDE_PATH for C++) as default
environment variable.  Other MCPP uses INCLUDE (and also CPLUS_INCLUDE
for C++) as default.  If an environment variable specifies several
directories with each separated with a delimiter, they are searched in
the order in which specified.  Windows and other OSs use ";" and ":" as
delimiter, respectively.
4.4. Implementation-specific directories defined by the
CPLUS_INCLUDE_DIR? macros in noconfig.H (configed.H).
4.5. Site-specific directories defined by setsysdirs() in system.c (For
UNIX systems, /usr/local/include).
4.6. Implementation-specific directories defined by the C_INCLUDE_DIR?
macros in noconfig.H (configed.H).
4.7. System-specific directories defined by setsysdirs() in system.c
(For UNIX systems, /usr/include).

The total number of include directories above must be equal to or less
than the number specified with NINCLUDE in system.H.  With the -I-
option (-nostdinc option for GCC-specific-build and -X for Visual C-
specific-build), the directories specified in 4.4 and later are not
searched.

ANSI C Rationale says the ANSI committee intends to define a current
directory as base directory.  I think this is acceptable, in that the
base directory is always constant and that the specification is clearer.
However, some implementations, such as UNIX, seem to define a source
file directory as base one at least for #include "header".  The stand-
alone-build of MCPP also takes source file directory as base, according
to the majority.


4.3     How to Construct Header Name

This section explains how to construct a header-name pp-token and
extract a file name from it.

1. If source code contains a header file name in the string literal
format, MCPP regards it as a header-name and removes the " at the both
ends to construct a filename.  This can be applied to a string literal
resulting from macro expansion in source code.

2. If source code contains a header file name in the <filename> format,
MCPP regards it as a header-name and removes the < and > at the both
ends to construct a filename.  This can be applied to a <filename>
format sequence resulting from macro expansion.  The spaces in the macro
are retained squeezing multiple spaces into one space.

3. In any case, MCPP converts \ to /, although both of "\" and "/" can
be used as path delimiters on Windows.

4. On Windows, all the uppercased letters in file names are converted
into lowercased letters.


4.4     Evaluation of #if Expression

Evaluation of #if expression depends on the largest integer type of the
host compiler (by which MCPP was compiled) and that of the target
compiler (which uses MCPP).  Since the stand-alone-build has no target
compiler, the type depends only on the host compiler.

MCPP in Standard mode evaluates #if expression in the common largest
integer type of the host and target compiler.  Nevertheless, MCPP in pre-
Standard mode evaluates it in (signed) long.

In the compiler-systems having type "long long", if ___STDC_VERSION__ is
set to 199901L or higher using the -V199901L option, MCPP evaluates a #
if expression in "long long" or "unsigned long long", according to the
C99 specification.  Although C90 and C++98 stipulate that #if expression
is evaluated in long / unsigned long, MCPP evaluate it in long long /
unsigned long long even in C90 or C++98 mode, and issues a warning in
case of the value overflows the range of long / unsigned long. [1]

Visual C and Borland C 5.5 do not have a "long long" type, but have an
__int64 type of the same length.  So, a #if expression is evaluated as
__int64 / unsigned __int64.  (However, since LL and ULL suffixes cannot
be used in Visual C++ 2002 or earlier and Borland C 5.5, these suffixes
must not be used in coding other than #if lines.)

In addition, when you invoke with the -+ option for C++ preprocessing,
MCPP evaluates pp-tokens 'true' and 'false' in a #if expression to 1L
and 0L, respectively.

MCPP in Standard mode evaluates #if expression as follows.  For a
compiler without long long, please read "long long" and "unsigned long
long" hereinafter, until the end of 4.5, as "long" and "unsigned long",
respectively.  For pre-Standard mode read all of them as "long".

1. An integer constant token with a U suffix, including character
constants, is evaluated in unsigned long long. (Note that pre-Standard
mode does not recognize the U suffix).
2. Otherwise, a token within the range of non-negative long long is
evaluated in long long.
3. Otherwise, a token within the range of unsigned long long is
evaluated in unsigned long long.
4. Otherwise, it is diagnosed as an out of range error.
5. In a binary operation, if either operand is unsigned long long, both
are converted to unsigned long long.  Otherwise, an operation is
performed in signed long long.

Anyway, an integer constant token always has a non-negative value.  In
pre-Standard mode, an integer constant token is evaluated within the
range of non-negative long.  A token beyond that range is diagnosed as
an out of range error.  All the operations are performed within the
range of long.

If both of host and target compilers have type unsigned long long and
the range of unsigned long long of the host is narrower than that of the
target, a beyond that host range is evaluated to an out of range error.

If an operation using constant tokens produces a result out of range of
long long, an out of range error occurs.  If it produces a result out of
range of unsigned long long, a warning is issued.  This can be applied
to intermediate operation results.

Since a bitwise right shift of a negative value or a division operation
using it does not provide portability, MCPP issues a warning.  If an
operation using a mixture of unsigned and signed operands converts a
signed negative value to an unsigned positive value, a warning is also
issued.  How these values are processed depends on the specification of
the compiler-proper of the host system.

C90 and C++98 makes it a rule that a preprocessor evaluates a #if
expression in long/unsigned long (in C99, the maximum integer type is
used).  These specifications are rougher than those of compiler-propers.
A (#)if expression is often evaluated differently between preprocessor
and compiler-proper, especially when sign extension is involved.

In addition, since keywords are not used during Standard C preprocessing,
sizeof or cast cannot be used in a #if expression.  Of course, neither
variables, enumeration constants, nor floating point numbers can be used
there.  Standard mode allows the "defined" operator in a #if expression
as well as the #elif directive.  Except for these differences, MCPP
evaluates a #if expression in accordance with priority of and the
associative law among operators, just as compiler-propers do.  In a
binary operation, an arithmetic conversion often takes place to equalize
the types on both-hand sides; If one operand is unsigned long long and
the other is long long, the both are converted to unsigned long long.

The Standard C <limits.h> shows the range of the types long, unsigned
long, long long and unsigned long long, however, the MCPP source code
does not use <limits.h>.  This is because so-called Standard C
conforming compiler systems sometimes have wrong <limits.h>.

Note:
[1] MCPP up to V.2.5 evaluated #if expression in C90 and C++98 by long
long / unsigned long long internally, and issued an error on overflow of
long / unsigned long.  From V.2.6 onward, MCPP degraded the error to
warning for compatibility with GCC or Visual C.


4.5    Character Constant Evaluation in #if Expression

Constant tokens in a #if expression includes identifiers (macros and non-
macros), integer tokens and character constants.  How to evaluate
character constants is implementation-defined and lacks of portability.
Even (#)if 'const' is sometimes evaluated differently between
preprocessor and compiler-proper.  Note that Standards does not even
guarantee that (#)if 'const' is evaluated to the same.

MCPP in POSTSTD mode does not evaluate a character constant in a #if
expression, which is almost meaningless, and makes it an error.

Like other integer constant tokens, MCPP evaluates a character constant
in a #if expression within the range of long long or unsigned long long.
(In pre-Standard mode, long only.)

A multi-byte character or a wide character is generally evaluated with 2-
bytes type, except for the UTF-8 encoding, which is evaluated with 4-
bytes type.  Since UTF-8 has a variable length, MCPP evaluates it with 4-
byte type.  MCPP does not support EUC's 3 byte encoding scheme. (A 3-
byte character is recognized as 1 byte + 2 bytes.  As a consequence, its
value is evaluated correctly.)  Although there are some implementations
using the 2-byte encoding scheme that define wchar_t as 4-byte, MCPP has
no relevance to wchar_t.  The following paragraphs describe two-byte
multi-byte character encodings.

Multi-byte character constants, such as '', are evaluated to ((First
byte value << CHARBIT) + Second byte value).  CHARBIT has the value of
CHAR_BIT in <limits.h>.

Let me take an example of multi-character character constants, such as
'ab', '\x12\x3', and '\x123\x45'.  'a', 'b', '\x12', '\x3' and '\x123'
are regarded as one byte.  When a multi-character character constant is
evaluated, each one byte, starting from the highest one, is evaluated
within the range of [0, UCHARMAX] and combined by shifting it to left by
CHARBIT.  If the value of one escape sequence exceeds UCHARMAX, an out
of range error occurs.  Therefore, in the implementation with CHARBIT ==
8 and the ASCII character set, the above three tokens are evaluated to
0x6162, 0x1203 and error, respectively.

L'' is evaluated to the same value as ''.  Let me take an example of
multi-character wide character constants, such as L'ab', L'\x12\x3', and
L'\x123\x45'.  L'a', L'b', L'\x12', L'\x3', L'\x123', and L'\x45' are
regarded as one wide character.  When a multi-character wide character
constant is evaluated, each wide character, starting from the highest
one, is evaluated within the range of [0, (UCHARMAX << CHARBIT) |
UCHARMAX] and combined by shifting it to left by CHARBIT * 2.  If the
value of one escape sequence exceeds the maximum value of an unsigned 2-
byte integer, an out of range error occurs.  Therefore, in the
implementation with CHARBIT * 2 == 16 and the ASCII character set, the
above three tokens are evaluated to 0x00610062, 0x00120003, and
0x01230045, respectively.

If the values of a multi-character character constant and a multi-
character wide character constant exceed the range of unsigned long long,
an out of range error occurs.

With __STDC_VERSION__ or __cplusplus set to 199901L or higher, MCPP
evaluates a Universal Character Name (UCN) in the form of \uxxxx and
\Uxxxxxxxxa as a hex escape sequence. (I know this evaluation is
nonsense but no other way.)

If the compiler-proper of the target compiler system uses a signed char
or signed wchar_t, a character constant in a (#)if expression may be
evaluated differently between MCPP and compiler-proper.  The range that
causes a range error may also differ between them.  In addition,
evaluation of multi-character character constants and multi-byte
character constants varies even among preprocessors and among compilers.
Standard C does not define whether, with CHAR_BIT set to 8, 'ab' is
evaluated to 'a' * 256 +'b' or 'a' + 'b' * 256.

In general, character constants should not be used in an #if expression,
as long as you have an alternative method.  I think an alternative
method always exists.


4.6     #if sizeof (type)

Standard C stipulates that preprocessing is a process independent of run-
time environments or compiler-proper specifications, thus prohibiting it
from using 'sizeof' and cast in an #if expression.  However, pre-
Standard mode allows sizeof (type) in a #if expression.  This was done
as a part of my effort to add necessary modifications to DECUS cpp, such
as adding long long and long double processing, while retaining its
original functionality.  As to cast, I neither implemented nor had a
will to do so because it would require troublesome work.

A series of macros beginning with S_, such as S_CHAR, in eval.c define
the size of each type.  Under cross implementation, these macros must be
modified to specify size of the types, in integer values, used in the
target system.

I have to admit that MCPP does not provide the full functionality of #if
sizeof.  MCPP just ignores the letter of "signed" or "unsigned"
preceding char, short, int, long, and long long when it appears in a #if
sizeof.  Also MCPP support sizeof (void *).  I know this is a half-
hearted implementation but I do not want to increase the number of flags
in system.H in vain for this non-conforming function.  I initially
thought of removing the sizeof code from the original version because I
did not intend to support cast at all, but on the second thought, I
decided to make a small amount of modifications to make use of the
existing code.


4.7     How to Handle White-Space Sequence

MCPP compresses a white-space sequence, excluding <newline>, as a token
separator into one space character during tokenization in the
translation phase 3. It also deletes a white-space sequence at the end
of a line.  A white-space sequence at the beginning of a line is deleted
in POSTSTD mode, and squeezed to one space in other modes.

This compression and deletion occurs during the intermediate phase.  The
next phase 4 involves macro expansion and preprocessor directive line
processing.  Macro expansion may sometimes produce several space
characters before and after the macro.  Of course, the number of space
characters does not affect compilation results.

Standard C says that whether implementation compresses a white-space
sequence into one space character during the translation phase 3 is
implementation-defined, but you usually do not have to worry about this.
<Vertical-tab> or <form-feed> in a preprocessor directive line may
adversely affect portability, since this is undefined in Standard C.
MCPP converts it to one space character.


4.8     Default Specifications for MCPP Executables

This section describes the specifications of MCPP executables generated
when DIFfile and makefile for each compiler system in the noconfig
directory are used to compile MCPP with default settings.  When a
configure script is used to compile MCPP, the generated MCPP may differ,
depending on configure's results, however, as long as OS and compiler
system versions are same, generated MCPPs would be same except for
include directories.

The stand-alone-build of MCPP has the constant specifications regardless
of the compiler system with which MCPP was compiled, except a few OS-
dependent features and the evaluation type of #if expression which
depends on whether the host compiler has long long or not.

There are stand-alone-build and compiler-specific-build for MCPP
executables, and each executable has several behavioral modes.  For
those, refer to 2.1.  This section describes the settings centering on
STD mode.

DIFfiles and makefiles are for the following compiler systems:

    FreeBSD 5.3:        GCC V.3.4
    VineLinux 3.2-x86:  GCC V.2.95, V.3.2, V.3.3, V.3.4
    openSUSE Linux 10.0:    GCC V.4.0
    CygWIN 1.3.10 (GCC V.2.95), 1.5.18 (GCC 3.4)
    MinGW & MSYS (GCC 3.4)
    WIN32:  LCC-Win32 V.3.2, V.3.8
    WIN32:  Visual C++ 2002, 2003, 2005
    WIN32:  Borland C++ V.4.0, V.5.5

Of all the macros defined in noconfig.H and system.H, the settings of
those mentioned below are identical among every MCPP executable,
regardless of their compiler systems.

Each MCPP is compiled with DIGRAPHS_INIT == FALSE, so enables digraph
when the -2 (-digraphs) option is specified.
With TFLAG_INIT == FALSE, trigraph is enabled with the -3 (-trigraphs)
option.  With OK_UCN set to TRUE, Universal Character Name (UCN) can be
used in C99 and C++.
DOLLAR_IN_NAME is also set to FALSE, so '$' cannot be used in names.
With OK_MBIDENT set to FALSE, multi-byte-characters cannot be used in
identifiers.

With STDC set to 1, the initial value of __STDC__ is 1.

The translation limits are set as follows.

    NMACPARS (Maximum number of macro arguments)                : 255
    NEXP (Maximum number of nested levels of #if expressions)   : 256
    BLK_NEST (Maximum number of nested levels of #if section)   : 256
    RESCAN_LIMIT (Maximum number of nested levels of macro rescans) : 64
    IDMAX (Valid length of identifier)                          : 1024
    NINCLUDE (Maximum number of include directories)            : 64
    INCLUDE_NEST(Maximum number of #include nest level)         : 256
    NBUFF (Maximum length of a source line after converting a comment
        into a space and line splicing)                         : 65536
    NWORK (Maximum length of an output line)                    : 65536
    NMACWORK (Size of internal buffers used for macro expansion): 262144

This macro differs on OS regardless of build types.

    MBCHAR (Default encoding of multibyte character)
        Linux, FreeBSD              : EUC-JP
        WIN32, CygWIN, MinGW        : SJIS

The settings of the macros below are different among compiler systems.

    STDC_VERSION (Initial value of __STDC_VERSION__)
        Stand-alone, GCC 2          : 199409L
        Others                      : 0L
    HAVE_DIGRAPHS   (Is digraphs output as it is?)
        Stand-alone, GCC and Visual C    : TRUE
        Others                      : FALSE
    EXPAND_PRAGMA   (Is a #pragma line macro-expanded in C99?)
        Visual C                    : TRUE
        Others                      : FALSE

The number of nested levels of #include can exceed the limit imposed by
OS of the number of simultaneously opened files.

GCC 2.7-2.95 defines __STDC_VERSION__ to 199409L.  However, in GCC V.3.*,
V.4.*, __STDC_VERSION__ is no longer predefined by default and is now
defined in accordance with an execution option.  MCPP setting for GCC
follows these variations.

If STDC_VERSION is set to 0L, MCPP predefines __STDC_VERSION__ as 0L.
So, specifying the -V199409L option sets __STDC__ and __STDC_VERSION__
to 1 and 199409L, respectively and allows only predefined macros that
begin with '_', resulting in MCPP in the strictly C95 conforming mode.
The -V199901L option specifies C99 mode.

In C99 mode, MCPP predefines __STDC_HOSTED__ as 1.

MCPP itself predefines neither __STDC_ISO_10646__, __STDC_IEC_559__ nor
__STDC_IEC_559_COMPLEX__.  These values are compiler-system-specific.
In glibc 2 / x86, the system header defines __STDC_IEC_559__ and
__STDC_IEC_559_COMPLEX__ as 1.  Other compiler systems do not define
them.

If HAVE_DIGRAPHS is set to FALSE, digraph is output after conversion.

The argument of #pragma line beginning with STDC, MCPP or GCC is never
macro-expanded even if EXPAND_PRAGMA == TRUE.

Include directories are set as follows:

System-specific or site-specific directories under UNIX-like OSs are as
follows (common to stand-alone-build and compiler-specific-build):
        FreeBSD, Linux and CygWIN:  /usr/include, /usr/local/include

For the implementation-specific directories that vary among compiler
systems and their versions, see the DIFfiles.  The stand-alone-build
does not set implementation-specific directories.  MCPP for the compiler
systems on Windows does not preset any directory but uses the
environment variables: INCLUDE, CPLUS_INCLUDE.  These environment
variables are used by the stand-alone-build too.

If these default settings do not suit you, change settings to recompile
MCPP, or use environment variables or the -I option.

When the length of a preprocessed line exceeds NWORK-1, MCPP divides it
into several parts so that each part length becomes equal to or less
than NWORK-1.  A string literal length must be equal to or less than
NWORK-2.

Again for confirmation, the macros mentioned above are used only to
compile MCPP, and are not built-in macros in a MCPP executable.

MCPP has the following built-in macros.  The (=value) below indicates
that the macro is set to the value.  Macros without (=value) are defined
as 1.

With __STDC__ set to 1 or higher, the macros that do not begin with '_'
are deleted.  The -N (-undef) option deletes all the macros other than
__MCPP.  After -N, you can use -D to defines macro symbols over again.
When you use a different compiler system version from those specified
here, -N and -D allow you to redefine your version macro without
recompiling MCPP.  The -D option allows you to redefine a particular
macro without using -N or -U.

When you invoke MCPP without an input file and enter #pragma MCPP
put_defines, the following built-in macros are displayed:

    FreeBSD 5 / Stand-alone:    __i386__, unix, __unix, __unix__,
                __FreeBSD__ (=5), __MCPP (=2)
    FreeBSD 5 / GCC 3.4:    __i386__, unix, __unix, __unix__,
                __FreeBSD__ (=5), __MCPP (=2), __GNUC__ (=3),
                __GNUC_MINOR__ (=4), __SIZE_TYPE__ (=unsigned int),
                __PTRDIFF_TYPE__ (=int), __WCHAR_TYPE__ (=int)
    Linux / Stand-alone:    __i386__, unix, __unix, __unix__, __linux__,
                __MCPP (=2)
    Linux / GCC 2.95:   __i386__, unix, __unix, __unix__, __linux__,
                __MCPP (=2), __GNUC__ (=2), __GNUC_MINOR__ (=95),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=long int)
    Linux / GCC 3.2:    __i386__, unix, __unix, __unix__, __linux__,
                __MCPP (=2), __GNUC__ (=3), __GNUC_MINOR__ (=2),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=long int)
    Linux / GCC 3.3.2:  __i386__, unix, __unix, __unix__, __linux__,
                __MCPP (=2), __GNUC__ (=3), __GNUC_MINOR__ (=3),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=long int)
    Linux / GCC 3.4.3:  __i386__, unix, __unix, __unix__, __linux__,
                __MCPP (=2), __GNUC__ (=3), __GNUC_MINOR__ (=4),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=long int)
    Linux / GCC 4.0.2:  __i386__, unix, __unix, __unix__, __linux__,
                __MCPP (=2), __GNUC__ (=4), __GNUC_MINOR__ (=0),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=long int)
    CygWIN 1.3.10, 1.5.18   / Stand-alone:  __i386__, __CYGWIN__,
                __CYGWIN32__, __MCPP (=2)
    CygWIN 1.3.10 / GCC 2.95.3: __i386__, __CYGWIN__, __CYGWIN32__,
                __MCPP (=2), __GNUC__ (=2), __GNUC_MINOR__ (=95),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=short unsigned int)
    CygWIN 1.5.18 / GCC 3.4.4:  __i386__, __CYGWIN__, __CYGWIN32__,
                __MCPP (=2), __GNUC__ (=3), __GNUC_MINOR__ (=4),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=short unsigned int)
    MinGW / Stand-alone:    __i386__, __MINGW__, __MINGW32__,
                __MCPP (=2)
    MinGW / GCC 3.4.2:      __i386__, __MINGW__, __MINGW32__,
                __MCPP (=2), __GNUC__ (=3), __GNUC_MINOR__ (=2),
                __SIZE_TYPE__ (=unsigned int), __PTRDIFF_TYPE__ (=int),
                __WCHAR_TYPE__ (=short unsigned int)
    Win32 / Stand-alone:    __i386__, __WIN32__, _WIN32,  WIN32,
                __FLAT__, __MCPP (=2)
    LCC-Win32:  __i386__, __WIN32__, WIN32, _WIN32, __FLAT__,
                __MCPP (=2),
                __LCC__, __LCCDEBUGLEVEL (=0), __LCCOPTIMLEVEL (=0)
    Visual C++ 2003:    __i386__, __WIN32__, _WIN32,  WIN32, __FLAT__,
                __MCPP (=2), _MSC_VER (=1310),
                _MSC_FULL_VER (=13103077), _MSC_EXTENSIONS,
                _M_IX86 (=600), _INTEGRAL_MAX_BITS (=64)
    Visual C++ 2005:    __i386__, __WIN32__, _WIN32,  WIN32, __FLAT__,
                __MCPP (=2), _MSC_VER (=1400),
                _MSC_FULL_VER (=140050320), _MSC_EXTENSIONS,
                _M_IX86 (=600), _INTEGRAL_MAX_BITS (=64), _MT
    BC 5.5  :   __i386__, __WIN32__, WIN32, __FLAT__, __MCPP (=2),
                __BORLANDC__ (=0x0550), __TURBOC__ (=0x0550)
    BC 4.0  :   __i386__, __WIN32__, WIN32, __FLAT__, __MCPP (=2),
                __BORLANDC__ (=0x0452), __TURBOC__ (=0x0452)

When you use the -+ (-lang-c++) option to specify C++ preprocessing,
__cplusplus is predefined with its initial value of 1L.  In addition,
the following macros are also predefined:

    GCC V.2.*   :    __GNUG__ (=2)
    GCC V.3.*   :    __GNUG__ (=3)
    GCC V.4.*   :    __GNUG__ (=4)
    Visual C++ 2005 :   _WCHAR_T_DEFINED, _NATIVE_WCHAR_T_DEFINED
    BC 4.0:     __BCPLUSPLUS__ (=0x0320)
    BC 5.5:     __BCPLUSPLUS__ (=0x0550)

Although there are some predefined macros in GCC, those predefined by
GCC were few, until GCC V.3.2.  Most of them are passed from gcc to cpp
by the -D option.  So, it is not necessary for MCPP to define them for
compatibility.  However, MCPP predefines these macros for being used in
a stand-alone manner, such as pre-preprocessing.

GCC V.3.3 and later predefines 60 or 70 macros (suddenly).  GCC-specific-
build of MCPP V.2.5 and later for GCC V.3.3 or later also includes these
predefined macros other than the above ones.  These GCC-specific
predefined macros are written in mcpp_g*.h header files, which is
generated by installation of MCPP.

Since FreeBSD, Linux, CygWIN, MinGW / GCC and LCC-Win32 have a type long
long, an #if expression is evaluated in long long or unsigned long long.
Visual C and Borland C 5.5 do not have a "long long" type but __int64
and unsigned __int64 instead.  These types are used.  In Borland C 4.0,
which has not long long nor __int64, it is evaluated in long or unsigned
long.

In the above compiler systems with type long ranges:
    [-2147483647-1, 2147483647 ([-0x7fffffff-1, 0x7fffffff])
and
unsigned long ranges:
    [0, 4294967295 ([0, 0xffffffff]).

In the compiler systems with type long long ranges:
    [-9223372036854775807-1, 9223372036854775807
   ([-0x7fffffffffffffff-1, 0x7fffffffffffffff]),
and
unsigned long long ranges:
    [0, 18446744073709551615 ([0, 0xffffffffffffffff]).

All the compiler-propers of the above compiler systems internally
represent a signed integer as two's complement number.  So do bit
operations.  This can be applied to MCPP's #if expression.

Right shift of a negative integer is an arithmetic shift.  This can be
applied to MCPP's #if expression. (Right shifting an integer by one bit
halves the value with the sign retained)

In an integer division or modulus operation, if either or both operands
are negative values, an algebraic operation like Standard C's ldiv()
function is performed.  This can be applied to MCPP's #if expression.

These OSs use the ASCII basic character set.  So does MCPP.

There is a memory management routine, kmmalloc, that I developed.  This
routine has malloc(), free(), realloc() and other memory handling
functions.  If kmmalloc is installed in systems other than CygWIN or
Visual C 2005, kmmalloc is linked when the MALLOC=KMMALLOC (or
-DKMMALLOC=1) option is specified in make.  Also its heap memory
debugging routine is linked.  MCPP for Linux and LCC-Win32 uses EFREEP,
EFREEBLK, EALLOCBLK, EFREEWRT and ETRAILWRT with an errno of 2120, 2121,
2122, 2123 and 2124 assigned, and other MCPP uses 120, 121, 122, 123,
and 124. (Refer to 4.extra of mcpp-porting.txt.)  [1]

On the systems other than GNU and Visual C, you must preset the
environment variable TZ to JST-9. Or, the __DATE__ and  __TIME__ macros
are not set correctly.

Note:
[1] CygWIN 1.3.10 and 1.5.18 provides malloc() that has an internal
routine named _malloc_r() which is called by a few other library
functions.  So this malloc() cannot be replaced with other malloc().
Also in Visual C 2005, the program terminating routine calls an internal
routine of resident malloc(), hence other malloc() cannot be used.


                        5.  Diagnostic Messages


5.1     Diagnostic Messages Format

This section covers diagnostic messages issued by MCPP, as well as their
meaning.  By default, these messages are output to stderr.  With the -Q
option, they are redirected to the mcpp.err file in the current
directory.  A diagnostic message is output in the following manner:

1. "filename: line: " is followed by "fatal error: ", "error: " or
"warning: " and then by any of the diagnostic messages shown in sections
5.3 to 5.9.  Although the specification that a diagnostic message must
fit in one line that begins with "filename: line:" seems to lack of
flexibility, I followed because it is a traditional way of implementing
messages in C on UNIX and because various tools have already assumed
that.  Some MCPP messages do not fit in a line of usual terminal.

2. If an error occurs during macro expansion, the macro invocation is
displayed.  For nested macro invocations, MCPP traces back the nested
macros in the reverse order to display each macro names.  MCPP shows the
macro definition, as well as the source filename and line number where
the macro definition is found.

3. The source file name, the line number and the line at which an error
has occurred are displayed.  If an error has occurred in an included
file, the names, line numbers and the #include lines of all the
including files are displayed.  Usually, a logical line with comments
replaced with a space character is displayed.  The logical line is
constructed from one or more physical lines with '\\' at the line end.
If a comment spreads over several lines, several logical lines are
concatenated into one, which is displayed as the line.  In this case,
the line number of the last concatenated physical line is displayed.
Note that if an error occurs during the translation phase before
processing a comment, the line in the phase is displayed.

If the -j option is specified, MCPP outputs neither the above 2 nor 3.

Diagnostic messages are divided into three levels:

fatal error: Indicates an error is so serious that it is no longer
    meaningful to continue preprocessing.
error: Indicates there is a syntax or usage error.
warning: Indicates code lacks of portability or may contain a bug.

Warnings are further divided into five classes:

Class 1: Source code may contain a bug or at least lack portability.
Class 2: Probably, source code will present no problem in practical use,
    but is problematic in terms of Standard conformance.
Class 4: Probably, source code will present no problem in practical use,
    but is problematic in terms of portability.
Class 8: Rather surplus warnings to #if groups skipped, sub-expression
    in #if expression whose evaluation is skipped, concatenation of
    string literals, and etc.
Class 16: Warning to trigraphs and digraphs.

Warnings other than Class 1 or 2 are rather specific to MCPP.

MCPP has various types of diagnostic messages.  For example, STD mode
provides the following types of diagnostics for each level and class.

    fatal error     : 20 types
    error           : 72 types
    warning class 1 : 46 types
    warning class 2 : 14 types
    warning class 4 : 15 types
    warning class 8 : 31 types
    warning class 16:  2 types

Principally, these messages point the coding in question.  The
diagnostic messages below have a sample value embedded in a token or a
numeric value from source code.  For the messages with a macro name
embedded, a value the macro is expanded into is shown in real messages.

Depending on cases, a same message is issued as warning or error, in
which case, this manual gives the first occurrence a detailed
description.  For the subsequent occurrences, the message is only listed.

Note that under Windows, MCPP converts all path-lists and file names in
diagnostic messages into lowercased letters for normalization.


5.2     Translation Limits

Of all the errors shown below, some errors, such as a buffer overflow,
occur due to MCPP specification restrictions.  Some macros in system.H
define translation limits, such as a buffer size.  Expand the buffer
size and recompile MCPP if necessary, however, be careful not to expand
it too much.  A large buffer in a system with a limited amount of memory
may cause an "out of memory" error frequently.


5.3     Fatal Errors

A fatal error occurs and preprocessing is terminated when it is no
longer possible to continue preprocessing due to an I/O error or a
shortage of memory, or it is no longer meaningful to do so due to a
buffer overflow.  A status value of failure is returned to a parent
process.

5.3.1   MCPP's Own Bugs

Bug:
    This message has several types.  Should it be issued, it would
    indicate MCPP's own bug.  I think this message is rarely issued, but
    should it be issued, do not hesitate to let me know the situation.

5.3.2   Physical Errors

File read error
    An error has occurred during reading a source file.  Disk may have
    been damaged.

File write error
    An error has occurred during writing to a file.  Disk may have been
    damaged or full.

Out of memory (required size is 0x123 bytes)
    Runs short of memory.  MCPP tried to obtain memory of 0x123 bytes
    from the heap, but in vain.  This error occurs when there are too
    many long macro definitions on a system with a small amount of
    memory.  Divide your source file to decrease the number of macro
    definitions in one translation unit.

5.3.3   Translation Limits and Internal Buffer Errors

Too long header name "long-file-name"
    The length of the full path name of a file to include (file name
    concatenated with the specified directory path) has exceeded
    FILENAMEMAX.

Too long source line
    The length of a physical line in source file has exceeded NBUFF-2.
    The source code may not be written in C.

Too long logical line
    The length of a logical line, which is constructed from the several
    physical lines with \ at the line end, has exceeded NBUFF-2.  This
    error may occur when a defined macro is too long.  The code should
    be written not as a macro but as a function.

Too long line spliced by comments
    The length of a preprocessed line with a comment replaced with a
    space character has exceeded NBUFF-2.  This error occurs when
    several lines are concatenated into one if a comment spreads over
    several lines.  Divide the comment into several parts and write each
    on a separate line.

Too long output line
    The length of a preprocessed line has exceeded NWORK-2.  Several
    long macro calls may be contained in a line.  Divide the line.

Too long token
    A preprocessed line has a token with a length more than NWORK-2.
    MCPP compiled with NWORK < NMACWORK tries to divide the preprocessed
    line into a length that the compiler-proper can accept.  However, if
    a line contains a too long token, it sometimes fails to do so.

The following four errors may also be caused by a buffer overflow at a
token that is not so particularly long during macro expansion, in which
case, you must divide the macro invocation.

Too long quotation "long-string"
    A string literal, character constant or header-name is too long.  In
    case of a string literal, divide it.  Standard conforming compiler
    concatenate adjacent string literals for you.

Too long pp-number token "1234567890toolong"
    A preprocessing-number token is too long.  This error is issued in
    Standard mode.

Too long number token "12345678901234......"
    A number token is too long.  This error is issued in pre-Standard
    mode.

Buffer overflow scanning token "token"
    A buffer overflow has occurred during token scan.  This message is
    issued to tokens other than string literals, character constants,
    header-names and pp-numbers.

More than BLK_NEST nesting of #if (#ifdef) sections
    The depth of nested #if, #ifdef, and #ifndef has exceeded BLK_NEST.
    (In real message, the macro name BLK_NEST is replaced with an actual
    numerical value. This is applied to all the messages below with a
    macro name embedded.)  Divide the #if section.

More than INCLUDE_NEST nesting of #include
    The depth of nested #included has exceeded INCLUDE_NEST.  Probably
    the #includes are in infinite recursion.

Too many include directories "dir"
    The number of include directories specified has exceeded NINCLUDE.

Too many include files
    The number of #included header files preprocessed in one source file
    has exceeded NINCLUDE*4.  Duplicated included header files are
    counted as one.

5.3.4   #pragma MCPP preprocessed Related Errors

This is not the preprocessed source
    Although the "#pragma MCPP preprocessed" directive is found, this is
    not a source preprocessed by MCPP.

This preprocessed file is corrupted
    This seems to be a source preprocessed by MCPP, but cannot be used
    because it is destroyed.


5.4     Errors

MCPP issues an error message when it found a grammatical error.
Standard C stipulates that a compiler system should issue a diagnostic
message when they encounter a violation of syntax rules or constraints.
Principally, Standard mode issues an error message to this type of
violation, but sometimes issues a warning.

MCPP issues an error message or warning to most of undefined items in
Standard C.  However, MCPP issues neither an error nor a warning to the
following undefined items:

1. ' or /* in a header name in the form of a string literal: MCPP
regards them as characters, resulting in a file open error. (' or /* in
a header name enclosed with < and > is regarded as the beginning of a
character constant or a comment, resulting in some errors.)  Although
how to treat \ in a header name is undefined in Standard C, MCPP does
not check it because it may eventually cause an error when MCPP actually
tries to open the file.  MCPP on Windows issues a class 2 warning to \
and converts it to /.

2. #undef defined: Although #undef-ing a name "defined" yields an
undefined result, MCPP does not issue a message because, in the first
place, MCPP does not allow definition of a macro name "defined", so it
does not think of revoking the definition.

3. Illegal multi-byte character sequence in a comment: Although how to
deal with such character sequence is undefined in Standard C, MCPP does
not issue a message because it does no harm. (MCPP issues a warning to
an illegal multi-byte character sequence in string literals, character
constants and header names.)

4. Identifiers that begin with _ (Reserved for compiler systems):
Although using these identifiers in a user program will cause an
undefined result, MCPP does not check it because MCPP does not always
have a means to decide whether these identifiers are used in a user
program or the compiler-system.

5. __STDC_ISO_10646__, __STDC_IEC_559__, and __STDC_IEC_559_COMPLEX__:
Although #defining or #undef-ing these optional C99 predefined macros
yields an undefined result, MCPP does not check it because MCPP does not
always have a means to determine whether these macros appear in a user
program or the compiler-system. (These macros are most likely to be
defined in a header file of a compiler system.)

6. UCN equivalent sequence: Although it is undefined in C99 how to deal
with a UCN equivalent sequence generated by deleting <backslash><newline>
during the translation phase 2 or by concatenating string literals, MCPP
does not issue a message and regards it as a UCN.

For details on what is a violation of syntax rule or constraint,
undefined, unspecified or implementation-defined in Standard C
preprocessing, refer to cpp_test.txt.

Even if an error occurs, MCPP continues preprocessing as long as they
are not fatal one.  MCPP shows the number of errors and returns the
status of failure to the parent process when it exits.

5.4.1   Character and Token Related Errors

Illegal control character 0x1b, skipped the character
    A control code other than a white space character is found in a
    string literal, character constant, header name or comment.  MCPP
    skips it and continues preprocessing.

The following five messages are all token-related errors.  For the first
four, MCPP skips the line in question and continues preprocessing.  The
first three are string literal or other token-related errors, indicating
that a closing quotation mark is not found by the end of the logical
line.  This type of error occurs when you write a text that does not
take a form of a preprocessing-token sequence in neither a string
literal nor comment, as shown below:

    #error I can't understand.

As processing-tokens are not so strictly defined as C tokens in the
compiler-proper, most character sequences are regarded as token
sequences, as long as they belong to a source character set.  Therefore,
it is only this type of coding that causes a preprocessing-token error.
Pp-token errors may occur in a skipped #if group.

Unterminated string literal "string
    A string literal is unterminated.  A string literal cannot spread
    over several logical lines.  If necessary, write a string literal on
    each of several lines and have the compiler concatenate them.  This
    error may occur during conversion into a string by a #operator, in
    which case the line in question is not skipped.  MCPP in OLDPREP
    mode does not make an unterminated string literal an error. (Instead,
    it regards the line end as literal end.)  Nor MCPP does when invoked
    with the -a (-lang-asm, -x assembler-with-cpp) option (it issues a
    warning); it regards an unterminated string literal as a literal
    spreading over several lines and concatenates a line with the next
    by inserting \n.

Unterminated character constant 't understand.
    A character constant is not terminated.  MCPP in OLDPREP mode does
    not make it an error. (Instead, it regards the line end as literal
    end.)

Unterminated header name <header.h
    A header-name is not terminated.  " or ' in a header-name enclosed
    with < and > causes the above two errors, not this one.  If /* is
    found in a header-name enclosed with < and >, MCPP regards it and
    the following text as a comment.

Empty character constant ''
    A character constant is empty.

Illegal UCN sequence
    MCPP in STD mode invoked with __STDC_VERSION__ set to 199901L or in
    C++ mode can recognizes UCN.  This message is issued when the number
    of orders of a hex sequence that begins with \u and \U in an
    identifier is less than four and eight, respectively.  (If this
    occurs in a character constant in a #if expression, an undefined
    escape sequence warning results.  Other tokens are not checked by
    MCPP.)

UCN cannot specify the value "0000007f"
    UCN cannot specify a hex value in the ranges of 0 to 9f, except for
    0x24 ($), 0x40 (@) and 0x60 (`), and of d800 to dfff.  The former
    range agrees with the range of the basic source character set.  The
    latter range falls into the reserved area for special characters.
    Note C++ does not have the latter restriction. (Specifications
    slightly differ among Standards for an unknown reason.)  However,
    when MCPP in STD mode is invoked as C++ with -V199901L to preset the
    __cplusplus macro to 199901L or higher, MCPP behaves in accordance
    with the C99 specifications in this respect.

Illegal multi-byte character sequence "XY"
    MCPP in STD mode compiled with OK_MBIDENT == TRUE allows for a multi-
    byte character in an identifier in C99, however, it will cause an
    error when it finds a character sequence that cannot be regarded as
    a multi-byte character.

5.4.2   Unterminated Source File Related Errors

This section covers messages issued when a source file ends with an
unterminated #if section or macro invocation.  If the file (not included
file) marks the end of input, the message "End of input", not
"End of file", is issued.

These diagnostic messages are issued as an error or warning, depending
on MCPP modes.

Standard mode issues these messages as error, in which case MCPP skips
the macro invocation in question and restores relationship between
paired directives in a #if section to that of when the file is initially
included.

On the other hand, pre-Standard mode issues warnings.  OLDPREP mode does
not even issue warning except on unterminated macro call.

End of file within #if (#ifdef) section started at line 123
    #if (#ifdef or #ifndef) on the line 123 does not have a
    corresponding #endif.

End of file within macro invocation started at line 123
    A macro invocation that begins at the line 123 is not terminated by
    the end of the file.  This error may occur when an argument has an
    ill-balanced parenthesis, or when a token error occurs between
    opening and closing parentheses, in which case, MCPP continues to
    read tokens for a corresponding parenthesis until it reaches to the
    file end. (Probably, a buffer overflow may occur before reaching
    there.)  In addition, since macro expansion specifications vary
    among modes, a macro that is successfully expanded in a mode may not
    in other modes.

5.4.3   Ill-Balanced Preprocessing Group Related Errors

This section covers errors caused by ill balanced directives of #if, #
else and etc.  Even if MCPP finds ill balance among these directives, it
continues processing, assuming that the processing group so far still
continues.  MCPP checks to see if directives are balanced even in a
skipped #if group.

The #if (#ifdef) section is a block between #if (#ifdef or #ifndef) and
#endif.  The #if (#elif, #else) group is a smaller block, say, between #
if (#ifdef or #ifndef) and #elif, between #elif and #else, or between #
else and #endif within the #if (#ifdef) section.

Already seen #else at line 123
    Another #else (#elif) is found after #else at the line 123.  #endif
    may be missing.

Not in a #if (#ifdef) section
    #else (#elif, #endif) is found without #if (#ifdef or #ifndef).

Not in a #if (#ifdef) section in a source file
    An included file has #else (#elif or #endif) without #if (#ifdef or
    #ifndef).  If the included file in question had been in the
    including source file, this error would never have occurred.  In
    other words, each of these directives contained in a separate file
    is not balanced by itself.  The only Standard mode issues this error.
    (pre-Standard mode issues a warning.)

The following two errors occur when #asm and #endasm are not balanced.
These messages are issued only by compiler-specific-build for a
particular compiler system and in pre-Standard mode.

In #asm block started at line 123
    A #asm block that begins at the line 123 has another #asm.  #asm
    cannot be nested.  Maybe, the programmer forgot to write #endasm.

Without #asm
    #endasm is found in a non #asm block.

5.4.4   Simple Syntax Errors on Directive Lines

This section covers simple syntax errors on directive lines that begin
with #.  The errors hereinafter discussed until 5.4.12 do not occur
within a skipped #if group. (MCPP invoked with the -W8 option issues a
warning to an unknown directive.)

When MCPP finds a directive line with a syntax error, it ignores the
line and continues processing, in which case, it neither regards #if as
the beginning of a section nor changes line numbers even with a #line.
If a #include or #line line has a macro argument, Standard mode expands
the macro and checks the syntax.  Pre-Standard mode does not expand the
macro.

Although the messages below do not show the directive name in question,
the source line that follows the message show it. (A control line with a
comment converted in a space character always becomes one line, which is
called "preprocessed line" here.)

Illegal #directive "123"
    A token that immediately follows # is not a name.  The token must be
    a directive name. ('oldprep' mode regards #123 as #line 123.)

Unknown #directive "pseudo-directive"
    The directive "pseudo-directive" is not implemented.  MCPP invoked
    with the -a (-lang-asm or -x assembler-with-cpp) option issues a
    warning, not an error.

No argument
    #if, #elif, #ifdef, #ifndef, #assert or #line has no arguments.

No header name
    A #include line does not have an argument, or expansion of a macro
    argument of a #include line results in no token.

Not a header name "UNDEFINED_MACRO"
    The specified argument is not a header name.  This message is issued
    when a macro that should define a header name is not defined.  A
    header name must be enclosed with < and >, or ", ".

Not an identifier "123"
    #ifdef, #ifndef, #define or #undef requires an identifier as an
    argument, but 123 is not an identifier.

No identifier
    #define or #undef does not have an argument.

No line number
    #line has a macro argument, but its expansion has resulted in no
    token.

Not a line number "name"
    The first argument of a #line is not a numeric token (preprocessing
    number).

Line number "0x123" isn't a decimal digits sequence
    The first argument of a #line must be a decimal integer.  Standard
    mode issues this message.  In pre-Standard mode, hex and octal
    integer tokens are allowed although a warning is issued.

Line number "2147483648" is out of range of [1,2147483647]
    The first argument of a #line must be within the range of 1 to
    2147483647.  0 is regarded as an error.  This is applied to Standard
    mode.  With __STDC_VERSION__ < 199901L or __cplusplus < 199901L, the
    valid range will be 1 to 32767, but the range between 32768 and
    2147483647 is not regarded as error and a warning is issued.

Not a file name "name"
    The second argument of a #line, if any, must be a string literal.
    An identifier or wide string literal is not allowed here.

The following error occurs only in Standard mode and this directive is
ignored.  OLDPREP mode issues neither an error nor a warning.  KR mode
issues a warning and continues preprocessing as if there had been no
"junk" text.

Excessive token sequence "junk"
    #else, #endif, #asm, or #endasm line has a junk text, or such text
    follows a valid argument of #ifdef, #ifndef, #include, #line or #
    undef line.

5.4.5   Syntax Related Errors in #if Expressions

This section covers syntax-related errors in #if, #elif and #assert
directives.  If a #if (#elif) line has these errors, MCPP evaluates it
to false, skips the #if (#elif) group, and continues processing.

For a skipped #if (#ifdef, #ifndef, #elif or #else) group, MCPP checks
validity of C preprocessing tokens and balance of these directives, but
not other grammatical errors.

A #if line has a sub-expression whose evaluation is skipped.  For
example, in case of #if a || b, if "a" is evaluated to true, "b" is not
evaluated at all.  However, the following 14 types of syntax errors or
translation limit errors are checked, even if they are located in a sub-
expression whose evaluation is skipped.

More than NEXP*2-1 constants stacked at "12"
    The number of constants in the stack has exceeded NEXP*2-1 when MCPP
    tried to evaluate "12" in a #if expression.  The depth of nested #if
    expressions is too deep.

More than NEXP*3-1 operators and parens stacked at "+"
    The total number of operators and parenthesis in the stack has
    exceeded NEXP*3-1 when MCPP tried to evaluate '+' in a #if
    expression. (A pair of parentheses is counted as two.)  The depth of
    nested #if expressions is too deep.

Misplaced constant "12"
    A #if expression has a constant '12' where no constant should be
    found.  This error occurs when casting, such as (int)0x8000, is used
    in a #if expression, where casting is not allowed.  In this case,
    (int)0x8000 is evaluated to (0)0x8000, causing this error.  The int
    is regarded as an identifier that is not defined as macro and is
    evaluated to 0.

Operator ">" in incorrect context
    A #if expression has a > operator where no > should be found.  If a
    macro MACRO is defined as 0 token, #if MACRO > 0 will be expanded to
    #if > 0, causing this error, which is indicated by the preceding
    warning -- Macro "MACRO" is expanded to 0 token.

Unterminated expression
    A #if expression is not terminated.  This error is caused by, for
    example, #if a || MACRO with MACRO defined as 0 token.

Excessive ")"
    A #if expression has a ")" that does not corresponds to "(".

Missing ")"
    A #if expression does not have a ")" that corresponds to "(".

Misplaced ":", previous operator is "+"
    : without a corresponding ?.

Bad defined syntax
    A #if defined has a syntax error.  This error is caused by an
    unbalanced parenthesis or missing identifier in an argument.  When a
    macro expansion causes this error, MCPP displays this message
    followed by an expansion result.

Can't use a string literal "string"
    A string literal is not allowed as a constant in a #if expression.

Can't use a character constant 'a'
    In 'poststd' mode, a character constant, or a wide character
    constant is not allowed as a constant in a #if expression.

Can't use the operator "++"
    A #if expression has an illegal operator, such as = or ++.

Not an integer "1.23"
    Only integers, including character constants, are allowed as a
    constant in a #if expression.

Can't use the character 0x24
    A #if expression contains an illegal character (code 0x24), which is
    not any of the preprocessing tokens: identifiers, operators,
    punctuators, string literals, character constants, and preprocessing
    numbers. (Control codes are excluded since they had been checked
    before.)  To avoid this error, compiler-specific-build for compiler
    systems that allows $ as an identifier must be compiled with
    OK_DOLLAR == TRUE.  Of course, this is not checked in a skipped
    group.

The following error messages are relevant to #if sizeof.  Only pre-
Standard mode issues this error.

sizeof: Syntax error
    A #if sizeof has a syntax error.  This error is caused by an
    unbalanced parenthesis or missing arguments.

sizeof: No type specified
    Like sizeof(*), the "type" of #if sizeof (type) is not specified.
    Note that sizeof ((*)()) is a valid syntax to determine the size of
    a pointer to a function.

5.4.6   #if Expression Evaluation Errors

The following errors do not occur in a sub-expression whose evaluation
is skipped. (MCPP invoked with the -W8 option issues a warning.)

The Standards say that #if expression is evaluated by the largest
integer type in C99 and long / unsigned long in C90 and in C++98.  MCPP
evaluate it by long long / unsigned long long even if in C90 or C++98,
and issues a warning on the value outside of long / unsigned long in C90
and in C++98.  In this subsection, please read the following long long /
unsigned long long as long / unsigned long for the compiler without long
long, and as long in pre-Standard mode.  In POSTSTD mode, character
constant in #if expression is not available and causes a different error.

Constant "123456789012345678901" is out of range
    An integer constant has a value that exceeded the range of unsigned
    long long.

Integer character constant 'abcdefghi' is out of range
    A character constant 'abcdefghi' has a value that exceeded the range
    of unsigned long long.

Wide character constant L'abcde' is out of range
    A wide character constant L'abcde' has a value that exceeded the
    range of unsigned long long.  This error occurs only in STD mode.

CHARBIT bits can't represent escape sequence 'x123'
    An escape sequence in a character constant has exceeded the range of
    CHARBIT bits ([0, UCHARMAX]).

CHARBIT*2 bits can't represent escape sequence L'x12345'
    An escape sequence in a wide character constant has exceeded the
    range of CHARBIT*2 bits (CHARBIT*4 bits for UTF-8).  This error
    occurs only in STD mode.

Division by zero
    A #if expression contains a division by zero.  A division can be
    expressed using / or %.  This error may be caused by a #if dividend/
    divisor with the divisor not defined as a macro.  To avoid this
    error, "#if defined divisor && (dividend/divisor ..)" is recommended.

Result of "op" is out of range
    An operation result using the operator op is out of range of
    (unsigned) long long.  Op is any of binary operators: *, /, %, +,
    and -.  When two's complement representation is used, the unary
    operator '-' will cause an overflow with -LLONG_MIN.  Unsigned long
    long will never cause an overflow, so it does not cause this error.
    If the result of an algebraic calculation is out of range, a warning
    is issued.

The following errors are relevant to sizeof.  They are not issued in a
sub-expression whose evaluation is skipped (The -W8 option issues a
warning).  Only in pre-Standard mode.

sizeof: Unknown type "type"
    The "type" of #if sizeof (type) is unknown.

sizeof: Illegal type combination with "type"
    A type combination, like #if sizeof (long float), is invalid.

5.4.7   #define Related Errors

This section covers #define related errors.  A macro will not be defined
if an error occurs at #define.  The # and ## operator related errors
occurs in Standard mode.  __VA_ARGS__ related errors also occur in
Standard mode.  Although variable argument macro is a C99 specification,
MCPP allows these macros to be used in C90 and C++ modes for
compatibility with GCC and Visual C++2005. (A warning is issued.)

"defined" shouldn't be defined
    A macro name "defined" cannot be defined.  Standard mode checks this.

"__STDC__" shouldn't be redefined
    The __STDC__ macro cannot be #defined.  The same can be said with
    __STDC_VERSION__, __FILE__, __LINE__, __DATE__ and __TIME__
    (__STDC_HOSTED__ in C99 mode, and __cplusplus when MCPP is invoked
    with -+ option).  Standard mode checks these macros.

"__VA_ARGS__" shouldn't be defined
    C99 allows a variable argument macro with the __VA_ARGS__ parameter
    in the replacement list, but this identifier cannot be defined as a
    macro.

More than NMACPARS parameters
    The number of parameters of a macro definition has exceeded NMACPARS.

Empty parameter
    A macro definition has an empty parameter.

Illegal parameter "123"
    A token other than an identifier is used in a parameter of a macro
    definition.  In Standard mode, even an identifier __VA_ARGS__ cannot
    be used.

Duplicate parameter name "a"
    A macro definition has a duplicate parameter name "a".

Missing "," or ")" in parameter list "(a,b"
    A macro definition does not have a parenthesis ")" that closes a
    parameter list.  Or, a parameter is followed by neither ',' nor ')'.

No token before ##
    No token precedes the ## operator in the replacement list of a macro
    definition.

No token after ##
    No token follows the ## operator in the replacement list of a macro
    definition.

## after ##
    The replacement list of a macro definition has a token sequence of
    "## ##".  Some may do not regard this as error, but since
    concatenation of ## with other token always generates an invalid
    token, when this happens in macro expansion, it always causes an
    error.  MCPP makes it an error when it finds this in a macro
    definition.

Not a formal parameter "id"
    A function-like macro definition has a # operator whose operand id
    is not a parameter name.

"..." isn't the last parameter
    "..." must be the last parameter of a macro definition.  In pre-
    Standard mode, "..." causes an illegal parameter error.

"__VA_ARGS__" without corresponding "..."
    "__VA_ARGS__", an identifier in a replacement list, can be used only
    when it has a corresponding "..." parameter.

5.4.8   #undef Related Errors

This section covers #undef related errors.

"__STDC__" shouldn't be undefined
    The __STDC__ macro cannot be #undefined. The same can be said with
    __STDC_VERSION__, __FILE__, __LINE__, __DATE__ and __TIME__
    (__STDC_HOSTED__ in C99 mode, and __cplusplus when invoked with -+
    option).  Standard mode checks these macros.

5.4.9   Macro Expansion Errors

This section covers macro expansion errors.  MCPP displays a macro
definition, as well as the source filename and line number where it is
found.  The errors related to # or ## operator can occur only in
Standard mode.

Less than necessary N argument (s) in macro call "macro( a)"
    A macro invocation has an insufficient number of arguments.  This
    macro requires N number of arguments.  MCPP assigns a zero token to
    missing arguments and continues to process.  MCPP does not regard a
    macro that takes only one parameter with zero number of arguments
    specified as error because it cannot distinguish an empty argument
    from a missing argument.  OLDPREP mode issues a warning instead of
    an error on this case.

More than necessary N argument (s) in macro call "macro( a, b, c)"
    A macro invocation has too many arguments.  The macro should take N
    number of arguments.  MCPP ignores surplus arguments and continues
    processing.  In OLDPREP mode, a warning is issued instead of an
    error.

Not a valid preprocessing token "+12"
    The ## operator has concatenated two pp-tokens, resulting in an
    invalid token "+12".  The token may be separated at a later time.
    Standard mode continues processing.  COMPAT mode issues a warning
    instead of an error.  STD mode invoked with the -lang-asm (-x
    assembler-with-cpp, -a) option also issues a warning.

Not a valid string literal "\\"str\""
    When a # operator tried to convert macro invocation's argument into
    a string, a token sequence of "\\"str"" has resulted, instead of a
    single valid string literal.  \ that precedes or follows the literal
    cause the error. (When Standard mode tries to convert such an
    argument into a string, it may or may not cause an unterminated
    string literal error.)  Standard mode tries to continue processing
    but maybe an error occurs again in the compilation phase.  This
    error can not occur in POSTSTD mode. (An unterminated string literal
    error may occur).

When the following errors occur, the macro invocation will be skipped.

Buffer overflow expanding macro "macro" at "something"
    A buffer overflow has occurred at "something" during macro expansion.
    Divide the macro.

Unterminated macro call "macro( a, (b, c)"
    A macro invocation is not terminated.  This error usually occurs
    when a macro invocation on the control line is not terminated at
    that line.  In Standard mode, a macro in an argument is expanded
    before argument substitution, in which case, the macro invocation
    must be terminated in the argument.  In POSTSTD mode, a macro
    invocation unterminated in a replacement list also causes this error.

Rescanning macro "macro" more than RESCAN_LIMIT times at "something"
    The depth of nested macros is so deep that the number of rescans has
    exceeded RESCAN_LIMIT at "something" during expansion.  This error
    occurs only in Standard mode but it is quite rare.

Recursive macro definition of "macro" to "macro"
    A macro definition is recursive.  This error occurs only in pre-
    Standard mode.  When the number of rescans has exceeded RESCAN_LIMIT,
    MCPP regards it as a recursive macro definition.

5.4.10  #error and #assert

#error
    A #error directive has been executed.  Following this message, the #
    error line is displayed.  If an argument itself contains a token
    error, such as unterminated strings, #error is not executed.  The
    only Standard mode has #error.

Preprocessing assertion failed:
    A #assert directive has been executed.  Following this message, #
    assert line arguments are displayed.  If any of the arguments
    contains an error, MCPP regards assertion has failed.  The only MCPP
    of other than GCC-specific-build in pre-Standard mode has #assert.

5.4.11  Failure of #include

Can't open include file "file-name"
    This error occurs when a file to include does not exist.  Probably,
    this is due to wrong spelling of the file name or an "include
    directory" should have been specified.

5.4.12  Other Errors

Operand of _Pragma() is not a string literal
    The _Pragma() operator must take an argument of one string literal
    or wide string literal.  This is checked when MCPP invoked with the
    -V199901L option.  The same thing can be said when MCPP is invoked
    with the -V199901L option in C++ mode.


5.5     Warnings (Class 1)

A warning is issued when source, although syntactically correct,
possibly contains some coding mistakes or has a portability problem.
Warnings are divided into five classes: 1, 2, 4, 8, and 16.  These
classes are enabled when the -W <n> option is specified on MCPP
invocation.  <n> specifies a ORed value of any of 1, 2, 4, 8, and 16.
Class 4, for example, can be specified explicitly with -W4, and
implicitly with -W<n>, where <n> is 1|4, 1|2|4, 2|4, 1|4|8, 4|8, 4|16,
etc., because the AND-ed value of <n> and 4 is 4 (true).

Standard mode issues an error message to most of the source code that
causes a Standard C undefined behavior, but a warning to some.

Likewise, Standard mode always issues a warning to the source code which
uses Standard C unspecified specifications, except for the following:

1. Evaluation order of sub-expressions in a #if expression: Although the
evaluation order of the operands of operators other than ||, &&, ? , and
: is unspecified in Standard C, MCPP does not issue a warning because #
if expression does not cause side-effects and therefore the evaluation
order does not affect results.  MCPP always evaluates integer constant
tokens from left to right in the order they appear and performs an
operation using the tokens in accordance with an operator grouping rule
when their values are needed.

Standard mode issues a warning to many implementation-defined behaviors,
except for the following:

1. Directories the #include directive searches for a file to include and
how to construct a header-name pp-token from #include's argument.: MCPP
does not issue a warning because there will be too many warnings if it
actually does.  Unless a header name is a macro, the source token
sequence, including spaces, are used as it is.  If it is a macro, the
expanded result, including spaces, is used.  (In POSTSTD mode, a space
character is inserted between pp-tokens during macro expansion.  A
header-name is constructed by concatenating the resulting pp-tokens from
< to > and then by removing space characters.  In POSTSTD, a header-name
enclosed with < and > is obsolete.)  When MCPP encounters
'#pragma MCPP debug path' or '#debug path', it displays a search path,
instead of issuing a warning.

2. Evaluation of a single byte character constant, such as 'a', and of a
wide character constant that consists of only one multi-byte character,
such as L'', in a #if expression.: MCPP does not issue a warning
because even with the same basic character set used, there are an
unlimited number of factors that limits the portability, such as single
byte Katakana, presence or absence of a sign, encoding scheme of Kanji,
and etc.  The same thing can be said with UCN.

3. Bit operations using negative numbers in a #if expression: Although
bit operation results depend on internal representation of an integer on
a machine, most of the machines use two's complement representation,
thus causing no portability problem.  However, MCPP issues a warning to
a right bit shift operation of a negative value and a division operation
involving either or both of negative operands because they lack of
portability.

4. A sequence of several white space characters as a token separator:
Standard C states that it is implementation-defined whether a sequence
of white space characters is replaced by one space character in the
translation phase 3, but you do not have to worry about this.
Portability becomes an issue only when a preprocessing directive line
has <vertical-tab> or <form-feed>.  MCPP converts it into one space
character and issues a warning.  For a sequence of several space
characters and tabs, MCPP compresses it into one space character without
a warning.

5. Compiler system's own built-in macros will not cause warning.

6. #pragma sub-directive: Principally, MCPP does not issue a warning to
#pragma sub-directive, however, for #pragma once, #pragma __setlocale, #
pragma MCPP * which MCPP itself processes, it issues a warning if they
have an invalid argument.  In addition, MCPP issues a warning to GCC V.3
's #pragmas, such as #pragma GCC poison (dependency, system_header),
that resident preprocessor processes but MCPP does not.

7. Doubled \: Although it is implementation-defined in C99 whether a
single \ is changed into double \ (\\) when the # operator converts a
UCN sequence into a string, MCPP does not issue a warning to this.  MCPP
does not double \.

As you see, MCPP can perform almost all the portability checks necessary
at a preprocessing level.

POSTSTD mode is identical with STD mode except for some specification
differences described in section 2.1.

Regardless of the number of warnings, MCPP always returns a status of
success.  MCPP invoked with the -W0 option does not issue a warning.

5.5.1   Character, Token and Comment Related Warnings

Converted [CR+LF] to [LF]
    Converted the newline code from [CR+LF] to [LF].  This warning is
    issued when the source files for Windows are compiled on UNIX-like
    systems.  This warning is issued only once, on warning level of
    class 1 in stand-alone-build and class 2 in compiler-specific-build.

Illegal control character 0x1b in quotation
    A string literal, character constant or header name has a control
    code other than a white space character, which may cause an error in
    the compiler-proper.  This way of coding is not desirable.  A
    control code in string literals or character constants should be
    written using an escape sequence.

Illegal multi-byte character sequence "XY" in quotation
    The first byte (X) of "XY" in a string literal, character constant,
    or header name is the first byte of a multi-byte character (Kanji),
    while the second byte (Y) is not the second byte of the character.
    "XY" may be displayed garbled.  MCPP does not regard "XY" as a
    single multi-byte character.  It treats the first byte as a single-
    byte character and the second byte as the next character.

    MCPP does not issue a warning to a character in an external
    character set, as long as it is within the proper range.  Even
    within the proper range, there are some holes (no corresponding
    characters).  MCPP does not check whether such a character is
    defined or not.  The following table shows the range of each multi-
    byte character set:

        Encoding        first byte              second byte
        shift-JIS       0x81-0x9f, 0xe0-0xfc    0x40-0x7e, 0x80-0xfc
        EUC-JP          0x8e, 0xa1-0xfe         0xa1-0xfe
        KS C 5601       0xa1-0xfe               0xa1-0xfe
        GB 2312-80      0xa1-0xfe               0xa1-0xfe
        Big Five        0xa1-0xfe               0x40-0x7e, 0xa1-0xfe
        ISO-2022-JP     0x21-0x7e               0x21-0x7e

    Beside character codes, ISO-2022-JP has a shift sequence.  Apart
    from the shift sequence, all the multi-byte characters other than
    UTF-8 are two bytes.

    In UTF-8, multi-byte characters are two bytes or three bytes.  Kanji
    is encoded in three bytes.  The first byte is within the range of
    0xc2 to 0xef, second and third 0x80 to 0xbf.  Details are omitted
    here.  Anyway, all these bytes must fall within the appropriate
    ranges.

    Note that since MCPP is unable to recognize EUC-JP's three byte
    encoding (JIS X 0213), it regards 0x8f + 0xa1-0xfe + 0xa1-0xfe not
    as one character but as two characters of 0x8f and 0xa1-0xfe + 0xa1-
    0xfe.  As a result, MCPP does not issue a warning to the three byte
    encoding and can evaluate it correctly, except for a wide character
    constant in a #if expression.  In EUC-JP, a character with the first
    byte of 0x8e (a half-width Katakana) is encoded in two bytes, and
    treated as a multi-byte character.  This warning is not issued in a
    skipped #if group.

"/*" in comment
    A comment has a sequence of /*.  Unless it is intended, the
    programmer may have forgot to enclose the comment.  A comment cannot
    be nested.

Too long identifier, truncated to "very_long_identifier"
    Since the length of an identifier has exceeded IDMAX, it is
    truncated to IDMAX.

Illegal digit in octal number "089"
    An octal numeric token contains 8 or 9.  The only pre-Standard mode
    issues this warning.  Standard mode does not check whether a
    numerical token on lines other than #if directives is correct or not.
    If a #if expression has an octal numeric token of 8 or 9, it will
    cause a "Not an integer" error.

Unterminated string literal, catenated to the next line
    Although an unterminated string literal in a logical line is
    normally regarded as an error, MCPP invoked with the -lang-asm (-x
    assembler-with-cpp, -a) option regards it as a multi-line string
    literal and concatenates the line with the next by inserting '\n'.
    This way of writing has no advantage.  Using a functionality to
    concatenate adjacent string literals is preferable.

5.5.2   Unterminated Source File Related Warnings

On unterminated line or comments, the following messages are issued.
OLDPREP mode does not issue warning.

End of file with no newline, supplemented newline
    A file must be terminated with a newline.  MCPP supplemented
    <newline>.

End of file with , deleted the \
    A file must not be terminated with <backslash><newline>.  MCPP
    deleted the <backslash>.

End of file with unterminated comment, terminated the comment
    A comment is not terminated.  MCPP terminated the comment.

The following warning messages are issued in pre-Standard mode.  Pre-
Standard mode ignores these warnings to continue processing until it
reaches the end of input, causing many unexpected results.  Standard
mode issues an error.  OLDPREP mode does not issue even warning, except
on unterminated macro.

End of file within #if (#ifdef) section starting from line 123
End of file within macro invocation starting from line 123
End of file with unterminated #asm block starting from line 123
    #asm on the line 123 does not have a corresponding #endasm.

5.5.3   Directive Line Related Warnings

The macro is redefined
    MCPP displays this message followed by the source filename and line
    number where this macro definition is found.  The macro has been
    redefined with a different contents.  Source must not be well
    organized.  The following conditions must be met for macro
    definitions with the same name to exist. Or, a warning is issued.

    1.  Have the same number of parameters.
    2.  Have the same replacement list (one or more white space
    character between tokens are regarded as one.  In POSTSTD, the
    difference of the token separators does not matter because any
    number of space characters is changed into one, regardless of the
    presence or absence of the token separators.)
    3. In Standard mode, parameter names must be the same.  In POSTSTD
    mode and in pre-Standard mode, they are not checked.

Unknown argument "name"
    There is no such an argument of #pragma MCPP debug or #debug as
    "name".
No argument
    A #pragma MCPP debug or #debug does not have an argument.
Not an identifier "123"
    The argument of a #pragma MCPP debug or #debug is not an identifier.

The following message is issued only in Standard mode.

"and" is defined as macro
    "and" is defined as a macro in C++.
    Whereas in C95 "and" or other 11 names are defined as macros by
    <iso646.h>, those are operator tokens in C++,

The following message is issued only in STD mode.

No space between macro name "MACRO" and repl-text
    There is no space between macro name and replacement list of a #
    define line.  Normally, this does not happen, but it does happens
    when an illegal character is used in a macro name as follows:

        #define THIS$AND$THAT(a, b)   ((a) + (b))

    MCPP interprets this as follows:

        #define THIS  $AND$THAT(a, b) ((a) + (b))

    and issues a warning.  Of course, this is a quite rare case.

The following warnings on #pragma line are issued only in Standard mode.

No sub-directive
    A #pragma line does not have any argument.  The line is ignored.

Unknown encoding "encoding"
    The encoding name, "encoding", specified with #pragma __setlocale
    ( "encoding") is not implemented.

Too long encoding name "encoding"
    The encoding name, "long-long-encoding", specified with #pragma
    __setlocale( "long-long-encoding") exceeds 19 bytes.  MCPP ignores
    it.

Bad push_macro syntax
Bad pop_macro syntax
    There is a syntax error in #pragma MCPP push_macro, #pragma MCPP
    pop_macro, #pragma push_macro or #pragma pop_macro.  To use these #
    pragma directives, first enclose a macro name in an argument with
    ", " and then further enclose with ( ).  For example, ("MACRO"). (A
    redundant specification for compatibility with Visual C.)

"MACRO" has not been defined
    MACRO in the argument,("MACRO"), for #pragma MCPP push_macro, #
    pragma MCPP pop_macro, #pragma push_macro, or #pragma pop_macro is
    not defined as a macro.
"MACRO" is already pushed
    MACRO of #pragma MCPP push_macro ("MACRO") has been pushed and then
    further #undef-ed.  Without redefining the MACRO, push would not be
    possible.
"MACRO" has not been pushed
    MACRO in #pragma MCPP pop_macro( "MACRO") has not been pushed.  It
    may have been already popped.

The GCC-specific-build issues the following warnings:

Ignored #ident
Ignored #sccs
    #ident or #sccs lines are ignored.

GCC-specific-build issues a Class 2 warning to a line with #pragma GCC
followed by either poison, dependency or system_header and does not
output the line.  GCC V.3 resident preprocessor process the line but
MCPP does not.

The following warnings are issued only in pre-Standard mode.  Standard
mode regards them as errors.

Not in a #if (#ifdef) section in a source file
Line number "0x123" isn't a decimal digit sequence

KR mode issues the following warning.  Standard mode issues the same
warning only to #pragma once, #pragma MCPP put_defines, #pragma MCPP
push_macro, #pragma MCPP pop_macro, #pragma push_macro, #pragma
pop_macro, #pragma MCPP debug, and #pragma MCPP end_debug; for other
directives, Standard mode issues an error.  OLDPREP mode issues neither
an error nor a warning.

"Excessive token sequence "junk"

5.5.4   #if Expression Related Warnings

The following three warnings are relevant to an argument of #if, #elif,
or #assert:

Macro "MACRO" is expanded to "defined"
    The macro MACRO in a #if expression has been expanded to "defined".
    MCPP treats this strange macro not as identifier but as operator.
    How to treat it is undefined in Standard C.

Macro "MACRO" is expanded to "sizeof"
    The macro MACRO in a #if expression has been expanded to sizeof.
    MCPP treats this strange macro not as identifier but as operator.
    Pre-Standard mode issues this warning.

Macro "MACRO" is expanded to 0 token
    The macro MACRO has been expanded to zero token.  If this happens in
    a #if expression, it almost always causes an error.  The purpose of
    this warning is to indicate the cause of an error.

The followings warnings are relevant to an argument of #if, #elif or #
assert.  They are not issued in a sub-expression whose evaluation is
skipped. (MCPP invoked with the -W8 option issues them.)

Undefined escape sequence '\x'
    There is no such escape sequence as \x.  \x is evaluated to a two
    byte sequence. (Of course, an escape sequence of "\x" followed by a
    hex string is valid)  This warning is also issued to a UCN with an
    insufficient number of orders.

The followings warnings are relevant to operations and types in a
constant expression on #if, #elif or #assert lines.  No warnings are
also issued in a skipped sub-expression. (MCPP invoked with -W8 issues
them.)

MCPP evaluate #if expression by long long / unsigned long long even if
in C90 or C++98, and issues a warning on the value outside of long /
unsigned long in C90 and in C++98.  Also on LL suffix in other than C99,
MCPP issues a warning.  These warnings are of class 1 in stand-alone-
build and class 2 in compiler-specific-build.  In POSTSTD mode,
character constants are not used in #if expression, hence no warning is
issued. (Those make errors.)

Constant "123456789012" is out of range of (unsigned) long
    An integer constant has a value that exceeded the range of unsigned
    long.
Integer character constant 'abcde' is out of range of unsigned long
    A character constant 'abcde' has a value that exceeded the range of
    unsigned long.
Wide character constant L'abc' is out of range of unsigned long
    A wide character constant L'abc' has a value that exceeded the range
    of unsigned long.  This error occurs only in STD mode.
Result of "op" is out of range of (unsigned) long
    An operation result using the operator op is out of range of
    (unsigned) long.  Op is any of binary operators: *, /, %, +, and -.
    When two's complement representation is used, the unary operator '-'
    will cause an overflow with -LONG_MIN.  Unsigned long will never
    cause an overflow, so it does not cause an error.  If the result of
    an algebraic calculation is out of range, a warning is issued.
LL suffix is used in other than C99 mode "123LL"
    LL suffix is used for an integer in other than C99 mode.
Shift count "40" is larger than bit count of long
    The value of the right operand of a bit shift operator, << or >>,
    exceeds the bit count of long.

Negative value "-1" is converted to positive "18446744073709551615"
    A mixture of signed and unsigned operations results in conversion of
    a signed negative value into an unsigned positive value.  This is
    not an error, but indicates source code may contain a bug.  For both
    operands of a binary operator, such as *, /, %, +, -, <, >, <=, >=,
    ==, !=, &, ^ and | , and the second and third operands of a ternary
    operator, ? and :, if one operand is unsigned and the other is
    signed, the signed one is always converted into unsigned.

Illegal shift count "-1"
    The value of the right operand of a bit shift operator, << or >>, is
    a negative number or exceeds the bit count of long long.  Probably,
    this is also a bug in source code.

"op" of negative number isn't portable
    If an operation using a binary operator (op) results in either or
    both of negative operands, it lacks of portability. "Op" is any of /,
    %, and >>.  The >> operator with a negative left operand provides
    portability across compiler systems on computers having an
    arithmetic shift command, where a one-bit shift means a division by
    2.  Otherwise, it does not provide portability.

5.5.5   Macro Expansion Related Warnings

In these warnings, MCPP displays a macro definition followed by the
source filename and line number where the macro definition is found.

Macro started at line 123 swallowed directive-like line
    MCPP has read a line that begins with # as an argument of the macro
    that begins at the line 123.  Maybe, the macro invocation has a bug.
    If it had not been for the macro, the line that begins with # would
    have been interpreted as a directive line.  The same thing could be
    said if the macro had been located in a #if group whose evaluation
    is skipped, and the line is treated as a directive, because such
    macro is never expanded.

Replacement text "sub(" of macro "head" involved subsequent text
    Rescanning of the replacement list "sub(" of the macro "head" has
    involved the text succeeding the macro invocation.  K&R 1st to
    Standard C did not regard this as an error, but if you used this
    type of macro without having these standards in mind to receive this
    warning, your macro definition or macro invocation is probably not
    correct.  If you are intended to use such macro, it is an unusual
    macro.  The only STD mode issues this warning.  COMPAT mode issues
    this warning only on class 8.  In pre-Standard mode, the same
    situation may arise but no warning is issued.  POSTSTD mode never
    issues this warning because rescanning does not involve the text
    succeeding the replacement list. (A macro may be expanded quite
    differently or causes an "unterminated macro call" error.)

Less than necessary N argument (s) in macro call "macro( a)"
    An insufficient number of arguments of a macro invocation.  Normally,
    this causes an error, but in case of missing only one argument of a
    macro that takes a variable number of arguments, MCPP issues a
    warning.  This is to decrease migration problems of variable
    argument macros between GCC and C99.

5.5.6   Line Number Related Warnings

This section covers line number related warnings.

Line number "32768" is out of range of [1,32767]
    In C90 and C++, the first argument of a #line must be within the
    range of 1 to 32767.  0 is also out of range.  With __STDC_VERSION__
    >= 199901L or __cplusplus >= 199901L, the valid range is 1 to
    2147483647.  Therefore, in C90 or C++ mode, MCPP issues a warning,
    not an error, to the range of 32768 to 2147483647.  Standard mode
    issues this warning.

In C90, when you use #line to specify a value slightly below 32767, you
won't receive an error, but sooner or later, the line number will exceed
32767, in which case, MCPP continues to increase the line number while
issuing a warning.  Some compiler-proper may not accept this large line
number.  It is not desirable to specify a large number with #line.

Line number 32768 got beyond range
    The source line number has reached 32768, at which a warning is
    issued one time.

Line number 32769 is out of range
    When the __LINE__ macro is expanded, the lime number has exceeded
    32767.

5.5.7   #pragma MCPP warning, #warning

#warning
#pragma MCPP warning
    A #pragma MCPP warning (#warning) directive has been executed.
    Following the above message, the line is displayed. (If an argument
    of #pragma MCPP warning has a token error, such as unterminated
    string, #pragma MCPP warning is not executed.)  Although this
    directive appears in the Warning Level 1 section, this warning is
    issued at every warning level.  Standard mode has #pragma MCPP
    warning, while pre-Standard mode has #warning.


5.6     Warnings (Class 2)

This section covers warnings to code that does not contains a bug but
causes a portability problem.

Converted [CR+LF] to [LF]
    Converted the newline code from [CR+LF] to [LF].  This warning is
    issued on warning level of class 1 in stand-alone-build and class 2
    in compiler-specific-build.

    MCPP evaluate #if expression by long long / unsigned long long even
    if in C90 or C++98, and issues a warning on the value outside of
    long / unsigned long in C90 and in C++98.  Also LL suffix in other
    than C99 mode gets a warning as well as i64 suffix of compiler-
    specific-builds for Visual C and Borland C.  These warnings are of
    class 1 in stand-alone-build and class 2 in compiler-specific-build.

    Constant "123456789012" is out of range of (unsigned) long
    Integer character constant 'abcde' is out of range of unsigned long
    Wide character constant L'abc' is out of range of unsigned long
    Result of "op" is out of range of (unsigned) long
    LL suffix is used in other than C99 mode "123LL"
    I64 suffix is used in other than C99 mode "123i64"
    Shift count "40" is larger than bit count of long

    Only the Standard mode issues the following five warnings:

Parsed "//" as comment
    A text from // to the end of the line is interpreted as a comment.
    This is a legal notation in C99 and C++.  In C90 mode MCPP treats it
    as a comment after issuing a warning.

Variable argument macro is defined
    Although it is the C99 Standard that stipulates variable argument
    macros, a variable argument macro has been defined in C90 or C++
    mode.

Empty argument in macro call "MACRO( a, ,"
    A macro invocation has an empty argument, in which case, MCPP
    regards the argument as zero number of pp-token sequences and treats
    it as such.  The empty argument is legal in C99, while it is
    undefined in C90, thus causing a lack of portability.  (MCPP regards
    an empty argument even without a ',' not as an empty argument, but
    as a missing argument, thus issuing an error.  Since zero number of
    arguments and one empty argument is syntactically indistinguishable,
    MCPP does not make both an error.)  Writing an empty argument in
    source code is not generally preferable.  I recommend that you
    should code:
        #define EMPTY
    , if possible, and then write EMPTY where an empty argument is
    written.

Skipped the #pragma line
    GCC V.3 provides several #pragma directives in the form of #pragma
    GCC <args>.  Its preprocessor processes some of them, but MCPP does
    not support them.  This warning is issued to a #pragma directive
    compiler-specific preprocessors process but MCPP does not.

Not a valid preprocessing token "+12"
    Concatenating two pp-tokens with the ## operator results in an
    invalid token "+12", which normally causes an error.  However, MCPP
    invoked with the -lang-asm  (-x assembler-with-cpp, -a) option does
    not regard it as an error.

The following warning is issued only POSTSTD mode.

Header-name enclosed by <, > is an obsolescent feature <stdio.h>
    The header name in the form of <stdio.h> is one of the
    specifications I want to abolish.  I recommend to use "stdio.h".

The following two warnings are issued only in some compiler systems.  Of
course, the coding in question is valid in those particular systems, but
it lacks of portability, so a warning is issued to remind users of it.

#include_next is not allowed by Standard
#warning is not allowed by Standard
    These directives are valid in GCC but not Standard C-conforming and
    lack of portability.

Converted  to /
    A #include directive contains \ in the header name.  MCPP converts \
    into /.  "\\" is a valid path delimiter in OSs, such as Windows, but
    undefined in Standard C.  It is safe to use /.  MCPP on Windows
    issues this warning only once. (MCPP does not regard " preceded by \
    as a delimiter of a string literal, raising an "unterminated string
    literal" error.)

'$' in identifier "THIS$AND$THAT"
    An identifier has a '$'.  The only MCPP compiled with DOLLAR_IN_NAME
    set to TRUE issues this warning only once because '$' lacks of
    portability although it is valid in this MCPP.  '$' being regarded
    as a pp-token, other MCPP parses THIS$AND$THAT into five components
    THIS,  $,  AND, $ and THAT, resulting in a compiler error.


5.7     Warnings (Class 4)

Standard C guarantees some minimum translation limits.  It is desirable
that a preprocessor imposes translation limits that exceed these values,
but source programs that uses preprocessor' own translation limits will
restrict portability.  MCPP provides some macros in "system.H" that
allows you to set translation limits to any values you like.  MCPP in
Standard mode issues a warning to source code that exceeds a Standard C
guaranteed limit.  However, these messages are excluded from Class 1 and
2 because they may be issued frequently, depending on standard headers
of compiler systems or source programs.

Logical source line longer than 509 bytes
    The length of a logical source line has exceeded 509 bytes.

Quotation longer than 509 bytes "very_very_long_string"
    The length of a string literal, character constant or header name
    has exceeded 509 bytes.

More than 8 nesting of #include
    The depth of nested #includes has exceeded 8.  This warning is
    issued only when it reaches 9.

More than 8 nesting of #if (#ifdef) sections
    The depth of nested #ifs, #ifdefs, or #ifndefs has exceeded 8.  This
    warning is issued only when it reaches 9.

More than 1024 macros defined
    The number of defined macros has reached 1024.  This number includes
    both of pre-defined macros and those defined in header files.

String literal longer than 509 bytes "very_very_long_string"
    Expansion of a macro using the # operator has generated a string
    literal longer than 509 bytes.

The following warnings are not issued in a skipped #if group.

More than 32 nesting of parens in #if expression
    The depth of nested parentheses in a #if expression has exceeded 32.
    This warning is issued only when it reaches 33.

More than 31 parameters
    The number of parameters of a macro definition has exceeded 31.

Identifier longer than 31 bytes "very_very_long_name"
    The length of an identifier has exceeded 31 bytes.

With __STDC_VERSION__ >= 199901L, the Standard specified translation
limits are as follows:

    Length of logical source line:                  4095 bytes
    Length of string literal, character constant, or header name:
                                                    4095 bytes
    Identifier length:                              63 characters
    Depth of nested #includes:                      15
    Depth of nested #ifs, #ifdefs, or #ifndefs:     63
    Depth of nested parentheses in #if expression:  63
    Number of macro parameters:                     127
    Number of definable macros:                     4095

Note that the length of a UCN or multi-byte-character as an identifier
is expressed as the number of characters, not bytes.

When MCPP is invoked with the -+ option to specify C++ preprocessing,
the Standard guideline of translation limits are as follows:

    Length of logical source line:                  65536 bytes
    Length of string literal, character constant, or header name:
                                                    65536 bytes
    Identifier length:                              1024 characters
    Depth of nested #includes:                      256
    Depth of nested #ifs, #ifdefs, or #ifndefs:     256
    Depth of nested parentheses in #if expression:  256
    Number of macro parameters:                     256
    Number of definable macros:                     65536

Note that MCPP allows the maximum number of macro parameters of 255.  So,
when it reaches 256, MCPP issues an error.

The following warnings are excluded from class 1 and 2 because they are
issued too frequently.

Converted 0x0c to a space
    [FF], [VT], [CR] (other than in [CR][LF] sequence) in source code as
    token separators are converted into a space character.  How to deal
    with these token separators located on a directive line is undefined
    in Standard C.  If they are located in comments, string literals, or
    character constants, MCPP does not convert them. (Of course, MCPP
    can do so, but I do not want MCPP to impose a greater restriction on
    a character set used since it essentially depends on the compiler-
    proper.)  On the other hand, [TAB] as a token separator is converted
    into a space character, but no warning is issued because it does not
    affect compilation at all. ([TAB] means nothing but a space to both
    of preprocessor and compiler-proper.)
    [FF] are found sometimes in actual source to indicate "end of page".
    This is not a recommendable style.

Undefined symbol
    In #if line the identifier "name" is not defined as a macro.  It is
    evaluated to zero.  This is not an error at all, but may be a
    program bug.  No warning is issued to an argument of a #if defined.
    This warning can be avoided by writing #if defined name && (name ..),
    instead of #if name .., or by invoking MCPP with the -D name=0
    option.  C++ gives "true" and "false" tokens special treatment and
    evaluates to 1 and 0, respectively, without a warning.

Multi-character wide character constant L'ab' isn't portable
    A wide character constant value varies even among compiler systems
    using the same character set because the encoding scheme of wide
    character constants and how to evaluate multi-characters depend on
    compiler systems.  Therefore, #if expressions using them do not
    provide portability.  The only STD mode issues this warning.
    POSTSTD mode does not permit character literal in #if expression, so
    this causes an error. (The next item is also treated the same way.)

Multi-character or multi-byte character constant '' isn't portable
    Since how to evaluate the value of a multi-character or multi-byte
    character constant depends on compiler systems,  #if expressions
    using them do not provide portability.  The only STD mode issues
    this warning.

The following two warnings are issued only in Standard mode.

Macro with mixing of ## and # operators isn't portable
    A function-like macro has a token sequence of "## #" in the
    replacement list.  This sequence of two operators lack of
    portability because their priority is unspecified in Standard C.
    MCPP takes precedence # over ##.  Note that if a function-like macro
    has a token sequence in the reverse order "# ##", MCPP regards it as
    an error because the operand of the # operator must be a parameter.

Macro with multiple ## operators isn't portable
    A macro definition has only one token or parameter inserted between
    ## operators in the replacement list.  This macro may lack of
    portability because the evaluation order of ## operators is
    unspecified in Standard C.  MCPP applies the ## operator from left
    to right.


5.8     Warnings (Class 8)

There is little chance that the indicated source code contains a bug,
but these messages are issued to call attention to it.  MCPP invoked
with the -W8 option issues these warnings.

In a skipped #if group, whether preprocessing directives, such as #ifdef,
#ifndef, #elif, #else, and #endif, are balanced or not is checked.
However, MCPP invoked with the -W8 option also checks non-conforming or
unknown directives.  Standard mode issues a warning when the depth of
nested #ifs exceeds 8.

Illegal #directive "123" (in skipped block)
Unknown #directive "pseudo-directive" (in skipped block)
More than 8 nesting of #if (#ifdef) sections (in skipped block)
#include_next is not allowed by Standard
#warning is not allowed by Standard

The following warnings are related to #if expression.  Given an
expression of #if a || b, for example, if "a" is true, "b" is not
evaluated.  However, MCPP invoked with -W8 issues a warning to non-
evaluated sub-expressions, in which case, the note saying "in non-
evaluated sub-expression" is appended.

Constant "123456789012345678901" is out of range
Constant "123456789012" is out of range of (unsigned) long
LL suffix is used in other than C99 mode "123LL"
I64 suffix is used in other than C99 mode "123i64"
Shift count "40" is larger than bit count of long
Integer character constant 'abcdefghi' is out of range
Integer character constant 'abcde' is out of range of unsigned long
Wide character constant L'abcdef' is out of range
Wide character constant L'abc' is out of range of unsigned long
CHARBIT bits can't represent escape sequence '\x123'
CHARBIT*2 bits can't represent escape sequence L'\x12345'
Division by zero
Undefined symbol "name", evaluated to 0
sizeof: Unknown type "type"
sizeof: Illegal type combination with "type"
Multi-character wide character constant L'ab' isn't portable
Multi-character or multi-byte character constant '' isn't portable
Undefined escape sequence '\x'
UCN cannot specify the value "0000007f"
Negative value "-1" is converted to positive "18446744073709551615"
Result of "op" is out of range
Result of "op" is out of range of (unsigned) long
Illegal shift count "-1"
"op" of negative number isn't portable

sizeof is disallowed in C Standard
    The purpose of this warning is to remind users of the fact that
    Standard C does not allow for #if sizeof, although pre-Standard mode
    implements it.

"MACRO" wasn't defined
    An undefined name is specified with #undef.  Standard C does not
    regard it as an error.

Macro "macro" needs arguments
    A token with the same name as a macro with arguments appears in a
    stand-alone manner.  MCPP does not expand it and leave it as it is.
    The only pre-Standard mode issues this warning. (Standard mode does
    not issue a warning since such a token does not cause any problem.)

Replacement text "sub(" of macro "head" involved subsequent text
    Rescanning of the replacement list "sub(" of the macro "head" has
    involved the text succeeding the macro invocation.  COMPAT mode
    issues this warning only on class 8, whereas STD mode issues on
    class 1.


5.9     Warnings (Class 16)

Trigraphs and digraphs are not used at all in an environment where they
are not need to.  If they are found in such an environment, attention
needs to be paid.  The purpose of the -W16 option is to find such
trigraphs and digraphs.  On the other hand, these warnings are very
bothersome in an environment where trigraphs or digraphs are used on a
regular basis because they are issued very frequently.  For this reason,
I set up a separate class for these warnings.  Anyway, MCPP issues these
messages only in the state where the trigraphs or digraphs are enabled.
Digraph is for Standard mode only, and trigraph is for STD mode only.

2 trigraph(s) converted
    Two trigraph sequences in this physical line have been converted.
    Does the programmer really intend to write trigraph?

2 digraph(s) converted
    Two digraph sequences in this line have been converted.  Does the
    programmer really intend to write digraphs?  MCPP compiled with
    HAVE_DIGRAPHS == FALSE in STD mode converts a digraph into a regular
    token in the following manner after preprocessing:

        <% -> {      <: -> [      %:    -> #
        %> -> }      :> -> ]      %:%:  -> ##

    Therefore, the compiler-proper is not necessary to be able to handle
    digraphs.  However, POSTSTD mode converts a digraph into a regular
    pp-token during the translation phase 1.  The difference of this
    behavior between the modes appears when a # operator converts a
    digraph into a string; STD mode directly converts a digraph sequence
    into a string, while POSTSTD mode converts it into a regular pp-
    token, and then into a string.  In addition, if a string literal
    contains a character sequence which is equivalent to a digraph
    sequence, STD mode does not convert it, while POSTSTD mode converts
    it into a character sequence of the corresponding pp-tokens.

    STD mode does not issue a warning to a digraph that appears on a
    preprocessing-directive line and disappears in a due course because
    this warning is issued only to converted digraphs.


5.10    Diagnostic Messages Index

    Diagnostic Messages             Fatal   Error   Warning class
                                    error           1   2   4   8  16

"..." isn't the last parameter      [5.4.7]
"/*" in comment                             [5.5.1]
"MACRO" has not been defined                [5.5.3]
"MACRO" has not been pushed                 [5.5.3]
"MACRO" is already pushed                   [5.5.3]
"MACRO" wasn't defined                                        [5.8]
"__STDC__" shouldn't be redefined   [5.4.7]
"__STDC__" shouldn't be undefined   [5.4.8]
"__VA_ARGS__" without corresponding "..."
                                    [5.4.7]
"and" is defined as macro                   [5.5.3]
"defined" shouldn't be defined      [5.4.7]
"op" of negative number isn't portable      [5.5.4]           [5.8]
## after ##                         [5.4.7]
#error                              [5.4.10]
#warning                                    [5.5.7]
#include_next is not allowed by Standard          [5.6]       [5.8]
'$' in identifier "THIS$AND$THAT"                 [5.6]
2 digraph(s) converted                                              [5.9]
2 trigraph(s) converted                                             [5.9]
CHARBIT bits can't represent escape sequence '\x123'
                                    [5.4.6]                   [5.8]
CHARBIT*2 bits can't represent escape sequence L'\x12345'
                                    [5.4.6]                   [5.8]
Already seen #else at line 123      [5.4.3]
Bad defined syntax                  [5.4.5]
Bad push_macro syntax                       [5.5.3]
Bad pop_macro syntax                        [5.5.3]
Buffer overflow expanding macro "macro" at "something"
                                    [5.4.9]
Buffer overflow scanning token "token"
                            [5.3.3]
Bug:                        [5.3.1]
Can't open include file "file-name"
                                    [5.4.11]
Can't use the character 0x24        [5.4.5]
Can't use a character constant 'a'  [5.4.5]
Can't use a string literal "string"
                                    [5.4.5]
Can't use the operator "++"         [5.4.5]
Constant "1234567890123456789012" is out of range
                                    [5.4.6]                   [5.8]
Constant "123456789012" is out of range of (unsigned) long
                                            [5.5.4][5.6]      [5.8]
Converted [CR+LF] to [LF]                   [5.5.1][5.6]
Converted 0x0c to a space                               [5.7]
Converted \ to /                                  [5.6]
Division by zero                    [5.4.6]                   [5.8]
Duplicate parameter names "a"       [5.4.7]
Empty argument in macro call "MACRO( a, ,"        [5.6]
Empty character constant ''         [5.4.1]
Empty parameter                     [5.4.7]
End of file with \, deleted the \           [5.5.2]
End of file with unterminated comment, terminated the comment
                                            [5.5.2]
End of file with no newline, supplemented the newline
                                            [5.5.2]
End of file with unterminated #asm block started at line 123
                                    [5.4.2] [5.5.2]
End of file within #if (#ifdef) section started at line 123
                                    [5.4.2] [5.5.2]
End of file within macro call started at line 123
                                    [5.4.2] [5.5.2]
Excessive ")"                       [5.4.5]
Excessive token sequence "junk"     [5.4.4] [5.5.3]
File read error             [5.3.2]
File write error            [5.3.2]
Header-name enclosed by <, > is an obsolescent feature <stdio.h>
                                                  [5.6]
I64 suffix is used in other than C99 mode "123i64"
                                                  [5.6]       [5.8]
Identifier longer than 31 bytes "very_very_long_name"   [5.7]
Ignored #ident                              [5.5.3]           [5.8]
Ignored #sccs                               [5.5.3]           [5.8]
Illegal #directive "123"            [5.4.4]                   [5.8]
Illegal control character 0x1b in quotation
                                            [5.5.1]
Illegal control character 0x1b, skipped the character
                                    [5.4.1]
Illegal digit in octal number "089"         [5.5.1]
Illegal multi-byte character sequence "XY"
                                    [5.4.1]
Illegal multi-byte character sequence "XY" in quotation
                                            [5.5.1]
Illegal parameter "123"             [5.4.7]
Illegal shift count "-1"                    [5.5.4]           [5.8]
Illegal UCN sequence                [5.4.1]
In #asm block started at line 123   [5.4.3]
Integer character constant 'abcdefghi' is out of range
                                    [5.4.6]                   [5.8]
Integer character constant 'abcde' is out of range of unsigned long
                                            [5.5.4][5.6]      [5.8]
The macro is redefined                      [5.5.4]
Less than necessary N argument(s) in macro call "macro( a)"
                                    [5.4.9] [5.5.5]
Line number "32768" got beyond range        [5.5.6]
Line number "0x123" isn't a decimal digits sequence
                                    [5.4.4] [5.5.6]
Line number "32769" is out of range         [5.5.6]
Line number "2147483648" is out of range of [1,2147483647]
                                    [5.4.4]
Line number "32768" is out of range of [1,32767]
                                            [5.5.6]
LL suffix is used in other than C99 mode "123LL"
                                            [5.5.4][5.6]      [5.8]
Logical source line longer than 509 bytes               [5.7]
Macro "MACRO" is expanded to "defined"      [5.5.4]
Macro "MACRO" is expanded to "sizeof"       [5.5.4]
Macro "MACRO" is expanded to 0 token        [5.5.4]
Macro "macro" needs arguments                                 [5.8]
Macro started at line 123 swallowed directive-like line
                                            [5.5.5]
Macro with mixing of ## and # operators isn't portable
                                                        [5.7]
Macro with multiple ## operators isn't portable         [5.7]
Misplaced ":", previous operator is "+"
                                    [5.4.5]
Misplaced constant "12"             [5.4.5]
Missing ")"                         [5.4.5]
Missing "," or ")" in parameter list "(a,b"
                                    [5.4.7]
More than BLK_NEST nesting of #if (#ifdef) sections
                            [5.3.3]
More than 8 nesting of #if (#ifdef) sections            [5.7] [5.8]
More than INCLUDE_NEST nesting of #include
                            [5.3.3]
More than 8 nesting of #include                         [5.7]
More than 32 nesting of parens in #if expression        [5.7]
More than NEXP*2-1 constants stacked at "12"
                                    [5.4.5]
More than NEXP*3-1 operators and parens stacked at "+"
                                    [5.4.5]
More than 1024 macros defined                           [5.7]
More than NMACPARS parameters       [5.4.7]
More than 31 parameters                                 [5.7]
More than necessary N argument(s) in macro call "macro( a, b, c)
                                    [5.4.9]
Multi-character or multi-byte character constant '' isn't portable
                                                        [5.7] [5.8]
Multi-character wide character constant L'ab' isn't portable
                                                        [5.7] [5.8]
Negative value "-1" is converted to positive "18446744073709551615"
                                            [5.5.4]           [5.8]
No argument                         [5.4.4] [5.5.3]
No header name                      [5.4.4]
No identifier                       [5.4.4]
No line number                      [5.4.4]
No space between macro name "MACRO" and repl-text
                                            [5.5.3]
No sub-directive                            [5.5.3]
No token after ##                   [5.4.7]
No token before ##                  [5.4.7]
Not a file name "name"              [5.4.4]
Not a formal parameter "id"         [5.4.7]
Not a header name "UNDEFINED_MACRO"
                                    [5.4.4]
Not a line number "name"            [5.4.4]
Not a valid preprocessing token "+12"
                                    [5.4.9]       [5.6]
Not a valid string literal          [5.4.9]
Not an identifier "123"             [5.4.4] [5.5.3]
Not an integer "1.23"               [5.4.5]
Not in a #if (#ifdef) section       [5.4.3]
Not in a #if (#ifdef) section in a source file
                                    [5.4.3] [5.5.3]
Operand of _Pragma() is not a string literal
                                    [5.4.12]
Operator ">" in incorrect context   [5.4.5]
Out of memory (required size is 0x123 bytes)
                            [5.3.2]
Parsed "//" as comment                            [5.6]
Preprocessing assertion failed      [5.4.10]
Quotation longer than 509 bytes "very_very_long_string"
                                                        [5.7]
Recursive macro definition of "macro" to "macro"
                                    [5.4.9]
Replacement text "sub(" of macro "head" involved subsequent text
                                            [5.5.5]           [5.8]
Rescanning macro "macro" more than RESCAN_LIMIT times at "something"
                                    [5.4.9]
Result of "op" is out of range      [5.4.6]                   [5.8]
Result of "op" is out of range of (unsigned) long
                                            [5.5.4][5.6]      [5.8]
Shift count "40" is larger than bit count of long
                                            [5.5.4][5.6]      [5.8]
sizeof is disallowed in C Standard                            [5.8]
sizeof: Illegal type combination with "type"
                                    [5.4.6]                   [5.8]
sizeof: No type specified           [5.4.5]
sizeof: Syntax error                [5.4.5]
sizeof: Unknown type "type"         [5.4.6]                   [5.8]
Skipped the #pragma line                          [5.6]
String literal longer than 509 bytes "very_very_long_string"
                                                        [5.7]
String literals "str1" "str2" are concatenated                [5.8]
This is not a preprocessed source
                            [5.3.4]
This preprocessed file is corrupted
                            [5.3.4]
Too long header name "long-file-name"
                            [5.3.3]
Too long identifier, truncated to "very_long_identifier"
                                            [5.5.1]
Too long line spliced by comments
                            [5.3.3]
Too long logical line       [5.3.3]
Too long number token "12345678901234"
                            [5.3.3]
Too long output line        [5.3.3]
Too long pp-number token "1234toolong"
                            [5.3.3]
Too long quotation "long-string"
                            [5.3.3]
Too long source line        [5.3.3]
Too long token              [5.3.3]
Too many include directories "dir"
                            [5.3.3]
Too many include files      [5.3.3]
UCN cannot specify the value "0000007f"
                                    [5.4.1]                   [5.8]
Undefined escape sequence '\x'              [5.5.4]           [5.8]
Undefined symbol "name", evaluated to 0                 [5.7] [5.8]
Unknown #directive "pseudo-directive"
                                    [5.4.4] [5.5.4]           [5.8]
Unknown argument "name"                     [5.5.3]
Unterminated character constant 't understand.
                                    [5.4.1]
Unterminated expression             [5.4.5]
Unterminated header name <header.h  [5.4.1]
Unterminated macro call "macro( a, (b,c)"
                                    [5.4.9]
Unterminated string literal
                                    [5.4.1]
Unterminated string literal, catenated to the next line
                                            [5.5.1]
Variable argument macro is defined                [5.6]
Wide character constant L'abc' is out of range
                                    [5.4.6]                   [5.8]
Wide character constant L'abc' is out of range of unsigned long
                                            [5.5.4][5.6]      [5.8]


                    6.  Reporting on Bugs and Others

6.1     MCPP's Bug?

I have developed the Validation Suite to verify conformance of
preprocessing to Standard C, and released it along with MCPP source.
The Validation Suite is intended to allow you to verify all the Standard
C preprocessing specifications.  Of course, I used the Validation Suite
to check MCPP.  And what is more, I have compiled MCPP in many compiler
systems to verify its behavior.  Therefore, I am confident that MCPP is
now almost flawless, free of bugs and misinterpretation of
specifications, however, I cannot deny the possibility that it still
contains some bugs.

If you find a strange behavior, do not hesitate to let me know.  If you
receive a diagnostic message saying "Bug: ...", it is undoubtedly a bug
of MCPP or a compiler system. (Probably, it's MCPP's.)  How illegal a
user program may be, should MCPP lose control, it is MCPP that is to be
blamed for it.


6.2     malloc() Related Bugs

I have written a library called kmmalloc that contains malloc() and
other memory handling routines. (For details, see 4.extra of mcpp-
porting.txt.)

If such MCPPs exit with an error number from 2120 to 2124 (or 120 to 124),
MCPP or a compiler system undoubtedly contains a bug. (Possibly, it is
library function's).

If you write the following directive near the end of a sample source for
testing:

    #pragma MCPP debug memory

the heap memory information is output both at the directive and at the
end of the preprocessing.  If a message saying "Heap error: ... " is
output, it undoubtedly indicates that MCPP or a compiler system contains
a bug.

If you find a bug, try to narrow down the problematic area by
sandwiching a portion of the sample source with #if 0 and #endif and
testing it.  Repeat this process until you spot the code with the bug.


6.3     How to Report Bugs

When you report a bug, please be sure to provide the following
information:

    Compiler system MCPP is ported to
    A sample source (shorter is better) that allows reproduction of what
        looks like a bug
    Preprocessing results


6.4     Give Us Your Feedback

Other than bugs, I would appreciate if you give me feedback on MCPP
usage, diagnostic messages or this manual.

For your feedback or information, please post to "Open Discussion Forum"
at:

    http://mcpp.sourceforge.net/

or send via e-mail.

                                                                   [eof]
