C and C++

Source Code Analysis

using

CodeCheck

 

by

Loren Cobb, PhD.

 

CodeCheck is a product of Abraxas Software, Inc.

CodeCheck was designed & written by Loren Cobb.

 

 

 

 

For more information, contact:

Abraxas Software, Inc.

 

Phone:  503-232-0540

Fax:  206.309.0304

Email: support@abxsoft.com

www.abraxas-software.com

 


Table of Contents

 


Preface............................................................................................................................................................. v

Acknowledgments................................................................................................................................. vi

Quick Start:

0.1     Installation........................................................................................................................................ 1

0.2 Command-Line Options.................................................................................................................... 1

0.3          CodeCheck File Names............................................................................................................ 8

0.4          How to Use CodeCheck........................................................................................................... 9

Introduction to CodeCheck:

1.1          The Elements of Style.......................................................................................................... 11

1.2          Why Programs Break............................................................................................................ 13

1.3          Why Programs Fail to Port.............................................................................................. 15

1.4          The Structure of CodeCheck.......................................................................................... 18

1.5          Debugging with CodeCheck............................................................................................ 19

1.6          Predefined Macro Constants.......................................................................................... 20

Checking Types:

3.1          How to Analyze a Type Declaration........................................................................... 24

3.2          How to Determine the Type of an Identifier........................................................ 26

3.3          How to Determine the Type of an Operand........................................................... 27

3.4          How to Detect Implicit Type Conversions............................................................. 27

Portable Style:

4.1          Lexical Issues in Portability........................................................................................... 29

4.2          Preprocessor Considerations....................................................................................... 39

4.3          Portability in Declarations............................................................................................ 48

4.4          Portability at the Expression Level......................................................................... 55

4.5          Portability of Functions................................................................................................... 56

4.6          C Compiler Limits..................................................................................................................... 58

Maintainable Style:

5.1          Lexical Issues in Program Maintenance................................................................. 62

5.2          Preprocessor Considerations....................................................................................... 78

5.3          Maintainability in Declarations................................................................................... 81

5.4          Maintainability at the Project Level....................................................................... 86

Software Metrics:

6.1          Program Size............................................................................................................................... 87

6.2          Logical Complexity............................................................................................................... 99

6.3          Code Density.............................................................................................................................. 105

CodeCheck Rule Sets:

7.1          Verifying POSIX.1 compliance....................................................................................... 112

7.2          Compliance with Coding Standards........................................................................ 118

7.3          Porting to ANSI C.................................................................................................................... 121

7.4          Porting to Strict K&R Compilers.............................................................................. 127

7.5          Measuring Code Complexity......................................................................................... 129

7.6          Verifying the Order of Module Elements.......................................................... 131

7.7          C++ Rules...................................................................................................................................... 133

7.8          Advanced C++ Rules.............................................................................................................. 135

Supporting Material:

8.1          Glossary....................................................................................................................................... 147

8.2          Bibliography............................................................................................................................ 154

8.3         Index................................................................................................................................................. 156


Preface

Producing accurate, reliable and flexible programs in C or C++ is a difficult task. Even experienced programmers need tools to aid in the program develop­ment process, but all too few tools exist in today’s market that can detect bugs in C and C++ source code and help the programmer to avoid problems.

CodeCheck is a powerful tool for analyzing C and C++ source code. Unlike other tools, Codecheck is itself fully programmable. It performs its primary task — analyzing and critiquing C and C++ source code — entirely under the direc­tion of a user-written control program.

CodeCheck is not a new version of that old C programmer’s standby, lint, although it can per­form some lint-like error detection. For example, Code­Check compares all declarations and macro definitions across all modules of a project, to detect inconsis­tencies. The main thrust of CodeCheck is to de­tect noncom­pliance with codified style standards, to detect maintenance or port­ability prob­lems within code which al­ready compiles perfectly on today’s compilers, and to compute cus­tomized quan­titative indicators of code size, com­plex­ity, and den­sity.

Stan­dards and mea­sures can be specified by the user for a tremendous number of fea­tures of C code that have an impact on portabil­ity, maintainability, and style. Code­Check is de­signed to enhance dra­matically the effec­tiveness and effi­ciency of project man­agement in com­mercial and indus­trial pro­gram­ming ef­forts.  A custom CodeCheck pro­gram specifying code stan­dard­s and measures can be written by a pro­ject leader using the CodeCheck language (actually a re­strict­ed subset of C itself). Code­Check can be pro­grammed to:

a.   Monitor compliance with standards for programming style, rules for type-encoded prefixes for identifiers, proper use of macros and typedefs, prototypes, etc.

b.   Identify code that is not portable to or from any particular envi­ronment (machine, com­­­piler, oper­ating system, or inter­face standard).

c.   Quantify code maintainability with user-defined measures at all levels: line, statement, func­tion, file, and project. Compute McCabe and Halstead complexity measures.

Sample CodeCheck programs are provided for a variety of problems, rang­ing from portability to complexity to compliance with style standards.


Acknowledgments

We gratefully acknowledge the invaluable help given to the CodeCheck pro­ject by the following individuals, who contributed suggestions and bug re­ports. We couldn't have done it without you!

 


Jan-Anders Åkerholm

Wendy Averdung

George Baker

Wahab Baldwin
Ed Batutis
Nasser Bazzi
John Benson
Bill Bentley
Dana Birkby
Dale Bremer
John Bradley

Mike Branson

Linda Brigham
Jeff Brown
Van Brollini

Thomas Brustbauer

Walt Buehring
Laura Burke
Bill Camp­bell
Pat Cappelare
Camille Carum

Rob Chambers
Alex Chervet

Tim Child

John Clinton
Patrick Conley

Darryl Cornish
Bill Costello
Kevin Coyle
Mike Curry

Mark De May

Matt Diamond
David Doerner
John Doggett
Bob Domitz

Tom Dropka
Ge­orge Entwhistle

Richard Evans

Aaron Fager
Brent Fairbanks

Bud Feuless

Steve Fine

Julianne Fontenoy

Keith Fulton
Greg Germano
Bud Feuless

Keith Fulton
Shawn Garbett

Jerry Garcia
Bonnie Gilmore

Dennis Glenn

Soo Hye Goh

David Gordon

Bruce Graham

Elaine Granoff
Jeff Johnson
Brett Halle
Esko Hannula

Othar Hansson
Tris Harkless
Bill Hazzard

John Her­bold

Alison Hine

Paul Hurley

Jim Jacobson

David Johnson
Mike Johnson
Darrell Jones

Arun Joshi

Ken Joyner

Ed Kirk

Ian Koenig

Tom Kohler

Hannu Kokko

Detlef Kowalewski
Ron Kuhn

Patrica Langer

Mark Lamer

Eric Lear

David Linsky

Alan Liu

Martin Lord

Tom Lucas
Frank Lusardi
Paul McGlashan

Bill McMahon

Terry McNulty

Eric Melbardis

Mike Muegel

Marcel Meyer
Deborah Miller
Steve Monett
Stephen Montgomery
Tom Moreaux

Peter Morse

Mike Muegel

Greg Munger

Rick Murnane

Hugh Njemanze
John Norby

Michael O’Leary
Ingmar Olsson

Lyle Parkyn
Bob Peterson
Steve Peterson

Greg Pilkington
Darrel Pinson
Th. Pfister

Karl Pingle

John Plocher

Alan Pope
Chris Prendergast

Steve Ray
Mike Reid

Steve Reynolds
Dan Richards

Robin Riley

Kay Roche
Jim Roskind

Stefan Roth
Florian Sachse
Richard Sargent

Jay Sarkar
Alan Sauls

Karl Schopmeyer

Rick Schuessler
Peter Schwaller
Roshin Sharma

Andrew Shebanow

Ursula Shelander
Hartmut Stein
Ur­sula Shelander

Jeffrey Smith

Cass Smith
Tim Southgate

Brian Stromquist

Dan Sullivan

Malcolm Sutter

Padma Talasila
Larry Thiel

Julie Tiemann

Esther Tong

Gino van den Bergen

Thomas Wik­shult

Clayton Wilkinson

David Williams

Roderick Williams
Tim Wint

Aaron Wohl
Matt Woodward
Heinz Wrosch

Cornelia Yoder

Robert Yu
Doug Zimmerman
Bruce Zimov


Chapter 0: Quick Start

 

 

 

0.1    Installation

 

1.    Copy the CodeCheck program into the directory in which you normally keep your programming tools.

2.    Create a new directory for the collection of Code­Check rule files that is sup­plied on the distribution disk. Copy these rule files to this new directory.

3.    Assign the pathname of the rule directory that you created in Step 2 to an envi­ron­mental variable named CCRULES. CodeCheck will use this vari­able to lo­cate rule files. Multiple pathnames may be specified in this environ­men­tal vari­able if this is desired. Separate each pathname with the charac­ter nor­mally used in your operating system (Unix: colon, DOS & OS/2: semi­colon, Macintosh: comma). On Unix systems only, if this variable is not defined then CodeCheck will look for rule files in a directory named /usr/CodeCheck.

4.    CodeCheck looks for header files in the paths listed in the INCLUDE en­vi­ron­mental variable. (On Macintosh systems it looks for CIncludes.) Make sure that this variable is correctly defined. This is im­portant: CodeCheck cannot function with­out access to the same header files that your C or C++ compiler uses. Multiple path­names may be specified in this envi­ronmental vari­able if necessary.

5.    If you have header files (e.g. system headers) that you wish CodeCheck to read but not check, then assign the pathnames of the directories containing these headers to an environmental variable named CCEXCLUDE. This step is not nec­es­sary for CodeCheck: it is useful only when you wish to apply rules to some but not all of the header files that are included in your C or C++ source files.

 

 

0.2    Command-Line Options

 

CodeCheck is invoked by means of a command line with either of these for­mats:

 

     check -options foo.c

     check foo.c -options

 

In this command line format foo.c refers to the name of the C source file to be analyzed. Any number of source files may be specified, arbitrarily intermixed with options.

The rules that are to be used to perform this analysis can be specified in the options list, as described below. If no rule file is speci­fied, Code­Check will look for a precompiled rule file named default.cco, first in the cur­rent directory and then in the directories specified in the CCRULES environment variable. If this file is not found, CodeCheck will per­form a simple syntactic scan of the source file without any user-defined rules. 

To analyze a multiple-file project with CodeCheck, either list all of the source filenames on the command line, or create a new file con­tain­ing the names of all of the source files (excluding the names of header files and li­braries). Give this pro­ject file the ex­tension “.ccp”. Then invoke CodeCheck, spec­ifying the pro­ject file in­stead of a source file:

   check -options myproject.ccp

CodeCheck will apply its rules to each source file named in myproject.ccp, and will apply project-level checking across all the files in the project.   The ccp ex­tension in­forms CodeCheck that the specified file is a project file rather than a C source file. This extension may be omitted in the command-line. Command-line options may also be specified in the project file, one per line. Every option placed in a project file applies to every source file in the project.

Command-line options are used to override default actions or conven­tions, or to indi­cate addi­tional actions that you want CodeCheck to perform. CodeCheck com­mand-line options are not case-sensitive. The available op­tions are:

–A       Reserved for CodeCheck expansion. Please do not use.

–B       Instruct CodeCheck that braces are on the same nesting level as material surrounded by the braces. If this option is not specified, then CodeCheck as­sumes that the braces are at the previous nest­ing level. This option only affects the prede­fined variable lin_nest_level.

–C       Reserved for CodeCheck expansion. Please do not use.

–D       Define a macro. The name of the macro must fol­low imme­di­ately. An optional macro definition can be specified af­ter an equal sign. The macro may not have any arguments. For ex­am­ple,

         check –DFOREVER=for(;;)

            has the same ef­fect as starting each source code file with

       

         #define FOREVER for(;;)

            If no macro definition is given, then CodeCheck assigns the value 1 to the macro by default.

–E       Do not ignore tokens that are derived from macro expan­sion when performing counts, e.g. of operators and operands. The de­fault (–E not specified) is for CodeCheck to ignore all macro-derived tokens when count­ing.

–F        Count to­kens, lines, operators, or operands when reading header files. The de­fault (–F not specified) is for CodeCheck not to count to­kens, lines, operators, or operands when reading header files.

–G       Do not read each header file more than once. Caution: Some header files are designed to be read multiple times, with con­di­tional access to different sections of the header.

–H       List all header files in the listing file. The –L option is as­sumed if this option is found. If –L is found without –H, then the listing file created by CodeCheck will not display the con­tents of header files.

–I         Specify a path to search when looking for header files. Use a sepa­rate –I for each path.  The pathname must follow –I, e.g.

        check -I/usr/myheaderpath src.c

            Header directory pathnames identified with the –I command-line option are searched before any directory paths listed in the the INCLUDE environmental variable. CodeCheck Unix only: the default header directory path is /usr/include.

–J        Suppress all CodeCheck-generated error messages, e.g. syntax warnings. This op­tion does not suppress warning messages generated by rules.

–Kn     Identify the dialect of C to be assumed for the source files. A digit should follow immediately, which identifies the di­alect. The dialects of C and C++ currently available are:

                      0      Strict K&R (1978) C

                      1      Strict ANSI standard C

                      2      K&R C with common extensions

                      3      ANSI C with common extensions (default)

                      4      AT&T C++  

                      5      Symantec C++ 

                      6      Borland C++ 

                      7      Microsoft C++ 

                      8      IBM Visual Age C++

                      9      Metrowerks CodeWarrior C/C++

                    10      DEC Vax C and HP/Apollo C.  

                    11      Metaware High C

            If this option is not specified, then CodeCheck will assume that the source code is ANSI with common exten­sions (–K3).

            If option –K is specified with no digit following, then Code­Check will assume that the user meant strict K&R C (–K0).

–L       Make a listing file for the source file or project, with Code­Check mes­sages inter­spersed at appro­priate points in the list­ing. The name of the listing file may follow im­me­diately. If no name is given then the listing file will be check.lst.  The listing file will be created in the current directory, unless a target directory is specified with the –Q option.

–M      List all macro expansions in the listing file. Each line contain­ing a macro is listed first as it is found in the source file, and second as it appears with all macros expanded. The –L op­tion is as­sumed if –M is found. If –L is found with­out –M, then the list­ing file created by CodeCheck will not ex­hibit macro expan­sions.

–N       Allow nested /* ... */ comments.  

–NEST         Allow C++ nested class definitions.

–O       Append all CodeCheck stderr output to the file stderr.out. This is useful for those using the MS-DOS operating system, which does not permit the redi­rec­tion of stderr output. .out; .out;

–P        Show progress of code checking. When this option is given, Code­Check will identify each file in the project as it is opened, and each function definition as it is parsed.

–Q       Specify a target directory. The pathname of the directory into which all CodeCheck output files are to be placed must follow immedi­ately, e.g.

         check -L -Q./temp mysource.c

            Ex­amples of such output files are the listing and prototype files. If this option is omitted CodeCheck will write its output files to the current working directory.

–R       Specify a rule file. The name of the rule file must follow im­me­diately, e.g. if the rule file name is foobar.cc and the C or C++ source filename is mysource.c:

         check -Rfoobar mysource.c

            Code­Check first looks for a object (i.e. compiled) rule file of this name (e.g. foobar.cco). If this file is out-of-date or not found, Code­Check will recom­pile the rule file (foobar.cc) into an object file (foobar.cco) before proceeding to apply these rules to the source file.  

            More than one –R file may be specified: in this case all the rules will be compiled together into an object file named temp.cco.

            If no –R file is specified, CodeCheck first looks for an object file named default.cco. If this file is found then it’s rules are used. If it is not found then checking proceeds with no user-defined rules.

–Sn      Apply rules while reading header files. A digit should follow immediately, which identifies the kinds of header files:

                      0      No header files (default).

                      1      Headers enclosed in double quotes.

                      2      Headers enclosed in angle brackets.

                      3      All header files.

            For example, suppose that these two lines are in a source file:

 

           #include <ctypes.h> //  A standard system header

           #include "project.h"      //  An application header

            When option –S1 is in effect, CodeCheck will apply it’s rules to project.h but not ctypes.h. Please note that CodeCheck must always read every header included in a source file — this op­tion only determines whether or not CodeCheck rules will be applied to the con­tents of the various headers.

            CodeCheck’s default behavior is not to apply its rules to the contents of any included header files.

            The environmental variable CCEXCLUDE, if it is used, takes prece­dence over this option. Rules are never applied to files that are found in directories listed in this variable.

–SQL  Enables embedded SQL code. Note: this option must be spelled in all uppercase. 

–T       Create a file of prototypes for all functions defined in a pro­ject. The name of the prototype file may follow im­me­diately. If no name is given then the name for the prototype file will be myprotos.h.     The prototye file will be created in the current directory, unless a target directory is specified with the –Q option.

–U       Undefine a macro constant. The name of the macro must fol­low im­mediately. Thus  check -UMSDOS foo.c  has the ef­fect of treat­ing foo.c as though it began with the pre­proces­sor direc­tive #undef MSDOS.

–V       For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

–W      For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

–X       For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

–Y       For CodeCheck users. See Section 1.4 of the Reference Manual for usage suggestions.

Z        Suppress cross-module checking. Macro definitions and vari­able and function declarations will not be checked for con­sis­tency across the modules of a project.

Any letter of the alphabet may be used as a command-line option. Every op­tion is remem­bered by CodeCheck and passed to the rule interpreter. Code­Check rules can refer to and change these options by calling the functions op­tion, set_option, str_option, and set_str_option (see Sec­tions 1.3–1.5 of the Reference Manual for de­tails).  Option –X is recommended for users who wish to design custom rule files whose behavior is controlled by a com­mand-line option.


0.3    CodeCheck File Names 

The conventions used by CodeCheck for filename extensions are:

.cc        A CodeCheck rule file, containing a set of rules for com­pila­tion by Code­Check. These rules are written in a subset of the C language. Code­Check requires that this extension be used for rule filenames, though it may be omitted in the –R command-line option.

.cch     A CodeCheck header file, for inclusion in a CodeCheck rule file.

.cco     A CodeCheck object file, produced by the CodeCheck compiler. This file contains a compilation of the rules found in the rule file with the same prefix.

.ccp     A project file for CodeCheck. This file contains a simple list of the file­names of all of the source modules that comprise a pro­ject, one filename per line. Header files and li­braries should not be listed in this file.

 

Depending on command line options, the following files may be created by CodeCheck:

check.lst        The default filename for the listing file (–L option).

myprotos.h      The default filename for the prototype file (–T option).

stderr.out      The filename for stderr output (–O option).

temp.cco           The object file created by CodeCheck when more than one rule file is specified (–R option).


0.4    How to Use CodeCheck

 

 

0.4.1  A Single User with Prepackaged Rules

Let us suppose that you simply want to check your C source file foo.c for some of the common errors that are not usually detected by C compilers. You want to see the warning messages in context, in a listing file. The command is

   check –Rerror –L foo.c

When CodeCheck completes execution, open the listing file check.lst with your editor. Each warning will be shown under the line that caused the warn­ing, with a marker immediately under the token that was being scanned when the er­ror was detected.

This command made use of the prepackaged rule file error.cc, supplied by Abraxas. Some other prepackaged rule files that you may find helpful are:

Tutorial and example rule files:

 

•     dcl.cc.cc          Example rules that use the declarator variables.

•     cplist.cc           Lists and describes all classes in each module.

•     cplus.cc           Example rules for C++ style checking.

•     declare.cc        Interpret global declarations in ordinary “English”.

•     fcncalls.cc       Generate a list of functions called by each function.

•     forward.cc       Example rules that illustrate forward chaining.

•     lex.cc               Example rules that use the lexical variables.

•     nesting.cc        Example rules for measuring iteration nesting.

•     oometric.cc     Computes several object-oriented metrics.

•     order.cc           Check for standard ordering of file elements.

•     pp.cc               Example rules that use the preprocessor variables.

•     prefix.cc          Example rules for checking declarator prefixes.

•     sample.cc        Example rules for compliance with standards.

•     wrapper.cc      Detect headers and #includes that are not “wrapped”.

Production rule files:

•     ansi.cc Check for compatibility with the ANSI C standard.

•     BSD43.cc        Check for use of BSD 4.3 features that are not POSIX.

•     braces.cc         Check for consistent use of braces in high-level statements.

•     complex.cc      Measures of program complexity (McCabe, etc).

•     error.cc           Check for errors that compilers may not find.

•     fromHP.cc       Check for portability from HP/Apollo C.

•     fromVAX.cc    Check for portability from VAX C.

•     general.cc        Check for general portability.

•     indent.cc          Check for proper indentation.

•     Halstead.cc     Measures of program size developed by Halstead.

•     logical.cc        Check for if-conditions that are too complex.

•     maintain.cc      Check for general maintainability.

•     posix.cc           Check for violations of the POSIX namespaces.

•     size.cc Measures of program size based on lines & statements.

•     style.cc            Check for compliance with Comeau's C style standards.

•     SVID.cc           Check for use of SVID features that are not POSIX.

•     toIntel.cc         Check for portability to the Intel iC-386 V4.2 compiler.

•     toKR.cc           Check for portability to the 1978 K&R C standard.

•     toMPW.cc       Check for portability to Macintosh MPW C version 3.2.

•     toSuncpp.cc     Check for portability to Sun C++ version 2.1.

•     toVAX.cc         Check for portability to VAX C.

 

0.4.2  Multiple Users with Custom Rules

Large corporations with many programmers often have staff assigned to maintaining the tools used by these programmers, including tools like Code­Check which are used for quality assurance. It is often appropriate to assign to a single individual the responsibility to write and maintain a CodeCheck rule file that encodes the corporate standards for C style. This compiled rule file would be placed in a network directory with the name default.cco. Then each pro­grammer can check his or her code with a command like:

   check foo.c

Assuming that each programmer has defined an environmental variable CCRULES that points to the directory containing default.cco, this command will cause CodeCheck to apply the corporate rules to his or her source code.

Note: Please contact Abraxas for site license information.


Introduction to CodeCheck

 

 

1.1    The Elements of Style

 

Every programmer has a distinctive style of writing. This style is an ex­pres­sion of many things: the programmer’s sense of æsthetics, the demands of speed and effi­ciency, the requirements of the customer, the needs of main­ten­ance pro­grammers, and the possibility that the program will need to be ported to another computer or trans­lated into another language. Many of these ele­ments of style require careful value judgments by the programmer or project leader. Once the stylistic requirements are clearly defined, Code­Check can be an invaluable tool for monitoring each of these ele­ments of program style.

The principal elements of good programming style are the requirements of æs­thet­ics, maintenance, and portability. Fortunately, there are significant over­laps between these elements, as illustrated below. Can C programmers achieve a style that is at once portable, easily maintainable, and elegant? The answer is an emphatic “yes”, and CodeCheck can help any C or C++ pro­grammer to develop his or her personal style to­wards this goal.

Figure 1:  Overlap among the elements of good programming style.

 

There are several general principles that govern the overlap between these el­e­ments:

1.   Code that is technically “portable” but not easily main­tained is not truly portable. A nontrivial C program that is universally portable is a very rare ani­mal in­deed. There are so many dark and dusty corners in the syntax and seman­tics of C that universal portability is next to impossible. Therefore, programmers must make the purpose of their portable code evident — and this is the essence of main­tain­ability.

2.   Code that is elegant without being maintainable will have a short life. What good is elegant code if even its author cannot understand it one year later? The C grammar lends itself to code that is highly abstract and su­per­fi­cially ele­gant, espe­cially in the area of character processing, but mainte­nance program­mers may consider this style to be anything but æsthetically pleasing. What is wrong here is the equation of elegance with æsthetics: the two con­cepts are not identical.

3.   Simplicity lies at the heart of portability, maintainability, and æs­thetics. For reasons difficult to understand, many C programmers never learn this essen­tial truth. Quite possibly the fault lies not in the language itself, but in the cul­ture that has grown up around it. This culture seems to value complexity, density and abstractness over all other considerations, perhaps for the sheer fun of creat­ing puzzles that others find im­possible to solve. What­ever the reason, these val­ues militate against clean, simple and under­stand­able code.

 


1.2    Why Programs Break

 

Despite the C language’s reputation for portability, it is an unfortunate fact of life that an appar­ently flawless C program will usually fail when compiled with a different com­piler, or under a differ­ent operating system, or on an­other type of com­puter. The rea­sons for this fact of life are many, but there are three principal forces at work which underlie almost all such problems.

 

1.2.1  Force #1: Language Parochialism

First, most programmers become thoroughly versed in only one imple­menta­tion of C, on only one computer, under only one operating system. As they learn more about this programming envi­ronment, they unwittingly be­gin to use its many nonstandard features, and to take advantage of each of its quirks and foibles. These nonstandard features are typi­cally concentrated in the lowest lev­els of operation: the preprocessor and the lexical an­alyzer of the compiler, the file management routines of the operating system, and the bit-manipulation in­struc­tions of the machine. Inexperienced program­mers do not real­ize the extent to which computer languages are dependent on low-level con­ventions, and are wholly unaware of the implications of using non­stan­dard features. Unfortu­nately, these low-level nonstandard fea­tures tend to spawn an unending stream of the most amazingly mysterious bugs, often months or years after a program is first written.

 

1.2.2  Force #2: Programmer Machismo

The second force at work to make programs break is pro­grammer mach­ismo. Many programmers are young, smart, brash, fearless, and anxious to prove themselves to be wickedly clever. These are the program­mers who write macros like

#define put(x,p) (--(p)->cnt>=0?(*(p)->ptr++=(x)):flush(x,p))

and think that by getting all this power on one single completely un­docu­mented line they have achieved some­thing special. What they have actually created is a main­te­nance nightmare for someone else. (This delight­ful exam­ple is due to An­drew Koenig, who dis­sects a slightly different version on page 80 of C Traps and Pitfalls).

 

1.2.3  Force #3: Compiler Drift

The third program-destroying force operates not on application program­mers, but on compiler writers. This force is the almost ir­resist­ible temptation to in­clude a new feature or language ex­ten­sion (non­stan­dard, naturally) that will ease the life of pro­gram­mers and sell lots more compilers. The tempta­tion to in­clude these fea­tures is certainly not all bad, as it does gen­erate an endless stream of new ideas for compilers, but it feeds directly into the other two forces. The re­sult, if un­controlled by tough pro­ject management and pro­grammer self-dis­ci­pline, is code that is neither maintainable nor ­port­able.

 

1.2.4  CodeCheck can help!

For the C language programmer, to defend against all programming prac­tices that can threaten portability or maintainability is a task requiring both an ency­clopædic knowledge of C compilers and an almost superhuman level of self-dis­cipline and at­tention to detail. For the project leader, the task of enforc­ing uni­form standards for code structure and style is a severe test of the ability to read and cri­tique great vol­umes of dense code. CodeCheck is de­signed to automate these tasks:

• The C programmer can use CodeCheck to review his or her code at the end of the day, and to identify ques­tionable con­struc­tions that might have crept in. This daily Code­Check pro­gram can implement many lint-like code checking opera­tions, as well as checking for adherence to pro­ject style specifica­tions.

• The project leader can use a different CodeCheck program on a weekly ba­sis to verify the programmers’ adherence to the project style specifica­tions, to quantify the amount of code pro­duced, and to measure critical qualities of the code, e.g. den­sity and complexity.

• Software contractors are frequently required to certify that their code con­forms to published governmental and industrial standards for code com­plexity, among them the McCabe and Hal­stead measures. A CodeCheck program can be run at the con­clu­sion of a project to docu­ment these partic­ular measures, and many others too.


1.3    Why Programs Fail to Port

 

1.3.1  The Many Standards for the C Language

There seem to be no fewer than four “standards” for the C language, all of which are covered by CodeCheck. Figure 2 depicts the family tree for C stan­dards, with the earliest version on top:

Figure 2:  The Evolution of C Standards.

 

Each descendent of the original C has added significant extensions to the original language, while trying to remain true to the spirit of C.

◊    The K&R standard, as described in the first edition of Kernighan & Ritchie (1978). This is certainly the single most influential book in the history of C. The lan­guage was only loosely defined in this “stan­dard,” however, and it lacks many of the popu­lar features that are commonplace now (e.g. enum­er­ated constants, prototypes, the void type). Although obsolete, there are still many K&R compilers in daily use around the world.   

◊    The H&S standard, as described in the first edition of Harbison & Steele (1984). This was the first careful description of the K&R stan­dard, with many mod­ern ex­ten­sions included (e.g. the enum and void types). The H&S stan­dard represents a transi­tional phase be­tween K&R and ANSI. Most pre-ANSI compilers in use today are best described as adhering to the H&S standard.

◊    The ANSI C standard, as defined by the American National Stan­dards In­stitute and certified internationally as ISO/IEC 9899. This version represented a significant ad­vance in precision over H&S. It also introduced several significant innovations (e.g. the prepro­ces­sor paste operator).

◊    The POSIX standard, as defined by the American National Stan­dards In­stitute and certified internationally as ISO/IEC 9945. Part 1 of this standard includes and extends the ANSI C standard, and de­tails the interface and behavior of a standard library of op­erat­ing system services.

◊    The C++ 2.0 standard, as defined in “The Annotated C++ Refer­ence Manual,” by Ellis and Stroustrup (1990). This book is the base doc­u­ment for an ANSI committee that is now de­veloping an official standard for C++.

◊    The C++ 3.0 standard, as defined in “The C++ Programming Language Manual, 3rd Edition” by Bjarne Stroustrup (1997). This book is the base document of the pending ANSI C++ standard.

 

 

1.3.2  Two kinds of incompatibility

It is useful to break down a portation problem into two separate sources of in­compati­bility:

1.   The source environment will invariably have a vari­ety of idiosyn­crasies which are com­mon to no other, and which differ from the ostens­ible standard on which the envi­ronment is based. These differences are source portation problems.

2.   The target environment will differ somewhat from the stan­dard on which it was based. These differences are target portation problems.

The prepackaged CodeCheck rule files supplied by Abraxas with Code­Check include several that address source and target portation problems. The source portation rule files have names that begin with “from”, as in “fromVax.cc” , which detects special keywords and other peculiarities that are found only on Vax C compilers. The target portation rule files have names that begin with “to”, as in “toKR.cc”, which tests for non-K&R syn­tax and keywords.


1.4    The Structure of CodeCheck

 

A CodeCheck program looks just like a very simple C program. Indeed, CodeCheck programs are written using a small subset of the C gram­mar, so any­one who can read C can also read CodeCheck. A CodeCheck pro­gram is, in fact, just a collection of if-statements (called “rules”) and variable decla­rations. The CodeCheck interpreter translates this collection of rules into pseudocode, which is used during the analysis of a C source to control the code checking operation.

Figure 5: Actions of the two components of CodeCheck.

To analyze a C source file, the user has only to specify the name of the C source file and the name of the CodeCheck program. The Code­Check program will be compiled (if necessary), and then the C source file is analyzed in accor­dance with the CodeCheck rules. As depicted in Figure 5, Code­Check has two logically separate components — the Code Analyzer and the Rule Compiler.

 

A brief bibliographic note

For those who are interested in referring to original sources, this manual makes many references to the C literature. These are given in a compressed for­mat, as illus­trated below. Details (title, etc.) are given in the bibliography.

HS84:182              means              Harbison & Steele, 1984, page 182.

RJ88:52                 means              R. Jaeschke, 1988, page 52.

AK89:99               means              A. Koenig, 1989, page 99.


1.5    Debugging with CodeCheck

 

CodeCheck is, in addition to all of its other functions, a sensitive bug de­tector, ca­pable of identifying subtle bugs that many compilers miss. A pro­gram that com­piles without error may still fail to pass CodeCheck’s rigor­ous cross-module syntactic and semantic analyses. There are two common rea­sons for this: (a) your program de­viates from strict C in ways that your com­piler permits, or (b) your program actually has a fault whose presence has gone unnoticed. The former case is a mild problem — it only implies a lack of portability — but the latter case may be quite serious.

To use CodeCheck as a bug-detector, use the rule file error.cc for an effi­cient “once-over-lightly” check of your project or any one of its source files. Even with no rule file at all, CodeCheck still performs a tremendous variety of unusual syntac­tic and semantic checks on its input. Given an entire pro­ject, for exam­ple, Code­Check will compare ex­ternal declarations and macro defini­tions across files and will advise if any discrepan­cies are found — re­gardless of how many or how few rules are included in the rule file. Please note that Code­Check does not per­form many of the standard semantic checks that all C and C++ compilers do. CodeCheck is designed to provide error checking that complements the checking performed by your compiler.


1.6    Predefined Macro Constants

 

CodeCheck has a predefined macro constant, CODECHECK, which is designed to permit conditional checking of C code. This macro constant has the value 100*(CodeCheck version number). Thus in CodeCheck version 8.02 this con­stant will have the value 802.     

The CODECHECK macro can be used to hide code from Code­Check, so that it will not be checked. This is extremely useful, for example, when in-line as­sem­­bler code is intermixed with C code. Here is an example:

 

#ifndef CODECHECK

     •••

     •••   /* Code to be hidden from CodeCheck  */

     •••

#endif

The lint macro can be used in exactly the same way. CodeCheck pre­de­fines this macro with the value 2.

CodeCheck also has another predefined macro constant BETA, for a version in format x.xxBy, BETA has value of y.  For a version in format x.xx, the value of BETA is 0. Combined with macro CODECHECK, you can distinguish different minor releases.

Depending on the specific environment for which it is implemented, Code­Check will predefine certain additional macro constants. These con­stants (as of CodeCheck version 8.02 ) are listed in the following table. See the update notes that accompany Code­Check for the latest ad­ditions and changes. The table of macros is organized by operating system and compiler. Any of these may be changed by the user on the com­mand line (using the –D or –U op­tions) or from within rules (using the CodeCheck define and undefine functions).


Constant   Value   Comment        

 

CODECHECK                 802           Major Rev 802

BETA                      3             Minor Rev 3

lint                      2            

__STDC__                  1             Option -k2 only.

__STDC__                  0             Except option -k2.

__cplusplus               1             C++ only (-k4 through –k9).

cplusplus                 1             C++ only (-k4 through –k9).

__FILE__                  <file name>

__LINE__                  <line number>

__DATE__                  <date>

__TIME__                  <time>

Unix Operating System

unix                      1

__unix                    1

                         

Constant                                 Value                     Comment                                      

DOS Operating System

MSDOS                     1

M_I386                    1

M_I86                     1

M_I86LM                   1

__386__                   1            

__I386__                  1            

__MSDOS__                 1            

__LARGE__                 1             Except option -k7 or -k11    

__BORLANDC                0x0500        Except option -k7 or -k11

__TURBOC__                0x0500        Except option -k7 or -k11

_WIN32                    1

OS/2 Operating System

__OS2__                   1

__FLAT__                  1

__IBMC__                  200           Except option -k6

__IBMCPP__                200           Option -k4 only

__32BIT__                 1             Except option -k6

_M_I386                   1             Except option -k6

NT Operating System

i386                      1

MSDOS                     1

_M_IX86                   300

_MSDOS                    1

_X86_                     1

_WIN32                    1

VMS Operating System

vax                       1

vms                       1

vaxc                      1

vax11c                    1

VAX                       1

VMS                       1

VAXC                      1

CC$gfloat                 1

CC$parallel               1

Macintosh Operating System

applec                    1

MC68000                   1

mc68000                   1

m68k                      1

macintosh                 1

Borland C++

__BCPLUSPLUS__            0x0340

__TCPLUSPLUS__            0x0340

__CDECL__                 1

_Windows                  1

                         

Constant                                    Value                  Comment                                      

Borland C++ continued:

 

__TEMPLATES__             1

wchar_t                   short

Microsoft C++

__single_inheritance                   Expands to nothing.

__multiple_inheritance                 Expands to nothing.    

__virtual_inheritance                  Expands to nothing.

_M_I86                    1             Except Windows NT.

_M_I86LM                  1             Except Windows NT.

_M_IX86                   300

_MSC_VER                  1200

_MSDOS                    1

_X86_                     300

i386                      1

MSDOS                     1

_WIN32                    1

Metaware High C

__HIGHC__                 1

Symantec C++

__SC__                    700

IBM Visual Age C++

__IBMCPP__                350

Metrowerks CodeWarrior

__MWERKS__                1

 

Debugging your source code with the CodeCheck preprocessor can be greatly enhanced  by using the "-D?" switch, which will display the current state of CodeCheck internal symbol table for the pre-processor. If a particular intrinsic definition is non-desirable then the "-U" switch can be used to undefine the macro.

The CodeCheck product does not include the C/C++ system header files ( stdio.h, iostream.hpp, … ). These must be obtained from your compiler vendor, e.g. if source code to be analyzed by CodeCheck explicitly references stdio.h ( #include <stdio.h> ), then that header file must be available for CodeCheck to analyze. In summary for CodeCheck to analyze source code, all parts of the entire project to be analyzed must be present in order for the analysis to be successful.


Chapter 3: Checking Types

 

This section describes how to use CodeCheck to analyze type information with CodeCheck rules. These techniques are necessary for those who wish to write their own CodeCheck rules which detect conditions that depend on type information.

The ability to detect and analyze type information in declarations has been a part of CodeCheck since version 4.03. However, the ability to detect and analyze type information within executable code is relatively new to Code­Check, having been introduced for C in version 5.04, and for C++ in version 5.05. Please refer to the CodeCheck Reference Manual for the exact definitions of the CodeCheck variable and functions mentioned in this chapter, and the Abraxas Technical Note ( www.abraxas-software.com/TechNotes.html ) series for recent changes and enhancements.

There are five broad categories of type-related rules that CodeCheck can en­force. Rules can be written that detect:

1.         a declaration of any specified type,

2.         a cast to or from any specified type,

3.         an implicit type conversion to or from any specified type,

4.         use of a variable or function of any specified type,

5.         use of an operand of any specified type for any specified opera­tor.

In addition, CodeCheck auto­mati­cally checks function argument types for com­patibility with the prototype for the function called, if one is in scope, and also checks the function return value for compatibility with the declared function return type.

 

3.1    How to Analyze a Type Declaration

Types in C and C++ are either simple or complex. A simple type consists of an unmodified base type, e.g. int or float, with possible qualifiers such as const, volatile, near, far, huge, export, etc. A complex type has a simple type as its base, and in addition has one or more additional levels, e.g. pointer to…, array of…, function returning…, or reference to… (the latter is allowed only in C++ and a few nonstandard C dialects). Each of these levels may have also have qualifiers (e.g. const, pascal, interrupt, etc.).

When an identifier is declared, CodeCheck set the variable dcl_levels to the number of levels in the type. Thus for simple variables dcl_levels will be zero. CodeCheck sets the variable dcl_base to an integer that identifies the base type of the identifier. The possible values of dcl_base are defined as manifest constants in the CodeCheck header file check.cch, which should be included in every rule file that makes use of type-checking services. Here are the first five base types from check.cch:

 

     #define VOID_TYPE            1

     #define CHAR_TYPE            2

     #define SHORT_TYPE           3

     #define INT_TYPE             4

     #define LONG_TYPE            5

As an example of the use of these CodeCheck variables to detect a specified type, here is a rule that will issue a message whenever a simple variable of type char is declared:

 

     if ( dcl_base == CHAR_TYPE )

        if ( dcl_levels == 0 )

           warn( 1234, "Variable %s is a char.", dcl_name() );

When the type in a declaration is complex, the function dcl_level() returns an integer that identifies the kind of each level. Here is an example rule that prints out the type of every global or local identifier that is declared:

 

     int  i, kind;

 

     if ( dcl_global || dcl_local )

        {

        printf( "Variable %s:  ", dcl_name() );

        i = 0;

        while ( i < dcl_levels )

           {

           kind = dcl_level( i++ );

           switch( kind )

              {

           case ARRAY:

              printf( "array of " );

              break;

           case POINTER:

              printf( "pointer to " );

              break;

           case REFERENCE:

              printf( "reference to " );

              break;

           case FUNCTION:

              printf( "function() returning " );

              break;

              }

           }

        printf( "%s\n", dcl_base_name() );

        }

The manifest constants ARRAY, POINTER, REFERENCE, and FUNCTION are de­fined in check.cch. In addition to dcl_level(), this rule also used the func­tions dcl_name() and dcl_base_name(). These functions return the declarator name and the name of the base type of the declaration, respectively. The vari­ables in the trigger for this rule are dcl_global and dcl_local, which Code­Check sets to 1 when a global or local identifier is declared, respectively. (In this context “global” means file scope and external linkage, while “local” means function or  block scope.)

To obtain the type qualifiers for each level of a type, including the base type level, use the function dcl_level_flags(). This function takes as its argument the level, just like dcl_level(), and returns an integer that has a bit set for each qualifier present. The header file check.cch contains manifest constants that can be used as masks to obtain each of these qualifier flags. For example, the fol­lowing rule can be used to detect declarations in which the first level has the const qualifier:

 

if ( dcl_level_flags(0) & CONST_FLAG )

     warn( 1234, "%s has been declared constant.", dcl_name() );

There are many other declarator variables and functions that are useful for analyzing declared types — refer to the CodeCheck Reference Manual for details.

 

 

3.2    How to Determine the Type of an Identifier

The CodeCheck variables and functions for determining the type of an identifier that is used within executable code are almost identical to those for declarations, except that they carry the prefix idn_ instead of dcl_.

The function idn_filename() and the variable idn_line can be used to determine the location (i.e. file name and line number) of the declaration that is currently in scope for the identifier as it is used in the executable code.

The variables idn_global, idn_local, idn_member, and idn_parameter also resemble their dcl_ counterparts: they are used to determine whether the identifier has global, local, or class scope, or is a function parameter, respectively.

 

 

3.3    How to Determine the Type of an Operand

The CodeCheck variables and functions for determining the types of all the operands of executable operators are similar to their counterparts for declara­tions and identifiers. The major difference is that they require an additional argument which specifies which operand to describe.

Operators may be unary (one operand, e.g.  ~ ), binary (two operands, e.g. += ), or ternary (three operands, e.g. ?: ). Functions have as many operands as they have arguments. Whenever an executable operator is encountered, Code­Check sets the appropriate op_ variables to indicate which operator was found, and sets op_operands to the number of operands taken by this operator.

The CodeCheck functions op_base(), op_levels(), op_level(), and op_level_flags() differ from their declarator and identifier counterparts in only one way: their first argument specifies which operand is to be described. The operands are indexed from right to left. Thus the rightmost operand is the operand 1, while the leftmost operand is the last (index given by op_operands). For example, in the expression x = a + b the first operand for op_add is b, and the second is a. For the operator op_assign, the first operand is the result of (a+b), and the second is x.

In the special case of the cast operator, the first operand is the type of the value to be cast, while the second operand is the result type after the cast has been performed. Here is an example rule that detects every cast of a pointer to a struct XYZ, to any result type, i.e. a cast that looks like (struct XYZ *):

 

     if ( op_cast )

        {

        if (  (op_levels(1) == 2)                &&

              (op_level(1,0) == POINTER          &&

              (op_base(1) == STRUCT_TYPE)        &&

              (strcmp(op_base_name(1),"XYZ") == 0) )

           warn( 1234, "Cast from (struct XYZ *) to anything." );

        }

 

 

3.4    How to Detect Implicit Type Conversions

Implicit type conversions can happen in three different contexts. First, when a value of one type is assigned to a value of another type, without an explicit cast, then an implicit type conversion takes place. Second, when the type of an argument that is passed to a function differs from the type of the formal parameter in the prototype for the function, then it must be implicitly converted. Third, an implicit type conversion occurs when type of the value in a return statement differs from the return type of the function.

The CodeCheck functions used for determining the types involved in all implicit type conversions are the same functions as used for operators. However, to detect an implicit type conversion, one of the cnv_ variables must be used as the trigger in the rule. When one of these variables is set, then CodeCheck uses the op_ functions as though a cast operator were present for the conversion.

Here is an example rule that detects every implicit conversion of anything to a pointer to a struct XYZ:

 

     if ( cnv_any_to_ptr || cnv_ptr_to_ptr )

        {

        if (  (op_levels(2) == 2)                &&

              (op_level(2,0) == POINTER)         &&

              (op_base(2) == STRUCT_TYPE)        &&

              (strcmp(op_base_name(2),"XYZ") == 0) )

           warn( 1234, "Implicit conversion to (struct XYZ *)." );

        }

 


Chapter 4: Portable Style

 

This section describes how to use CodeCheck to monitor style for port­abil­ity. Many of the rules described here are based on internal corporate standards for C coding that have been made available to Abraxas Software, and also on the rec­ommendations of two influential books: C Programming Guide­lines, Second Edition, by Thomas Plum, and Portability and the C Language, by Rex Jaeschke.  Jaeschke,Rex;

The guidelines and recommendations found in these sources were de­signed pri­marily to enforce both portability and maintainability. In the sub­sec­tions that follow, those guide­lines that primarily affect program portabil­ity are pre­sented. Each guideline is followed by a description of the relevant CodeCheck vari­ables that can be used to construct corresponding CodeCheck rules.

 

 

4.1    Lexical Issues in Portability

 

4.1.1  Lexical Rules for Variable Names

C programmers have evolved a great variety of lexical guide­lines for vari­ables and their dec­lara­tions. The guide­lines reported here are specifically in­tended to ensure portability. Additional guidelines that are intended to im­prove maintainability may be found in Chapter 4.

1.   Spell variable names in lower case only.

2.   Names with external linkage must be unique in their first 6 characters.

3.   Do not begin an identifier with an underscore character.

The predefined CodeCheck variables which are used to detect violations of the above guide­lines are, re­spectively:

Variable                                  Meaning

dcl_any_upper                 Set to 1 if an upper-case character is found in an iden­tifier name when it is declared.

dcl_extern_ambig          If two external identifiers have names that agree on the first 6 or more charac­ters, regardless of case, then this variable is set to the number of consecutive char­ac­ters on which they agree.

dcl_underscore              Set to 1 if the name of a declared identifier begins with an un­derscore character.

 

 

4.1.2  Nonstandard Characters

Many C compilers allow the use of characters that are not in the standard C char­acter set. Nonstandard characters are, by definition, not portable. The stan­dard char­acters are (HS88:6):

a b c d e f g h i j k l m n o p q r s t u v w x y z

a b c d e f g h i j k l m n o p q r s t u v w x y z

0 1 2 3 4 5 6 7 8 9

! # % ^ & * ( ) - _ + = ~ [ ] \ | ; : ' " { } , . < > / ?

blank      newline     backspace     horizontal-tab

vertical-tab     form-feed     carriage-return

 

Recommendation: For general portability, do not use nonstandard charac­ters. Note that the char­­ac­ters $ and @ are nonstandard. CodeCheck provides a prede­fined vari­able with which to detect nonstandard characters:

Variable                                  Meaning

lex_nonstandard            Whenever a char­acter is found that is not in the stan­dard C set, the value of this variable is set to the inte­ger rep­resen­ta­tion of the nonstandard.

To ignore nonstandard identifiers,  call  function skip_nonansi_ident().

Function                                 Meaning

skip_nonansi_ident( char ) Skip non-ANSI identifiers beginning with '@','$' or '`'. Char parameter of this function specifies the character which leads the identifier. The value of  the parameter only can be '@', '$' or '`'. The other characters have no effect for this function.

 

4.1.3  Trigraphs

Trigraphs are special 3-character sequences introduced in ANSI C. Tri­graphs are sig­nificant as a portability issue only to the extent that older pro­grams which unwit­tingly use trigraph sequences within literal strings will no longer compile correctly (RJ:28). ANSI C compilers translate trigraph se­quences into their cor­re­sponding sin­gle ASCII characters at a very early stage in the lexical analysis phase of compilation. The trigraph sequences and their corres­ponding ASCII characters are:

                  trigraph          meaning

              ??=         #

              ??(         [

              ??/         \

              ??)         ]

              ??'         ^

              ??<         {

              ??!         |

              ??>         }

              ??-         ~

Recommendation: Search old programs for the ?? symbol pair, and re­place ev­ery occurrence with ?\?. Older compilers will simply ignore the back­slash, while ANSI compilers will treat ?\? as a question mark fol­lowed by an es­cape se­quence, thus recoding the string to ??. CodeCheck provides two pre­de­fined variables for identifying inadvertent trigraphs:

Variable                                  Meaning

lex_trigraph                   Set to 1 if an ANSI trigraph is found.

lex_str_trigraph          Set to 1 if a trigraph is found within a string lit­eral.

 

 

4.1.4  Numeric Escape Codes

Numeric escape codes are permitted in C, but they are nor­mally used to re­fer to control characters in the ASCII character set (HS84:25). Such nu­meric es­cape codes will cause a program to fail when it is compiled and used in a non-ASCII (e.g. EBCDIC) envi­ronment.

There is a potentially confusing aspect of hexadecimal escape sequences. Unlike octal sequences, which may have at most three digits, the ANSI  C stan­dard specifies that a hexadecimal escape sequence may have any number of digits. Thus the string literal "/xabcd" contains only one character (because a, b, c, and d are all valid hex digits).

Recommendation: Do not use numeric escape codes unless it is abso­lutely nec­es­sary. Carefully document the meaning of each such usage in the source code, and use a manifest constant (#define) to make its meaning apparent. Code­Check provides two predefined variables for detec­ting numeric escape codes:

Variable                                  Meaning

lex_hex_escape              When a hexadecimal escape sequence is found, this variable is set to number of hexadecimal digits found.

lex_num_escape              When a non-zero numeric escape sequence is found, the value of this variable is set to the value of the es­cape se­quence.

lex_zero_escape            When a zero escape sequence is found (e.g. \0, \00, \x0, \0x0), the value of this variable is set to 1 if the context is a character literal, or 2 if the context is a string literal.

 

 

4.1.7  Escape Sequences in Character and String Literals

An escape sequence within a character or string literal is signaled by the back­slash char­acter: \. The unrestricted use of escape sequences is a fre­quent source of portability problems.

1. For reasons that shall remain forever mysterious, many pre-ANSI com­pil­ers al­low the digits 8 and 9 to appear within an octal escape sequence, as in \09. This usage has always been hopelessly confusing, has always been non-portable, and is now, fortunately, also ungrammatical.

2. Many pre-ANSI compilers do not support the hexadecimal escape se­quence, as in \xA3. This usage is therefore not portable except among ANSI compilers. The hex­adecimal escape sequence \0xA3 (with a zero appearing be­fore the x) is even rarer, and is forbidden by ANSI.

3. In the K&R dialect of C the only defined escape characters are in this set:

        \n   \b   \t   \r   \f   \\   \"   \'

In all other cases when a backslash is followed by a character, the backslash is ig­nored. The H&S dialect allows one more escape, \v (vertical tab). The, ANSI standard includes three further escape characters: \a (alert or bell), \x (hexadecimal), and \? (to disam­biguate trigraphs). Thus pre-ANSI programs which rely on the backslash being ignored before the let­ters a, v, or x are no longer portable.

4. The empty character constant formed by two successive single quote marks ('') is not con­sistently interpreted by C compilers, and is not allowed by many. In particular, it is not al­ways the same as the null character '\0'.

5. Many C compilers allow character con­stants to have more than one char­ac­ter. Even if all com­pilers did allow this usage, it would still be non-portable due to the in­famous NUXI problem. (“NUXI” refers to the worst-case scram­bling of the string “UNIX” that can result when porting encoding character strings).

To detect the above portability problems, CodeCheck provides the follow­ing six predefined variables:

Variable                                  Meaning

lex_big_octal                 Set to 8 or 9, respectively, when a numeric escape se­quence or octal in­teger con­tains the digits 8 or 9.

lex_hex_escape              When a hexadecimal escape sequence is found, this variable is set to number of hexadecimal digits found.

lex_ansi_escape            Set to 1 if an escape sequence con­tains one of the new ANSI escape char­acters: a, v, or ?.

lex_not_KR_escape       Whenever an escape character is found that is not de­fined by K&R (i.e. \n, \b, \t, \r, \f, \\, \", \') then this vari­able is set to the integer repre­sen­ta­tion of the char­ac­ter.

lex_char_empty              Set to 1 if an empty character con­stant is found (e.g. ''). This variable does not flag the null char­ac­ter con­stant ('\0').

lex_char_long                 Set to 1 if a character constant is longer than one char­ac­ter.

 

 

4.1.8  System Variables

A useful convention has evolved within the C community in which iden­ti­­fiers that are de­fined and used by the system (i.e. variables used by the com­piler, the linker, or standard sys­tem header files) are spelled with a lead­ing un­derscore character. If pro­grammers con­form to this convention by never spelling an iden­tifier in this way, then name conflicts are prevented. Unfor­tunately, some pro­grammers are unaware of this convention, and may in­ad­vertently spell an identi­fier with a leading under­score. Even if the pro­gram compiles without error, it will break as soon as it is compiled on an­other sys­tem that happens to use one of these names. This is, therefore, an obscure but significant portability problem.

Recommendation: Adhere to this convention religiously. Do not spell iden­ti­fiers with a leading underscore character, unless you are writing system code. CodeCheck predefined variable:

Variable                                  Meaning

dcl_underscore              Set to 1 if an identifier begins with an un­derscore charac­ter.

 

 

4.1.9  The Numeric Constant Suffixes U, F, and L

The ANSI standard and some pre-ANSI compilers allow a suffix of U (for un­signed) or F (for float) on numeric constants, in precisely the same way that the suffix L (for long) is allowed in K&R C. Needless to say, this is not a portable us­age among pre-ANSI com­pilers.   

The suffix L is generally portable, except when it is used on a floating con­s­tant. Only some non-ANSI compilers recognize the long float type, so this is a non-portable use of the L suffix.

Recommendation: The new suffixes U and F solve many problems of nu­meric am­biguity, and should be used on the grounds that they increase pro­gram clarity and make main­tenance easier. The cost in terms of lack of porta­bility is sub­stan­tially less than the benefit for pro­gram clarity.

CodeCheck provides four predefined variables for suffix detection:

Variable                                  Meaning

lex_float                          Set to 1 if a numeric constant is found with the suf­fix 'F' or 'f'.

lex_long_float              Set to 1 if a floating constant is found with the suf­fix 'L' or 'l'.

lex_suffix