Next: About this document ...
STAT:7400 (22S:248) Computer Intensive Statistics
Coding Standards
Luke Tierney
Date: Spring 2016
Except for binary machine code, all computer code is intended to
be read by humans. Well-written code makes this easy, and coding
standards or guidelines help you create well-written code. Here
are some guidelines to follow in code written for this course.
[Adapted from coding
standards
for Roger Peng's Biostat 776 at Johns Hopkins University.]
- Program files should always be ASCII text files. Program files
should always be immediately source-able into R or read by a C
compiler in the case of C programs. If you cannot source your
file directly into R, then the file format is not
acceptable. Word processing programs like Microsoft Word, by
default, do not save files as text files.
- Always use a monospace font to write or display code. Variable
space fonts like Times New Roman are not appropriate and can
alter the apparent structure of a program (and hence its
readability).
- Always indent your code. If you use an editor like GNU Emacs,
then there is support for automatic indentation of code. I prefer 4
space indentation, as recommended in the R coding
standards.
Comments should be indented to the same level of
indentation of the code to which the comment pertains. Comments can
also appear at the end of a code line, if space permits (but see
below).
- Put spaces around operators and after punctuation marks like
commas and semicolons. This makes the code easier to read.
- Your code should not extend past 80 columns. This is because
standard Unix terminal windows are 80 columns wide and if your
code wraps around the end of the line it becomes very difficult
to read. Break long lines if you have to. Exceptions should be
made only for hard-coded constants (such as path names or URLs)
which cannot easily be wrapped or shortened.
- As a rule no function or subroutine should be longer that about
30 lines. In particular it should be fully visible, without the
need to scroll, in an editor using a reasonable font size.
Being able to see the full code helps in understanding the
logic of the code and helps limit the complexity of individual
functions. With lower level languages like C this rule
occasionally needs to be broken, but exceptions should be
thought through very carefully.
- Don't repeat yourself. In particular, don't cut and paste. If
you find yourself writing the same bit of code, or very similar
bits of code, multiple times then it is time to think about
abstracting the core idea out into a function of its own.
- Use a consistent scheme for naming variables. I happen to
prefer so-called Camel-case, as in fileLength to
file_length, but either is fine as long as you are
consistent.
- Ideally code should be sufficiently well factored into
functions and subroutines, with well chosen function and
variable names, to be easy to read and understand without
comments. Comments should be used only to explain non-obvious
steps in tricky computations, or to provide background or
attribution.
Good programming editors will help immensely in following good
programming practices.
Some other coding style guides:
Some useful tools:
- A good programming editor that is aware of R and C syntax.
- The indent command for formatting C code.
- The formatR package for formatting R code.
Next: About this document ...
Luke Tierney
2016-01-22