Program design

Yutaka Masuda

February 2020

Program design

We have looked at language elements in Fortran. I believe that the elements are enough to write small numerical programs. However, even though you get familiar with the language syntax, it would not be clear what is a good and sustainable program. The code sometimes becomes clumsy to read because you have to mix various pieces of code with different roles in the same program. A clear structure of the program helps you a lot to maintain the program.

This chapter demonstrates to clarify the program. When you become more familiar with programming, please read substantial textbooks for this topic, e.g., Software Tools by Kernighan and Plauger and The Art of Readable Code by Boswell and Foucher. Ondřej Čertík provides a good suggestion of programming style in Modern Fortran.

Programming style

Use of constants

If you need a numerical constant, you should declare it as constant with attribute parameter. Consider a case where you allocate a temporary array (vector) with 5 elements.

real,allocatable :: temporary(:)
...
allocate(temporary(5))

real,allocatable :: temporary(:)
integer,parameter :: leng=5
...
allocate(temporary(leng))

In programming, a numerical literal (like 5 in the above code) with some functionality is called a magic number. Sometimes a single magic number appears everywhere in the code and often changed. Defining it as a constant, you do not have to change the value everywhere in the code. Also, the magic number becomes more understandable.

Comments

The comments in your program should tell the future yourself for what the code is. The comments should be intended to show why the code is needed and what is the intention of the program. See the following piece of code.

! LESS MEANINGFUL:
! the constant holds 5
integer,parameter :: leng = 5

! MORE MEANINGFUL:
! the default length of a temporary array
! it may be too small, and it can be larger in the future
integer,parameter :: leng = 5

The above comment is describing what the code is doing; the comment does not provide anything useful for programmers (even though it can be helpful for beginner programmers). The comment should include the information and some background that are not directly understandable from the code.

Appropriate names

It is importrant to put appropriate names on variables, constants, and program units. A better name is self-descriptive, and what you can understand what it is when you look at it. In the above example, leng is too general, and perhaps, default_array_length is more useful. If it is too long, default_length can be acceptable.

You do not have to make every variable-name meaningful. For example, i is good enough for a counter in many cases. Variables a and b may be okay to describe small numerical algorithms. When your program is complicated, a descriptive name is more helpful for you to understand the code.

The naming rule can be unified when you work with collaborators. If you are the only person to run the project, you can define any naming rules. In Fortran with lowercase letters, the name often has an English word(s) and underscores (_). Fortran supports up to 63 characters for the name.

Fortran is not case insensitive, and the lowercase is generally recommended to write a program. In some cases, the uppercase would be useful to tell the programmer that the symbol has a particular meaning. For example, a matrix can be A or M rather than a and m (although it is not a meaningful name). The uppercase could be helpful to emphasize a constant and a special function to improve the readability the code. Anyway, do not abuse the uppercase.

Clear statements

A better program is readable. It clearly shows what it is. Compare the following 2 pieces of programs.

! program 1
sd = sqrt(((x-sum(x)/size(x))**2)/(size(x)-1))

! program 2
n = size(x)
mean = sum(x)/n
var = ((x-mean)**2)/(n-1)
sd = sqrt(var)

Two programs yield the same value, i.e., the standard deviation of elements in x. The second program should be more understandable, with variables holding meaningful values. A dense program looks nice, but it would not be readable when you look back at the code 3 months later. The above code becomes explicit in its purpose by defining a function for the variance.

! program 3
! function variance defined elsewhere.
sd = sqrt(variance(x))

In the past, the programming style might have an impact on performance (e.g., computing time). But now, the difference in performance among styles is ignorable in modern computers with recent compilers (except for really heavy computing). So, a more explicit code is preferred in general.

There are some tips to improve the readability of code, in addition to using constants, comments, and clear naming rules. The last point will be discussed in the next section.

Simply write the code that you want to do.
Use temporary variables if needed for readability (but not too many).
Do not write too busy code.
Define functions or subroutines to clarify what to do.
Consider what is the clear code before writing a program.

Sometimes, a simple program is slow. Generally speaking, it is not because of a programming style but because of a choice of algorithms. It would help if you did not care about the algorithm at the beginning of programming because you can easily change it once you finish writing a program. A programming style becomes much more important when you modify the code.

Program structure

A typical numerical program consists of the following 4 steps.

Declaration of variables, modules, and interfaces.
Preparation of data (e.g., reading from files, generating in the program, or given through the argument).
Computation with the data and finding the results.
The output of results (e.g., printing on-screen or writing a file).

Each step can be divided into detailed steps. The smallest step can be a block of code, and some blocks can form a program unit. You can put comments on each block or unit to make sure what it is.

To define each block in the above steps, you have to think about a blueprint of the program. The plan, flowchart (or blueprint), can be descriptive on paper or written in a (pseudo) programming language. Once you write it, you can translate it into Fortran (or any programming languages). When you find a block to be more details while coding, you should be back to planning to divide it into pieces, then coding again. In this section, we are looking at some practices to write code with a reasonable structure.

Descriptive blueprint

First, a programmer has to make a blueprint (flowchart) of a program. Note that you do not have to write a definite flowchart; what I suggest here is that it is relevent to know the flow of your program. We will look at this process using a small program.

Making a blueprint

Rough plan

The program needs to compute the standard deviation of data. The data is stored in a file, and the output should be on screen. A rough description of the program is as follows.

Read the data and store it into a vector.
Compute the standard deviation.
Print the result on the screen.

Some considerations

We need to break down each step. In the first step, you have to think about more details.

How do you get the file name? The name is fixed, or given by keyboard?
What kind of data type do you need? Probably real?
What the name of vector? data, or just x?

We choose with a file name given by a keyboard and a real vector named data. The file name is stored in a character variable filename. The size of data is unknown, so it should be allocatable. Once the size of data is known, it is stored in an integer variable, n.

In the second step, a typical question is how to compute the standard deviation. Our choice is the naive formula shown above.

In the third step, there is no question. You can use either print or write.

A revised description looks like this.

Declare variables: data(:), n, filename.
Read the file name from the keyboard.
Read the data from the file and store it into a vector.
Compute the standard deviation with a simple formula.
Print the result on the screen.

More details

You may notice unclear details in the flowchart. We have to open the file for reading and to close the file at the end of the program (just in case), so the flowchart should have open and close. A unit variable unit is also needed.

Another issue is that we do not know how many values are stored in the file. We have to know n before allocating data(:), so we should preliminarily read the file just to determine n.

Thinking of the details is to consider the logic and the algorithms to be implemented in the program. So, this step may need some technical consideration. If you are not sure the best option, simple logic is enough as a placeholder.

The final flowchart is as follows.

Declare variables: data(:), n, filename, and unit.
Read the file name from the keyboard, and open it.
Pre-read it to determine n as the number of values in the file.
Allocate data and rewind the file.
Read the data from the file and store it into a vector.
Compute the standard deviation with a simple formula.
Print the result on the screen.

Structured programming

Structured programming is a broader concept to improve the clarity and quality of a computer program using the control flow statements and program units. In a narrow sense, the program with the sequence processing (flowing from the top to the bottom), selection (if), and iteration (do), is called structured program. These 3 flows are enough to write any computer programs. In a broad sense, the program should have more program units (procedures and modules) to clarify the program structure.

We will see how it works using the example shown above.

Coding each step

Variable declaration

We define the main program as compute_sd. First, we declare all variables at the beginning of the program. When needed, we will put some more variables here.

program compute_sd
   implicit none
   character(len=50) :: filename
   real,allocatable :: data(:)
   integer :: n,unit

end program compute_sd

Read the file name

The file name is given by the keyboard and open the file.

   read *,filename
   open(newunit=unit,file=filename)

Count lines

Then, we determine the number of lines in the file. There is an idiom to do it in Fortran. Preparing a dummy, real variable, x, and use it to read the file. You repeat it until the file is over, and in each round, you count up n, which has the initial value, 0. At the end of the file, you have to get the end-of-file (EOF) status using an extra integer variable, say info.

A reference code looks like this.

   real :: x
   integer :: n,unit,info
...
   ! pre-read the file to determine n as the number of lines in the file
   n = 0
   do
      read(*,*,iostat=info) x
      if(info/=0) exit
      n = n + 1
   end do

Preparation for reading the data

Knowing n, you can allocate data(n). For reading the data, you should rewind the file pointer to the top of the file.

   allocate(data(n))
   rewind(unit)

Reading the data

Finally, we can read the data and store it in a vector. There are a few ways to do it. We know there are n lines in the file, so we can use a counter (i) to define the order of data. In this way, we use open loops similar to the previous one.

   real :: x
   integer :: n,unit,info,i
...
   ! repeat the data reading n times
   do i=1,n
      read(*,*,iostat=info) x
      if(info/=0) exit
      data(i) = x
   end do

Computing the standard deviation

The original formula is already shown above. Some more variables should be used to improve the readability.

   real :: x,mean,var,sd
   integer :: n,unit,info,i
...
   ! computation of variance with a naive formula
   mean = sum(data)/n
   var = ((data-mean)**2)/(n-1)
   sd = sqrt(var)

Output

The output is simple.

   print "(a,es12.4)","standard deviation = ",sd

Termination

You do not need the following statements. We put it just for clarification of the code.

   close(unit)
   deallocate(data)

Entire program

We combine all the above pieces of code to figure the program out.

program compute_sd
   implicit none
   character(len=50) :: filename
   real,allocatable :: data(:)
   real :: x,mean,var,sd
   integer :: n,unit,info,i

   read *,filename
   open(newunit=unit,file=filename)

   ! pre-read the file to determine n as the number of lines in the file
   n = 0
   do
      read(*,*,iostat=info) x
      if(info/=0) exit
      n = n + 1
   end do

   allocate(data(n))
   rewind(unit)

   ! repeat the data reading n times
   do i=1,n
      read(*,*,iostat=info) x
      if(info/=0) exit
      data(i) = x
   end do

   ! computation of variance with a naive formula
   mean = sum(data)/n
   var = ((data-mean)**2)/(n-1)
   sd = sqrt(var)
   print "(a,es12.4)","standard deviation = ",sd

   close(unit)
   deallocate(data)
end program compute_sd

Stuctured more

The above code is structured in the sense of flow controls. This program is good enough to do a small task. However, if you want to extend this program to compute more statistics, or modify it to be more general (e.g., reading multiple columns), some blocks can be separated as procedures for reuse and extension.

In the above code, we have possible 3 blocks to be separated as independent procedures.

A block to count the number of lines in a text file.
A block to read a real vector from a file.
A block to compute the variance of elements in a real vector.

These procedures should be in a custom module for reuse. Here, for simplicity, we use internal procedures for them.

Counting the number of lines

To separate a block to a procedure, think which is better, function or subroutine. Counting the number of lines always returns a single number n, so a function seems to be a right choice. Tachnically, you can use each one of them, but here, I am going to use a subroutine.

Here is a direct translation of the above code to an independent function to count the number of lines in a text file.

subroutine count_lines(unit,n)
   integer,intent(in) :: unit
   integer,intent(out) :: n
   real :: x
   integer :: info
   n = 0
   rewind(unit)
   do
      read(*,*,iostat=info) x
      if(info/=0) exit
      n = n + 1
   end do
   rewind(unit)
end subroutine count_lines

We assume that the file has already been opened, and the unit number is only the argument, and the output is n.

You may notice that rewind(unit) is inserted into the subroutine. The first rewind should reduce the risk of failure in reading the file due to a lack of rewind. Also, the last rewind is just in case the programmer forget to rewind the unit just after calling this function and fail in the next step.

So, why did I choose a subroutine, not function? It is because this procedure has a side effect. The procedure moves the file position, and on exit, the position may differ from the original position. If a program unit changes any variables or status defined outside the procedure, we say that there is a side effect. Otherwise, the program unit is pure. There is a custom in Fortran: if the program unit has a side effect, use subroutine, instead of function.

Reading real vector from file

We have the same equation as above, which is better for reading a vector from file, function, or subroutine? I choose a subroutine because of the same reason described above.

subroutine read_real_vector(unit,n,data)
   integer,intent(in) :: unit,n
   real,intent(out) :: data(:)
   real :: x
   integer :: i,info
   rewind(unit)
   do i=1,n
      read(*,*,iostat=info) x
      if(info/=0) exit
      data(i) = x
   end do
   rewind(unit)
end subroutine read_real_vector

The file should be opened before calling this function. Also, the number of lines should be known. This program also has rewind for fail-safe.

Variance function

A function is a choice to compute the variance.

function variance(data) result(var)
   real,intent(in) :: data(:)
   real :: mean
   integer :: n
   n = size(data)
   mean = sum(data)/(n-1)
   var = ((data-mean)**2)/(n-1)
end function variance

Structured program

Here is the final program.

program compute_sd
   implicit none
   character(len=50) :: filename
   real,allocatable :: data(:)
   integer :: unit

   read *,filename
   open(newunit=unit,file=filename)

   call count_lines(unit,n)
   allocate(data(n))

   call read_real_vector(unit,n,data)
   sd = sqrt(variance(data))
   print "(a,es12.4)","standard deviation = ",sd

   close(unit)
   deallocate(data)

contains

! pre-read the file to determine n as the number of lines in the file
subroutine count_lines(unit,n)
   integer,intent(in) :: unit
   integer,intent(out) :: n
   real :: x
   integer :: info
   n = 0
   rewind(unit)
   do
      read(*,*,iostat=info) x
      if(info/=0) exit
      n = n + 1
   end do
   rewind(unit)
end subroutine count_lines

! repeat the data reading n times from a unit
subroutine read_real_vector(unit,n,data)
   integer,intent(in) :: unit,n
   real,intent(out) :: data(:)
   real :: x
   integer :: i,info
   rewind(unit)
   do i=1,n
      read(*,*,iostat=info) x
      if(info/=0) exit
      data(i) = x
   end do
   rewind(unit)
end subroutine read_real_vector

! computation of variance with a naive formula
function variance(data) result(var)
   real,intent(in) :: data(:)
   real :: mean
   integer :: n
   n = size(data)
   mean = sum(data)/(n-1)
   var = ((data-mean)**2)/(n-1)
end function variance

end program compute_sd

This program is better structured compared with the original program. You see that the main program needs fewer variables than the original one. Also, you do not need further comments in the main program because the name of the function clearly describes what is going on.

Structured with custom module

The program is further structured using a custom module. All the functions can be separated from the module. The module is reusable for any programmers.

program compute_sd
   use mymod
   implicit none
   character(len=50) :: filename
   real,allocatable :: data(:)
   integer :: unit

   read *,filename
   open(newunit=unit,file=filename)

   call count_lines(unit,n)
   allocate(data(n))

   call read_real_vector(unit,n,data)
   sd = sqrt(variance(data))
   print "(a,es12.4)","standard deviation = ",sd

   close(unit)
   deallocate(data)
end program compute_sd

module mymod

implicit none

contains

! pre-read the file to determine n as the number of lines in the file
subroutine count_lines(unit,n)
   integer,intent(in) :: unit
   integer,intent(out) :: n
   real :: x
   integer :: info
   n = 0
   rewind(unit)
   do
      read(*,*,iostat=info) x
      if(info/=0) exit
      n = n + 1
   end do
   rewind(unit)
end subroutine count_lines

! repeat the data reading n times from a unit
subroutine read_real_vector(unit,n,data)
   integer,intent(in) :: unit,n
   real,intent(out) :: data(:)
   real :: x
   integer :: i,info
   rewind(unit)
   do i=1,n
      read(*,*,iostat=info) x
      if(info/=0) exit
      data(i) = x
   end do
   rewind(unit)
end subroutine read_real_vector

! computation of variance with a naive formula
function variance(data) result(var)
   real,intent(in) :: data(:)
   real :: mean
   integer :: n
   n = size(data)
   mean = sum(data)/(n-1)
   var = ((data-mean)**2)/(n-1)
end function variance

end module mymod

Improvement of reusability

Clear interface for the future use

In the above code, some functions have rewind to reduce the risk of failure. It improves the reusability of the functions; safer and more understandable for programmers. You may notice that there is an alternative set of arguments for count_lines and read_real_vector to be useful in future usage.

For count_lines, it can receive filename instead of unit so that the programmer does not have to open the file before calling it. Using this style, count_lines can be a function because of no side effect. This option is more comfortable to remember for programmers.

Also, for read_real_vector, filename can be an argument, but n is still needed, meaning count_lines should be called before read_real_vector. A possible, more straightforward argument is filename only, and data would be allocated inside the program unit. It can be a function because there is no side effect anymore if the file is closed inside the function. Although it is possible in modern Fortran, we do not see this feature here because it produces more questions about the best design of procedures for future use.

A lesson in this section is that there are many choices about arguments and return values (i.e., interface to procedures), and the best interface may not be easily found.

Error handling

The programs shown above do not care about any errors. If an error occurs, the program stops immediately with some messages prepared by the compiler. Sometimes the error message is cryptic. The programmer often needs to care for the error to show kinder messages or to provide alternative methods to proceed with the program.

Main program

In the main program, a typical error comes from the following statement. When there is no file, open fails. The open statement can have an option, status="old", which assumes that the file exists. If not, the statement returns a non-zero value to an integer variable specified in iostat=. A safer program is as follows.

   open(newunit=unit,file=filename,status="old",iostat=info)
   if(info/=0) stop "The file is not found."

Another risk is an empty file. If the file is empty, n equals to 0. In this case, the variance function fails because of n-1 in the denominator. A best practice is to check if n is numerically positive or not.

   call count_lines(unit,n)
   if(n<=0) stop "The file is empty or invalid."

The above program tests if n<=. So why not n==0? It is an idiom in programming; the valid case is a positive value, i.e., n>0, and its inverse is n<=0 even if n seems not to be negative. Note that n can be numerically negative if n is overflow with more than 2147483647 lines in the file.

You may see that a perfect error-handling is very hard for general programs. My suggestion is that a programmer should be responsible for handling minimal, frequently-observed errors. Do not put too much effort into trying to remove rare errors; put a comment on the code with a potential error-risk.

Other program units

For reusable procedures in a module, error handling is essential because these procedures are used in the future, and the future programmers may not know the details of procedures. They would easily hit an error but miss it just because they are unaware of the inside.

If the perfect error handling is hard, a module-developer should prepare documentation of reusable procedures. It reduces the risk to miss the errors.

There are several ways to handle the error. The simplest (and unrecoverable) way is to stop the program inside the procedure. It is useful if the error is unrecoverable, and it is caused by a mistake of the programmer, such as the wrong usage of procedure. For example, count_lines can stop if the file not opened. This function works only when the file is opened, and a wrong sequence of programs should cause the non-opened unit.

Another option is to return an error status like open(iostat=) and read(iostat=). It is useful when the procedure is supposed to have both a normal and an abnormal status. The error code tells the abnormal status to the programmer to take any action to care for it. If returning the error code, the procedure should usually be a subroutine.

Back to index.html.