Data preparation with RENUMF90

Yutaka Masuda

September 2019

Back to index.html.

Advanced usage of RENUMF90

In this section, we will introduce some advanced features in RENUMF90. We commonly use the following raw data and pedigree file. The files come from the multiple-trait model shown in the previous chapter.

  ID006  A  1  1.0  3.0  4.5
  ID009  A  2  1.0  2.0  7.5
  ID012  A  1  2.0  4.0  3.5
  ID007  B  2  2.0  6.0 -0.5
  ID010  B  1  1.0  3.0  5.5
  ID013  B  2  2.0  6.0  1.5
  ID008  C  1  2.0  6.0 -1.5
  ID011  C  2  1.0  6.0  2.5
  ID014  C  1  1.0  8.0  0.5
  ID015  C  2  2.0  4.0  4.5

The pedigree file is as follows.

 ID001      0      0
 ID002      0      0
 ID003      0      0
 ID004      0      0
 ID005      0      0
 ID006      0      0
 ID007  ID002  ID005
 ID008  ID001  ID004
 ID009  ID002  ID003
 ID010  ID007  ID006
 ID011  ID007  ID004
 ID012  ID011  ID008
 ID013  ID011  ID010
 ID014  ID009  ID013
 ID015  ID011  ID010

Combined effects

RENUMF90 can treat a field as a combination of the existing fields. This is useful when an interaction effect is incorporated. The keyword COMBINE performs this feature. The following example is from the previous section and the keyword is added.

COMBINE
6 2 3
DATAFILE
rawdata9.txt
TRAITS
5
FIELDS_PASSED TO OUTPUT

WEIGHT(S)

RESIDUAL_VARIANCE
1.0
EFFECT          # 1st effect
2 cross alpha
EFFECT          # 2nd effect
3 cross alpha
EFFECT          # 3rd effect
4 cov
EFFECT          # interaction effect 1 x 2
6 cross alpha

The COMBINE statement should be used as follows.

You can run RENUMF90 and check renf90.tables. Column 2 and 3 are successfully combined into one effect (group 4).

 Effect group 1 of column 1 with 3 levels , effect # 1
 Value    #    consecutive number
A 3 1
B 3 2
C 4 3
 Effect group 2 of column 1 with 2 levels , effect # 2
 Value    #    consecutive number
1 5 1
2 5 2
 Effect group 4 of column 1 with 6 levels , effect # 4
 Value    #    consecutive number
A1 2 1
A2 1 2
B1 1 3
B2 2 4
C1 2 5
C2 2 6

Pedigree manipulation

RENUMF90 prunes the pedigree. By default, the program traces back 3-generation (up to great-grandsires and dams) from the animals with phenotype or genotype. The number of generations back can be changed with the option PED_DEPTH, which should place just after SNP_FILE (or FILE_POS if no genetic markers are used). The following statements define 10 generations to be traced.

PED_DEPTH
10

If you put a large number (like 100), you can consider all ancestors to be traced back from the current animals. If you put 0, RENUMF90 tries to include all animals found in the raw pedigree file even if the pedigree animals are not related to the animals with phenotype or genotype.

Animal model options

RENUMF90 supports the permanent environmental (PE) effect and the maternal genetic (MG) and the maternal environmental (MPE) effects as additional random effects in the animal model using the OPTIONAL keyword. The following is a parameter file with maximal options (PE, MG, and MPE).

DATAFILE
rawdata9.txt
TRAITS
5
FIELDS_PASSED TO OUTPUT

WEIGHT(S)

RESIDUAL_VARIANCE
1.0
EFFECT          # 1st effect
2 cross alpha
EFFECT          # 2nd effect
3 cross alpha
EFFECT          # 3rd effect
4 cov
EFFECT          # 4th effect = animal
1 cross alpha
RANDOM
animal
OPTIONAL
pe mat mpe
FILE
rawpedigree9.txt
(CO)VARIANCES
1.0 0.2
0.2 0.3
(CO)VARIANCES_PE
1.5
(CO)VARIANCES_MPE
0.5

You can read this file as follows.

Definition of unknown parent groups (UPGs)

RENUMF90 can assign UPGs to unknown parents. There are 2 ways to do this but here we just introduce the one approach (because of its simplicity). The following keyword generates UPGs.

UPG_TYPE
in_pedigrees

The option should be just after PED_DEPTH. With this option, RENUMF90 interprets a negative sire (or dam) ID as a UPG. So you should prepare different columns for sires and dams which contain negative integers for unknown animals instead of 0. Also, all the real animals should be in the pedigree file with the standard ID (other than negative integers).

The following is a simple example of UPGs. The 4th column is for sires with groups (1 and 2) and the 5th column is for dams with groups (3 and 4). You can specify column 4 (instead of 2) as the sire and column 5 (instead of 3) as the dam with the FILE_POS keyword.

 ID001      0      0     -1     -4
 ID002      0      0     -2     -3
 ID003      0      0     -1     -3
 ID004      0      0     -2     -3
 ID005      0      0     -2     -4
 ID006      0      0     -1     -3
 ID007  ID002  ID005  ID002  ID005
 ID008  ID001  ID004  ID001  ID004
 ID009  ID002  ID003  ID002  ID003
 ID010  ID007  ID006  ID007  ID006
 ID011  ID007  ID004  ID007  ID004
 ID012  ID011  ID008  ID011  ID008
 ID013  ID011  ID010  ID011  ID010
 ID014  ID009  ID013  ID009  ID013
 ID015  ID011  ID010  ID011  ID010

The instruction file can be as follows.

DATAFILE
rawdata9.txt
TRAITS
5
FIELDS_PASSED TO OUTPUT

WEIGHT(S)

RESIDUAL_VARIANCE
1.0
EFFECT          # 1st effect
2 cross alpha
EFFECT          # 2nd effect
3 cross alpha
EFFECT          # 3rd effect
4 cov
EFFECT          # 4th effect = animal
1 cross alpha
RANDOM
animal
FILE
rawpedigree9c.txt
FILE_POS        #  positions of animal, sire, dam, 0, 0
1 4 5 0 0
UPG_TYPE
in_pedigrees
(CO)VARIANCES
1.0

The resulting pedigree file is as follows. You have only 15 real animals but there are the codes greater than 15 (between 16 and 19) in the sire and dam columns. These numbers greater than the number of real animals represent UPGs.

1 7 5 1 0 2 1 0 0 ID015
13 17 18 3 0 0 0 0 2 ID004
11 17 19 3 0 0 0 0 1 ID005
2 16 18 3 0 0 1 0 1 ID006
3 12 11 1 0 2 1 2 0 ID007
4 14 13 1 0 2 1 0 1 ID008
5 3 2 1 0 2 1 0 2 ID010
6 12 15 1 0 2 1 1 0 ID009
7 3 13 1 0 2 1 3 0 ID011
8 7 4 1 0 2 1 0 0 ID012
14 16 19 3 0 0 0 1 0 ID001
9 7 5 1 0 2 1 0 1 ID013
12 17 18 3 0 0 0 2 0 ID002
10 6 9 1 0 2 1 0 0 ID014
15 16 18 3 0 0 0 0 1 ID003

Considering inbreeding coefficients in \(\mathbf{A}^{-1}\)

By default, RENUMF90 creates a pedigree file which will result in \(\mathbf{A}^{-1}\) without inbreeding. To put inbreeding coefficients in \(\mathbf{A}^{-1}\), you have to generate a special, renumbered pedigree-file by RENUMF90 using the INBREEDING keyword.1 This keyword should be placed after UPG_TYPE.

The basic usage of this option is shown below.

INBREEDING
pedigree

You should put the literal word pedigree here (you shouldn’t replace it with the pedigree file name). With this option, RENUMF90 calculates inbreeding coefficients with Meuwissen and Luo (1992) based on the pedigree to be saved in the file renaddxx.ped which has been possibly pruned by some generations (see PED_DEPTH for details).

If you want to supply pre-calculated inbreeding coefficients stored in a file, use the different option. Assuming your file is inb.txt, the following option can be used.

INBREEDING
file inb.txt

The file is a space-separated text file with 2 columns: 1) the original animal ID and 2) inbreeding coefficient (ranging from 0 to 1).

The following example uses the pedigree option.

DATAFILE
rawdata9.txt
TRAITS
5
FIELDS_PASSED TO OUTPUT

WEIGHT(S)

RESIDUAL_VARIANCE
1.0
EFFECT          # 1st effect
2 cross alpha
EFFECT          # 2nd effect
3 cross alpha
EFFECT          # 3rd effect
4 cov
EFFECT          # 4th effect = animal
1 cross alpha
RANDOM
animal
FILE
rawpedigree9.txt
INBREEDING
pedigree
(CO)VARIANCES
1.0

The resulting pedigree file renadd04.ped has the inb/upg code in the 4th column.

1 7 5 2000 0 2 1 0 0 ID015
13 0 0 1000 0 0 0 0 2 ID004
11 0 0 1000 0 0 0 0 1 ID005
2 0 0 1000 0 0 1 0 1 ID006
3 12 11 2000 0 2 1 2 0 ID007
4 14 13 2000 0 2 1 0 1 ID008
5 3 2 2000 0 2 1 0 2 ID010
6 12 15 2000 0 2 1 1 0 ID009
7 3 13 2000 0 2 1 3 0 ID011
8 7 4 2000 0 2 1 0 0 ID012
14 0 0 1000 0 0 0 1 0 ID001
9 7 5 2000 0 2 1 0 1 ID013
12 0 0 1000 0 0 0 2 0 ID002
10 6 9 2133 0 2 1 0 0 ID014
15 0 0 1000 0 0 0 0 1 ID003

Looking at the 4th column, when an animal has non-inbred parents, the value should be 1000 with unknown parents, 1333 with one parent unknown, or 2000 with known parents. Animal 10 (ID014) has inbred parents so the inb/upg code is different from the other animals.

Order of keywords

The keywords used in the instruction file should be ordered following the manual. Here we show the exact order of keywords supported by RENUMF90.

Keyword optional possible values
COMBINE optional definition of new field as a combination of existing fields
DATAFILE mandatory name of raw data file
TRAITS mandatory positions of observations in the raw data file
FIELDS_PASSED mandatory positions of items in the raw data file to be passed to renf90.dat
WEIGHT(S) mandatory positions of weights in the raw data file
RESIDUAL_VARIANCE mandatory residual covariance matrix
EFFECT mandatory effect description
NESTED optional positions of nested covariates
RANDOM optional declaration of random effect
FILE optional name of raw pedigree file
FILE_POS optional positions of animal ID, sire ID and dam ID
SNP_FILE optional name of SNP marker file
PED_DEPTH optional the maximum generation back from animals with phenotype or genotype
GEN_INT optional generation interval to set unknown parent groups (UPG)
REC_SEX optional check if records are found in specific sex
UPG_TYPE optional UPG specification
INBREEDING optional create pedigree file with inbreeding
RANDOM_REGRESSION optional put covariates for random regressions
RR_POSITION optional positions of covariates for random regressions
(CO)VARIANCES optional covariance components
(CO)VARIANCES_PE optional covariance components for animal’s PE effects
(CO)VARIANCES_MPE optional covariance components for maternal PE effects
OPTION optional option parameters

  1. Technically, the alternative, renumbered pedigree-file has the same as the standard one except for the 4th column. In the alternative pedigree, the 4th column has a special 4-digit integer (inb/upg code) which combines the inbreeding coefficients and the number of missing parents. Although the user can manually calculate the code, this process may be complicated, and I recommend the user to use RENUMF90 to generate the pedigree file with such a code.

Back to index.html.