1.4.1 Birth Weight Data Set
The Birth Weight data set consists of data collected on 189 women to identify the risk factors associated with the birth of a low birth weight baby. The data set was collected at the Baystate Medical Center in Springfield, Massachusetts. The variables included in this data set are summarized in Table 1.1.
Table 1.1 A Description of the Variables in the Birth Weight Data Set
Variable |
Description |
Codes/Values |
Name |
1 |
Identification code |
ID number |
ID |
2 |
Low birth weight |
1 = BWT≤2500 g |
LOW |
|
|
0 = BWT>2500 g |
|
3 |
Age of mother |
Years |
AGE |
4 |
Weight of mother at |
Pounds |
LWT |
|
last menstrual period |
|
|
5 |
Race |
1 = White |
RACE |
|
|
2 = Black |
|
|
|
3 = Other |
|
6 |
Smoking status during pregnancy |
0 = No |
SMOKE |
|
|
1 = Yes |
|
7 |
History of premature labor |
0 = None |
PTL |
|
|
1 = One |
|
|
|
2 = Two, etc. |
|
8 |
History of hypertension |
0 = No |
HT |
|
|
1 = Yes |
|
9 |
Presence of uterine irritability |
0 = No |
UI |
|
|
1 = Yes |
|
10 |
Number of physician visits |
0 = None |
FTV |
|
during the first trimester |
1 = One |
|
|
|
2 = Two, etc. |
|
11 |
Birth weight |
Grams |
BWT |
The Body Fat data set consists of data collected on 252 adult males. The data were originally collected to build a model relating body density and percentage of body fat in adult males to several body measurement variables. These data were originally used in the article “Generalized body composition prediction equation for men using simple measurement techniques,” published in Medicine and Science in Sports and Exercise (Penrose et al., 1985). The variables included in this data set are summarized in Table 1.2. Two data sets have also been created from the Body Fat data set. These data sets have the same variables as the Body Fat data set and were formed by randomly sampling the Body Fat data set to create a training set of 189 observations called bodyfat-tr.xlsx and a validation set or 63 observations called bodyfat-val.xlsx
. These data sets are used in the model validation sections of the text.
Table 1.2 A Description of the Variables in the Body Fat Data Set
Variable |
Description |
Codes/Values |
Name |
1 |
Density determined from |
Density |
Density |
|
underwater weighing |
|
|
2 |
Percent body fat Percent |
Percentage |
PCTBF |
|
from Siri’s (1956) equation |
|
|
3 |
Age in years |
Years |
Age |
4 |
Weight in pounds |
Pounds |
Weight |
5 |
Height in inches |
Inches |
Height |
6 |
Neck circumference |
Centimeters |
Neck |
7 |
Chest circumference |
Centimeters |
Chest |
8 |
Abdomen circumference |
Centimeters |
Abdomen |
9 |
Hip circumference |
Centimeters |
Hip |
10 |
Thigh circumference |
Centimeters |
Thigh |
11 |
Knee circumference |
Centimeters |
Knee |
12 |
Ankle circumference |
Centimeters |
Ankle |
13 |
Biceps extended circumference |
Centimeters |
Biceps |
14 |
Forearm circumference |
Centimeters |
Forearm |
15 |
Wrist circumference |
Centimeters |
Wrist |
16 |
Body Mass Index |
BMI |
BMI |
17 |
Overweight indicator |
0 = Not Overweight |
Overweight |
|
|
1 = Yes |
|
18 |
Obese indicator |
0 = Not Obese |
Obese |
|
|
1 = Yes |
|
1.4.3 Coronary Heart Disease Data Set
The Coronary Heart Disease data set consists of 100 observations on patients who were selected in a study on the relationship between the age and the presence of coronary heart disease. The variables included in this data set are summarized in Table 1.3.
Table 1.3 A Description of the Variables in the Coronary Heart Disease Data Set
Variable |
Description |
Codes/Values |
Name |
1 |
Identification code |
ID number |
ID |
2 |
Age in years |
Years |
Age |
3 |
Coronary heart disease |
0 = Absent |
CHD |
|
|
1 = Present |
|
1.4.4 Prostate Cancer Study Data Set
The Prostate Cancer Study data set consists of 380 patients in a study to determine whether the variables measured at a baseline medical examination can be used to predict whether the prostatic tumor has penetrated a prostatic capsule. The data were collected by Dr. Donn Young at the Ohio State University Comprehensive Cancer Center and the data have been modified to protect subject confidentiality. Variables included in this data set are summarized in Table 1.4.
Table 1.4 A Description of the Variables in the Prostate Cancer Study Data Set
Variable |
Description |
Codes/values |
Name |
1 |
Identification code |
ID number |
ID |
2 |
Tumor penetration of |
0 = No penetration |
CAPSULE |
|
prostatic capsule |
1 = Penetration |
|
3 |
Age |
Years |
AGE |
4 |
Race |
1 = White |
RACE |
|
|
2 = Black |
|
5 |
Results of the digital |
1 = No nodule |
DPROS |
|
rectal exam |
2 = Unilobar nodule (left) |
|
|
|
3 = Unilobar nodule (right) |
|
|
|
4 = Bilobar nodule |
|
6 |
Detection of capsular |
1 = No |
DCAPS |
|
involvement in rectal exam |
2 = Yes |
|
7 |
Prostatic-specific |
mg/ml |
PSA |
|
antigen value |
|
|
8 |
Tumor volume obtained |
cm 3 |
VOL |
|
from ultrasound |
|
|
9 |
Total Gleason score |
0–10 |
GLEASON |
1.4.5 Intensive Care Unit Data Set
The Intensive Care Unit data set consists of 200 observations on subjects involved in a study on the survival of patients following admission to an adult intensive care unit (ICU). The data set was collected at the Baystate Medical Center in Springfield, Massachusetts, and the variables included in this data set are summarized in Table 1.5.
Читать дальше