
|
The ROUSSEEUW datasets
This directory contains the datasets taken from
Robust Regression and outlier Detection
by Peter J. Rousseeuw and Annick M. Leroy
About the datasets
The datasets are simple unformatted files. Their names have the extension .dat.
If you want to have all datasets you should copy the file all.dat.
The data
page
22, table 1
- Pilot-Plant Data Set from Daniel and Wood (1971),
The response variable corresponds to the acid content determined by
titration, and the explanatory variable is the organic acid content
determined by extraction and weighing.
20 subjects, 3 variables:
- Observation (i)
- Extraction (x[i])
- Titration (y[i])
page 26,
table 2
- Number of International Calls from Belgium, taken from the Belgian
Statistical Survey, published by the Ministry of Economy,
73 subjects, 2 variables:
- Year(x[i])
- Number of Calls (y[i], in tens of millions)
page 27,
table 3
- Data for the Hertzsprung-Russell Diagram of the Star Cluster CYG OB1,
from C.Doom
47 subjects, 3 variables:
- Index of Star (i)
- logarithm of the effective temperature at the surface of the star
(x[i])
- logarithm of the light intensity of the star (y[i])
page 47,
table 4
- First Word - Gesell Adaptive Score Data (from Mickey et
al.,1967),
21 subjects, 3 variables:
- Child (i)
- Age in Months (x[i])
- Gesell Score (y[i])
page 57,
table 7
- Body and Brain Weight for 28 Animals, from Weisberg (1980) and
Jerison (1973),
28 subjects, 4 variables:
- Index (i)
- Species
- Body Weight (x[i], in kilograms)
- Brain Weight (y[i], in grams)
page 62,
table 10
- Data on the Calibration of an Instrument that Measures Lactic Acid
Concentration in Blood, from Afifi and Azen (1979),
20 subjects, 3 variables:
- index (i)
- True Concentration (x[i])
- Instrument (y[i])
page 73,
table 13
- Pension Funds for 10 Professional
Branches, from de Wit (1982)
The table lists the total 1981
premium income of pension founds of dutch firms, for 18 professional
branches. In the other column the respective premium reserves are given.
18 subjects, 3 variables:
- Index
- Premium Income (in millions of guilders)
- Premium Reserves (in millions of guilders)
page 76,
table 1
- Stackloss data, from Brownlee (1965)
The data describe the operation of a plant for the oxidation of
ammonia to nitric acid.
21 subjects, 5 variables:
- Index (i)
- Rate (x[1])
- Temperature (x[2])
- Acid Concentration (x[3])
- Stackloss (y)
page 79,
table 2
- Coleman Data Set, Containing Information on 20 Schools from the
Mid-Atlantic and New England States, from Mosteller and Tukey
(1977)
20 subjects, 7 variables:
- Index
- staff salaries per pupil (x[1])
- percent of white-collar fathers (x[2])
- socioeconomic status composite deviation: means for family size, family
intactness, father's education, mother's education, and home items
(x[3])
- mean teacher's verbal test score (x[4])
- mean mother's educational level (x[5]), one unit is equal to two
school years
- verbal mean test score (y, all sixth graders)
page 82,
table 5
- Salinity Data, from Ruppert and Carroll (1980)
That is a set of measurements of water salinity (i.e., its salt
concentration) and river discharge in taken in North Carolina's
Pamlico Sound.
28 subjects, 5 variables:
- Index (i)
- Lagged Salinity (x[1])
- Trend (x[2])
- Discharge (x[3])
- Salinity (y)
page 86,
table 6
- Air Quality Data Set for May 1973, from Chambers et al.
(1983)
31 subjects, 5 variables:
- Index (i)
- Solar Radi (x[1])
- Windspeed (x[2], in miles per hour)
- Temperature (x[3], in degrees Fahrenheit)
- Ozone (in parts per billlion) (y)
page 94,
table 9
- Artifical Data Set generated by Hawkins, Bradu, and Kass
(1984)
75 subjects, 5 variables:
- Index
- x[1]
- x[2]
- x[3]
- y
page 96,
table 10
- Cloud point of a Liquid, from Draper and Smith (1969)
The cloud point is a measure of the degree of crystallization in a
stock.
19 subjects, 3 variables:
- Index (i)
- Percentage of I-8 (x)
- Cloud point (y)
page
103, table 13
- Heart Catherization Data, from Weisberg (1980)
A catheter is passed into a major vein or artery at the femoral
region and moved into the heart. The proper length of the introduced
catheter has to be guessed by the physician. The aim of the Data is to
describe the relation between the catheter length and the
patient's height.
12 subjects, 4 variables:
- Index (i)
- Height (x[1], in inches)
- Weight (x[2], in pound)
- Catheter Length (y, in centimeters)
page
110, table 16
- Education Expenditure Data, from Chatterjee and Price
(1977)
50 subjects, 7 variables:
- Index
- State
- Region (1=Northeastern, 2=North central, 3=Southern, 4=Western)
- Number of residents per thousand residing in urban areas in 1970
(x[1])
- Per capita personal income in 1973 (x[2])
- Number of residents per thousand under 18 years of age in
1974(x[3])
- Per capita expenditure on public education in a state, projected for
1975 (y)
page
154, table 22
- Aircraft Data, deals with 23 single-engine aircraft built over the
years 1947-1979, from Office of Naval Research
23 subjects, 6 variables:
- Index
- Aspect Ratio
- Lift-to-Drag Ratio
- Weight
- Thrust
- Cost
page
155, table 23
- Delivery Time Data, from Montgomery and Peck (1982)
The aim is to explain the time required to service a vending machine
by means of the number of products stocked and the distance walked by
the route driver.
25 subjects, 4 variables:
- Index (i)
- Number of Products (x[1])
- Distance (x[2])
- Delivery time (y)
page
156, table 24
- Phosphorus Content Data, investigates the effect from inorganic and
organic Phosphorus in the soil upon the phosphorus content of the corn
grown in this soil, from Prescott (1975)
18 subjects, 4 variables:
- Index (i)
- Inorganic Phosphorus (x[1])
- Organic Phosphorus (x[2])
- Plant Phosphorus (y)
|