|
|
|
Uni Köln
> WiSo-Fakultät
> Seminar
für Wirtschafts- und Sozialstatistik > Institut
> LS
Mosler > Prof. Mosler > Datenportal
Datenportal des Lehrstuhls für Statistik und Ökonometrie
|
| |
Blood Transfusion data
The data set (and description) can be downloaded here:
http://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data
Description:
Title: Blood Transfusion Service Center Data Set
Abstract: Data taken from the Blood Transfusion Service Center in Hsin-Chu City
in Taiwan -- this is a classification problem.
-----------------------------------------------------
Data Set Characteristics: Multivariate
Number of Instances: 748
Area: Business
Attribute Characteristics: Real
Number of Attributes: 5
Date Donated: 2008-10-03
Associated Tasks: Classification
Missing Values? N/A
-----------------------------------------------------
Source:
Original Owner and Donor
Prof. I-Cheng Yeh
Department of Information Management
Chung-Hua University,
Hsin Chu, Taiwan 30067, R.O.C.
e-mail:icyeh 'at' chu.edu.tw
TEL:886-3-5186511
Date Donated: October 3, 2008
-----------------------------------------------------
Data Set Information:
To demonstrate the RFMTC marketing model (a modified version of RFM), this study
adopted the donor database of Blood Transfusion Service Center in Hsin-Chu City
in Taiwan. The center passes their blood transfusion service bus to one
university in Hsin-Chu City to gather blood donated about every three months. To
build a FRMTC model, we selected 748 donors at random from the donor database.
These 748 donor data, each one included R (Recency - months since last
donation), F (Frequency - total number of donation), M (Monetary - total blood
donated in c.c.), T (Time - months since first donation), and a binary variable
representing whether he/she donated blood in March 2007 (1 stand for donating
blood; 0 stands for not donating blood).
-----------------------------------------------------
Attribute Information:
Given is the variable name, variable type, the measurement unit and a brief
description. The "Blood Transfusion Service Center" is a classification problem.
The order of this listing corresponds to the order of numerals along the rows of
the database.
R (Recency - months since last donation),
F (Frequency - total number of donation),
M (Monetary - total blood donated in c.c.),
T (Time - months since first donation), and
a binary variable representing whether he/she donated blood in March 2007 (1
stand for donating blood; 0 stands for not donating blood).
Table 1 shows the descriptive statistics of the data. We selected 500 data at
random as the training set, and the rest 248 as the testing set.
Table 1. Descriptive statistics of the data
Variable Data Type Measurement Description min max mean std
-------------------------------------------------------------------------
Recency quantitative Months Input 0.03 74.4 9.74 8.07
Frequency quantitative Times Input 1 50 5.51 5.84
Monetary quantitative c.c. blood Input 250 12500 1378.68 1459.83
Time quantitative Months Input 2.27 98.3 34.42 24.32
Whether donated binary 1=yes 0=no Output 0 1 1 (24%) 0 (76%)
in March 2007
-----------------------------------------------------
Citation Request:
NOTE: Reuse of this database is unlimited with retention of copyright notice for
Prof. I-Cheng Yeh and the following published paper:
Yeh, I-Cheng, Yang, King-Jang, and Ting, Tao-Ming, "Knowledge discovery on RFM
model using Bernoulli sequence, "Expert Systems with Applications, 2008
(doi:10.1016/j.eswa.2008.07.018).
Please refer to the repository http://archive.ics.uci.edu/ml (see citation policy).
See also Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml].
Irvine, CA: University of California, School of Information and Computer Science.
Descriptive statistics:
Dataset= blood-transfusion : n= 748 , d= 3
Class1: n= 178
Covariance matrix:
[,1] [,2] [,3]
[1,] 26.7353 -7.9413 8.6257
[2,] -7.9413 64.5916 140.5756
[3,] 8.6257 140.5756 558.3500
Correlation matrix:
[,1] [,2] [,3]
[1,] 1.0000 -0.1911 0.0706
[2,] -0.1911 1.0000 0.7402
[3,] 0.0706 0.7402 1.0000
Median: 4 6 28
Mean: 5.4551 7.7978 32.7191
MCD-estimated:
MDC-0.975-Mean: 3.0087 5.9565 24.1739
MDC-0.750-Mean: 3.0087 5.9565 24.1739
MDC-0.500-Mean: 3.0087 5.9565 24.1739
Class2: n= 570
Covariance matrix:
[,1] [,2] [,3]
[1,] 70.9813 -5.0734 36.3289
[2,] -5.0734 22.5318 76.3885
[3,] 36.3289 76.3885 605.4251
Correlation matrix:
[,1] [,2] [,3]
[1,] 1.0000 -0.1269 0.1752
[2,] -0.1269 1.0000 0.6540
[3,] 0.1752 0.6540 1.0000
Median: 11 3 28
Mean: 10.7719 4.8018 34.7702
MCD-estimated:
MDC-0.975-Mean: 11.2011 2.4589 20.3994
MDC-0.750-Mean: 11.2011 2.4589 20.3994
MDC-0.500-Mean: 11.2011 2.4589 20.3994
Measures:
Mah.Dist: 0.9028
Mah.Dist-MCD-0.975: 1.3704
Mah.Dist-MCD-0.750: 1.3704
Mah.Dist-MCD-0.500: 1.3704
Zuletzt geändert am 17.02.2013
|
|
|
|