Aktuelles
  Seminar
  LS Liesenfeld
  LS Mosler
  Mitarbeiter
  Lehre
Diplom
Bachelor
Master
Promotion
  Forschung
  Bibliothek
  Links
 
   

     Uni Köln > WiSo-Fakultät > Seminar für Wirtschafts- und Sozialstatistik > Institut > LS Mosler > Prof. Mosler > Datenportal

50 Real-Data Classification Tasks

  This site contains data sets used in the joint project of the University of Cologne and the Hochschule Merseburg “Classifying real-world data with the DDα-procedure”. Comprehensive description of the methodology, and experimental settings and results of the study are presented in the work:

Mozharovskyi, P., Mosler, K. and Lange, T. (2013): “Classifying real-world data with the DDα-procedure”. Mimeo.

For a more complete explanation of the technique and further experiments see:
Lange, T., Mosler, K. and Mozharovskyi, P. (2012): “Fast nonparametric classification based on data depth”. Statistical Papers (The final publication is available at www.springerlink.com).

50 binary classification tasks have been obtained from partitioning 33 freely accessible data sets. Multiclass problems were reasonably split into binary classification problems, some of the data set were slightly processed by removing objects or attributes and selecting prevailing classes. Each data set is provided with a (short) description and brief descriptive statistics. The name reflects the origination of the data. A letter after the name is a property filter, letters (also their combinations) in brackets separated by "vs" are the classes opposed. The letters (combinations or words) stand for labels of classes (names of properties) and are intuitive. Each description contains a link to the original data.

The data have been collected as open source data in January 2013. Owners of this web page decline any responsibility regarding their correctness or consequences of their usage. If you publish material based on these data, please quote the original source. Special requests regarding citations are found on data set's web page.

The general list of sources consists of:

http://archive.ics.uci.edu/ml , se also Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
http://cran.r-project.org/web/packages
http://lib.stat.cmu.edu/datasets
http://stat.ethz.ch/Teaching/Datasets
http://www.stats.ox.ac.uk/pub/PRNN

All the data sets as *.zip:  

Data table

 #  Dataset  n1  n2  n1+n2  d  ln(n1/n2)  (n1+n2)/d  # tied  Download
 .  Baby  161  86  247  5  0,626  49,4  0   
 .  Banknoten  100  100  200  6  0  33,3  0   
 .  Biomedical  67  127  194  4  -0,635  48,5  0   
 .  Blood Transfusion  178  570  748  3  -1,171  249,3  246   
 .  Breast Cancer Wisconsin  458  241  699  9  0,642  77,7  236   
 .  Bupa Liver Disorder  145  200  345  6  -0,329  57,5  4   
 .  Chemical Diabetes (C vs N)  36  76  112  5  -0,755  22,4  0   
 .  Chemical Diabetes (C vs O)  36  33  69  5  0,086  13,8  0   
 .  Chemical Diabetes (N vs O)  76  33  109  5  0,833  21,8  0   
 .  Cloud  54  54  108  7  0  15,4  0   
 .  Crabs (B vs O)  100  100  200  5  0  40,0  0   
 .  Crabs (M vs F)  100  100  200  5  0  40,0  0   
 .  Crabs B (M vs F)  50  50  100  5  0  20,0  0   
 .  Crabs F (B vs O)  50  50  100  5  0  20,0  0   
 .  Crabs M (B vs O)  50  50  100  5  0  20,0  0   
 .  Crabs O (M vs F)  50  50  100  5  0  20,0  0   
 .  Cricket (C vs P)  78  78  156  4  0  39,0  7   
 .  Diabetes (of Pima Indians)  268  500  768  8  -0,616  96,0  0   
 .  Ecoli (CP vs IM)  143  77  220  5  0,621  44,0  0   
 .  Ecoli (CP vs PP)  143  52  195  5  1,012  39,0  0   
 .  Ecoli (IM vs PP)  77  52  129  5  0,392  25,8  0   
 .  Gemsen (M vs F)  796  553  1349  6  0,365  224,8  27   
 .  Glass (F vs NF)  70  76  146  9  -0,083  16,2  1   
 .  Groessen (M vs F)  116  114  230  3  0,020  76,7  0   
 .  Haberman's Survival  225  81  306  3  1,022  102,0  23   
 .  Heart  120  150  270  13  -0,223  20,8  0   
 .  Hemophilia  30  45  75  2  -0,400  37,5  0   
 .  Indian Liver Patient (1 vs 2)  414  165  579  10  0,920  57,9  13   
 .  Indian Liver Patient (M vs F)  140  439  579  9  -1,139  64,3  13   
 .  Iris Plants (SET vs VER)  50  50  100  4  0  25,0  2   
 .  Iris Plants (SET vs VIR)  50  50  100  4  0  25,0  3   
 .  Iris Plants (VER vs VIR)  50  50  100  4  0  25,0  1   
 .  Irish Educational Transitions (M vs F)  250  250  500  5  0  100,0  44   
 .  Kidney (M vs F)  20  56  76  5  -1,022  15,2  0   
 .  PIMA (training)  132  68  200  7  0,663  28,6  0   
 .  Plasma Retinol and Beta-Carotene Levels (M vs F)  273  42  315  13  1,872  24,2  0   
 .  Segmentation (C vs W)  330  330  660  10  0  66,0  62   
 .  Social Mobility (I vs NI)  578  578  1156  5  0  231,2  45   
 .  Social Mobility (W vs B)  578  578  1156  5  0  231,2  8   
 .  Teaching Assistan Evaluation (E vs NE)  29  122  151  5  -1,427  30,2  43   
 .  Tennis (M vs F)  42  45  87  15  -0,073  5,8  0   
 .  Tips (D vs N)  176  68  244  6  0,952  40,7  1   
 .  Tips (M vs F)  87  157  244  6  -0,598  40,7  1   
 .  US Crime (S vs N)  16  31  47  13  -0,654  3,6  0   
 .  Vertebral Column  210  100  310  6  0,742  51,7  0   
 .  Veteran Lung Cancer (S vs T)  69  68  137  7  0,010  19,6  0   
 .  Vowel (M vs F)  528  462  990  13  0,131  76,2  0   
 .  Wine (1 vs 2)  59  71  130  13  -0,186  10,0  0   
 .  Wine (1 vs 3)  59  48  107  13  0,207  8,2  0   
 .  Wine (2 vs 3)  71  48  119  13  0,392  9,2  0   

 

Zuletzt geändert am 18.02.2013