|
|
|
Uni Köln
> WiSo-Fakultät
> Seminar
für Wirtschafts- und Sozialstatistik > Institut
> LS
Mosler > Prof. Mosler > Datenportal
Datenportal des Lehrstuhls für Statistik und Ökonometrie
|
| |
Vowel (MvsF) data
The data set (and description) can be downloaded here:
http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/vowel/vowel-context.data
Description:
Introduction
============
In my work on context-sensitive learning, I used the "Deterding Vowel
Recognition Data", but I found it necessary to reformulate the data.
Implicit in the original data is contextual information on the
speaker's gender and identity. For my work, it was necessary to make
this information explicit. The file "vowel-context.data" adds the
speaker's sex and identity as new features. The format of the data file
is described below.
Peter Turney
peter@ai.iit.nrc.ca
References
==========
P. Turney. "Robust Classification With Context-Sensitive Features."
Proceedings of the Sixth International Conference on Industrial
and Engineering Applications of Artificial Intelligence and Expert
Systems (IEA/AIE-93): 268-276. 1993.
URL: ftp://ai.iit.nrc.ca/pub/ksl-papers/NRC-35074.ps.Z
P. Turney. "Exploiting Context When Learning to Classify."
Proceedings of the European Conference on Machine Learning
(ECML-93): 402-407. 1993.
URL: ftp://ai.iit.nrc.ca/pub/ksl-papers/NRC-35058.ps.Z
File Structure
==============
Column Description
-------------------------------
0 Train or Test
1 Speaker Number
2 Sex
3 Feature 0
4 Feature 1
5 Feature 2
6 Feature 3
7 Feature 4
8 Feature 5
9 Feature 6
10 Feature 7
11 Feature 8
12 Feature 9
13 Class
Numerical Codes
===============
Speaker Code Number
---------------------------
Andrew 0
Bill 1
David 2
Mark 3
Jo 4
Kate 5
Penny 6
Rose 7
Mike 8
Nick 9
Rich 10
Tim 11
Sarah 12
Sue 13
Wendy 14
Set Number
---------------------------
Train 0
Test 1
Sex Number
---------------------------
Male 0
Female 1
Class Number
---------------------------
hid 0
hId 1
hEd 2
hAd 3
hYd 4
had 5
hOd 6
hod 7
hUd 8
hud 9
hed 10
Speaker Code Number Sex Train/Test
---------------------------------------------------------------
Andrew 0 0 0
Bill 1 0 0
David 2 0 0
Mark 3 0 0
Jo 4 1 0
Kate 5 1 0
Penny 6 1 0
Rose 7 1 0
Mike 8 0 1
Nick 9 0 1
Rich 10 0 1
Tim 11 0 1
Sarah 12 1 1
Sue 13 1 1
Wendy 14 1 1
Citation Request:
Please refer to the repository http://archive.ics.uci.edu/ml (see citation policy).
See also Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml].
Irvine, CA: University of California, School of Information and Computer Science.
Descriptive statistics:
Dataset= vowel_MvsF : n= 990 , d= 13
Class1: n= 528
Covariance matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 0.2505 2.0038 -0.0834 0.0821 -0.0115 -0.0612 0.0893 -0.0184 0.0330 -0.0330 -0.0529 -0.0546 0.0000
[2,] 2.0038 17.2827 -0.5461 0.6315 -0.0508 -0.5634 0.8658 -0.2462 0.3434 -0.2071 -0.4088 -0.4880 0.0000
[3,] -0.0834 -0.5461 0.8118 -0.5088 -0.2941 -0.1309 0.0855 0.0474 0.0073 0.0923 -0.0885 -0.0866 -1.7375
[4,] 0.0821 0.6315 -0.5088 1.2556 0.1582 -0.2629 -0.3330 -0.3351 -0.0014 -0.0144 0.2397 0.0591 2.0623
[5,] -0.0115 -0.0508 -0.2941 0.1582 0.5472 0.0383 -0.1417 -0.2189 -0.0969 -0.0566 0.1868 0.2024 0.8977
[6,] -0.0612 -0.5634 -0.1309 -0.2629 0.0383 0.4924 -0.0328 0.0867 -0.1012 0.0127 -0.0491 0.0675 -0.0315
[7,] 0.0893 0.8658 0.0855 -0.3330 -0.1417 -0.0328 0.3676 0.1126 0.0688 -0.0631 -0.1678 -0.1463 -0.7607
[8,] -0.0184 -0.2462 0.0474 -0.3351 -0.2189 0.0867 0.1126 0.3733 0.0424 0.0283 -0.1638 -0.1012 -0.5770
[9,] 0.0330 0.3434 0.0073 -0.0014 -0.0969 -0.1012 0.0688 0.0424 0.1717 -0.0257 -0.0476 -0.0898 -0.2811
[10,] -0.0330 -0.2071 0.0923 -0.0144 -0.0566 0.0127 -0.0631 0.0283 -0.0257 0.2106 -0.0034 0.0250 -0.0379
[11,] -0.0529 -0.4088 -0.0885 0.2397 0.1868 -0.0491 -0.1678 -0.1638 -0.0476 -0.0034 0.2885 0.0965 0.7798
[12,] -0.0546 -0.4880 -0.0866 0.0591 0.2024 0.0675 -0.1463 -0.1012 -0.0898 0.0250 0.0965 0.2689 0.5671
[13,] 0.0000 0.0000 -1.7375 2.0623 0.8977 -0.0315 -0.7607 -0.5770 -0.2811 -0.0379 0.7798 0.5671 10.0190
Correlation matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1.0000 0.9631 -0.1849 0.1464 -0.0311 -0.1742 0.2944 -0.0602 0.1589 -0.1437 -0.1969 -0.2104 0.0000
[2,] 0.9631 1.0000 -0.1458 0.1356 -0.0165 -0.1931 0.3435 -0.0969 0.1993 -0.1086 -0.1831 -0.2264 0.0000
[3,] -0.1849 -0.1458 1.0000 -0.5040 -0.4413 -0.2070 0.1565 0.0862 0.0196 0.2233 -0.1828 -0.1854 -0.6092
[4,] 0.1464 0.1356 -0.5040 1.0000 0.1909 -0.3343 -0.4901 -0.4894 -0.0029 -0.0280 0.3983 0.1017 0.5814
[5,] -0.0311 -0.0165 -0.4413 0.1909 1.0000 0.0737 -0.3159 -0.4844 -0.3160 -0.1667 0.4700 0.5277 0.3834
[6,] -0.1742 -0.1931 -0.2070 -0.3343 0.0737 1.0000 -0.0770 0.2021 -0.3479 0.0393 -0.1302 0.1856 -0.0142
[7,] 0.2944 0.3435 0.1565 -0.4901 -0.3159 -0.0770 1.0000 0.3039 0.2737 -0.2266 -0.5151 -0.4653 -0.3964
[8,] -0.0602 -0.0969 0.0862 -0.4894 -0.4844 0.2021 0.3039 1.0000 0.1676 0.1008 -0.4991 -0.3195 -0.2983
[9,] 0.1589 0.1993 0.0196 -0.0029 -0.3160 -0.3479 0.2737 0.1676 1.0000 -0.1350 -0.2140 -0.4179 -0.2143
[10,] -0.1437 -0.1086 0.2233 -0.0280 -0.1667 0.0393 -0.2266 0.1008 -0.1350 1.0000 -0.0140 0.1050 -0.0261
[11,] -0.1969 -0.1831 -0.1828 0.3983 0.4700 -0.1302 -0.5151 -0.4991 -0.2140 -0.0140 1.0000 0.3466 0.4586
[12,] -0.2104 -0.2264 -0.1854 0.1017 0.5277 0.1856 -0.4653 -0.3195 -0.4179 0.1050 0.3466 1.0000 0.3455
[13,] 0.0000 0.0000 -0.6092 0.5814 0.3834 -0.0142 -0.3964 -0.2983 -0.2143 -0.0261 0.4586 0.3455 1.0000
Median: 0.5007 5.5049 -2.9235 1.7224 -0.526 0.5637 -0.5822 0.7309 -0.0554 0.5883 0.0034 -0.208 4.9953
Mean: 0.5 5.5 -2.9803 1.6699 -0.5887 0.5886 -0.531 0.8074 -0.0202 0.5767 -0.0438 -0.246 5
MCD-estimated:
MDC-0.975-Mean: 0.45 5.0767 -2.5109 1.3426 -0.8244 0.6016 -0.4631 0.825 0.0098 0.5845 -0.1444 -0.3825 3
MDC-0.750-Mean: 0.5159 5.7134 -2.5315 1.4633 -0.8073 0.4965 -0.4265 0.8424 0.0078 0.6055 -0.1521 -0.4031 3.1242
MDC-0.500-Mean: 0.5808 6.1078 -2.8884 1.7077 -0.786 0.4062 -0.4314 0.8473 0.0914 0.4699 -0.0901 -0.4441 3.9311
Class2: n= 462
Covariance matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 0.2454 1.8407 0.0346 0.0912 -0.0487 0.0531 0.0121 0.0584 -0.0614 -0.0745 -0.0006 0.0919 0.0000
[2,] 1.8407 14.8076 0.0192 0.8937 -0.4954 0.3621 0.1423 0.4198 -0.4907 -0.6062 -0.0874 0.8986 0.0000
[3,] 0.0346 0.0192 0.5694 -0.2785 -0.1439 -0.0223 -0.1300 0.0216 0.0420 -0.0774 -0.0119 0.0786 -0.6309
[4,] 0.0912 0.8937 -0.2785 1.4177 -0.3074 -0.6624 -0.3132 -0.0019 0.2496 0.2919 -0.0161 -0.2901 2.3842
[5,] -0.0487 -0.4954 -0.1439 -0.3074 0.4458 0.2091 0.0815 -0.1447 -0.1605 0.0351 0.0312 0.0140 -0.5654
[6,] 0.0531 0.3621 -0.0223 -0.6624 0.2091 0.6607 0.1885 -0.0691 -0.2809 -0.2335 0.0785 0.2196 -1.4070
[7,] 0.0121 0.1423 -0.1300 -0.3132 0.0815 0.1885 0.4026 0.0503 -0.1393 -0.1593 -0.0996 0.0979 -0.3427
[8,] 0.0584 0.4198 0.0216 -0.0019 -0.1447 -0.0691 0.0503 0.2786 0.0518 -0.0075 -0.0636 -0.0625 0.0734
[9,] -0.0614 -0.4907 0.0420 0.2496 -0.1605 -0.2809 -0.1393 0.0518 0.2608 0.0900 -0.0126 -0.1327 0.4993
[10,] -0.0745 -0.6062 -0.0774 0.2919 0.0351 -0.2335 -0.1593 -0.0075 0.0900 0.3229 0.0065 -0.2279 0.9318
[11,] -0.0006 -0.0874 -0.0119 -0.0161 0.0312 0.0785 -0.0996 -0.0636 -0.0126 0.0065 0.2027 -0.0364 -0.2090
[12,] 0.0919 0.8986 0.0786 -0.2901 0.0140 0.2196 0.0979 -0.0625 -0.1327 -0.2279 -0.0364 0.4004 -1.0323
[13,] 0.0000 0.0000 -0.6309 2.3842 -0.5654 -1.4070 -0.3427 0.0734 0.4993 0.9318 -0.2090 -1.0323 10.0217
Correlation matrix:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1.0000 0.9656 0.0926 0.1546 -0.1471 0.1319 0.0385 0.2234 -0.2427 -0.2647 -0.0028 0.2931 0.0000
[2,] 0.9656 1.0000 0.0066 0.1951 -0.1928 0.1158 0.0583 0.2067 -0.2497 -0.2772 -0.0504 0.3690 0.0000
[3,] 0.0926 0.0066 1.0000 -0.3100 -0.2856 -0.0363 -0.2715 0.0543 0.1091 -0.1805 -0.0351 0.1647 -0.2641
[4,] 0.1546 0.1951 -0.3100 1.0000 -0.3867 -0.6845 -0.4145 -0.0030 0.4105 0.4314 -0.0301 -0.3851 0.6325
[5,] -0.1471 -0.1928 -0.2856 -0.3867 1.0000 0.3853 0.1924 -0.4107 -0.4706 0.0924 0.1038 0.0331 -0.2675
[6,] 0.1319 0.1158 -0.0363 -0.6845 0.3853 1.0000 0.3655 -0.1611 -0.6766 -0.5056 0.2144 0.4269 -0.5468
[7,] 0.0385 0.0583 -0.2715 -0.4145 0.1924 0.3655 1.0000 0.1501 -0.4299 -0.4417 -0.3486 0.2439 -0.1706
[8,] 0.2234 0.2067 0.0543 -0.0030 -0.4107 -0.1611 0.1501 1.0000 0.1920 -0.0251 -0.2677 -0.1873 0.0439
[9,] -0.2427 -0.2497 0.1091 0.4105 -0.4706 -0.6766 -0.4299 0.1920 1.0000 0.3101 -0.0549 -0.4106 0.3088
[10,] -0.2647 -0.2772 -0.1805 0.4314 0.0924 -0.5056 -0.4417 -0.0251 0.3101 1.0000 0.0253 -0.6339 0.5180
[11,] -0.0028 -0.0504 -0.0351 -0.0301 0.1038 0.2144 -0.3486 -0.2677 -0.0549 0.0253 1.0000 -0.1278 -0.1467
[12,] 0.2931 0.3690 0.1647 -0.3851 0.0331 0.4269 0.2439 -0.1873 -0.4106 -0.6339 -0.1278 1.0000 -0.5153
[13,] 0.0000 0.0000 -0.2641 0.6325 -0.2675 -0.5468 -0.1706 0.0439 0.3088 0.5180 -0.1467 -0.5153 1.0000
Median: 0.3587 8.2849 -3.4217 2.1937 -0.4829 0.3294 -0.1385 0.41 0.0857 0.0998 -0.6072 0.12 5.0087
Mean: 0.4286 8.7143 -3.4591 2.1239 -0.4153 0.4319 -0.0481 0.4278 0.0137 0.0621 -0.5992 0.1282 5
MCD-estimated:
MDC-0.975-Mean: 0 5.5 -3.5196 1.9647 -0.3303 0.3392 -0.0692 0.3258 0.121 0.1922 -0.5981 -0.0322 5
MDC-0.750-Mean: 0 5.5 -3.5196 1.9647 -0.3303 0.3392 -0.0692 0.3258 0.121 0.1922 -0.5981 -0.0322 5
MDC-0.500-Mean: 0 5.5 -3.5196 1.9647 -0.3303 0.3392 -0.0692 0.3258 0.121 0.1922 -0.5981 -0.0322 5
Measures:
Mah.Dist: 3.9868
Mah.Dist-MCD-0.975: 4.281
Mah.Dist-MCD-0.750: 4.0891
Mah.Dist-MCD-0.500: 4.0898
Zuletzt geändert am 17.02.2013
|
|
|
|