Begin with the “census2.csv” datafile, which contains census data on various tracts in a district. The fields in the data are • Total Population (thousands) • Professional degree (percent) • Employed age over 16 (percent) • Government employed (percent) • Median home value (dollars) a) Conduct a principal component analysis using the covariance matrix (the default for prcomp and many routines in other software), and interpret the results. How much of the variance is accounted for in the first component and why is this? b) Try dividing the MedianHomeValue field by 100,000 so that the median home value in the dataset is measured in \$100,000’s rather than in dollars. How does this change the analysis? c) Compute the PCA with the correlation matrix instead. How does this change the result and how does your answer compare (if you did it) with your answer in b)? d) Analyze the correlation matrix for this dataset for significance, and also look for variables that are extremely correlated or uncorrelated. Discuss the effect of this on the analysis. e) Discuss what using the correlation matrix does and why it may or may not be appropriate in this case.
homework-2.pdf
