# Cycle tour through multivariate statistics: Part 4 Principal Component Analysis (Factor Analysis)

The research question I wish to answer using principal component analysis is: can any factors be extracted from the continuous variables with medium or high correlations? Based upon the Pearson correlation, see Table 2, the following variables have a correlation ? 0.3: Age, BudgetAccom, Budget, RideDays, TotalDays, TotalDist, and Variety; therefore, I used these variables in a principal component analysis to see if I could find any common factors that would allow me to reduce the number of variables for analysis. I ran a principal component analysis. I checked that KMO > .6 and that Barlett’s test of sphericity was significant. Based upon this, I concluded that my data set was appropriate for further analysis, see Table 3. I checked the scree plot to see if I could visually see factors. The scree plot indicated two factors, see Figure 1. Based upon the scree plot, the curve changes after two factors, indicating two factors. I examined the total variance explained table to determine that the first factor explained 41.2% of the variance and the second factor explained 22.1% of the variance. I examined the rotated component matrix. This again validate the presence of two factors. Given my sample size, and the variables associated with the factors, I decided on a cutoff of absolute .7. This meant that I had two factors: 1) TotalDist, TotalDays, and Variety; and, 2) Budget and BudgetAccom. Based on what these variables represented, I decided to name the factors: 1) tourAwesomeness—a measure of how awesome the tour is, and 2) tourMoney—a measure of how money is used on the tour. I ran a principal component analysis with only the five variables associated with the two factors, forcing a two-factor solution. The resulting analysis had KMO = .647 and Barlett’s test was significant. The total variance explained in the two-factor solution was 51.6% for the first factor and 28.0% for the second factor. I ran a principal component analysis with only the variables in the first factor (forcing a single factor, and adding the scores option to display factor score coefficient matrix). The KMO and Barlett’s test indicated a valid analysis. I examined the component score coefficient matrix, and noted that the values were roughly equal. Based on that, I decided to create the tourAwesomeness factor using the average zScore of the three variables, so tourAwesomeness =  (zTotalDist + zTotalDays + zVariety)/3. I ran a principal component analysis with only the variables in the second factor. The KMO = .5 and Bartlett’s test was significant. Since I had already established the two factors, I determined that the sample size was adequate to determine coefficient weighting (so, KMO < .6 was OK). I examined the component score matrix, and noted that the values were equal but opposite. Based on that, I decided to create the tourMoney factor using the following formula: tourMoney = (zBudget – zBudgetAccom)/2.