R Biplot Example

R Biplot Example 6,8/10 2301 votes

The biplot capitalizes on correlations between variables in reducing the dimensionality—for example, variables xand yin the scatterplot of Exhibit 0.1 appear to have high positive correlation and would be represented in a biplot in approximately the same orientation, like x 1and x. An implementation of the biplot using ggplot2. The package provides two functions: ggscreeplot and ggbiplot. Ggbiplot aims to be a drop-in replacement for the built-in R function biplot.princomp with extended functionality for labeling groups, drawing a correlation circle, and adding Normal probability ellipsoids.

  1. R Biplot Examples
  2. R Biplot Example Csv
  3. R Biplot Pca Example

1.5 Biplots and Interpretation

It can be made clear by means of a biplot that graphically displays the results of the PCA.

R Biplot Examples

What is this plot telling us? Each variable that went into the PCA has an associated arrow. Arrows for each variable point in the direction of increasing values of that variable.

R biplot example csv

If you look at the ‘Rating’ arrow, it points towards low values of PC1 - so we know the lower the value of PC1, the higher the Drinker Rating.

R Biplot Example Csv

If you look at the ‘ABV’ arrow, it also points towards low values of PC1 - so we know the lower the value of PC1, the higher the Alcohol Content.

So we now understand that our summary score that we obtained for each beer, that is, the value of PC1, is lower if a beer is both well-regarded and high in alcohol and higher if a beer is not well-regarded and low in alcohol.

R Biplot Example

The arrows on the biplot are actually representative of the eigenvectors (loadings), so we could just as easily obtain this information from the matrix of the loadings:

R Biplot Pca Example

R biplot examples

[begin{array}{ccc} & PC1 & PC2 ABV & -0.707 & 0.707 Rating & -0.707 & -0.707 end{array}]

PC1 is negatively associated with ABV and Rating (the signs of the eigenvectors are negative) and therefore we would expect low values of PC1 to entail high values of ABV & Ratings. PC2 is positively associated with ABV and negatively associated with Rating, so we expect beers with high PC2 scores to be low in alcohol but highly rated.

Note that also from the biplot, we can see that higher ratings are associated with Stout (and not Lager) because the arrow points in the direction of the cluster of Stout points (in purple) and away from the cluster of Lager points (in green). Higher alcohol might be associated with Belgian beers (in orange) and not Wheat beers (in pink).

1.5.1 Extending the Example

What happens if we add some more data into the PCA? Let’s reconduct the PCA and include a new piece of information: the year the beer was released.

[begin{array}{cccc} & PC1 & PC2 & PC3 ABV & -0.65 & 0.31 & -0.70 Rating & -0.67 & 0.20 & -0.70 Year & -0.37 & -0.93 & -0.08 end{array}]

In this case, we see that high values of PC1 are associated with low values of Alcohol Content, low Drinker Rating, and older years. So low values of PC1 are associated with well-regarded beers (loading: -0.65) that are also high in alcohol (loading: -0.67). Low values of PC1 are a little less associated with newness (loading: -0.37). PC1 explains 60.2% of the total variance, making it a fairly good summary measure.

Example

PC2 on the other hand (explaining 29% of the variance), is largely influenced by year (the associated loading is 0.93) - so this implies that there is some aspect of the beer data, independent from being well-regarded and strong, that is explained by the newness of the beer. Note that the composite measure PC2 actually explains less of the variance than any of the given variables (ABV, Rating, Year) alone - since the total variance is 3, each variable alone would explain 33.3% of the variance.