Scatter Matrices using pandas
Using pandas we can create scatter matrices to easily visualise any trends in our data. Pandas uses matplotlib to display scatter matrices.
We start with our imports and tell matplotlib to display visuals inline
In [1]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
%matplotlib inline
plt.style.use('ggplot')
In [2]:
# Load some data
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris['data'], columns=iris['feature_names'])
iris_df['species'] = iris['target']
pd.scatter_matrix(iris_df, alpha=0.2, figsize=(10, 10))
plt.show()
Instead of having histograms on the diagonals to display density, we could view the more aesthetically pleasing kernel density estimation(KDE).
KDE is a non-parametric way to estimate the probability density function of any variable we wish to view.
In [3]:
pd.scatter_matrix(iris_df, alpha=0.2, figsize=(10, 10), diagonal='kde')
plt.show()