Reminder: Please complete the O’Reilly login steps (See the Accessing O’Reilly link in the Week 1 Resources area) where you get an email that will allow your browser to keep your password, solving future access limitations.
Mukhiya, S. K., & Ahmed, U. (2020). Hands-on exploratory data analysis with Python. Packt Publishing. Read all sections between Grouping Datasets and Summary.
These sections describe the entire process of exploratory data analysis and the key concepts of performing complete exploratory data analysis.
Datar, R. (2019). Hands-on exploratory data analysis with R. Packt Publishing. Read all sections between Manipulating and Mutating Data and Summary.
These sections describe the basics of exploratory data analysis and the key concepts setting up the data analysis environment.
Ciaburro, G. (2018). Regression analysis with R: Design and develop statistical nodes to identify unique relationships within data. Packt Publishing. Read all sections between Dimensionality Reduction and Summary.
These sections describe how to reduce the number of dimensions using feature selection or extraction while avoiding loss of detail.
Fandango, A. (2017). Python data analysis (2nd ed.). Packt Publishing. Read all sections between Statistics and Linear Algebra and Summary.
These sections discuss the statistics and linear algebra behind exploratory data analysis. The use of the statsmodel library is demonstrated to perform the five-number summary.
Martin, O. (2018). Bayesian analysis with Python (2nd ed.). Packt Publishing. Read all sections between Mixture Models and Summary.
These sections discuss the use of the Gaussian distribution to determine the existence of sub-sets in the data.
Grogan, M. (2018). Machine learning in R – Automated algorithms for business analysis. O'Reilly Media, Inc. Watch the video sections Clustering (K-Means), including Cluster Determination: Within Groups Sum of Squares and K-Means Clustering.
These video segments demonstrate how to run a k-means clustering algorithm.
Fandango, A., Idris, I., & Navlani. A. (2021). Python data analysis (3rd ed.). Packt Publishing. Read Section: Unsupervised Learning – PCA and Clustering.
This section demonstrates how to conduct Principal Components Analysis and Clustering as dimensionality reduction techniques in Python.