Skip to Main Content

Data Science Ph.D Program

A website for the Data Science students in the Doctorate Program

Data Science and Artificial Intelligence

Data science / John D. Kelleher and Brendan Tierney.

Kelleher, J. D. (2018). Data science (B. Tierney (Ed.)). The MIT Press.

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking

Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. O'Reilly Media.

Data Scientist: The Sexiest Job of the 21st Century

Davenport T., & Patil, D.J. (2012, Oct.). Data Scientist: The Sexiest Job of the 21st Century Harvard Business Review.

The Data Notebook

Rampsy K. (2020). The Data Notebook. What is Data-Driven Research, Pressbooks.pub
Read Chapter 1: What is Data-Driven Research? Understanding the different types of research methods is essential to ensure that the correct method is used during a study. In this chapter, you will be introduced to DDR or data-driven research.

Bayesian Analysis with Python

Martin, O. (2018, December). Bayesian analysis with Python - Second Edition. Packt Publishing.

Ethical Data Mining Applications for Socio-Economic Development

Rahman, H., & Ramos, I. (Eds.). (2013). Ethical data mining applications for socio-economic development. IGI Publishing.

Ethics of data mining

Cook, J. (2005). Ethics of data mining. Rochester Institute of Technology.

Data Ethics

Samiksha S., Jossy P. G., Kapil T.,(2022) Data Ethics and Challenges

Front Matter - Data Ethics and Challenges

Data Ethics and Challenges, 2022

Data and Its Dimensions

Samiksha S., Jossy P. G., Kapil T., (2022), Data Ethics and Challenges

Culture of Ethics in Adopting Learning Analytics

Augmented Intelligence and Intelligent Tutoring Systems, 2023, Volume 13891

Dimitrios Tzimas, Stavros Demetriadis

Under Construction

Business Statistics: For Contemporary Decision Making, 7th Edition

Black K. (2011). Business statistics: contemporary decision making. John Wiley.

 

NIST/SEMATECH Engineering Statistics Handbook

Smeaton, A. (2003). NIST/SEMATECH Engineering Statistics Handbook. https://www.itl.nist.gov/div898/handbook/

INFORMS Analytics Body Of Knowledge

Cochran, J.J. (2018, October). INFORMS analytics body of knowledge. Wiley.

 

Python Data Analysis – Third Edition

Navlani, A., Fandango, A., & Idris, I. (2021, February). Python data analysis-third edition. Packt Publishing.

 

Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators

Hsing, T., & Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley.

Hands-on Exploratory Data Analysis with R: Become an Expert in Exploratory Data Analysis using R Packages

Datar, R., & Garg, H. (2019). Hands-on exploratory data analysis with R: Become an expert in exploratory data analysis using R packages. O'Reilly Media, Inc.

 

Practical Feature Engineering

Dunning, T. (2020). Practical feature engineering (O’Reilly Media).

Data Mining: Concepts and Techniques, 3rd Edition

Han, J., Kamber, M., & Pei, J. (2011). Data Mining Concepts and Techniques (3rd ed.). Morgan Kaufman - O’Reilly

An Introduction to Statistical Learning with Applications in R

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023). An introduction to statistical learning with applications in R (2nd ed.). Springer International Publishing.

An introduction to statistical learning with applications in Python

James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). An introduction to statistical learning with applications in Python. Springer Nature.

Pattern Recognition and Machine Learning

Bishop, C. M. (2016). Pattern recognition and machine learning.

Deep Learning

Heaton, J. (2018). Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016

Data Mining and Data Warehousing Principles and Practical Techniques

Bhatia, P. (2019). Data mining and data warehousing: Principles and practical techniques. Cambridge University Press.

Data Mining: Concepts and Techniques

Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques (3rd ed.). Elsevier.

Probabilistic Machine Learning : An Introduction

Murphy, K.P. (2022). Probabilistic machine learning : An introduction. The MIT Press.

The Elements of Statistical Learning Data Mining, Inference, and Prediction

Hastie, T., Tibshirani, R., & Friedman, J. (2017). The elements of statistical learning: Data mining, inference, and prediction (2nd ed., Springer Series in Statistics). Springer.

Outlier Analysis

Aggarwal, C. C. (2013). Outlier analysis. Springer.

Association Analysis

Kotu, V. and Deshpande, B. (2018). Association Analysis. In Data Science Concepts and Practice, 2nd Edition. Morgan Kaufmann.

Deep Learning

John D. Kelleher. (2019). Deep Learning. The MIT Press.

Machine Learning with TensorFlow / Chris Mattmann 

Mattmann, C. A. (2020). Machine Learning with TensorFlow (S. Penberthy (Ed.); Second edition.). Manning.

Aurélien Géron on Hands-On Machine Learning with Scikit-LearnKeras, and TensorFlowthird edition

Aurélien Géron. (2023, January 1). O’Reilly Book Club: Aurélien Géron on Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, third edition. O’Reilly Media, Inc.

 

Building Machine Learning Powered Applications: Going from Idea to Product by Emmanuel Ameisen (2022) 

Ameisen, E. (2020). Building machine learning powered applications : going from idea to product.

Deep Learning with Python, Second Edition by François Chollet (2022) 

Francois Chollet. (2021, January 1). Deep Learning with Python, Second Edition, Video Edition. Manning Publications.

Deep Learning for Coders with fastai and PyTorch, Revised Edition by Jeremy Howard and Sylvain Gugger (2022)

Jeremy Howard, & Sylvain Gugger. (2020). Deep Learning for Coders with fastai and PyTorch. O’Reilly Media, Inc.

Generative Adversarial Networks applied to Telecom Data - Using GANs to generate synthetic features regarding Wi-Fi signal quality

Espindola, T. S. (2021). Generative Adversarial Networks applied to Telecom Data - Using GANs to generate synthetic features regarding Wi-Fi signal quality.

Advanced Deep Learning with Python: Design and Optimize Neural Network Models by Ivan Vasilev (2023) 

Vasilev, I. (2019). Advanced Deep Learning with Python. Packt Publishing.

Transformers for Natural Language Processing by Denis Rothman (2022) 

Rothman, D. (2024). Transformers for Natural Language Processing and Computer Vision : Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3. Packt Publishing.

Natural Language Processing with Transformers, Revised Edition by Lewis Tunstall, Leandro von Werra, and Thomas Wolf (2022) 

Lewis Tunstall, & Leandro von Werra. (2023, January 1). O’Reilly Book Club: Lewis Tunstall and Leandro von Werra on Natural Language Processing with Transformers, revised edition. O’Reilly Media, Inc.

Introduction to transformers for NLP. With the Hugging Face Library and Models to Solve Problems. 

Jain, S. M. (2022). Introduction to transformers for NLP. With the Hugging Face Library and Models to Solve Problems. 

 

Digital Twins in Urban Informatics

Goodchild, M.F., Connor, D., Fotheringham, A.S. et al. Digital twins in urban informatics. Urban Info 3, 16 (2024). https://doi.org/10.1007/s44212-024-00048-6

Digital Twins Technologies

Wang, W. et al. (2024). Digital Twins Technologies. In: Digital Twin Technologies in Transportation Infrastructure Management. Springer, Singapore. https://doi.org/10.1007/978-981-99-5804-7_2

Digital Twins in Medicine

Laubenbacher, R., Mehrad, B., Shmulevich, I. et al. Digital twins in medicine. Nat Comput Sci 4, 184–191 (2024). https://doi.org/10.1038/s43588-024-00607-6

The Rise of Digital Twins

There has been a growing interest and enthusiasm in using digital twins to accelerate scientific discovery and to help researchers and stakeholders with critical decision-making tasks. Various areas of science – including, but not limited to, engineering, climate sciences, medicine, and social sciences – have realized the potential of digital twins for bringing value and innovation to myriad applications. Nevertheless, many challenges still need to be addressed before the research community can bring the promise of digital twins to fruition. This Focus highlights the state of the art, challenges, and opportunities in developing and using digital twins across different domains, with the goal of fostering discussion and collaboration within the computational science community regarding this burgeoning field.

The role of computational science in digital twins

Willcox, K., Segundo, B. The role of computational science in digital twins. Nat Comput Sci 4, 147–149 (2024). https://doi.org/10.1038/s43588-024-00609-4

An Introduction to Statistical Learning with Applications in R

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023). An introduction to statistical learning with applications in R (2nd ed.). Springer International Publishing.

Introduction to Probability

Blitzstein, J. K., & Hwang, J. (2015). Introduction to probability. CRC Press.

The Theory of Probability : Explorations and Applications

Venkatesh, S. S. (2012). The theory of probability : Explorations and applications. Cambridge University Press.

Designing Surveys: A Guide to Decisions and Procedures

Blair, J., Czaja, R. F., & Blair, E. A. (2013). Designing surveys: A guide to decisions and procedures. Sage publications.

An Introduction to Bootstrap Methods with Applications to R

Chernick, M. R., & LaBudde, R. A. (2011). An Introduction to Bootstrap Methods with Applications to R. John Wiley & Sons.

VIT: Visual Inference Tools

VIT is a developing collection of software modules for use in teaching and learning experiences aimed at developing the core concepts of statistical inference. VIT is desktop software.

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts.

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts.

Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis

McCue, C. (2006). Data mining and predictive analysis: Intelligence gathering and crime analysis. Elsevier Science & Technology.

Data Visualization: A Successful Design Process

Kirk A. (2020) Data Visualization: a successful design process. O’Reilly

Data Visualization and Graph Types

The Data Visualisation Catalogue. (2023). Datavizcatalogue.com. https://datavizcatalogue.com/

Handbook of Data Visualization

Chen, C., Härdle, W., & Unwin, A. (2008). Handbook of data visualization. Springer-Verla.

Storytelling with Data

Cole Nussbaumer, K. (2019). Storytelling with data. O’Reilly.

The Data Visualization Lifecycle

Meeks, E. (2022). The data visualization lifecycle [Video file]. Addison-Wesley Professional.

Please refer to the specific subject above. In this tab you will find the AGI books we recommend.

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking

Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. O'Reilly Media.

Data Science

Kelleher, J. D. , & Tierney, B.. (2018). Data science. The MIT Press Essential Knowledge Series.
Kelleher, one of the pioneers in the data science field, is sharing an introduction to data science and its components. Please read the first chapter until page 28.

The Data Notebook

Rampsy K. (2020). The Data Notebook. What is Data-Driven Research, Pressbooks.pub
Read Chapter 1: What is Data-Driven Research? Understanding the different types of research methods is essential to ensure that the correct method is used during a study. In this chapter, you will be introduced to DDR or data-driven research.

Data Science and Machine Learning with R from A-Z Course [Video]

Galvan J.E. (2020). Data Science and Machine Learning with R from A-Z Course [Video]. O’Reilly.

 

R Programming for Statistics and Data Science

R Programming for Statistics and Data Science (Media from Packt Publishing available freely through O’Reilly Media Inc.). (2018).

 

R in Action, Third Edition

Kabacoff, Robert (2022). R in Action, Third Edition. O’Reilly Online Learning. https://learning.oreilly.com/library/view/r-in-action/9781617296055/

 

R for Data Science

Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science. " O'Reilly Media, Inc."

 

An Introduction to Statistical Learning with Applications in R

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2023). An introduction to statistical learning with applications in R (2nd ed.). Springer International Publishing.

 

RStudio: Integrated Development Environment for R

PositTeam. (2023). RStudio: Integrated development environment for R. Posit Software, PBC. Retrieved from http://www.posit.co/

 

The R Project for Statistical Computing

R Core Team. (2023). R: A language and environment for statistical computing. Retrieved from https://www.R-project.org/

 

An Introduction to Bootstrap Methods with Applications to R

Chernick, M. R., & LaBudde, R. A. (2011). An Introduction to Bootstrap Methods with Applications to R. John Wiley & Sons.

 

Introduction to Data Science in Biostatistics

MacFarland, T. W. (2024). Introduction to Data Science in Biostatistics : Using R, the Tidyverse Ecosystem, and APIs. https://doi.org/10.1007/978-3-031-46383-9

 

Python: Data Analytics and Visualization

Vo.T.H, P., Czygan, M., Kumar, A., & Raman, K. (2017, March). Python: Data analytics and visualization. Packt Publishing.

Python for data analysis, 2nd ed.

McKinney, W. (2017). Python for data analysis, 2nd ed. O’Reilly Media.

 

Data Analysis Using SQL and Excel

Linoff, G. S. (2015). Data analysis using SQL and Excel. John Wiley & Sons, Incorporated.