Skip to Main Content

Data Science Ph.D Program

A website for the Data Science students in the Doctorate Program

Programming Resources

RStudio: Integrated Development Environment for R

PositTeam. (2023). RStudio: Integrated development environment for R. Posit Software, PBC. Retrieved from http://www.posit.co/

The R Project for Statistical Computing

R Core Team. (2023). R: A language and environment for statistical computing. Retrieved from https://www.R-project.org/

R Markdown: The Definitive Guide

Xie, Y., Allaire, J. J., & Grolemund, G. (2018). R markdown: The definitive guide. CRC Press. https://bookdown.org/yihui/rmarkdown/

Hands on Programming for R

R Programming

R for Data Science

R Programming in One Hour

Advantages and Disadvantages of Reference Storage Methods

Method Advantages Disadvantages
Reference Management Software
- EndNote, Zotero, Mendeley, RefWorks - Automated citation and bibliography generation. - Learning curve to master software features.
- Easy organization with tagging, folders, and search functions. - Potential for software costs, although some tools offer free versions.
- Integration with word processing software for seamless writing and citing. - Risk of data loss if not regularly backed up or synced.
- Ability to import references directly from academic databases.
Spreadsheets
- Excel, Google Sheets - Customizable format to suit personal preferences and specific needs. - Time-consuming manual entry and updating process.
- Free to use and widely accessible. - Limited advanced features for citation and bibliography management.
- Easy to sort and filter references based on various criteria. - Higher potential for human error in data entry.
Manual Filing Systems
- Physical folders, Digital folders - No need for learning new software tools. - Difficulty in searching and retrieving specific references quickly.
- Can be organized in a way that makes sense to the researcher. - Lack of integration with word processing software for automatic citations.
- Useful for managing hard-copy articles and physical books. - Takes up physical space and can become unwieldy with many references.

EndNote. (2024). EndNote Reference Management Software. Retrieved from https://endnote.com/ 

Google Sheets. (2024). Google Sheets.  https://www.google.com/sheets/about/ 

Microsoft Excel. (2024). Microsoft Excel.  https://www.microsoft.com/en-us/microsoft-365/excel 

Mendeley. (2024). Mendeley Reference Manager. https://www.mendeley.com/ 

RefWorks. (2024). RefWorks.  https://refworks.proquest.com/ 

Zotero. (2024). Zotero.  https://www.zotero.org/ 

 

Google Colab

Google Colaboratory. (2022). Google.com. https://colab.research.google.com/notebooks/intro.ipynb
Under Google Drive, click New | More | Google Colaboratory. Set up a Google account to use this open-source Python platform to support univariate analysis projects. The Getting Started section under the Table of Contents is helpful to those new to the environment.

GitHub Docs

GitHub docs. (n.d.). GitHub. https://docs.github.com/en/get-started/start-your-journey/hello-world

Git started on your first repository in the third installment of GitHub for Beginners. Discover the essential features and settings to manage your projects effectively.

Google Colaboratory

Google Colaboratory. (n.d.). Google Colab. https://colab.google/

Colab is a hosted Jupyter Notebook service that requires no setup to use and provides free access to computing resources, including GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units). Colab is especially well suited to machine learning, data science, and education.

Posit Cloud Documentation

Posit Cloud Documentation. (n.d.). Posit. https://docs.posit.co/cloud/

Posit makes it easy to deploy open-source data science work across the enterprise safely and securely. Share Jupyter notebooks, Plotly dashboards, or interactive applications built with popular R and Python frameworks. You may want to review documentation on how to set up a Posit Cloud Account.

GitHub Learning Labs

GitHub Learning Labs: https://github.com/apps/github-learning-lab

NVIDIA LAMDA LABS

https://lambdalabs.com/service/gpu-cloud/faqs

Private Cloud is the ideal solution for organizations requiring a single-tenant cluster with greater than 512x GPUs. Private Cloud customers have low-level access to their cluster infrastructure. Managed Kubernetes, Preinstalled Kubernetes, and Slurm are available. ML Engineers and Researchers have put together guides and tutorials to quickly get you up and running on Lambda's cloud and hardware solutions.

 

VIT: Visual Inference Tools

VIT is a developing collection of software modules for use in teaching and learning experiences aimed at developing the core concepts of statistical inference. VIT is desktop software.

Data manipulation tools

U.S. Department of Energy Office of Science. (n.d.). Data manipulation tools. Advanced Photon Source