Content: Practical guides to data analysis, comprised of peer-reviewed datasets and tools to manage data.
Purpose: Use to learn and practice data analysis including cleaning and normalizing data.
These websites may contain government data or publications that are no longer on the web.
Internet Archive's Wayback Machine: The Internet Archive has been crawling and preserving websites since the late 1990s.
DataLumos: An ICPSR archive for government data resources. DataLumos accepts deposits of public data resources from the community and recommendations of public data resources that ICPSR itself might add to DataLumos.
Dataverse: Researcher deposited data from many disciplines. Includes some public datasets used in research projects for replication or archiving purposes.
Dimensions: Register for a free account to search journal publications and datasets. Datasets are generally tied to a publication, but some are public datasets the authors have archived copies of.
ICPSR : Social Sciences data repository, including many government datasets.
IPUMS: Primarily provides access to public use microdata from government surveys including data related to demographics, socioeconomic factors, health, higher ed, as well as international data. They also have copies of the recently removed, historical volumes of the Decennial Census of Population and Housing.
NIH Figshare: Find NIH funded data in this repository.
PolicyMap: Multi-disciplinary interactive online mapping and visualization tool that provides access to data, including, some government data and some private sector data.
Public Environmental Data Partners: In the process of archiving 57 high priority climate change related datasets. Currently available: CDC’s Social Vulnerability Index and Environmental Justice Index, Council on Environmental Quality EJScorecard, and Climate and Economic Justice Screening Tool.
Content: Practical guides to data analysis, comprised of peer-reviewed datasets and tools to manage data.
Purpose: Use to learn and practice data analysis including cleaning and normalizing data.
A dataset (also spelled ‘data set’) is a collection of raw statistics and information generated by a research study. Datasets produced by government agencies or non-profit organizations can usually be downloaded free of charge. However, datasets developed by for-profit companies may be available for a fee.
Most datasets can be located by identifying the agency or organization that focuses on a specific research area of interest. For example, if you are interested in learning about public opinion on social issues, Pew Research Center would be a good place to look. For data about population, the U.S. government’s Population Estimates Program from American Factfinder would be a good source.
An “open data” philosophy is becoming more common among governments and business organizations around the world, with the belief that data should be freely accessible. Open data efforts have been led by both the government and non-government organizations such as the Open Knowledge Foundation. Learn more by exploring The Open Data Handbook. There is also a growing trend in what is being called “Big Data”, where extremely large amounts of data are analyzed for new and interesting perspectives, and data visualization, which is helping to drive the availability and accessibility of datasets and statistics.
Don't know where to begin? Here is a quick view of our recommendations.
Site | Structure | Source Type | Topics |
Data.gov | Repository | Public | U.S. Environment, Climate, Health, Government |
DataPlanet | Repository | Public | Multidisciplinary |
Dept. of Education | Website | Public | Education, Educational Institutions |
Dryad | Repository | Public | Health, Biology |
Google Dataset Search | Search Engine | Public | Multidisciplinary |
Harvard Dataverse | Repository | Public | Multidisciplinary, *Social Sciences |
Healthdata.gov | Repository | Public | Health, Healthcare |
ICPSR | Repository | 3rd Party | Multidisciplinary, *Social Sciences |
Kaggle | Repository | Public | Multidisciplinary |
Mendeley Data | Search Engine | Public | Multidisciplinary |
National Artificial Intelligence Research Resource Pilot (NAIRR Pilot) | Repository | Public | AI, Computer Science, Multidisciplinary |
NCES | Repository | Public | Education, Educational Institutions |
Pew Research Center | Website | Public | Social Science Demographics, Trends |
Quandl | Repository | Mixed | Financial, Business |
Re3 | Registry of Repositories | Public | n/a |
Registry of Open Data on AWS | Search Engine | Mixed | Multidisciplinary |
Statista (*contains mostly aggregated data, raw data may be available through clicking on "source link") | Database | Subscription (provided by NU Library) | All, *Business |
Zenodo | Repository | Public | Multidisciplinary |
* Indicates that datasets on this topic are prominent in the source
For additional information about locating statistics, please see our Statistics page.
Content: Practical guides to data analysis, comprised of peer-reviewed datasets and tools to manage data.
Purpose: Use to learn and practice data analysis including cleaning and normalizing data.
Sources for statistics on hospitals and/or hospital spending.
A list of public datasets by topic, from the Society of General Internal Medicine.
Not sure where or how to start your data search? This webinar provides a basic overview of how to find datasets using Google Dataset Search and other dataset directories/repositories, and answer any questions you bring to the session.
Google Dataset Search is a search engine across metadata for millions of datasets in thousands of repositories across the Web. Similar to how Google Scholar works, Google Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.
Dataset Search can be useful to a broad audience, whether you're looking for scientific data, government data, or data provided by news organizations. Simply enter what you are looking for, and the results will guide you to the published dataset on the repository provider’s site.
Persistent links to datasets may be found by clicking on the share icon. You can may then copy/paste the link to share or save the location.
Content: The Association of Computing Machinery database is a research, discovery and network platform. The database provides journals, conference proceedings, technical magazines, newsletters and books.
Purpose: An essential database computing and technology research topics.
Special Features: Provides a list of authors after an initial topic search, includes a dataset search filter, and the ability to sort results by most cited.
Use the following steps to locate the actual dataset used in a research article within the ACM Digital Library database.
Content: Full-text peer-reviewed journals, transactions, magazines, conference proceedings, and published standards in the areas of electrical engineering, computer science, and electronics.
Purpose: Users may learn about technology industry information
Special Features: Users may search datasets
To limit to full-text only, change the results from "All Results" to "My Subscribed Content".
Use the following steps to locate the actual dataset used in a research article within the IEEE Xplore Digital Library database.
© Copyright 2025 National University. All Rights Reserved.