Skip to Main Content

Research Process

These pages offer an introduction to the research process at a very general level.

Datasets

A dataset (also spelled ‘data set’) is a collection of raw statistics and information generated by a research study. Datasets produced by government agencies or non-profit organizations can usually be downloaded free of charge. However, datasets developed by for-profit companies may be available for a fee.

Most datasets can be located by identifying the agency or organization that focuses on a specific research area of interest. For example, if you are interested in learning about public opinion on social issues, Pew Research Center would be a good place to look. For data about population, the U.S. government’s Population Estimates Program from American Factfinder would be a good source.

An “open data” philosophy is becoming more common among governments and business organizations around the world, with the belief that data should be freely accessible. Open data efforts have been led by both the government and non-government organizations such as the Open Knowledge Foundation. Learn more by exploring The Open Data Handbook. There is also a growing trend in what is being called “Big Data”, where extremely large amounts of data are analyzed for new and interesting perspectives, and data visualization, which is helping to drive the availability and accessibility of datasets and statistics.

Don't know where to begin? Here is a quick view of our recommendations.

Site Structure Source Type Topics
Data.gov Repository Public U.S. Environment, Climate, Health, Government
DataPlanet Repository Public Multidisciplinary
Dept. of Education Website Public Education, Educational Institutions
Dryad Repository Public Health, Biology
Google Dataset Search Search Engine Public Multidisciplinary
Harvard Dataverse Repository Public Multidisciplinary, *Social Sciences
Healthdata.gov Repository Public Health, Healthcare
ICPSR Repository 3rd Party Multidisciplinary, *Social Sciences
Kaggle Repository Public Multidisciplinary
Mendeley Data Search Engine Public Multidisciplinary
National Artificial Intelligence Research Resource Pilot (NAIRR Pilot) Repository Public AI, Computer Science, Multidisciplinary
NCES Repository Public Education, Educational Institutions
Pew Research Center Website Public Social Science Demographics, Trends
Quandl Repository Mixed Financial, Business
Re3 Registry of Repositories Public n/a
Registry of Open Data on AWS Search Engine Mixed Multidisciplinary
Statista (*contains mostly aggregated data, raw data may be available through clicking on "source link") Database Subscription (provided by NU Library) All, *Business
Zenodo Repository Public Multidisciplinary

* Indicates that datasets on this topic are prominent in the source

For additional information about locating statistics, please see our Statistics page.

Subject Specific and Additional Dataset Resources

Health Dataset Sites

Sources for statistics on hospitals and/or hospital spending.

A list of public datasets by topic, from the Society of General Internal Medicine.

Digging for Data Webinar

Not sure where or how to start your data search? This webinar provides a basic overview of how to find datasets using Google Dataset Search and other dataset directories/repositories, and answer any questions you bring to the session.

Searching for Datasets Online

Google Dataset Search is a search engine across metadata for millions of datasets in thousands of repositories across the Web. Similar to how Google Scholar works, Google Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.

Dataset Search can be useful to a broad audience, whether you're looking for scientific data, government data, or data provided by news organizations. Simply enter what you are looking for, and the results will guide you to the published dataset on the repository provider’s site.

Screenshot of search results for Google Dataset Search

 

Persistent links to datasets may be found by clicking on the share icon. You can may then copy/paste the link to share or save the location.

Screenshot showing the share feature in Google Dataset Search

  • To find open data for a particular U.S. state or country, try using a search engine and the keywords: open data [name of state or country] , as shown in the image below.

Screenshot of Google search results for search terms arizona open data.

  • You can also search Google for datasets by typing in your topic followed by the keywords "raw data" or "datasets". For example, "barriers to AI adoption raw data or datasets".
  • Lastly, you can search Google for xls. file type, which will pull excel documents that might contain raw data. For example, "artificial intelligence filetype: xls"

Locating an Original Dataset from a Journal Article

Use the following steps to locate the actual dataset used in a research article within the ACM Digital Library database. 

  1. Access the ACM Digital Library database from the A-Z Databases List
  2. Using the search box, enter your keyword terms to locate relevant research articles on your topic. 
  3. Datasets from a research article may be included as Zip files or Txt files. Using the filters on the left-hand side of the search results page and under Refine by Publications, limit your results to Zip or Txt under Content Formats.                                                                                                                                   ACM Digital Library search results refined by content formats                                                                                 
  4. Review your results and ensure that icons for Artifacts are included. Artifacts include digital objects that were either created by the authors to be used as part of the study or generated by the experiment itself. For example, artifacts can be software systems, scripts used to run experiments, input datasets, raw data collected in the experiment, or scripts used to analyze results. Below is a list of badges that denote the availability of datasets, including Artifacts Available, Artifacts Evaluated-Functional and Artifacts Evaluated-Reusable:                                                                     Artifact badges in ACM Digital Library                                                                                                                                                                                                                                                                     Below is a sample search result:Sample search result in ACM Digital Library with Artifacts badges displayed  
  5. To access the artifacts for a particular search result, click on the article link. You will be routed to the resource page. Using the Source Material tab, review the content links under Linked Artifacts to access.                                                                                                                                                 ACM search result with linked artifacts under Source Materials

Use the following steps to locate the actual dataset used in a research article within the IEEE Xplore Digital Library database. 

  1. Access the IEEE Xplore Digital Library database from the A-Z Databases List
  2. Using the search box, enter your keyword terms to locate relevant research articles on your topic.                                                                                                   IEEE Xplore Digital Library search box                            
  3. Using the filters on the left-hand side of the search results page, select Datasets under Supplemental Items and apply the filter. Search results will reflect the full research article available in a PDF or HTML format along with access to the dataset, which is indicated by the "dataset icon."                                                                                                                                                                                                       Search Filters in IEEE Xplore Digital Library showing limiter for DatasetsIEEE search result with a dataset

Was this resource helpful?