Skip to Main Content

Research Data Toolkit

This guide is intended to help researchers, data creators or others who manage digital data as part of a research project, plan, organize, describe, share and preserve their research data for the long term.

README File Creation

A README file is an essential component when sharing datasets. It provides context and instructions, helping others understand and effectively use the data. Below are the steps and key elements to include in a README file:

1. Title

  • Provide a clear and descriptive title for your dataset.

2. Introduction

  • Dataset Name: The name of your dataset.
  • Description: A brief overview of what the dataset contains and its purpose.
  • Authors: List the authors or creators of the dataset.
  • Date of Creation: The date when the dataset was created.

3. Dataset Description

  • Contents: Describe what is included in the dataset (e.g., number of files, types of files).
  • Variables: List and explain each variable or column in the dataset, including data types and possible values.
  • Data Collection: Explain how the data was collected, including any relevant methodologies or instruments used.

4. File Information

  • File Structure: Describe the structure and format of the files (e.g., CSV, Excel, JSON).
  • Naming Conventions: Explain the naming conventions used for the files and variables.
  • Size: Indicate the size of the dataset and its individual files.

5. Usage Notes

  • Instructions: Provide detailed instructions on how to access, open, and use the dataset.
  • Requirements: List any software or tools needed to use the dataset.
  • Examples: Offer examples of how to load and analyze the data, possibly with code snippets.

6. Data Quality and Limitations

  • Quality: Discuss the quality of the data, including any known issues or anomalies.
  • Limitations: Mention any limitations or constraints of the dataset that users should be aware of.

7. Citation

  • How to Cite: Provide the preferred citation format for users who reference your dataset in their work.
  • DOI: If available, include the Digital Object Identifier (DOI) for the dataset.

8. Licensing

  • License: Specify the license under which the dataset is shared (e.g., Creative Commons, MIT).
  • Terms of Use: Outline any specific terms of use or restrictions associated with the dataset.

9. Contact Information

  • Contact Details: Provide contact information for the authors or the person responsible for the dataset, in case users have questions or need further assistance.

10. Versioning

  • Version History: Include a version history with dates and descriptions of changes or updates made to the dataset.

Sample README Files

You can find sample README files in various places online. Here are some reliable sources where you can access sample README files to guide you in creating your own:​​​​​​​

  • GitHub Repositories

    • GitHub is a popular platform for sharing code and data. Many repositories include README files that can serve as good examples. You can search for repositories related to your field of interest and review their README files.
  • Kaggle Datasets

    • Kaggle is a platform for data science competitions and data sharing. Many datasets on Kaggle include README files that provide comprehensive details about the data.
  • Zenodo

    • Zenodo is an open-access repository where researchers can share datasets, software, and other research outputs. Many datasets on Zenodo include README files.
  • Dryad Data Repository

    • Dryad is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Datasets on Dryad often include detailed README files.
  • Figshare Data

    • Figshare is an online digital repository where researchers can preserve and share their research outputs, including datasets with README files.
  • Open Science Framework (OSF)

    • OSF is a free and open platform to support research and enable collaboration. Projects on OSF often include README files.
  • Public Data Repositories

    • Harvard Dataverse: The Harvard Dataverse is an online data repository where researchers can share, preserve, cite, explore, and analyze research data. Many datasets include comprehensive README files.
    • ICPSR: The Inter-university Consortium for Political and Social Research provides access to an extensive archive of social science data. Datasets often come with detailed documentation and README files.
    • Data.gov: Data.gov is the U.S. government's open data portal. It includes datasets from various federal agencies, many of which include README files for better understanding and use.

README Template

Sample README