Research & Subject Guides: Research Data Services : Organizing and Describing Data

Why Good File Management is Important?

Another important aspect of research data management is developing effective file management practices. Proper file management helps in identifying, locating, and maintaining data efficiently and effectively.

Furthermore, establishing consistent naming conventions is crucial for distinguishing between files. This becomes especially important when dealing with multiple files in various formats, as it allows for quick and easy retrieval.

Tips for Naming and Organizing Files

Tips for Organizing and Naming Data Files

Organizing electronic files systematically and consistently in folders will save you time when you and your research partners are searching for them. Additionally, applying file naming conventions (FNC) helps bring order to a complex group of files, allowing you to logically group files with similar information.

Tips for Naming Your Files

You don't have to include all of these elements, but a good file name may include:

Version Name
Date the Record was Created
Initials of Person Who Created the Record
Short Description of Record Contents
Name of Research Team Associated with the Data Record
Date Data was Published
Project Name and Number
Type of Data

Best Practices for Naming Files

Keep file names short: The standard length is less than 25 characters.
Use a standard and meaningful vocabulary: Ensure everyone uses a common language for file names.
Format dates consistently: Use Year-Month-Date (e.g., YYYY-MM-DD, YYYY-MM, or YYYY-YYY).
Avoid blank spaces: Use underscores, dashes, or no separation (e.g., file_name.xx, file-name.xx, filename.xx).
Avoid special characters: Refrain from using characters like ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' ".
Use leading zeros in numbering: For sequential numbering, use "001, 002, ...010, 011".
Include file extensions: Differentiate file types and versions (e.g., docx, xls, pdf, jpeg).

Keeping Track of File Versions (Version Control)

Versioning occurs when you make changes to an existing file, resulting in saving a new copy. It is crucial to document new versions so the latest version can be easily identified. Use ordinal numbers for major versions and decimals for minor changes (e.g., v1, v1.1, v2.6).

Metadata Creation

Metadata is critical for managing, discovering, and understanding datasets. It provides essential information about the data, making it easier for users to find, use, and interpret the data effectively. Here's a summary on creating metadata for a dataset:

What is Metadata?

Metadata is data about data. It includes descriptions and attributes that provide context and meaning to the dataset, such as the who, what, when, where, why, and how of the data.

Key Elements of Metadata

When creating metadata for a dataset, include the following key elements:

Title:
- A clear and concise title that accurately reflects the content of the dataset.
Creator:
- Information about the individuals or organizations responsible for creating the dataset.
Description:
- A detailed description of the dataset, including its purpose, scope, and context.
Date:
- Dates relevant to the dataset, such as creation date, publication date, and data collection period.
Format:
- The file format(s) of the dataset (e.g., CSV, JSON, Excel).
Keywords:
- Keywords or tags that describe the content and subject matter of the dataset, aiding in search and discovery.
Methodology:
- Details about how the data was collected, processed, and analyzed, including instruments and software used.
Usage Rights:
- Information on the usage rights, licenses, and restrictions associated with the dataset.
Contact Information:
- Contact details for the person or organization responsible for the dataset, for users who may have questions.
Geographical Coverage:
- The spatial coverage of the data, specifying the geographical area to which the data pertains.
Subject:
- The main subject area or theme of the dataset.
Identifier:
- Unique identifiers for the dataset, such as DOIs (Digital Object Identifiers) or other persistent identifiers.
Language:
- The language in which the dataset and its metadata are written.

Metadata Formats and Standards

When determining the appropriate metadata schema for your dataset, there are a few valuable resources to consult.

The Digital Curation Centre offers a comprehensive catalog of metadata standards organized by discipline, which can be accessed on their website: Digital Curation Centre Metadata Standards.

The Research Data Alliance (RDA) also provides a useful "Metadata Directory," listing potential metadata standards by discipline. You can explore their offerings at: RDA Metadata Directory.

Links to Commonly Used Metadata Schemas

General/Multidisciplinary

Dublin Core: A simple and widely used standard for cross-domain information resource description.

Life Sciences

Minimum Information About a Microarray Experiment (MIAME): Standards for microarray data to ensure that the data can be easily interpreted and verified.
Data Documentation Initiative (DDI): Standard for documenting social, behavioral, economic, and health sciences data.
Minimum Information Required by Biological and Biomedical Investigations (MIBBI): A set of guidelines for the reporting of biological and biomedical research.

Earth and Environmental Sciences

Federal Geographic Data Committee (FGDC): Standards for documenting geospatial data.
ISO 19115: An international standard for the description of geographic information and services.
Climate and Forecast (CF) Metadata Conventions: Standards for climate and forecast data.

Physical Sciences

International Virtual Observatory Alliance (IVOA): Standards for astronomical data and services.

Social Sciences

Data Documentation Initiative (DDI): A metadata standard for documenting and managing data in the social, behavioral, economic, and health sciences.

Humanities

Text Encoding Initiative (TEI): Guidelines for encoding and exchanging digital texts.

Health and Medicine

Health Level 7 (HL7): Standards for the exchange of clinical and administrative data.
Clinical Data Interchange Standards Consortium (CDISC): Standards for clinical research data.

Education

Learning Object Metadata (LOM): A standard for educational resources.

Arts and Humanities

Categories for the Description of Works of Art (CDWA): Standards for describing works of art and cultural objects.
VRA Core: A data standard for the description of works of visual culture as well as the images that document them.