Skip to Main Content

Research Data Services @ Bluford Library

This guide is intended to help researchers, data creators or others who manage digital data as part of a research project, plan, organize, describe, share and preserve their research data for the long term.

Principles for Selecting File Formats

Non-Proprietary vs. Proprietary Formats

When saving your files, it is recommended to choose non-proprietary or open-source software formats that are royalty-free and free of intellectual property restrictions, or those that conform to standards in the public domain. Open, non-proprietary formats are more likely to remain accessible even if the software that created them becomes unavailable or non-functional.

Unencrypted Formats

It is also advisable to use unencrypted formats. Unlike encrypted formats, which require passwords or passphrases, unencrypted formats allow you to retrieve data without needing to remember or store additional access credentials. This ensures that your data remains accessible even if passwords are lost or forgotten.

Compressed Files

Compressing files can sometimes lead to permanent or partial data loss. To avoid this, use "lossless" compression formats. Lossless compression is essential when maintaining the integrity of the original dataset is crucial, as it preserves data quality and prevents any changes to the original data.

Examples of Open File Formats

 Type

 Description

 Container / Archive

 File type used for compressing and storing a collection of files and folders to a single file.

 GZIP/TAR- Two of the most common utilities for archiving and compressing files.

 ZIP (7-Zip, WinZip, ZipRAR)- Good for archiving many files, supports lossless data compression.

 Database

 Consists of collections of data organized so it can be easily accessed and managed.

 XML- A general-purpose markup language, standardized by W3C.

 CSV- Comma-separated values, commonly used for spreadsheets or simple database.

 Geospatial

 File type commonly used for encoding geographical information.

 SHP- Shapefile format for storing geometric location and associated attribute information.
 GeoTIFF- Allows geo-referencing information to be embedded within a TIFF file.
 KML- XML notation for expressing geographic annotation and visualization.

 Tabular Data / Spreadsheets

 File type for storing data elements arranged in tables.

 CSV- Comma-separated values, commonly used for spreadsheets or simple database.

 Still Images

 Files format for storing a single static image (e.g.,     photographs, graphs, scans, autoradiograms).

 JPG- Most used image file format.

 PDF/A- Differs from PDF by prohibiting features unsuitable for long-term archiving.

 JPEG/JPEG2000- “Lossy” format, meaning quality can easily be compromised in editing and saving.
 TIFF- Most used file format by photographers and designers.

 Audio / Sound

 File format for storing recorded digital audio data (e.g.,   music, sound effects, speech).

 MP3- “Lossy” format, moderate-quality audio, but may not be suitable for high-fidelity audio.
 AIFF, WAV, FLAC- Audio recording formats (lossless), best for maintaining audio quaility.

 Moving Images

 Files type used for saving motion pictures, film,   movies, video etc.

 AVI- Most popular and flexible of all public domain raster formats.
 M-JPEG2000- File format used to store video, audio, subtitles, images and is based on the MP4/QuickTime format.

Text

 File type for data viewed and edited on text terminals or in simple text editors.

 Plain text (ASCII, UTF)-  Most portable format, is supported by most machines and applications.

 JSON- Good for structured data (e.g.,. numbers, dates, groups of words).

 XML- Good for semi-structured plain text formats for non-tabular data (e.g., those used for nucleotide/protein sequences, alignments and phylogenies).

 Note: We recommend that a README be a plain text file, however, if text formatting is important, PDF is also acceptable.

 More information on file formats:

Sustainability of Digital Formats (Library of Congress)

Examples of open formats (Wikipedia)

Tips for Backing Up Files

Having duplicate copies of data files ensures their safety in case of issues with your local workstation. Original data files can be lost due to hardware and software failures, virus infections, malicious hacking, power failures, and human errors. Developing a robust data backup strategy ensures that your files can be restored and remain accessible over the long term, even if the originals are damaged or lost.

Data Security Best Practices

  • Physical Security: Control access to buildings or rooms where your data is stored. This can be as straightforward as ensuring that the lab where your workstation is located is locked, with key card access limited to you and authorized lab personnel.

  • Network Security: Implement firewall protection and ensure that operating system patches and updates are regularly applied to your computers to prevent security vulnerabilities.

  • Device & File Security: Protect your devices with strong passwords and take measures to anonymize sensitive data, ensuring that only authorized individuals can access and interpret it.

Backup Strategies

The 3-2-1 Rule

  • 3 Copies: Keep at least three copies of your data.
  • 2 Different Media: Store copies on two different types of media (e.g., external hard drive, cloud storage).
  • 1 Offsite Copy: Store one copy offsite to protect against local disasters.

Regular Backup Schedule

  • Frequency: Daily, weekly, or after significant changes.
  • Automated Backups: Use automated tools to ensure consistency and reduce human error.

How and Where to Store Data Securely? Pros and Cons

Data Storage & Privacy

It is important that you carefully consider storage options for your data as well as how you will control access. It is recommended that you save your data on several different mediums or devices, ensure that those devices are password-protected, keep human accessibility to data highly selective, and anonymize identifiable human subject information.

Here at NCA&T, the Information Security Services works with individuals across the campus to ensure the security of technology and data and manages the campus cybersecurity awareness program. 

The list below are the pros and cons of recommended storage options for your data: 

  • Local Storage: Hard drives, SSDs, USB drives.
    • Pros: Fast access, offline availability.
    • Cons: Vulnerable to physical damage, theft, limited capacity.
  • Network Attached Storage (NAS): A dedicated file storage device connected to a network.
    • Pros: Centralized data, accessible by multiple users.
    • Cons: Requires network access, initial setup cost.
  • Cloud Storage: Services like Google Drive, Dropbox, and institutional cloud solutions.
    • Pros: Scalable, accessible from anywhere, disaster recovery.
    • Cons: Subscription costs, data privacy concerns.

 Note: If you suspect any incident of unauthorized access to and acquisition of your research data contact Information Technology Services or follow university outlined Data Security Breach Procedures.