Non-Proprietary vs. Proprietary Formats
When saving your files it is recommended that you select non-proprietary or open source software formats that are royalty-free and without intellectual property restrictions or those that conform to standards that are in the public domain. Open, non-proprietary formats are more likely to remain usable even if the software that created them is not available or no longer functional.
Unencrypted Formats
It is also recommended that you use unencrypted formats because unlike their counterpart (encrypted software), they don't require the use of passwords or passphases. This ensures that if data are lost or forgotten, you will be able to retrieve the data from the file later.
Compressed Files
Often compressing files can result in permanent or partial data loss, however, using "lossless" formats can prevent this from happening. Lossless compression is best for situations where it is important to maintain the integrity of the original dataset and where changes to original data limit data quality.
Examples of Open File Formats
Type |
Description |
Container / Archive File type used for compressing and storing a collection of files and folders to a single file. |
GZIP/TAR- Two of the most common utilities for archiving and compressing files. ZIP (7-Zip, WinZip, ZipRAR)- Good for archiving many files, supports lossless data compression. |
Database Consists of collections of data organized so it can be easily accessed and managed. |
XML- A general-purpose markup language, standardized by W3C. CSV- Comma-separated values, commonly used for spreadsheets or simple database. |
Geospatial File type commonly used for encoding geographical information. |
SHP- Shapefile format for storing geometric location and associated attribute information. |
Tabular Data / Spreadsheets File type for storing data elements arranged in tables. |
CSV- Comma-separated values, commonly used for spreadsheets or simple database. |
Still Images Files format for storing a single static image (e.g., photographs, graphs, scans, autoradiograms). |
JPG- Most used image file format. PDF/A- Differs from PDF by prohibiting features unsuitable for long-term archiving. JPEG/JPEG2000- “Lossy” format, meaning quality can easily be compromised in editing and saving. |
Audio / Sound File format for storing recorded digital audio data (e.g., music, sound effects, speech). |
MP3- “Lossy” format, moderate-quality audio, but may not be suitable for high-fidelity audio. |
Moving Images Files type used for saving motion pictures, film, movies, video etc. |
AVI- Most popular and flexible of all public domain raster formats. |
Text File type for data viewed and edited on text terminals or in simple text editors. |
Plain text (ASCII, UTF)- Most portable format, is supported by most machines and applications. JSON- Good for structured data (e.g.,. numbers, dates, groups of words). XML- Good for semi-structured plain text formats for non-tabular data (e.g., those used for nucleotide/protein sequences, alignments and phylogenies). Note: We recommend that a README be a plain text file, however, if text formatting is important, PDF is also acceptable. |
More information on file formats:
Having duplicate copies of data files keeps them safe in case anything goes wrong with your local workstation. Loss of original data files can occur due to hardware and software failures, virus infection, malicious hacking, power failure and human errors. Developing strategies for backing up your data files ensures that data files can be restored and remain accessible for the long term should originals get damaged or go missing.
Backup storage tips:
For more information on backup storage options offered for researchers at NCA&T contact Information Technology Services.
Data Storage & Security
Data storage refers to where and how you keep your data, this includes selecting appropriate media for physical storage of data. On the other hand, data security refers to keeping your data safe, protecting it from malicious activity and preventing the breach of sensitive data.
It is important that you carefully consider storage options for your data as well as how you will control access. It is recommended that you save your data on several different mediums or devices, ensure that those devices are password-protected, keep human accessibility to data highly selective, and anonymize identifiable human subject information.
Here at NCA&T, the Information Security Services works with individuals across the campus to ensure the security of technology and data and manages the campus cybersecurity awareness program.
The list below are the pros and cons of recommended storage options for your data:
Physical Hardware
External Storage
Institution provided Network Storage
Cloud Storage
Note: If you suspect any incident of unauthorized access to and acquisition of your research data contact NCA&T's IT Security and Audit Department or follow university outlined Data Security Breach Procedures.
The videos below were created by IBM Security, it explains the importance of data security and privacy.