Non-Proprietary vs. Proprietary Formats
When saving your files, it is recommended to choose non-proprietary or open-source software formats that are royalty-free and free of intellectual property restrictions, or those that conform to standards in the public domain. Open, non-proprietary formats are more likely to remain accessible even if the software that created them becomes unavailable or non-functional.
Unencrypted Formats
It is also advisable to use unencrypted formats. Unlike encrypted formats, which require passwords or passphrases, unencrypted formats allow you to retrieve data without needing to remember or store additional access credentials. This ensures that your data remains accessible even if passwords are lost or forgotten.
Compressed Files
Compressing files can sometimes lead to permanent or partial data loss. To avoid this, use "lossless" compression formats. Lossless compression is essential when maintaining the integrity of the original dataset is crucial, as it preserves data quality and prevents any changes to the original data.
Examples of Open File Formats
Type |
Description |
Container / Archive File type used for compressing and storing a collection of files and folders to a single file. |
GZIP/TAR- Two of the most common utilities for archiving and compressing files. ZIP (7-Zip, WinZip, ZipRAR)- Good for archiving many files, supports lossless data compression. |
Database Consists of collections of data organized so it can be easily accessed and managed. |
XML- A general-purpose markup language, standardized by W3C. CSV- Comma-separated values, commonly used for spreadsheets or simple database. |
Geospatial File type commonly used for encoding geographical information. |
SHP- Shapefile format for storing geometric location and associated attribute information. |
Tabular Data / Spreadsheets File type for storing data elements arranged in tables. |
CSV- Comma-separated values, commonly used for spreadsheets or simple database. |
Still Images Files format for storing a single static image (e.g., photographs, graphs, scans, autoradiograms). |
JPG- Most used image file format. PDF/A- Differs from PDF by prohibiting features unsuitable for long-term archiving. JPEG/JPEG2000- “Lossy” format, meaning quality can easily be compromised in editing and saving. |
Audio / Sound File format for storing recorded digital audio data (e.g., music, sound effects, speech). |
MP3- “Lossy” format, moderate-quality audio, but may not be suitable for high-fidelity audio. |
Moving Images Files type used for saving motion pictures, film, movies, video etc. |
AVI- Most popular and flexible of all public domain raster formats. |
Text File type for data viewed and edited on text terminals or in simple text editors. |
Plain text (ASCII, UTF)- Most portable format, is supported by most machines and applications. JSON- Good for structured data (e.g.,. numbers, dates, groups of words). XML- Good for semi-structured plain text formats for non-tabular data (e.g., those used for nucleotide/protein sequences, alignments and phylogenies). Note: We recommend that a README be a plain text file, however, if text formatting is important, PDF is also acceptable. |
More information on file formats:
Having duplicate copies of data files ensures their safety in case of issues with your local workstation. Original data files can be lost due to hardware and software failures, virus infections, malicious hacking, power failures, and human errors. Developing a robust data backup strategy ensures that your files can be restored and remain accessible over the long term, even if the originals are damaged or lost.
Data Security Best Practices
Physical Security: Control access to buildings or rooms where your data is stored. This can be as straightforward as ensuring that the lab where your workstation is located is locked, with key card access limited to you and authorized lab personnel.
Network Security: Implement firewall protection and ensure that operating system patches and updates are regularly applied to your computers to prevent security vulnerabilities.
Device & File Security: Protect your devices with strong passwords and take measures to anonymize sensitive data, ensuring that only authorized individuals can access and interpret it.
Data Storage & Privacy
It is important that you carefully consider storage options for your data as well as how you will control access. It is recommended that you save your data on several different mediums or devices, ensure that those devices are password-protected, keep human accessibility to data highly selective, and anonymize identifiable human subject information.
Here at NCA&T, the Information Security Services works with individuals across the campus to ensure the security of technology and data and manages the campus cybersecurity awareness program.
The list below are the pros and cons of recommended storage options for your data:
Note: If you suspect any incident of unauthorized access to and acquisition of your research data contact Information Technology Services or follow university outlined Data Security Breach Procedures.