Data should stand the test of time (10 years for NIH) and be machine-readable.
● Include a header line (first line or record)
● Label each column with a short but descriptive name
○ Names should be unique
○ Use letters, numbers or underscore _
○ Do not include blank spaces or symbols such as +-&*
● Columns of data should be consistent (use the same naming convention for text data)
● Each line should be complete
● Columns should include only a single kind of data, such as text or “string data”, integer numbers, floating point or real numbers
Organizing your files and directories will help with searching & finding, sharing, security, clarity and preservation.
● Name folders for major functions and activities
● Structure by date or event (especially subfolders)
● Names should be self-explanatory
● Avoid duplication
● Make it simple and consistent
● Use descriptive names
● Not too long, use camel case
● Try to include time
● Date using YYYYMMDD
● Use version numbers
● Don’t use spaces, may use - or _
● Don’t change default extensions
● Who created the data?
● Who maintains it?
● When were the data collected? When were they published?
● Where was it collected (geographic location)?
● What is the content of the data? The structure?
● Why were the data created?
● How were they produced /analyzed?
● Variable names and descriptions
● Explanation of codes and schemas used
● Algorithms used to transform data
● File format and software (including version) used
● Readme file
● Data dictionary
● Codebook
● Structured documentation in XML formats for use in programs such as: DDI, FGDC, EML