Skip to Main Content

Data Management

This guide helps researchers learn about data management, why it is important, and what tools and resources are available to help researchers do it well.

Spreadsheet Basics

Data should stand the test of time (10 years for NIH) and be machine-readable.

●      Include a header line (first line or record)

●      Label each column with a short but descriptive name

               ○      Names should be unique

               ○      Use letters, numbers or underscore _

               ○      Do not include blank spaces or symbols such as +-&*

●      Columns of data should be consistent (use the same naming convention for text data)

●      Each line should be complete

●      Columns should include only a single kind of data, such as text or “string data”, integer numbers, floating point or real numbers

Organizing Your Files and Directories

Organizing your files and directories will help with searching & finding, sharing, security, clarity and preservation.

●      Name folders for major functions and activities

●      Structure by date or event (especially subfolders)

●      Names should be self-explanatory

●      Avoid duplication

●      Make it simple and consistent

File Names

●      Use descriptive names

●      Not too long, use camel case

●      Try to include time

●      Date using YYYYMMDD

●      Use version numbers

●      Don’t use spaces, may use - or _

●      Don’t change default extensions

 

Provide Good Metadata

●      Who created the data?

●      Who maintains it?

●      When were the data collected? When were they published?

●      Where was it collected (geographic location)?

●      What is the content of the data? The structure?

●      Why were the data created?

●      How were they produced /analyzed?

 

Dataset documentation should include:

●      Variable names and descriptions

●      Explanation of codes and schemas used

●      Algorithms used to transform data

●      File format and software (including version) used

●      Readme file

●      Data dictionary

●      Codebook

●      Structured documentation in XML formats for use in programs such as: DDI, FGDC, EML