Duke Wiki  logo
Page tree
Skip to end of metadata
Go to start of metadata

Setting Up Your Data set

A good place to start with setting up your data frame is to look up the principles of tidy data (Tidyverse in R). There might be a good reason to store data in a different format, but be aware of the justification when you are making that decision.

There are three rules to make a data set tidy:

1. Each variable must have its own column.

2. Each observation must have its own row.

3. Each value must have its own cell.

Other norms:

  • Maintain consistency!  In all respects... e.g., if you have a set of allowed categorical responses, follow them, including capitalization; other examples...
  • Dates – Excel and dates are not friends. Ideally the column(s) containing dates should be formatted for text, not for date. All dates should be written in the format YYYY-MM-DD, and time should be in a separate column as hh:mm:ss.
  • Metadata – A metadata sheet is a separate tab that contains all of the column headers of your data set, plus definitions and possible values for each column. Anybody who looks at your data should be able to read the metadata and clearly understand what all your columns and values mean. Metadata should clarify the format of your date and time columns, report all units of measurement, explain any abbreviations or indicator variables that you used, and provide empirical definitions for all your values. 
  • If you are using Excel, avoid coloring cells to indicate different states; those states should be included in a separate column with clear meta-data descriptions.  Only you will understand the coloring, and are likely to forget it sooner than you expect.
  • For comparative data, be sure to include pointers for each datum to the source from which it was obtained.
  • Boolean operators take on binary states, such as true or false.  One standard is to set True = 1, False = 0.

Using GitHub

GitHub is an open source , version control system that keeps a record of all versions and edits made to a project. For example, rather than saving drafts of files on your computer with titles like "Project_code1", "Project_code2", etc, you can work on your project in GitHub and it keeps track of all your drafts for you. GitHub is most often used to store and edit code, but works with many file formats. GitHub also contains many features that allow you to collaborate effectively with others on a project.

To get started, first Join GitHub. Make sure to add a profile picture so others in the lab recognize you!

Next, complete the following steps to learn how to navigate GitHub. There are many tutorials online (like this one) to help if you get stuck. 

  1. Download GitHub Desktop. The app allows you to clone projects from online and work on them locally on your computer. 
  2. Create your own repository. Include a README file with a description. 
  3. Clone the repository onto your local computer using GitHub Desktop. 
  4. On your local computers, add code to the repository, commit changes and push.
  5. Make changes to the code locally, and this time commit changes to a new branch. Then, merge this branch into the master. 
  6. Open an issue in your repo and assign a label to the issue. 
  7. Open a project, link issues to your project management columns, and create milestones. 
  8. Create a wiki page and add some text, links, code chunks, and figures. 
  9. Create a repository for your real project and start working in GitHub!

  • No labels