13 Study definition

13.1 Reproducibile dummy data

To ensure the dummy data system generates exactly the same data every time you run it, set a random number generator seed at the top of the study_definition.py file
```
import numpy as np
# Change this number to one for which your scripts 
# successfully run on the dummy data
np.random.seed(123456)
```

13.2 File formats

Use .feather files for outputs from the cohortextractor, so specify an action in your project.yaml as follows

generate_study_population:
  run: cohortextractor:latest generate_cohort --study-definition study_definition --output-format feather
 needs: 
  - design
  outputs:
    highly_sensitive:
      cohort: output/input.feather

Use the arrow package to read .feather files into R
```
arrow::read_feather(file = file.path("output", "input.feather"))
```
- The col_select argument can be used to read in just the columns you need

Start each project with a preprocessing action that formats .feather files and outputs (gzipped) .rds files which can be saved with readr::write_rds()

readr::write_rds(object, 
                 file.path("output", "mydata.rds"), 
                 compress = "gz")