13 Study definition
13.1 Reproducibile dummy data
To ensure the dummy data system generates exactly the same data every time you run it, set a random number generator seed at the top of the
study_definition.py
fileimport numpy as np # Change this number to one for which your scripts # successfully run on the dummy data 123456) np.random.seed(
13.2 File formats
Use
.feather
files for outputs from the cohortextractor, so specify an action in yourproject.yaml
as followsgenerate_study_population: run: cohortextractor:latest generate_cohort --study-definition study_definition --output-format feather needs: - design outputs: highly_sensitive: cohort: output/input.feather
Use the arrow package to read
.feather
files into R::read_feather(file = file.path("output", "input.feather")) arrow
- The
col_select
argument can be used to read in just the columns you need
- The
Start each project with a preprocessing action that formats
.feather
files and outputs (gzipped).rds
files which can be saved withreadr::write_rds()
::write_rds(object, readrfile.path("output", "mydata.rds"), compress = "gz")