How to use dummy data in an ehrQL measures definition🔗
Refer to the Measures reference documentation for more information on how to use measures.
Similarly to dataset definitions, there are also three ways to use dummy data with a measures definition in ehrQL.
Let ehrQL generate dummy measures from your measures definition🔗
You do not need to add anything to the measures definition itself in order to generate a dummy dataset in this way. ehrQL will use the measures definition to set up dummy data and generate matching patients.
By default, 500 patients will be generated in a dummy measures output. If you need to increase this number, you can configure it in the measures definition with:
measures.configure_dummy_data(population_size=1000)
Increasing the population size will increase the time required to generate the measures.
Supply your own dummy measures🔗
You can provide a dummy measures file in the following formats.
Format | File extension |
---|---|
CSV | .csv |
Compressed CSV | .csv.gz |
Arrow | .arrow |
Your file must have the relevant file extension shown in the table above.
For example, take this simple measures definition:
from ehrql import create_measures, years
from ehrql.measures import INTERVAL
from ehrql.tables.beta.core import patients, clinical_events
events_in_interval = clinical_events.where(clinical_events.date.is_during(INTERVAL))
had_event = events_in_interval.exists_for_patient()
intervals = years(2).starting_on("2020-01-01")
measures = create_measures()
measures.define_measure(
"had_event_by_sex",
numerator=had_event,
denominator=patients.exists_for_patient(),
group_by={"sex": patients.sex},
intervals=intervals,
)
And this dummy measures, in a CSV file named dummy_measures.csv
:
measure | interval_start | interval_end | ratio | numerator | denominator | sex |
---|---|---|---|---|---|---|
had_event_by_sex | 2020-01-01 | 2021-12-31 | 0.25 | 2 | 8 | female |
had_event_by_sex | 2020-01-01 | 2021-12-31 | 0.5 | 3 | 6 | male |
had_event_by_sex | 2021-01-01 | 2021-12-31 | 0.1 | 1 | 10 | female |
had_event_by_sex | 2021-01-01 | 2021-12-31 | 0.0 | 0 | 2 | male |
Run the measures definition with the dummy measures output file:
opensafely exec ehrql:v0 generate-measres measures_definition.py --dummy-data-file dummy_measures.csv
Now, instead of generated dummy measures output, you'll see the data from the dummy data file that you provided.
Dummy measures errors🔗
ehrQL will check the column names, types and categorical values in your dummy measures output file. If errors are found, they will be shown in the terminal output.
Supply your own dummy tables🔗
A measures definition uses the same underlying data tables as a dataset definition. As such, you can use the same process to supply dummy data tables for a measures definition.