Synthetic Data: `quickq seed`¶

quickq seed generates plausible synthetic responses against a loaded questionnaire. It is the fastest way to populate the analytics layer for development, demos, scoring-rule validation, and tutorial walkthroughs without collecting real data.

quickq seed study.db <questionnaire_id> --n 50 --seed 42

The generator reads the questionnaire structure (question types, option sets, numeric ranges, skip rules) directly from the OLTP and produces responses that respect each. Output is one new response_session per call to the per-respondent loop, plus the corresponding response rows.

Arguments¶

DB_PATH (required): the OLTP study.db to write into.
QUESTIONNAIRE_ID (required): the integer ID returned by quickq load (also visible in quickq list surveys).
--n N (default: 50): number of synthetic response sessions to generate.
--seed N: random seed for reproducible output. Same seed plus same questionnaire produces identical responses across runs.
--study-id N: associate the synthetic respondents with an existing study; defaults to the questionnaire's study.

After seeding, run quickq refresh study.db analytics.duckdb to materialize the OLAP layer with scores, distributions, and aggregates ready to query.

What gets generated¶

Layer	Behaviour
Question types	All 12 supported types (`single_choice`, `multiple_choice`, `sata_other`, `boolean`, `numeric`, `date`, `datetime`, `slider`, `likert`, `ranked`, `grid`, `text`, `repeating_group`) produce values shaped to their type and constraints. For repeating groups, seed honors `count_from`: if the group is linked to a numeric count question, the seeded answer to that question drives the number of instances. For free-add groups (no `count_from`) seed picks a small random N (0–5).
Option sets	Choice questions sample uniformly from the configured options. Exclusive options ("None of the above") are honoured: if selected, no other option is chosen alongside.
Numeric ranges	`numeric` and `slider` items sample uniformly from the `[min, max]` range when set; otherwise from a sensible default (`0–100`).
Skip logic	A question is answered only when its `show_when` condition evaluates true against the prior generated answers in the same session. Skipped questions produce no `response` row, exactly mirroring real data collection.
Free text	`text` items get one of a small library of plausible short strings; `sata_other` produces a short phrase only when "Other" was selected.

The point is not realism (the distributions are uniform, not survey-realistic). The point is structural correctness: a query that works against seeded data works against real data.

Reproducibility¶

The same --seed value produces identical synthetic responses across runs and across machines (Python's random module is deterministic; quickq does not introduce any non-deterministic shuffling). This makes seeded data suitable for:

Regression tests that assert on aggregate counts or score distributions
Tutorial walkthroughs where readers expect to see the same numbers as the prose
CI builds that compare report output across changes

If --seed is omitted, output varies run to run.

When to use seed vs. real data¶

Use case	Recommended path
Build the analytical layer to learn quickq	`quickq seed` then `quickq refresh`
Validate scoring rules before collection begins	`quickq seed` to produce edge cases, then inspect `agg_respondent_scores`
Demo the analytics tutorials with realistic instrument structure	`scripts/generate_demo.py` (uses the FHIR import path; richer than seed)
Demonstrate the full FHIR round-trip	`quickq fhir export` followed by `quickq fhir import-response` against external responses

For repeating-group instruments, quickq seed writes child sub-question rows with sequential repeat_index values per session. Per-instance skip logic is not evaluated (each instance gets the same set of children); for fully realistic per-instance behaviour, quickq fhir import-response against synthetic FHIR QuestionnaireResponse JSON gives finer control. The demo script (scripts/generate_demo.py) takes that route for the prenatal-visit-log example.

Where seed fits in the pipeline¶

   YAML
     ↓ quickq load
study.db (instruments)
     ↓ quickq seed --n 50 --seed 42
study.db (synthetic responses)
     ↓ quickq refresh
analytics.duckdb (queryable OLAP)
     ↓ quickq report / quickq analytics

quickq seed writes to the same response and response_session tables that quickq fhir import-response writes to. Downstream commands cannot tell the difference between seeded and real responses; that is the design.

Synthetic Data: quickq seed¶