Skip to content

Analytics & Refresh

The analytical layer is a DuckDB star schema populated on demand from the SQLite OLTP file. All reports, cohort queries, and cross-study analyses run here. The OLTP is never queried directly for analysis.


Refresh

quickq refresh study.db analytics.duckdb

Refresh is incremental. It reads refresh_log to find the high-water mark from the last run, loads only new response and response_session rows, recomputes scores and aggregates, and updates the watermark on success. A failed run leaves the watermark unchanged so the next run retries the same window cleanly.

DuckDB attaches the SQLite file directly — no intermediate export, no staging database:

ATTACH 'study.db' AS oltp (TYPE sqlite, READ_ONLY);

What Refresh Produces

Output Description
fact_response One row per answer atom, pre-joined with concept and dimension keys
dim_question, dim_respondent, dim_session, etc. Dimension tables for filtering and grouping
agg_question_distribution Response frequency and percentage per question per option
agg_numeric_stats Mean, median, SD, percentiles for numeric questions
agg_session_completion Daily enrollment, completion rate, and median session duration by admin mode
agg_respondent_scores Computed scale scores (PHQ-9 total, GAD-7 severity, etc.) per respondent per session
omop_survey_conduct OMOP CDM SurveyConducts projection (when person_map is populated)
omop_observation OMOP CDM Observations projection for concept-mapped responses
omop_unmapped_questions Questions excluded from OMOP export due to missing concept_id

Querying

The fact table is the starting point for most analytical queries. Dimensions provide the labels.

Prevalence of a symptom:

SELECT
    option_value,
    COUNT(*) AS n,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM fact_response fr
JOIN dim_question dq USING (question_id)
WHERE dq.link_id = 'phq-1'
GROUP BY option_value
ORDER BY option_value;

PHQ-9 scores by admin mode:

SELECT
    ds.admin_mode,
    ROUND(AVG(score_raw), 2) AS mean_phq9,
    COUNT(*) AS n
FROM agg_respondent_scores ars
JOIN dim_session ds USING (session_id)
WHERE ars.scoring_rule_name = 'PHQ-9 Total'
GROUP BY ds.admin_mode;

Repeating group: per-visit data across pregnancies:

SELECT
    fr.respondent_id,
    fr.repeat_index     AS visit_number,
    fr.response_numeric AS gestational_week
FROM fact_response fr
JOIN dim_question dq USING (question_id)
WHERE dq.link_id = 'visits.week'
ORDER BY respondent_id, visit_number;


Cross-Study Harmonization

dim_question.equivalence_group_id is a computed cluster ID — the connected-component ID of the question_equivalence graph declared in the OLTP. Questions in the same group measure the same construct across different instruments or studies.

-- PHQ-9 item 1 prevalence pooled across instruments with different link_ids
SELECT
    dq.source_instrument,
    AVG(fr.response_numeric) AS mean_score
FROM fact_response fr
JOIN dim_question dq USING (question_id)
WHERE dq.equivalence_group_id = (
    SELECT equivalence_group_id FROM dim_question WHERE link_id = 'phq-1'
)
GROUP BY dq.source_instrument;

For concept-based queries, question_concept_id and option_concept_id are pre-joined onto fact_response — no extra join required:

-- All responses to any LOINC 44250-9 question (PHQ-9 item 1, any instrument)
SELECT respondent_id, response_numeric
FROM fact_response
WHERE question_concept_id = (
    SELECT concept_id FROM dim_concept
    WHERE vocabulary_id = 'LOINC' AND concept_code = '44250-9'
);

OMOP Extraction

Studies participating in federated networks (PCORnet, TriNetX, i2b2) can export OMOP CDM tables via quickq refresh once person_map is populated in the OLTP.

omop_unmapped_questions is the first data quality check before any federated query: it lists every question excluded from OMOP output due to a missing concept_id, with its response count. High response counts on unmapped questions indicate data that will be silently excluded from network queries.

SELECT link_id, question_text, response_count
FROM omop_unmapped_questions
ORDER BY response_count DESC;

Reports

quickq report analytics.duckdb study.db <questionnaire_id>

Generates a Markdown summary from the OLAP: enrollment counts, completion rates, question distributions, and scored scale summaries. The report reads exclusively from aggregate tables and requires a prior quickq refresh. Pass --output report.md to write to a file instead of stdout.