Skip to content

Tutorial: Analyzing Study Data

This tutorial covers the analytical layer of a quickq study: running scoring queries, exploring response distributions, and joining across instruments. The queries below assume the demo database (PHQ-9 + prenatal, 400 sessions). If you have not generated it yet, see Analytics phase tutorials for the one-line setup.

Once the demo is generated, open the UI:

quickq analytics demo/analytics.duckdb

The analytical database

After quickq refresh, the DuckDB OLAP contains:

  • fact_response — one row per answer atom, with typed value columns (response_numeric, response_text, option_id, response_date)
  • dim_* tables — respondent, question, session, study, questionnaire, concept, response option, date
  • agg_* tables — pre-computed distributions, scores, and completion stats materialized on refresh
  • omop_* tables — OMOP-aligned projection for federated network queries (see OMOP Interoperability)

The six demo views (v_phq9_scores, v_phq9_severity_distribution, v_phq9_by_admin_mode, v_prenatal_visits, v_prenatal_summary, v_phq9_prenatal_overlap) are pre-loaded from quickq/sql/demo_views.sql and available immediately.

Two design points worth noting before diving into the queries:

  • v_phq9_scores joins agg_respondent_scores to respondent and session context. No scoring logic lives in the view — score_raw and score_category were already computed by quickq refresh from the YAML scoring rule.
  • v_phq9_prenatal_overlap is one JOIN ... USING (respondent_id) — possible because both instruments share the same respondent_id. The multi-instrument linking happened at import time, not in the view.

PHQ-9 severity distribution

The scoring rule — defined once in the YAML, computed automatically on refresh — turns raw item scores into a severity category for every respondent. No scoring logic needed in the query.

SELECT severity, n, pct, mean_score
FROM v_phq9_severity_distribution;
severity n pct mean_score
Minimal depression 104 41.6% 1.4
Mild depression 50 20.0% 7.0
Moderate depression 49 19.6% 11.8
Moderately severe depression 26 10.4% 16.8
Severe depression 21 8.4% 22.3

Scoring is automatic

The PHQ-9 total and severity category come from agg_respondent_scores, populated by quickq refresh using the scoring rule defined at authoring time. There is no scoring logic to maintain in your analysis code.

To query scores directly without the view:

SELECT
    dr.external_id,
    ars.score_raw,
    ars.score_category
FROM agg_respondent_scores ars
JOIN dim_respondent dr USING (respondent_id)
WHERE ars.scoring_rule_name = 'PHQ-9 Total Score'
ORDER BY ars.score_raw DESC;

Score by delivery mode

admin_mode is a first-class column on every session — not buried in metadata. Mode-effect analysis is one GROUP BY away.

SELECT admin_mode, n, mean_score, min_score, max_score
FROM v_phq9_by_admin_mode;
admin_mode n mean_score min_score max_score
paper 47 7.3 0 25
phone 70 7.9 0 25
web 133 8.6 0 25

In a real study you would investigate whether the score difference across modes reflects a genuine mode effect or selection bias (e.g., phone interviews reaching sicker participants).


Repeating group data: prenatal visits

Each row in fact_response for a repeating group child carries a repeat_index — the 0-based visit number within that respondent's session. This is what makes loop data queryable with standard SQL instead of JSON parsing or custom ETL.

SELECT respondent, visit_number, gestational_week, provider, concern_noted
FROM v_prenatal_visits
WHERE respondent = 'respondent-001';
respondent visit_number gestational_week provider concern_noted
respondent-001 0 8.0 midwife false
respondent-001 1 20.0 midwife false
respondent-001 2 24.0 ob false
respondent-001 3 28.0 ob false
respondent-001 4 36.0 midwife true

Provider mix across all visits:

SELECT provider, COUNT(*) AS n,
       ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM v_prenatal_visits
GROUP BY provider
ORDER BY n DESC;

Repeating groups are just rows

The repeat_index column makes each visit a distinct, queryable row. Pivoting, filtering by visit number, or computing per-visit statistics requires no special handling — it is ordinary SQL.


Cross-instrument join: PHQ-9 score vs visit attendance

The 150 respondents who completed both instruments share the same respondent_id. Joining across instruments is one SQL join.

SELECT
    severity,
    ROUND(AVG(total_visits), 1)  AS avg_visits,
    COUNT(*)                     AS n
FROM v_phq9_prenatal_overlap
GROUP BY severity
ORDER BY AVG(phq9_total);

This query asks: do participants with higher depression scores attend fewer prenatal visits? In a real cohort you would adjust for gestational age at enrollment and other covariates — but the data is already joined and queryable with no additional ETL.

-- Participants with concerns at any visit, stratified by PHQ-9 severity
SELECT
    severity,
    COUNT(CASE WHEN visits_with_concern > 0 THEN 1 END)  AS had_concern,
    COUNT(*)                                              AS total,
    ROUND(100.0 * COUNT(CASE WHEN visits_with_concern > 0 THEN 1 END) / COUNT(*), 1) AS pct
FROM v_phq9_prenatal_overlap
GROUP BY severity
ORDER BY AVG(phq9_total);

Cross-instrument joins are free

Because all respondents share a single respondent_id regardless of which instruments they completed, joining across instruments is a standard SQL join. No custom mapping tables, no manual ID reconciliation.


Exploring by question type

Each question type stores responses in a specific column in fact_response. Use dim_question.question_type to build a catalog of what is in the study:

SELECT question_type, link_id, question_text, COUNT(fr.response_id) AS response_count
FROM dim_question dq
LEFT JOIN fact_response fr USING (question_id)
GROUP BY question_type, link_id, question_text
ORDER BY question_type, response_count DESC;

Note that repeating_group shows response_count = 0 — it is a container whose children carry the responses, not an answer-bearing item itself.

For query patterns by type (single_choice, boolean, numeric, likert, grid, and others), see the Query Patterns reference.


What the data model does for you

Challenge Typical approach quickq
Scale scoring Custom scoring script per instrument Scoring rule in YAML; auto-computed on refresh
Repeating/loop data JSON parsing or bespoke ETL repeat_index — each instance is a standard row
Mode-effect analysis Join to a separate metadata table admin_mode on every session
Cross-instrument joins Manual ID reconciliation Shared respondent_id across all instruments
FHIR export Custom mapping per instrument quickq fhir export — lossless, one command
Federated analysis Manual OMOP mapping omop_observation table on refresh (when concepts mapped)