Federated Analytics¶
quickq supports a federated analysis pattern for multi-institution studies where individual-level data cannot leave each institution's boundary. Each site runs the same query against its local OLAP database; only aggregate results (counts, means, distributions) leave the site, and rows whose cell counts are below a configurable threshold are suppressed entirely.
This sidesteps the data use agreement and IRB amendment process that direct data sharing requires, which is the primary adoption barrier for multi-institution studies.
For the design rationale (why federated, scaling considerations) see Design Decisions.
quickq federated query¶
Runs an aggregate SELECT query against a local OLAP database, suppresses small cells per disclosure-control rules, and writes a JSON document suitable for sharing with a coordinating center.
quickq federated query analysis.sql analytics.duckdb --output result.json
Arguments:
QUERY_PATH(required): a.sqlfile containing a singleSELECTstatement.OLAP_PATH(required): the localanalytics.duckdbto query against.--min-cell N(default: 5): minimum cell size; rows with counts below this threshold are suppressed entirely.--output PATH/-o: write JSON to file instead of stdout.
Query format¶
The .sql file contains a single SELECT statement. Aggregations should produce summary rows (counts, means, distributions); individual-identifier columns are blocked by the executor.
Example query: PHQ-9 severity distribution by site.
SELECT
s.name AS site,
ars.score_category,
COUNT(*) AS n,
ROUND(AVG(ars.score_raw), 1) AS mean_score
FROM agg_respondent_scores ars
JOIN dim_session sess USING (session_id)
JOIN dim_study s USING (study_id)
WHERE ars.scoring_rule_name = 'PHQ-9 Total Score'
GROUP BY s.name, ars.score_category;
Because every quickq deployment shares the same fact_response / dim_question / dim_respondent star schema, a query written once runs identically at every site. See Query Patterns by Question Type for the canonical patterns to build from.
Output JSON¶
{
"query_hash": "sha256:abc123...",
"min_cell": 5,
"rows_total": 42,
"rows_suppressed": 3,
"columns": ["site", "score_category", "n", "mean_score"],
"rows": [
["Site A", "Minimal", 87, 1.4],
["Site A", "Mild", 34, 6.7],
["Site B", "Minimal", 102, 1.6]
],
"disclosure_control": {
"min_cell": 5,
"rows_suppressed": 3,
"note": "Rows with cell counts below min_cell were excluded entirely."
}
}
The disclosure_control block makes the result self-describing: a coordinating center receiving this JSON knows exactly how it was produced and can document the suppression in its own analysis.
Disclosure control¶
The default --min-cell 5 matches the NCHS standard used in BRFSS and NHANES. Some IRBs require 10 or higher; verify the threshold for your specific study and override with --min-cell N.
Suppression removes entire rows rather than blanking individual cells. This prevents information leakage via subtraction (known total minus visible cells equals the suppressed cell value), at the cost of slightly less granular output.
Audit trail¶
Every federated query run is recorded in the local study's tool_audit_log table (best-effort: skipped if no companion OLTP file exists alongside the OLAP). The audit row records the query hash, --min-cell threshold, total rows, suppressed rows, and output destination. This gives each site a complete record of which queries it has run on its data.
Workflow: coordinating center and sites¶
A typical multi-site federated analysis:
- Coordinating center authors the query as a
.sqlfile against the standard OLAP schema. Tests it on a single-site database (or the demo) first. - Distributes the query file to each site via email, secure file share, or git.
- Each site runs the query locally:
quickq federated query analysis.sql analytics.duckdb --output result.json. Reviewers check the suppressed-row count before sending. - Each site sends its
result.jsonback to the coordinating center. - Coordinating center assembles the per-site JSONs into a combined view. No individual-level data has moved.
This pattern is distinct from quickq merge, which combines site databases into a single combined OLTP for studies where individual-level pooling is permitted under a DUA. Use merge when records can move; use query when they cannot.
Companion commands: quickq fork and quickq merge¶
These are top-level study-management commands (not under the federated namespace) that complete the multi-site lifecycle. fork distributes a study's structure to collection sites; merge reassembles their populated databases.
quickq fork¶
Scaffolds a new study database from an existing one's structure (questions, options, scoring rules, skip rules, lineage records). Responses, sessions, respondents, audit history, and compliance records are not copied. The new database carries a provenance entry in its tool_audit_log linking back to the source.
quickq fork prod.db --questionnaire-id 1 --output site_a.db --site-id site_a
Useful for distributing a canonical instrument to sites at the start of a multi-site study, scaffolding dev or staging copies of prod, or handing a study to another investigator without exposing respondent data.
quickq merge¶
Combines multiple site databases into a single combined OLTP study database. Deduplicates by natural key (instrument definitions by canonical URL, sessions by FHIR QuestionnaireResponse.id, respondents by (study_id, external_id)). Question-text divergence by link_id is detected at merge time; deeper structural drift between questionnaire versions is not — pre-merge, hash the FHIR export from each site and confirm they match.
quickq merge site_a.db site_b.db site_c.db --output combined.db
See also¶
- Compliance & Governance: FAIR metadata, GDPR-style erasure, IRB-style withdrawal.
- Design Decisions: Federated analytics: the rationale.