Pipeline details¶
This section walks through each stage of the ffrprep pipeline
— BIDS validation, preprocessing, and analysis — covering the
nodes that make up each Nipype workflow and the file layout each
stage writes to disk.
Pipeline Overview¶
The ffrprep pipeline consists of three main stages:
BIDS Validation - Ensures dataset compliance with BIDS standards
Preprocessing - Filters, re-references, and epochs the EEG data
Analysis - Computes evoked responses, time-frequency representations, and FFR metrics
Each stage is implemented as a modular Nipype workflow. The CLI parallelizes
across (task, run) iterations within a subject via a ProcessPoolExecutor
sized by --n_procs; for cross-subject scaling on clusters, run one CLI
invocation per subject (e.g. via slurm job arrays). See the
Parallelization section under Usage for details.
Stage 1: BIDS Validation¶
The first stage validates that your input dataset follows the Brain Imaging Data Structure (BIDS) specification. This ensures reproducibility and compatibility with other neuroimaging tools.
Purpose: Verify dataset structure, file naming conventions, and required metadata files before processing begins.
Implementation:
The validation uses the bids-validator tool with a custom configuration that ignores warnings not relevant to EEG/FFR data.
Key Functions: - validate_input_dir() - Main validation function - Custom validator configuration for EEG-specific requirements
Validation Steps:
Directory Structure Check
Verifies presence of required BIDS directories (
sub-*/,derivatives/)Checks for
dataset_description.jsonand other required metadata filesValidates subject/session/task naming conventions
EEG-Specific Validation
Confirms presence of EEG data files (
.edf,.bdf,.vhdr,.fif,.set)Validates channel description files (
*_channels.tsv)Checks event files (
*_events.tsv) for proper formattingVerifies EEG-specific metadata in JSON sidecars
Participant Selection
Validates requested participant labels exist in dataset
Checks for required EEG data for specified participants
Reports any missing or incomplete data
Error Handling:
If validation fails, ffrprep provides detailed error messages indicating specific BIDS compliance issues and suggestions for resolution.
Skip Option:
Validation can be bypassed using --skip_bids_validation (not recommended for production analyses).
Stage 2: Preprocessing¶
The preprocessing stage converts raw EEG data into clean, epoched data suitable for FFR analysis. This stage implements standard electrophysiological preprocessing steps optimized for frequency-following responses.
Purpose: Transform raw continuous EEG into clean, filtered, and epoched data while preserving FFR-relevant neural signals.
Implementation: Implemented as a Nipype workflow (create_preprocessing_workflow()) with the following nodes:
Preprocessing Workflow Nodes¶
1. Data Loading Node
Function: load_data()
Purpose: Robustly load EEG data from BIDS datasets using pybids and MNE-BIDS.
- Sub-steps:
Create
BIDSLayoutobject for dataset queryingQuery for EEG files matching participant/session/task/run criteria
Try multiple file extensions (
.edf,.bdf,.vhdr,.fif,.set)Load data using
mne_bids.read_raw_bids()Extract and validate channel information
Load associated event data and metadata
Outputs: Raw EEG data object, BIDS path information, original filename
2. Re-referencing Node
Function: reference_data()
Purpose: Apply appropriate reference scheme to reduce common-mode noise and artifacts.
- Sub-steps:
Parse reference channel specification (average, single channel, or channel list)
Validate reference channels exist in data
Apply re-referencing using MNE’s
set_eeg_reference()Update channel information and provenance
- Reference Options:
Average reference (
--ref_channels average): Uses all EEG channelsSingle channel (
--ref_channels Cz): References to one electrodeMultiple channels (
--ref_channels "Cz,Fz"): Average of specified channels
Outputs: Re-referenced EEG data
3. Filtering Node
Function: filter_data()
Purpose: Apply temporal filtering to remove noise while preserving FFR signals.
- Sub-steps:
Apply high-pass filter to remove slow drifts and DC offsets
Apply low-pass filter to remove high-frequency noise
Use zero-phase FIR filters to avoid temporal distortions
Log filter parameters and transition bands
- Default Parameters:
High-pass: 1.0 Hz (removes slow drifts, preserves FFR frequencies)
Low-pass: 40.0 Hz (removes EMG and high-frequency noise)
Filter design: Zero-phase FIR with automatic transition bandwidth
Outputs: Filtered EEG data
4. Epoching Node
Function: epoch_data()
Purpose: Segment the continuously-filtered EEG into time-locked epochs around stimulus events, applying baseline correction and amplitude-based rejection.
- Sub-steps:
Load events from the BIDS
*_events.tsv(or use annotations embedded in the raw recording when no sidecar is present)Build
mne.Epochswith the requestedtmin/tmax, baseline window, picks, andrejectthresholdsApply
epochs.drop_bad()to materialize amplitude-based rejection; keep the resulting Epochs object as the workflow output
- Default Parameters (FFR-typical, all overridable from the CLI):
Epoch window:
--tmin -0.2to--tmax 0.6seconds around stimulus onsetBaseline:
--baseline -0.2 0seconds (pre-stimulus)Rejection:
--reject-eeg 75e-6(75 µV peak-to-peak); pass--no-auto-rejectto disable
Outputs: Epoched EEG data and the post-rejection drop log.
5. Save Preprocessing Outputs Node
Function: save_preprocessing_outputs()
(invoked through the internal save_preprocessing_node wrapper,
which fans the Epochs out by trial type before calling it).
Purpose: Persist the epoched data plus a self-describing BIDS
sidecar — one file per trial type by default, or a single
combined file under --no-split-by-trial-type.
- Sub-steps:
When
--split-by-trial-typeis on (the default), partition the input Epochs byevent_idand write one_desc-preproc{Cond}_epo.fifper trial type with a matching sidecar carrying aConditionfield. Under--no-split-by-trial-type, write a single bare_desc-preproc_epo.fifinstead.Reconstruct a fresh
EpochsArrayfrom the data + events +event_idso trial-type metadata survives the save / load round-trip (and the upstream Epochs object’s internal state doesn’t leak through.save()).Write each sibling
.jsonsidecar withEpochCount/EpochCountTotal/EpochCountRejected/RejectionThresholds/Filtering/SamplingFrequency/EpochTmin/EpochTmax/Channelsplus run / session /ConcatenatedRuns/Conditionprovenance.Initialize the per-derivatives
dataset_description.jsonif missing.
Reporting runs after the workflow drains, in the CLI rather than as a workflow node — see reporting below.
Output Structure (default split-by-trial-type):
derivatives/ffrprep-preprocessing/
├── sub-XX/
│ └── eeg/
│ ├── sub-XX_task-YY_run-ZZ_desc-preprocPositive_epo.fif
│ ├── sub-XX_task-YY_run-ZZ_desc-preprocPositive_epo.json
│ ├── sub-XX_task-YY_run-ZZ_desc-preprocNegative_epo.fif
│ ├── sub-XX_task-YY_run-ZZ_desc-preprocNegative_epo.json
│ ├── sub-XX_preprocessing_report.html
│ └── sub-XX_preprocessing.log
Under --no-split-by-trial-type the per-trial-type files are
replaced by a single _desc-preproc_epo.fif + its sidecar.
The sidecar JSON carries provenance, EpochCount /
EpochCountTotal / EpochCountRejected, RejectionThresholds,
Filtering (high-pass and low-pass cut-offs), sampling frequency,
run / session identifiers, and Condition (for per-trial-type
files only).
Outputs: File paths, processing metadata.
Stage 3: Analysis¶
The analysis stage averages each preprocessed epochs group into a structured set of evoked responses (per-trial-type + combined + optional difference) and saves them to BIDS-derivatives.
Purpose: Produce per-(task, run) evoked responses suitable for downstream statistical analysis or visualization, persisted in MNE-readable format with self-describing BIDS sidecars.
Implementation:
Implemented as a Nipype workflow (create_analysis_workflow())
with two nodes. The CLI worker collects all per-trial-type
_desc-preproc{Cond}_epo.fif files for one (task, run) group
via _collect_analysis_groups, stitches them back together with
mne.concatenate_epochs (which preserves event_id), and
passes the resulting Epochs directly into the workflow’s
inputnode. The per-(task, run) granularity means each group
runs the workflow once regardless of how many trial types it
holds.
Analysis Workflow Nodes¶
1. Build Analysis Payload Node
Function: build_analysis_payload()
Purpose: From a single Epochs object, produce a structured
{by_type, combined, diff} payload covering every evoked the
analysis stage emits.
- Sub-steps:
Average all events into a single combined Evoked via
make_combined_evoked(always emitted).When
split_by_trial_typeis True (the default), partition byevent_idviamake_evoked(by_event_type=True)into a per-trial-type dict.When at least two trial types are present and either (a) there are exactly two types (auto-paired) or (b) the user passed
--difference-pairs A:B [C:D …], compute one difference Evoked per pair viamake_difference_evokeds(which wrapsmne.combine_evoked([A, B], weights=[1, -1])).
Outputs: dict with keys "by_type" (dict trial_type → Evoked),
"combined" (Evoked), and optionally "diff" (dict (A, B) →
Evoked).
2. Save Analysis Node
Function: save_analysis_outputs()
Purpose: Persist every Evoked in the payload plus a self-describing sidecar per file.
- Sub-steps:
For each entry in
by_type: write_desc-evoked{Cond}.fif+ matching sidecar withCondition: <cond>.For
combined: write the bare_desc-evoked.fif+ sidecar (noConditionfield).For each entry in
diff: write_desc-evokedDiff{A}Vs{B}.fif+ sidecar carryingDifferenceOf: [A, B].Every sidecar also carries
AverageCount/Baseline/SamplingFrequency/Tmin/Tmax/Channels/TaskName/AnalysisTypeplus run / session /ConcatenatedRunsprovenance.Initialize the per-derivatives
dataset_description.jsonif missing.
Output Structure (default split-by-trial-type, 2-trial-type dataset):
derivatives/ffrprep-analysis/
├── sub-XX/
│ ├── sub-XX_task-YY_run-ZZ_desc-evokedPositive.fif
│ ├── sub-XX_task-YY_run-ZZ_desc-evokedPositive.json
│ ├── sub-XX_task-YY_run-ZZ_desc-evokedNegative.fif
│ ├── sub-XX_task-YY_run-ZZ_desc-evokedNegative.json
│ ├── sub-XX_task-YY_run-ZZ_desc-evoked.fif
│ ├── sub-XX_task-YY_run-ZZ_desc-evoked.json
│ ├── sub-XX_task-YY_run-ZZ_desc-evokedDiffPositiveVsNegative.fif
│ ├── sub-XX_task-YY_run-ZZ_desc-evokedDiffPositiveVsNegative.json
│ ├── sub-XX_analysis_report.html
│ └── sub-XX_analysis.log
Under --no-split-by-trial-type, only the combined
_desc-evoked.fif + sidecar are emitted per (task, run).
Outputs: Saved file paths.
Reporting (post-workflow)¶
The single-file HTML reports are built in the CLI after the
workflow drains, not as a workflow node. The CLI’s
_build_preproc_report and _build_analysis_report glob the
saved _desc-preproc*_epo.fif and _desc-evoked*.fif files
via _collect_analysis_groups / _collect_evoked_groups,
group them by (task, run), and build sections via the
ffrprep.reports builders (build_raw_section /
build_epoch_section / build_evoked_section /
build_phase_consistency_section / make_group); the
subject-level HTML is rendered via build_subject_report /
build_analysis_report.
Per (task, run) group layout in the analysis report:
one Evoked section per per-trial-type file (e.g. Positive, Negative). Each carries:
waveform / PSD / TFR / autocorrelation / pitch-track figures (via
ffrprep.reports.evoked_qa());scalar metrics:
RMS SNR (100-200 ms),Mean power 90-110 Hz, 100-200 ms;when the BIDS
stim_filecolumn is populated inevents.tsv, aStim correlation (peak r)+Stim correlation (lag, ms)row plus a stim ↔ response cross-correlation lag plot (computed by_stim_correlation_datainffrprep_cli.py).
one combined Evoked section (the across-events average) with the same plots + scalars, plus a second pair of rows / plot for the envelope correlation (combined ≈ ENV proxy in FFR, so
|hilbert(stim)|is the natural reference).one difference Evoked section per
(A, B)pair (auto for the 2-trial-type case; opt-in via--difference-pairs) with the standard plots + a single raw-waveform stim correlation row + plot (diff ≈ TFS proxy).when exactly two per-condition preproc files exist for the group, one Phase Consistency section combining
ffrprep.analysis.compute_phase_consistency()withffrprep.analysis.plot_phase_consistency()(or, when the caller passesmask=True, the masked variant). Single-subject reports default to unmasked; group-level builders passmask=True. Uses seaborn’sflare_rcolormap; subplot titles surface the trial-type names plus auto-derivedA + B/A − Bfor the sum and difference panels.
Per (task, run) group layout in the preprocessing report:
one Raw section loaded from the original BIDS recording (with run concatenation when the preproc output sourced multiple runs).
one Epoched section per per-trial-type file — each carries the standard epoch metadata plus a
Mean trial-to-trial rrow (viaffrprep.analysis.response_consistency()) when there are at least 10 trials.
Figures and scalar metrics are computed at report time from
the loaded Epochs / Evoked objects — they are not
separately persisted to disk. To recompute them yourself, see the
Working with Outputs in Python section of the
Tutorial walkthrough.
The single-file *_report.html embeds all figures as inline
base64 PNGs — there is no sibling figures/ directory.
- File Formats:
MNE format (.fif): Epoched / Evoked data, loadable in MNE-Python.
WAV (under
<dataset>/stimuli/): stimulus audio referenced byevents.tsv’sstim_filecolumn. Fetched viaffrprep-download example --with-stimuli.JSON: BIDS sidecar (human-readable, machine-parseable).
HTML: Self-contained single-file per-subject report.
Pipeline Integration and Quality Control¶
Workflow Management:
Each stage implemented as a Nipype workflow for per-iteration dependency tracking
Outer-loop parallelism: the CLI dispatches per-(task, run) iterations to a
ProcessPoolExecutorsized by--n_procsFail-fast on any iteration error; the exception propagates to the CLI entry point
nipype caches per-iteration intermediates under
work/so re-runs that already have a saved_desc-preproc_epo.fifskip the workflow re-execution
Quality Control Checkpoints:
BIDS validation before processing
Data quality assessment after loading
Preprocessing quality metrics and reports
Analysis validation and statistical checks
Output Organization:
BIDS-compatible directory structure
Per-file BIDS sidecars with run / session / condition provenance
Standardized file formats for interoperability
Version-controlled processing parameters
Customization:
All preprocessing and analysis parameters are exposed as CLI flags (see Usage).
The Nipype workflows can be imported and reused programmatically from
ffrprep.preproc(create_preprocessing_workflow/create_analysis_workflow).Section builders in
ffrprep.reportsacceptextra_summaryandextra_figureskwargs so downstream code can fold caller-computed scalars or figures into a section’s table or figure gallery.