Pipeline details

This section walks through each stage of the ffrprep pipeline — BIDS validation, preprocessing, and analysis — covering the nodes that make up each Nipype workflow and the file layout each stage writes to disk.

Pipeline Overview

The ffrprep pipeline consists of three main stages:

  1. BIDS Validation - Ensures dataset compliance with BIDS standards

  2. Preprocessing - Filters, re-references, and epochs the EEG data

  3. Analysis - Computes evoked responses, time-frequency representations, and FFR metrics

Each stage is implemented as a modular Nipype workflow. The CLI parallelizes across (task, run) iterations within a subject via a ProcessPoolExecutor sized by --n_procs; for cross-subject scaling on clusters, run one CLI invocation per subject (e.g. via slurm job arrays). See the Parallelization section under Usage for details.

Stage 1: BIDS Validation

The first stage validates that your input dataset follows the Brain Imaging Data Structure (BIDS) specification. This ensures reproducibility and compatibility with other neuroimaging tools.

Purpose: Verify dataset structure, file naming conventions, and required metadata files before processing begins.

Implementation: The validation uses the bids-validator tool with a custom configuration that ignores warnings not relevant to EEG/FFR data.

Key Functions: - validate_input_dir() - Main validation function - Custom validator configuration for EEG-specific requirements

Validation Steps:

  1. Directory Structure Check

    • Verifies presence of required BIDS directories (sub-*/, derivatives/)

    • Checks for dataset_description.json and other required metadata files

    • Validates subject/session/task naming conventions

  2. EEG-Specific Validation

    • Confirms presence of EEG data files (.edf, .bdf, .vhdr, .fif, .set)

    • Validates channel description files (*_channels.tsv)

    • Checks event files (*_events.tsv) for proper formatting

    • Verifies EEG-specific metadata in JSON sidecars

  3. Participant Selection

    • Validates requested participant labels exist in dataset

    • Checks for required EEG data for specified participants

    • Reports any missing or incomplete data

Error Handling: If validation fails, ffrprep provides detailed error messages indicating specific BIDS compliance issues and suggestions for resolution.

Skip Option: Validation can be bypassed using --skip_bids_validation (not recommended for production analyses).

Stage 2: Preprocessing

The preprocessing stage converts raw EEG data into clean, epoched data suitable for FFR analysis. This stage implements standard electrophysiological preprocessing steps optimized for frequency-following responses.

Purpose: Transform raw continuous EEG into clean, filtered, and epoched data while preserving FFR-relevant neural signals.

Implementation: Implemented as a Nipype workflow (create_preprocessing_workflow()) with the following nodes:

Preprocessing Workflow Nodes

1. Data Loading Node

Function: load_data()

Purpose: Robustly load EEG data from BIDS datasets using pybids and MNE-BIDS.

Sub-steps:
  • Create BIDSLayout object for dataset querying

  • Query for EEG files matching participant/session/task/run criteria

  • Try multiple file extensions (.edf, .bdf, .vhdr, .fif, .set)

  • Load data using mne_bids.read_raw_bids()

  • Extract and validate channel information

  • Load associated event data and metadata

Outputs: Raw EEG data object, BIDS path information, original filename

2. Re-referencing Node

Function: reference_data()

Purpose: Apply appropriate reference scheme to reduce common-mode noise and artifacts.

Sub-steps:
  • Parse reference channel specification (average, single channel, or channel list)

  • Validate reference channels exist in data

  • Apply re-referencing using MNE’s set_eeg_reference()

  • Update channel information and provenance

Reference Options:
  • Average reference (--ref_channels average): Uses all EEG channels

  • Single channel (--ref_channels Cz): References to one electrode

  • Multiple channels (--ref_channels "Cz,Fz"): Average of specified channels

Outputs: Re-referenced EEG data

3. Filtering Node

Function: filter_data()

Purpose: Apply temporal filtering to remove noise while preserving FFR signals.

Sub-steps:
  • Apply high-pass filter to remove slow drifts and DC offsets

  • Apply low-pass filter to remove high-frequency noise

  • Use zero-phase FIR filters to avoid temporal distortions

  • Log filter parameters and transition bands

Default Parameters:
  • High-pass: 1.0 Hz (removes slow drifts, preserves FFR frequencies)

  • Low-pass: 40.0 Hz (removes EMG and high-frequency noise)

  • Filter design: Zero-phase FIR with automatic transition bandwidth

Outputs: Filtered EEG data

4. Epoching Node

Function: epoch_data()

Purpose: Segment the continuously-filtered EEG into time-locked epochs around stimulus events, applying baseline correction and amplitude-based rejection.

Sub-steps:
  • Load events from the BIDS *_events.tsv (or use annotations embedded in the raw recording when no sidecar is present)

  • Build mne.Epochs with the requested tmin / tmax, baseline window, picks, and reject thresholds

  • Apply epochs.drop_bad() to materialize amplitude-based rejection; keep the resulting Epochs object as the workflow output

Default Parameters (FFR-typical, all overridable from the CLI):
  • Epoch window: --tmin -0.2 to --tmax 0.6 seconds around stimulus onset

  • Baseline: --baseline -0.2 0 seconds (pre-stimulus)

  • Rejection: --reject-eeg 75e-6 (75 µV peak-to-peak); pass --no-auto-reject to disable

Outputs: Epoched EEG data and the post-rejection drop log.

5. Save Preprocessing Outputs Node

Function: save_preprocessing_outputs() (invoked through the internal save_preprocessing_node wrapper, which fans the Epochs out by trial type before calling it).

Purpose: Persist the epoched data plus a self-describing BIDS sidecar — one file per trial type by default, or a single combined file under --no-split-by-trial-type.

Sub-steps:
  • When --split-by-trial-type is on (the default), partition the input Epochs by event_id and write one _desc-preproc{Cond}_epo.fif per trial type with a matching sidecar carrying a Condition field. Under --no-split-by-trial-type, write a single bare _desc-preproc_epo.fif instead.

  • Reconstruct a fresh EpochsArray from the data + events + event_id so trial-type metadata survives the save / load round-trip (and the upstream Epochs object’s internal state doesn’t leak through .save()).

  • Write each sibling .json sidecar with EpochCount / EpochCountTotal / EpochCountRejected / RejectionThresholds / Filtering / SamplingFrequency / EpochTmin / EpochTmax / Channels plus run / session / ConcatenatedRuns / Condition provenance.

  • Initialize the per-derivatives dataset_description.json if missing.

Reporting runs after the workflow drains, in the CLI rather than as a workflow node — see reporting below.

Output Structure (default split-by-trial-type):

derivatives/ffrprep-preprocessing/
├── sub-XX/
│   └── eeg/
│       ├── sub-XX_task-YY_run-ZZ_desc-preprocPositive_epo.fif
│       ├── sub-XX_task-YY_run-ZZ_desc-preprocPositive_epo.json
│       ├── sub-XX_task-YY_run-ZZ_desc-preprocNegative_epo.fif
│       ├── sub-XX_task-YY_run-ZZ_desc-preprocNegative_epo.json
│       ├── sub-XX_preprocessing_report.html
│       └── sub-XX_preprocessing.log

Under --no-split-by-trial-type the per-trial-type files are replaced by a single _desc-preproc_epo.fif + its sidecar.

The sidecar JSON carries provenance, EpochCount / EpochCountTotal / EpochCountRejected, RejectionThresholds, Filtering (high-pass and low-pass cut-offs), sampling frequency, run / session identifiers, and Condition (for per-trial-type files only).

Outputs: File paths, processing metadata.

Stage 3: Analysis

The analysis stage averages each preprocessed epochs group into a structured set of evoked responses (per-trial-type + combined + optional difference) and saves them to BIDS-derivatives.

Purpose: Produce per-(task, run) evoked responses suitable for downstream statistical analysis or visualization, persisted in MNE-readable format with self-describing BIDS sidecars.

Implementation: Implemented as a Nipype workflow (create_analysis_workflow()) with two nodes. The CLI worker collects all per-trial-type _desc-preproc{Cond}_epo.fif files for one (task, run) group via _collect_analysis_groups, stitches them back together with mne.concatenate_epochs (which preserves event_id), and passes the resulting Epochs directly into the workflow’s inputnode. The per-(task, run) granularity means each group runs the workflow once regardless of how many trial types it holds.

Analysis Workflow Nodes

1. Build Analysis Payload Node

Function: build_analysis_payload()

Purpose: From a single Epochs object, produce a structured {by_type, combined, diff} payload covering every evoked the analysis stage emits.

Sub-steps:
  • Average all events into a single combined Evoked via make_combined_evoked (always emitted).

  • When split_by_trial_type is True (the default), partition by event_id via make_evoked(by_event_type=True) into a per-trial-type dict.

  • When at least two trial types are present and either (a) there are exactly two types (auto-paired) or (b) the user passed --difference-pairs A:B [C:D …], compute one difference Evoked per pair via make_difference_evokeds (which wraps mne.combine_evoked([A, B], weights=[1, -1])).

Outputs: dict with keys "by_type" (dict trial_type → Evoked), "combined" (Evoked), and optionally "diff" (dict (A, B) → Evoked).

2. Save Analysis Node

Function: save_analysis_outputs()

Purpose: Persist every Evoked in the payload plus a self-describing sidecar per file.

Sub-steps:
  • For each entry in by_type: write _desc-evoked{Cond}.fif + matching sidecar with Condition: <cond>.

  • For combined: write the bare _desc-evoked.fif + sidecar (no Condition field).

  • For each entry in diff: write _desc-evokedDiff{A}Vs{B}.fif + sidecar carrying DifferenceOf: [A, B].

  • Every sidecar also carries AverageCount / Baseline / SamplingFrequency / Tmin / Tmax / Channels / TaskName / AnalysisType plus run / session / ConcatenatedRuns provenance.

  • Initialize the per-derivatives dataset_description.json if missing.

Output Structure (default split-by-trial-type, 2-trial-type dataset):

derivatives/ffrprep-analysis/
├── sub-XX/
│   ├── sub-XX_task-YY_run-ZZ_desc-evokedPositive.fif
│   ├── sub-XX_task-YY_run-ZZ_desc-evokedPositive.json
│   ├── sub-XX_task-YY_run-ZZ_desc-evokedNegative.fif
│   ├── sub-XX_task-YY_run-ZZ_desc-evokedNegative.json
│   ├── sub-XX_task-YY_run-ZZ_desc-evoked.fif
│   ├── sub-XX_task-YY_run-ZZ_desc-evoked.json
│   ├── sub-XX_task-YY_run-ZZ_desc-evokedDiffPositiveVsNegative.fif
│   ├── sub-XX_task-YY_run-ZZ_desc-evokedDiffPositiveVsNegative.json
│   ├── sub-XX_analysis_report.html
│   └── sub-XX_analysis.log

Under --no-split-by-trial-type, only the combined _desc-evoked.fif + sidecar are emitted per (task, run).

Outputs: Saved file paths.

Reporting (post-workflow)

The single-file HTML reports are built in the CLI after the workflow drains, not as a workflow node. The CLI’s _build_preproc_report and _build_analysis_report glob the saved _desc-preproc*_epo.fif and _desc-evoked*.fif files via _collect_analysis_groups / _collect_evoked_groups, group them by (task, run), and build sections via the ffrprep.reports builders (build_raw_section / build_epoch_section / build_evoked_section / build_phase_consistency_section / make_group); the subject-level HTML is rendered via build_subject_report / build_analysis_report.

Per (task, run) group layout in the analysis report:

  • one Evoked section per per-trial-type file (e.g. Positive, Negative). Each carries:

    • waveform / PSD / TFR / autocorrelation / pitch-track figures (via ffrprep.reports.evoked_qa());

    • scalar metrics: RMS SNR (100-200 ms), Mean power 90-110 Hz, 100-200 ms;

    • when the BIDS stim_file column is populated in events.tsv, a Stim correlation (peak r) + Stim correlation (lag, ms) row plus a stim ↔ response cross-correlation lag plot (computed by _stim_correlation_data in ffrprep_cli.py).

  • one combined Evoked section (the across-events average) with the same plots + scalars, plus a second pair of rows / plot for the envelope correlation (combined ≈ ENV proxy in FFR, so |hilbert(stim)| is the natural reference).

  • one difference Evoked section per (A, B) pair (auto for the 2-trial-type case; opt-in via --difference-pairs) with the standard plots + a single raw-waveform stim correlation row + plot (diff ≈ TFS proxy).

  • when exactly two per-condition preproc files exist for the group, one Phase Consistency section combining ffrprep.analysis.compute_phase_consistency() with ffrprep.analysis.plot_phase_consistency() (or, when the caller passes mask=True, the masked variant). Single-subject reports default to unmasked; group-level builders pass mask=True. Uses seaborn’s flare_r colormap; subplot titles surface the trial-type names plus auto-derived A + B / A B for the sum and difference panels.

Per (task, run) group layout in the preprocessing report:

  • one Raw section loaded from the original BIDS recording (with run concatenation when the preproc output sourced multiple runs).

  • one Epoched section per per-trial-type file — each carries the standard epoch metadata plus a Mean trial-to-trial r row (via ffrprep.analysis.response_consistency()) when there are at least 10 trials.

Figures and scalar metrics are computed at report time from the loaded Epochs / Evoked objects — they are not separately persisted to disk. To recompute them yourself, see the Working with Outputs in Python section of the Tutorial walkthrough.

The single-file *_report.html embeds all figures as inline base64 PNGs — there is no sibling figures/ directory.

File Formats:
  • MNE format (.fif): Epoched / Evoked data, loadable in MNE-Python.

  • WAV (under <dataset>/stimuli/ ): stimulus audio referenced by events.tsv’s stim_file column. Fetched via ffrprep-download example --with-stimuli.

  • JSON: BIDS sidecar (human-readable, machine-parseable).

  • HTML: Self-contained single-file per-subject report.

Pipeline Integration and Quality Control

Workflow Management:

  • Each stage implemented as a Nipype workflow for per-iteration dependency tracking

  • Outer-loop parallelism: the CLI dispatches per-(task, run) iterations to a ProcessPoolExecutor sized by --n_procs

  • Fail-fast on any iteration error; the exception propagates to the CLI entry point

  • nipype caches per-iteration intermediates under work/ so re-runs that already have a saved _desc-preproc_epo.fif skip the workflow re-execution

Quality Control Checkpoints:

  • BIDS validation before processing

  • Data quality assessment after loading

  • Preprocessing quality metrics and reports

  • Analysis validation and statistical checks

Output Organization:

  • BIDS-compatible directory structure

  • Per-file BIDS sidecars with run / session / condition provenance

  • Standardized file formats for interoperability

  • Version-controlled processing parameters

Customization:

  • All preprocessing and analysis parameters are exposed as CLI flags (see Usage).

  • The Nipype workflows can be imported and reused programmatically from ffrprep.preproc (create_preprocessing_workflow / create_analysis_workflow).

  • Section builders in ffrprep.reports accept extra_summary and extra_figures kwargs so downstream code can fold caller-computed scalars or figures into a section’s table or figure gallery.