nipoppy.NipoppyDataRetriever¶
- class nipoppy.NipoppyDataRetriever(path)¶
API for getting data from a Nipoppy study.
- Parameters:
path (nipoppy.env.StrOrPathLike)
- get_derivatives(derivatives)¶
Get derivative data from the Nipoppy study.
This functions loads and combines derivative TSV files from specified pipelines and versions, based on the provided filepath patterns. It filters the rows to include only participants and sessions that are present in the study’s manifest.
The derivatives TSV files are expected to have columns
"participant_id"and"session_id"for participant and session identifiers.- Parameters:
derivatives (List[Tuple[str, str, str]]) – List of (
pipeline_name,pipeline_version,filepath_pattern) tuples, for specifying derivative data to retrieve.filepath_patternmay include wildcards as perpathlib.Path.glob().- Returns:
A dataframe containing the requested derivative data, with a
pd.MultiIndexof participant IDs and session IDs.- Return type:
pd.DataFrame
Examples
>>> from nipoppy import NipoppyDataRetriever >>> api = NipoppyDataRetriever("/path/to/dataset") >>> df = api.get_derivatives( ... [ ... ("freesurfer", "6.0.1", "idp/fs_stats-0.2.0/fs6.0.1-aseg-volume.tsv"), ... ("freesurfer", "6.0.1", "idp/fs_stats-0.2.0/fs6.0.1-aparc.DKTatlas-thickness.tsv"), ... ], ... ) Left-Lateral-Ventricle Left-Inf-Lat-Vent ... rh_MeanThickness_thickness rh_temporalpole_thickness participant_id session_id ... 001 1 17025.9 593.8 ... 2.40661 0.000468
- get_phenotypes(phenotypes)¶
Get harmonized phenotypic data from the Nipoppy study.
This function loads the study’s harmonized phenotypic TSV file (
<NIPOPPY_ROOT>/tabular/harmonized.tsv). It then subsets the data to include only the requested phenotypic columns, and filters the rows to include only participants and sessions that are present in the study’s manifest.The harmonized phenotypic TSV file is expected to have columns
"nb:ParticipantID"and"nb:SessionID"for participant and session identifiers.- Parameters:
phenotypes (List[str]) – List of Neurobagel TermURLs corresponding to phenotypic (demographics, assessments, etc.) columns to retrieve.
- Returns:
A dataframe containing the requested phenotypic data, with a
pd.MultiIndexof participant IDs and session IDs.- Return type:
pd.DataFrame
Examples
>>> from nipoppy import NipoppyDataRetriever >>> api = NipoppyDataRetriever("/path/to/dataset") >>> df = api.get_phenotypes( ... [ ... "nb:Age", ... "nb:Sex", ... "nb:Diagnosis", ... "snomed:859351000000102", # MoCA ... ], ... ) nb:Age nb:Sex nb:Diagnosis snomed:859351000000102 participant_id session_id 001 1 70.0 snomed:248153007 ncit:C94342 nb:available
- get_tabular_data(*, phenotypes=None, derivatives=None)¶
Get harmonized tabular data from the Nipoppy study.
This is a high-level wrapper function that combines phenotypic and derivative data retrieval.
Harmonized phenotypic data is loaded from the TSV file at
<NIPOPPY_ROOT>/tabular/harmonized.tsvand subsetted to include only the requested phenotypic columns. This file is expected to have columns"nb:ParticipantID"and"nb:SessionID"for participant and session identifiers.Derivative data is loaded from the specified pipelines and versions, based on the provided filepath patterns. These TSV files are expected to have columns
"participant_id"and"session_id"for participant and session identifiers."nb:ParticipantID"and"participant_id"columns are assumed to correspond to each other, as are"nb:SessionID"and"session_id".The output dataframe will only contain participants and sessions that are present in the study’s manifest.
- Parameters:
phenotypes (Optional[List[str]]) – List of Neurobagel TermURLs, for specifying phenotypic (demographics, assessments, etc.) data to retrieve.
derivatives (Optional[List[Tuple[str, str, str]]]) – List of (
pipeline_name,pipeline_version,filepath_pattern) tuples, for specifying derivative data to retrieve.filepath_patternmay include wildcards as perpathlib.Path.glob().
- Returns:
A dataframe containing the requested phenotypic and derivative data, with a
pd.MultiIndexof participant IDs and session IDs.- Return type:
pd.DataFrame
Examples
>>> from nipoppy import NipoppyDataRetriever >>> api = NipoppyDataRetriever("/path/to/dataset") >>> df = api.get_tabular_data( ... phenotypes=[ ... "nb:Age", ... "nb:Sex", ... "nb:Diagnosis", ... "snomed:859351000000102", # MoCA ... ], ... derivatives=[ ... ("freesurfer", "6.0.1", "idp/fs_stats-0.2.0/fs6.0.1-aseg-volume.tsv"), ... ("freesurfer", "6.0.1", "idp/fs_stats-0.2.0/fs6.0.1-aparc.DKTatlas-thickness.tsv"), ... ], ... ) nb:Age nb:Sex nb:Diagnosis ... rh_insula_thickness rh_MeanThickness_thickness rh_temporalpole_thickness participant_id session_id ... 001 1 70.0 snomed:248153007 ncit:C94342 ... 2.897977 2.406614 0.000468