User Guide

Note

This documentation assumes a basic understanding of the command line. Here’s a quick (and free!) crash course if needed.

Background Info

Welcome to the user docs! Hopefully this documentation can help you tackle the gargantuan task of uploading data to the NIMH Data Archive (NDA).

ndagen is a python command line tool that’s meant to streamline the process of uploading neuroimaging data to the NDA. Generally, NIMH is only interested in the NIFTI files derived from raw dicoms. NIMH also requires a csv file containing metadata to be uploaded along with the converted NIFTIs. NIMH has very strict parameters for the accompanying csv file, including required column names. ndagen automates a large portion of creating the csv file with limited manual invervention from the user.

Installing ndagen

If you are a fasse user at Harvard, accessing ndagen is as simple as running:

module load venv/ndagen

All other users can install ndagen using a pip (or whatever package manager you prefer) command:

pip install ndagen

Verify that it installed correctly by running nda_gen.py --help.

The CSV File

The csv file that NDA requires is comprised of scan metadata, ranging from scan dimensions to the dicom conversion software used. The good news is that a big chunk of the csv can be generated from pulling information from the json files and NIFTI file headers. However, there are some fields that need a bit of help from the user, which get passed to ndagen as command-line arguments. Let’s dive into those below!

The Key File

There is certain information for each NDA upload that is unique to each research center and study and must be supplied by the user. ndagen requires that this information be passed to it as a csv file. More specifically, there are 5 columns that should be in the csv file for each subject being uploaded (written EXACTLY as shown here): subjectkey, src_subject_id, interview_date, interview_age, sex

The subjectkey column is the NDA-issued 12 character ID for that specific subject (so they can be tracked across studies).

The src_subject_id column is an arbitrary subject number starting from 1 to N (number of subjects being uploaded).

The interview_date is the date of the scan, though have the accurate year is the only requirement. Day and month can be arbitrarily chosen.

Note

interview_date must be written in MM/DD/YYYY format!

The interview_age column is the age of the subject at the time of the scan in months. (e.g. a 20 year old’s age would appear as 240 here).

The sex column is the sex of the subject written as F or M.

So the csv key file could look something like this:

subjectkey

src_subject_id

interview_date

interview_age

sex

NDA*********

P1

01/01/2023

288

M

NDA*********

P2

01/01/2023

312

M

NDA*********

P3

01/01/2023

336

F

NDA*********

P4

01/01/2023

252

M

NDA*********

P5

01/01/2023

264

M

NDA*********

P6

01/01/2024

228

M

NDA*********

P7

01/01/2024

300

F

NDA*********

P8

01/01/2024

276

M

NDA*********

PN

01/01/????

???

?

So when you’re running nda_gen.py, you’ll pass the full path to the key csv file to the --key-file argument like this:

nda_gen.py --key-file /home/user/imager/nda_key_file.csv

Note

Don’t worry about copying and keeping track of the full nda_gen.py command. There’s a section further down dedicated to that; the example above is just a visual aid.

The Tasks File

One of the required columns for fMRI data is the task number. The task number is generated by NDA when you register a task (e.g. MOTOR) on the NDA website. ndagen can automatically insert the task number into the final csv file by using a yaml config file which contains task name and task number pairs. Here’s what it looks like as of February 2024:

tasks:
  EPROJ: 2337
  NBACK: 2348
  PAIN: 2351
  FALSBEL: 2350
  LANG: 2344
  MOTOR: 2347
  VISME: 2352
  VODDK: 2353
  REST: 2349

ndagen will look at the name of each NIFTI file and if one of the tasks above is in the name, it will insert the associated task number in the experiment_id column in the final csv file. For example, if a given NIFTI file name has MOTOR in it (e.g. NDA123456-sess01-run01-MOTOR1.nii.gz, ndagen will insert 2347 into the experiment_id column for that row.

The file above will be used by default so there’s no need to pass an argument if your task(s) are included. However, if you need to add a new task-number pair you can copy or download this file and add the pair to it. Be sure to follow the same formatting as shown above! You can pass your new tasks file to ndagen as a command line argument: --task-list /full/path/to/file/tasks.yaml

The Echo Time File

Many studies include the acquisition of multi-echo T1 scans and NDA requires all of the TEs to be reported in the csv upload file. Unfortunately, popular dicom to NIFTI conversion software (looking at you, dcm2niix) does not include all the TEs of multi-echo scans in the json sidecar files. As such, users with multi-echo scans will need to make use of ndagen’s --echo-times argument. Like the tasks file above, the --echo-times argument is the full path to a yaml file that could look something like this:

echo_times:
  T1w_MPR_vNav_4e_RMS: .00181,.0036,.00539,.00718
  T1_MEMPRAGE_1.2mm_p4_RMS: .00157,.00339,.00521,.00703

The echo_times.yaml file consists of key-value pairs where the key is the SeriesDescription field from the json file and the value is the echo times listed in succession, separated by commas (unit is seconds). It’s important that the SeriesDescription (e.g. T1w_MPR_vNav_4e_RMS) portion mirrors exactly what is in the SeriesDescription of the json file. Otherwise, ndagen will not detect it.

Using the --echo-times argument could look something like this:

nda_gen.py --echo-times /full/path/to/echo/times/file.yaml

Source Files Argument

NDA requires that all the NIFTIs being uploaded are found in the same directory. This required argument for ndagen is simply the full path to the directory where the NIFTIs are located:

nda_gen.py --source-files /full/path/to/all/niftis

Reface Info Argument

Refacing T1 data has become standard practice at many research centers. At the time of writing, refacing T1 images does not include adding any metadata about the refacing to the associated json file. To report the refacing software used to NDA, ndagen has the --reface-info argument. Here’s an example use case:

nda_gen.py --reface-info "Refaced using NITRC mri_deface_0.3; https://www.nitrc.org/projects/mri_reface"

Note

Notice the double quotes being placed around the input. Bash doesn’t like whitespace, so you have to tell it to ignore it!

NDA Config Argument

This is an argument that you likely will not have to use. There is a yaml file used by default to generate all the column names in the upload csv file. You will only need to use this argument if NDA changes or adds required variables/colnames to the upload csv. Here’s a link to the yaml file for reference.

Running ndagen

Phew! Now that we’ve talked about the ndagen’s background info and different arguments, let’s take a look at actually running it.

As is mentioned above, ndagen only has two required arguments: --source-files and --key-file. --nda-config and --task-list are fairly stable and there’s a decent change you will never have to mess with them. However, you will likely make use of --reface-info and --echo-times at some point. Below are a couple of examples; one is a general template for most nda_gen.py use cases while the other is a more concrete example.

Command Template

nda_gen.py --source-files /PATH/TO/ALL/NIFTIS --key-file /PATH/TO/KEY/FILE.csv --reface-info "REFACE SOFTWARE" --echo-times /PATH/TO/YAML/FILE.yaml

Command Example

nda_gen.py --source_files /users/home/nrg/studies/aging/all_niftis --key-file /users/home/nrg/studies/aging/subject_key_file.csv --reface-info "Refaced using NITRC mri_deface_0.3" --echo-times /users/home/nrg/studies/aging/echo_times.yaml

As ndagen runs you will see the name of each nifti file being added to the upload csv file printed out to the terminal window. It can take a few seconds or a few minutes depending on the number of files you’re uploading. Once it’s done, the output csv file will be placed in the --source-files argument directory and be named nda_upload_file-YYYY-MM-DD.csv.

And that’s it! Please feel free to contact Daniel with any questions: danielasay@fas.harvard.edu