User Guide
==========
.. _NDA: https://nda.nih.gov/
.. _ndagen: https://github.com/harvard-nrg/ndagen

.. note::
    This documentation assumes a basic understanding of the command line. Here's a quick (and free!) `crash course <https://www.codecademy.com/learn/learn-the-command-line>`_ if needed.

Background Info
---------------
Welcome to the user docs! Hopefully this documentation can help you tackle the gargantuan task of uploading data to the NIMH Data Archive (`NDA`_).

`ndagen`_ is a python command line tool that's meant to streamline the process of uploading neuroimaging data to the `NDA`_. Generally, NIMH is only interested in the NIFTI files derived from raw dicoms. NIMH also requires a csv file containing metadata to be uploaded along with the converted NIFTIs. NIMH has very strict parameters for the accompanying csv file, including required column names. `ndagen`_ automates a large portion of creating the csv file with limited manual invervention from the user.


Installing ndagen
-----------------
If you are a fasse user at Harvard, accessing ndagen is as simple as running:

.. code-block:: shell

    module load venv/ndagen

All other users can install ndagen using a `pip` (or whatever package manager you prefer) command:

.. code-block:: shell
  
    pip install ndagen


Verify that it installed correctly by running ``nda_gen.py --help``.

The CSV File
------------
The csv file that NDA requires is comprised of scan metadata, ranging from scan dimensions to the dicom conversion software used. The good news is that a big chunk of the csv can be generated from pulling information from the json files and NIFTI file headers. However, there are some fields that need a bit of help from the user, which get passed to ndagen as command-line arguments. Let's dive into those below!

The Key File
^^^^^^^^^^^^
There is certain information for each NDA upload that is unique to each research center and study and must be supplied by the user. ndagen requires that this information be passed to it as a csv file. More specifically, there are 5 columns that should be in the csv file for each subject being uploaded (written EXACTLY as shown here): **subjectkey, src_subject_id, interview_date, interview_age, sex**

The ``subjectkey`` column is the NDA-issued 12 character ID for that specific subject (so they can be tracked across studies).

The ``src_subject_id`` column is an arbitrary subject number starting from 1 to N (number of subjects being uploaded).

The ``interview_date`` is the date of the scan, though have the accurate year is the only requirement. Day and month can be arbitrarily chosen. 

.. note::
    ``interview_date`` must be written in MM/DD/YYYY format!

The ``interview_age`` column is the age of the subject at the time of the scan in months. (e.g. a 20 year old's age would appear as 240 here).

The ``sex`` column is the sex of the subject written as F or M.

So the csv key file could look something like this:

.. csv-table::

    subjectkey,src_subject_id,interview_date,interview_age,sex
    NDA*********,P1,01/01/2023,288,M
    NDA*********,P2,01/01/2023,312,M
    NDA*********,P3,01/01/2023,336,F
    NDA*********,P4,01/01/2023,252,M
    NDA*********,P5,01/01/2023,264,M
    NDA*********,P6,01/01/2024,228,M
    NDA*********,P7,01/01/2024,300,F
    NDA*********,P8,01/01/2024,276,M
    NDA*********,PN,01/01/????,???,?

So when you're running ``nda_gen.py``, you'll pass the **full** path to the key csv file to the ``--key-file`` argument like this:

.. code-block:: shell
    
    nda_gen.py --key-file /home/user/imager/nda_key_file.csv

.. note::
    Don't worry about copying and keeping track of the full nda_gen.py command. There's a section further down dedicated to that; the example above is just a visual aid.

The Tasks File
^^^^^^^^^^^^^^
One of the required columns for fMRI data is the task number. The task number is generated by NDA when you register a task (e.g. MOTOR) on the NDA website. ndagen can automatically insert the task number into the final csv file by using a yaml config file which contains task name and task number pairs. Here's what it looks like as of February 2024:

.. code-block:: yaml

    tasks:
      EPROJ: 2337
      NBACK: 2348
      PAIN: 2351
      FALSBEL: 2350
      LANG: 2344
      MOTOR: 2347
      VISME: 2352
      VODDK: 2353
      REST: 2349

ndagen will look at the name of each NIFTI file and if one of the tasks above is in the name, it will insert the associated task number in the `experiment_id` column in the final csv file. For example, if a given NIFTI file name has `MOTOR` in it (e.g. NDA123456-sess01-run01-MOTOR1.nii.gz, ndagen will insert 2347 into the `experiment_id` column for that row.

The file above will be used by default so there's no need to pass an argument if your task(s) are included. However, if you need to add a new task-number pair you can copy or `download <https://github.com/harvard-nrg/ndagen/blob/main/ndagen/config/tasks.yaml>`_ this file and add the pair to it. Be sure to follow the same formatting as shown above! You can pass your new tasks file to ndagen as a command line argument: ``--task-list /full/path/to/file/tasks.yaml`` 

The Echo Time File
^^^^^^^^^^^^^^^^^^
Many studies include the acquisition of multi-echo T1 scans and NDA requires all of the TEs to be reported in the csv upload file. Unfortunately, popular dicom to NIFTI conversion software (looking at you, dcm2niix) does not include all the TEs of multi-echo scans in the json sidecar files. As such, users with multi-echo scans will need to make use of ndagen's ``--echo-times`` argument. Like the tasks file above, the ``--echo-times`` argument is the full path to a yaml file that could look something like this:

.. code-block:: yaml

    echo_times:                                                                                                                                                                                   
      T1w_MPR_vNav_4e_RMS: .00181,.0036,.00539,.00718
      T1_MEMPRAGE_1.2mm_p4_RMS: .00157,.00339,.00521,.00703

The `echo_times.yaml` file consists of key-value pairs where the key is the `SeriesDescription` field from the json file and the value is the echo times listed in succession, separated by commas (unit is seconds). It's important that the `SeriesDescription` (e.g. T1w_MPR_vNav_4e_RMS) portion mirrors **exactly** what is in the `SeriesDescription` of the json file. Otherwise, ndagen will not detect it.

Using the ``--echo-times`` argument could look something like this:

.. code-block:: shell

    nda_gen.py --echo-times /full/path/to/echo/times/file.yaml

Source Files Argument
^^^^^^^^^^^^^^^^^^^^^
NDA requires that all the NIFTIs being uploaded are found in the same directory. This required argument for ndagen is simply the full path to the directory where the NIFTIs are located:

.. code-block:: shell

    nda_gen.py --source-files /full/path/to/all/niftis

Reface Info Argument
^^^^^^^^^^^^^^^^^^^^
Refacing T1 data has become standard practice at many research centers. At the time of writing, refacing T1 images does not include adding any metadata about the refacing to the associated json file. To report the refacing software used to NDA, ndagen has the ``--reface-info`` argument. Here's an example use case:

.. code-block:: shell

    nda_gen.py --reface-info "Refaced using NITRC mri_deface_0.3; https://www.nitrc.org/projects/mri_reface"

.. note::

    Notice the double quotes being placed around the input. Bash doesn't like whitespace, so you have to tell it to ignore it!

NDA Config Argument
^^^^^^^^^^^^^^^^^^^
This is an argument that you likely will not have to use. There is a yaml file used by default to generate all the column names in the upload csv file. You will only need to use this argument if NDA changes or adds required variables/colnames to the upload csv. Here's a `link <https://github.com/harvard-nrg/ndagen/blob/main/ndagen/config/variables.yaml>`_ to the yaml file for reference.

Running ndagen
--------------
Phew! Now that we've talked about the ndagen's background info and different arguments, let's take a look at actually running it. 

As is mentioned above, ndagen only has two required arguments: ``--source-files`` and ``--key-file``. ``--nda-config`` and ``--task-list`` are fairly stable and there's a decent change you will never have to mess with them. However, you will likely make use of ``--reface-info`` and ``--echo-times`` at some point. Below are a couple of examples; one is a general template for most nda_gen.py use cases while the other is a more concrete example.

Command Template
^^^^^^^^^^^^^^^^

.. code-block:: shell

    nda_gen.py --source-files /PATH/TO/ALL/NIFTIS --key-file /PATH/TO/KEY/FILE.csv --reface-info "REFACE SOFTWARE" --echo-times /PATH/TO/YAML/FILE.yaml

Command Example
^^^^^^^^^^^^^^^

.. code-block:: shell

    nda_gen.py --source_files /users/home/nrg/studies/aging/all_niftis --key-file /users/home/nrg/studies/aging/subject_key_file.csv --reface-info "Refaced using NITRC mri_deface_0.3" --echo-times /users/home/nrg/studies/aging/echo_times.yaml

As ndagen runs you will see the name of each nifti file being added to the upload csv file printed out to the terminal window. It can take a few seconds or a few minutes depending on the number of files you're uploading. Once it's done, the output csv file will be placed in the ``--source-files`` argument directory and be named `nda_upload_file-YYYY-MM-DD.csv`. 

And that's it! Please feel free to contact Daniel with any questions: danielasay@fas.harvard.edu