Datalad-hirni comes with a set of commands aiming to support the following workflow to curate a study dataset and to convert it. This workflow to a degree reflects the envisioned structure of a study dataset as described in the concepts section.
This is a somewhat abstract description of what it is supposed to do. It might be more convenient for you to first see what you have to do by looking at the examples, showing this exact workflow: The creation of a study dataset here and afterwards the conversion.
Build your study dataset¶
In order to build such a dataset that binds all your raw data, the first thing to do is to create a new dataset. To set
it up as a hirni dataset, you can use a builtin routine called
cfg_hirni which is implemented as a Datalad
procedure. Ideally you create your dataset right at the moment you start (planning) your study. Even without any actual
data, there is basic metadata you might already be able to capture by (partially) filling
CHANGELOG etc. Hirni’s webUI might be of help here.
The idea is then to add all data to the dataset as it comes into existence. That is, for each acquisition, import the
DICOMs, import all additional data, possibly edit the specification. It’s like writing documentation for your code: If
you don’t do it at the beginning, chances are you’ll never properly do it at all.
You can always edit and add things later, of course.
Import DICOM files¶
Importing the DICOMS consists of several steps. The
hirni-import-dcm command will help you, given you can provide it
a tarball containing all DICOMs of an acquisition (internal structure of the tarball doesn’t matter). Of course you can
achieve the same result differently.
The first step is to retrieve the tarball, of course, extract its content and create a dataset from it. If you passed an
acquisition directory to the command it will create this dataset in
dicoms/ underneath that directory. Otherwise it
is created at a temporary location.
Then the DICOM metadata is extracted. If the acquisition directory wasn’t given, a name for the acquisition is derived
from that metadata (how exactly this is done is configurable) the respective directory created and the dataset is moved
into it from its temporary location.
Either way there’s a new subdataset beneath the respective acquisition directory by now and it provides extracted DICOM
metadata. Note, that the metadata doesn’t technically describe DICOM files, but rather image series that are found in
those files. The final step is now to use that metadata to derive a specification. This is done by
which automatically is called by
hirni-import-dcm. However, if you need to skip
hirni-ìmport-dcm for whatever
reason (say you already have a DICOM dataset you want to use instead of creating a new one by such a tarball), you can
hirni-dicom2spec. How the rule system is used to derive the specification deserves its own
chapter (at least if you wish to adjust those rules). This should now result in a
studyspec.json within the respective acquisition directory. You can now review the autogenerated entries and correct
or enhance them.
Add arbitrary data¶
Once an acquisition is established within a study dataset, you may add arbitrary additional files to that acquisition.
Protocols, stimulation log files, other data modalities … whatever else belongs to that acquisition. There are no
requirements on how to structure those additional files within the acquisition directory.
A specification for arbitrary data can be added as well, of course. It works the exact same way as for the DICOM data,
with the only exception that there’s no automated trial to derive a specification from the data. There is, however, the
hirni-spec4anything to help with the creation of such a specification. It will fill the specification not
based on the data, but based on what is already specified (for the DICOMs, for example). So,
will assume that specification values, that are unambiguous throughout the existing specification of an acquisition, are
valid for additional data as well. For example, if all existing specifications of an acquisition agree on a subject
identifier, this will be the guess for additional files.
This is how to create such a study dataset including its specification. See also this example.
Convert your dataset¶
The conversion of such datasets is meant to target a new dataset. That is, you create a new, empty dataset which is the
target of the conversion and make the study dataset a subdataset of that new one. Thereby the converted dataset keeps a
reference to the data is was created from. From within the target dataset you can then call
execute the actual conversion as specified in the specification files.
Note, that it is not required to convert the entire dataset at once. Instead, the conversion is called on particular
specification files and can be further limited to convert a particular type of data as listed in the respective
hirni-spec2bids comes with an
--anonymize switch. This will do several things: It will choose what
subject identifier to use in the converted dataset. For that a specification has a subject and a anon_subject field
to chose from. So, usually subject will contain the identifier as it comes from the DICOMs (likely pseudo-anonymized),
while anon_subject allows you to specify an anonymized identifier in addition.
--anonymize will cause the conversion to encrypted generated commit messages in order to disguise possibly
revealing paths. Finally, conversion procedures listed in specifications can declare to be executed only if the
--anonymize switch was used. This mechanism allows to trigger things like a defacing after the conversion of DICOM
An example of such a conversion is to be found here.