Register workflow runs with the CLI
The LabID CLI let you automate the registration of Workflow Runs, together with the data, for e.g batch-registration or to automate registration after executing a workflow.
There are 2 commands available : labid register wfrun and labid register wfrun_crate for workflow runs coming as a workflowruncrate.
Both commands have similar options, the latter having the advantage to automatically extract more data and metadata from the rocrate metadata.
The following tutorials cover concrete examples of workflow run registration with the LabID CLI :
- Registering an image-conversion workflow
- Registration of a Nextflow nf-core workflow
- Register a workflow run from a Workflow Run RO-Crate (same Nextflow workflow)
The labid register wfrun and labid register wfrun_crate commands¶
These commands create a new workflow run, associate an existing Workflow Version to it (specified as argument, or discovered from the run directory) and associate the data to the run.
The commands rely on a so-called sniffer class, that is configured to identify files or directories to register as part of the workflow run. The CLI comes with a set of generic sniffers for workflow runs. The command will automatically create datasets in LabID for these files and directories and associate them to the run. Workflow runs might use input data that is already registered in LabID in which case the command will not create new datasets but directly associate them to the run.
To view the parameters supported by each command, just pass the flag --help to the command (ex: labid register wfrun --help).
Below is a short explanation of some key parameters.
register wfrun_crate available since v26.05.1
This command is newer than the previously available register wfrun. This latter command is also able to register Workflow Run RO-Crate but does not provide the same flexibility of configuration.
Parameters common to both commands¶
Both the register wfrun and register wfrun_crate command share some parameters, detailed below.
--indir¶
This is a mandatory argument that should be set to the workflow run directory.
The run directory should be located in the LabID User-dropbox of the user currently configured for the CLI.
You can find out what is the dropbox for the current user, by calling the labid config show command with the extra --verbose flag.
labid config show --verbose
--sniffer-name¶
This parameter is optional, but if provided should be set to one of the values returned by the command labid get sniffers.
Each sniffer supports different options to parse a workflow run directory. Custom sniffers can be developped to parse a specific type of workflow runs.
With the register wfrun command, when omitting this parameter, if a samplesheet is provided the SampleSheetWorkflowRunSniffer is used, otherwise the GenericWorkflowRunSniffer will be used.
--sniffer-param¶
This parameters allows passing sniffer-specific parameters.
As the description for --sniffer-param explains, we can learn about the parameters supported by a given sniffer by calling the labid get sniffers command with the name of the sniffer, for instance with the GenericWorkflowRunSniffer.
labid get sniffers -n GenericWorkflowRunSniffer
The documentation for this sniffer states that we can configure it to look for files and folder to register from the workflow run directory, by providing so called "search-strategies", which are detailed in the next section.
--no-input-datasets-in-labid¶
By default, the register wfrun command expects the input datasets of the workflow run to be already registered in LabID. The command then checks for existing datasets in LabID that have the same path or name than the files it discovers.
If the input datasets are not already registered, then the command can create new ones and register them before associating them to the workflow run. For this to work, the flag --no-input-datasets-in-labid must be explicitely set.
--dry¶
When this boolean flag is passed, the command does not actually submit the data to the server, it only parses the run directory and shows a list of the datasets that would be registered.
This is convenient to test the command to make sure it behaves as expected.
Parameters specific to register wfrun¶
These parameters are mostly to specify how to identify files and folders to register from a workflow run directory.
--samplesheet¶
This option lets the sniffer parse a samplesheet (csv, tsv) to discover input files of the workflow.
While the demo workflow uses a samplesheet (see workflow doc), when running the test profile, the default samplesheet is not "exposed" and contains references to files stored on the cloud, which we dont support in LabID for now, so we are not going to specify this argument.
If you would run this workflow with your own data, then you should provide this argument with the path to your samplesheet (either an absolute path, or a path relative to the workflow run directory).
The associated parameters --samplesheet-header-path and the optional --samplesheet-header-sample-id should be use together with this option, to specify the column(s) containing the filepath and the sample identifier respectively (see the parameters documentation shown by labid register wfrun --help).
--search-strategy-table¶
(new in v26.05.1)
This argument accepts the path to a tabular file, that should contain a list of "search strategies" for some (or all) of the workflow run data type (INPUT, OUTPUT, CONFIG, REPORT, LOG).
The search strategy let you specify a pattern (glob/wildcard or regex) to use to identify files or folders of interest.
You can get an idea of how the table should look like, with some example of file pattern, by calling the command :
| CATEGORY | PATTERN | PATTERN_TYPE | TARGET | RECURSIVE | EXCLUDE_HIDDEN | DIRECTORY | COLLECTION | REGEX_ON_FILENAME_ONLY |
|---|---|---|---|---|---|---|---|---|
| #INPUT/OUTPUT/LOG/CONFIG | a glob or regex pattern to match files or folders | type of pattern (GLOB or REGEX) | target to search for (files or directories) | whether the search should be recursive (TRUE/FALSE) | whether hidden files/folders should be excluded from the search (TRUE/FALSE, default TRUE if not specified) | optional subdirectory to start the search from (if empty the search will start from the workflow run directory) | optional name of the collection to which the found files/folders should be added. If not specified the found files/folders will be added to the default collection for the category. | for regex pattern type, whether the regex should be applied to the full path (FALSE) or only to the filename (TRUE). default FALSE if not specified) |
| INPUT | *.txt | GLOB | FILES | FALSE | TRUE | input_subdir | FALSE | |
| OUTPUT | output_* | GLOB | FOLDERS | FALSE | TRUE | output_subdir | Output directories | |
| LOG | *.log | GLOB | FILES | TRUE | FALSE | |||
| REPORT | *.html | GLOB | FILES | TRUE | TRUE | |||
| CONFIG | .+.json | REGEX | FILES | FALSE | TRUE | TRUE |
Search strategy can be passed via the --sniffer-param too
Search strategies can also be passed via the sniffer-param argument, but the notation is a bit heavy (example --sniffer-param "strategy-report=(, files, true, *_report*.html)") so we don't actually recommend it.
The value (after the equal) should be a tuple (a,b,c,d) as following :
-
The first value (a) should be the name of a folder within the run directory where the search should start. It can be left empty to use the top-level run directory as starting point.
-
The second value (b) should be either
filesorfoldersand set wether the search should match files or folders. -
The third value (c) is a boolean (
true/false) to set whether the search for files or folder should be recursive, starting from the top-level folder passed to (a). If false, then the search only considers files and folders directly within the folder passed to (a). -
The last value (d) is a wildcard (also called glob) pattern, to define which files or folders to match with this search strategy.
Finally, as indicated in the doc for the --sniffer-param of the labid register wfrun command, we need to pass the search strategy to the command line the following way :
--sniffer-param "strategy-log=(, files, true, *.log)"
and
--sniffer-param "strategy-report=(, files, true, *_report*.html)"
Parameters specific to register wfrun_crate¶
The register wfrun_crate command automatically identifies files and folders, as well as their "role" in the workflow run (input, output, log...) from the RO-Crate metadata.
You can pass additional arguments to only include some of the files that were identified, or ensure the right type is assigned.
--regex-xxx¶
This command supports declination of this argument (--regex-input, --regex-output, ...) for each workflow run datatype.
You can pass to this flags custom regular expressions, to include or exclude some files/folders that would normally be included.
An example would be to only include inputs with a specific extension.
Another use case of these flags is to identify logs and reports datasets, which are often listed as output of the workflow run (there is no notion of log or report dataset in a workflow run RO-Crate, only of input/output).
By default, all outputs with extension .html are considered to have the role REPORT, while outputs with extension .log are assigned the "role" LOG.
You can pass a custom regex expression to --regex-report and --regex-log to change this default behaviour.