ibllib.pipes.tasks
Functions
Runs a single Alyx job and registers output datasets |
Classes
Pipeline class: collection of related and potentially interdependent tasks |
|
- class Task(session_path, parents=None, taskid=None, one=None, machine=None, clobber=True, location='server', **kwargs)[source]
Bases:
ABC
- log = ''
- cpu = 1
- gpu = 0
- io_charge = 5
- priority = 30
- ram = 4
- outputs = None
- time_elapsed_secs = None
- time_out_secs = 7200
- version = '2.19.0'
- signature = {'input_files': [], 'output_files': []}
- force = False
- job_size = 'small'
- one = None
- level = 0
- property name
- run(**kwargs)[source]
— do not overload, see _run() below— wraps the _run() method with - error management - logging to variable - writing a lock file if the GPU is used - labels the status property of the object. The status value is labeled as:
0: Complete
-1: Errored -2: Didn’t run as a lock was encountered -3: Incomplete
- register_datasets(one=None, **kwargs)[source]
Register output datasets form the task to Alyx
- Parameters
one –
jobid –
kwargs – directly passed to the register_dataset function
- Returns
- get_signatures(**kwargs)[source]
This is the default but should be overwritten for each task :return:
- setUp(**kwargs)[source]
Setup method to get the data handler and ensure all data is available locally to run task
- Parameters
kwargs –
- Returns
- tearDown()[source]
Function after runs() Does not run if a lock is encountered by the task (status -2)
- assert_expected_outputs(raise_error=True)[source]
After a run, asserts that all signature files are present at least once in the output files Mainly useful for integration tests :return:
- assert_expected_inputs(raise_error=True)[source]
Before running a task, check that all the files necessary to run the task have been downloaded/ are on the local file system already :return:
- get_data_handler(location=None)[source]
Gets the relevant data handler based on location argument :return:
- class Pipeline(session_path=None, one=None, eid=None)[source]
Bases:
ABC
Pipeline class: collection of related and potentially interdependent tasks
- tasks = {}
- one = None
- create_alyx_tasks(rerun__status__in=None, tasks_list=None)[source]
Instantiate the pipeline and create the tasks in Alyx, then create the jobs for the session If the jobs already exist, they are left untouched. The re-run parameter will re-init the job by emptying the log and set the status to Waiting
- Parameters
rerun__status__in – by default no re-run. To re-run tasks if they already exist,
specify a list of statuses string that will be re-run, those are the possible choices: [‘Waiting’, ‘Started’, ‘Errored’, ‘Empty’, ‘Complete’] to always patch, the string ‘__all__’ can also be provided :return: list of alyx tasks dictionaries (existing and or created)
- create_tasks_list_from_pipeline()[source]
From a pipeline with tasks, creates a list of dictionaries containing task description that can be used to upload to create alyx tasks :return:
- run(status__in=['Waiting'], machine=None, clobber=True, **kwargs)[source]
Get all the session related jobs from alyx and run them
- Parameters
status__in – lists of status strings to run in
[‘Waiting’, ‘Started’, ‘Errored’, ‘Empty’, ‘Complete’] :param machine: string identifying the machine the task is run on, optional :param clobber: bool, if True any existing logs are overwritten, default is True :param kwargs: arguments passed downstream to run_alyx_task :return: jalyx: list of REST dictionaries of the job endpoints :return: job_deck: list of REST dictionaries of the jobs endpoints :return: all_datasets: list of REST dictionaries of the dataset endpoints
- property name
- run_alyx_task(tdict=None, session_path=None, one=None, job_deck=None, max_md5_size=None, machine=None, clobber=True, location='server', mode='log')[source]
Runs a single Alyx job and registers output datasets
- Parameters
tdict –
session_path –
one –
job_deck – optional list of job dictionaries belonging to the session. Needed
to check dependency status if the jdict has a parent field. If jdict has a parent and job_deck is not entered, will query the database :param max_md5_size: in bytes, if specified, will not compute the md5 checksum above a given filesize to save time :param machine: string identifying the machine the task is run on, optional :param clobber: bool, if True any existing logs are overwritten, default is True :param location: where you are running the task, ‘server’ - local lab server, ‘remote’ - any compute node/ computer, ‘SDSC’ - flatiron compute node, ‘AWS’ - using data from aws s3 :param mode: str (‘log’ or ‘raise’) behaviour to adopt if an error occured. If ‘raise’, it will Raise the error at the very end of this function (ie. after having labeled the tasks) :return: