This content not available without scripts

Feature Description - Resource Management

Home >Feature Description - Resource Management

Component Versions

  • Portal: 4.0.0
  • Scheduler Service: 4.0.8
  • Object Store Service: 4.0.4

Resource Management Overview

When an individual Task requires an input file, a new Task setting may require the YellowDog Platform to verify that the file is present before running the Task. If the file is not present, the Scheduler Service may be set to either fail the Task or wait until the file is present before continuing.

This is the first step towards the implementation of a broader framework that can not only verify but reserve different types of resource across multiple modalities.

Impact

Platform Updates

Task Changes

The input methods on Tasks (inputFromNamespace and inputFromTaskNamespace) have changed to support resource verification. These changes are:

  • The optional flag required is no longer supported
  • A new optional parameter verification has been added

If verification is required, the verification parameter should be set to one of the following:

  • VERIFY_AT_START If the specified input file is not present when the Scheduler is ready to run the Task, the Task will automatically fail.
  • VERIFY_WAIT If the specified file is not present when the Scheduler is ready to run the Task, the Task remains in the new Pending state until the file appears. In this initial phase of Resource Management work, the Scheduler will wait indefinitely.

Status Changes

In order to support verification, the Task status workflow has changed. A Task that is waiting for a resource to become available will be in a new state Pending. The parent Task Group ignores any Pending tasks when requesting Workers. A Task only becomes Ready when any resources that require verification have been confirmed as present.

At the same time as this change, some other statuses have been renamed to improve consistency and clarity. These changes are:

  • Task: Running > Executing
  • Task Group: Working > Running
  • Work Requirement: Working > Running
  • Compute Requirement: Pending > Provisioning
  • Node: Alive > Running and Unregistered > Deregistered

Changes to AutoFail and AutoComplete

The Task Group settings autoFail and autoComplete have been replaced by the new properties finishIfAnyTaskFailed and finishIfAllTasksFinished. This change supports greater flexibility in using dynamic Work Requirements and when running applications that expect some Tasks to fail.

  • finishIfAnyTaskFailed If set to true, then if any Task fails the Task Group will immediately fail. All remaining unallocated Tasks will be discarded, and once all allocated Tasks have run the Task Group will finish with the status Failed. If this property is not set it defaults to false, and if a Task fails the Task Group will continue to run its remaining Tasks. Although individual Tasks may complete successfully, if the Task Group finishes (rather than, for example, being cancelled by the user) its final status will still be Failed.
  • finishIfAllTasksFinished If set to false, then the Task Group will remain in the status Running even if all its Tasks have been run. This is useful in cases where, for example, events may cause Tasks to be added to the Task Group after all previous Tasks have finished. If this property is not set it defaults to true, and once all a group’s Tasks have finished (either Completed or Failed) the Task Group will also finish with the appropriate final status.

Existing Task Groups that used the autoFail or autoComplete settings must be updated to use the new properties. Existing Task Groups that did not use these settings can be left unchanged, as the default behaviour is the same.

Worker Claiming

As part of revisiting status and dependencies, two further changes have been made:

  • If a Task Group is part of a Held Work Requirement, the Scheduler Service will not attempt to claim any additional Workers for it.
  • If a Task Group is dependentOn another Task Group, it will not attempt to claim more than its minWorkers (default 0) while in the status Waiting. Only once the dependency Task Group has completed will this Task Group attempt to claim any additional Workers required.

These changes ensure that YellowDog will not provision additional Instances for Task Groups that cannot use them.

Portal Updates

When viewing an individual Task, it is now possible to see its verification status.

AWS Fleet Configuration

For example, the above Task requires three input files. One has been verified as present, but two have not yet been detected. The Task will not be run until all three files are present.

Workflow Changes

Task Dependencies

Task dependencies are not supported within YellowDog, and in previous versions it has been recommended that any dependencies between Tasks be addressed by placing the dependent Tasks within separate Task Groups that do have a dependency relationship.

However, with this release it is also possible to create an effective dependency between Tasks if the output of one is used as the input to another. For example, Task A outputs File A, which must then be processed by Task B. In previous versions, Task B would have had to be placed in Task Group B, which would then have been set to be dependent on the Task Group that contained Task A. In the current version of YellowDog, however, resource verification can be used to ensure that Task B does not run until File A is present. This means that both Tasks can be placed in the same Task Group if that is desirable.

If necessary, Resource Management can be used to create a ‘pipeline’ of multiple Tasks, with the output of each Task supplying the input for the next. As long as each Task is set to wait for verification of its input file before running, the Scheduler will run the entire pipeline in the required order.

Autoscaling

A Task that is in the Pending state, waiting for a resource, is ignored for all scaling calculations. However, if multiple Pending Tasks receive verification of their required input at the same time then the Task Group will attempt to claim Workers for all of these Tasks at once. If you plan to use resource verification with a large number of Tasks, you may wish to set your Task Groups’ Run Specifications to restrict the number of Workers each can demand from the Scheduler, in order to avoid excessive provisioning costs.

Further Information

The Implementation Guide and other documents have been updated to provide information on Resource Management and guidance on its use.

To learn more, contact YellowDog.

YellowDog Please upgrade your browser

You are seeing this because you are using a browser that is not supported. The YellowDog website is built using modern technology and standards. We recommend upgrading your browser with one of the following to properly view our website:

Windows Mac

Please note that this is not an exhaustive list of browsers. We also do not intend to recommend a particular manufacturer's browser over another's; only to suggest upgrading to a browser version that is compliant with current standards to give you the best and most secure browsing experience.