1.0 Runjob Project: Architecture Of Job Building and Configuration System With Shahkar
The following diagram shows how the Shahkar software produced by the Runjob project works with externally provided infrastructure to produce jobs for production processing. In the diagram, communication is assumed to flow among APIs within each box and vertically across box boundaries only.
At the top, we see that there are fundamentally three inputs into the system. These inputs consist of (A) a macro file containing user directives, (B) one or more context files declaring configuration information relating to logical units such as the target site infrastructure or to the target physics group, and (C) one or more external services such as parameter databases or tracking services.
A macro file contains directives to load modules corresponding to the services that the user wants to access and applications the user wants to configure for a run. It also contains the control structures needed to generate multiple jobs, and it may also contain some on the fly configurations.
A site context file may declare that whenever the user loads an abstract "Runjob" module, that a concrete "Condor Submit" module be loaded in its place. The context file would also be responsible for configuring the module. A physics group context file may declare that whenever a user loads a "Pythia" module, that a data card forcing Higgs mass be set to 108 GeV/c2 be added to the driver cards. Context files are generally to be written by site administrators or physics group leaders.
An external service like a parameter database may contain information about how to configure jobs such as in a request database. For example, the CMS Reference Database holds all of the configuration parameters, driver card files, and even wrapper scripts needed to run CMS production jobs. The external services are adapted for use within the Shahkar core by small adapters that understand the MySQL API, the HTTP API, or can run client programs directly such as wget in a subshell.
The ScriptObject layer contains an API for defining a sanbox containing the job itself, and subsidiary data files that it needs, methods for staging data in/out of worker nodes, application software itself, a runtime environment specification, etc. However, the API is loose in that any or all of these elements may be omitted. In the CMS example, the ScriptObject contains all of the wrapper scripts, cards files, and descriptive metadata only. The intent is that the ScriptObject contain all of the relevant metadata to describe the job. The ScriptObjects can be dressed for a variety of batch systems. For example, a Condor submit file may be added to the ScriptObject so that a Condor Configurator could then submit it to execution services.
In this scheme, monitoring takes place in a parallel stack. In practice, monitoring may be infrastructure based and be functionally separate from the jobs. Also, monitoring may be put into the job wrapper itself by agency outside of the project.
Internally, as shown in the next diagram, macros and context directives are interpreted by a parser layer directly on top of the Linker. All directives go through the Linker. The Linker acts as a container for both Configurators and for ScriptObjects created by Configurators.
2.0 Runjob Project: Details of Framework Call Handling in Shahkar Core
The Shahkar Core layer is responsible for interpreting the macro and context directives, loading actual modules corresponding to specific services and applications, ordering the modules, making framework calls, and collecting ScriptObjects. The modules, called Configurators, have metadata and are allowed to link their metadata to metadata in other Configurators. The Configurators are ordered by dependecies, or by the order in which they appear in the macro if no dependencies are given. In order to do work, the Linker issues framework calls that may be handled by the individual Configurators. Framework call handlers are registered to each configurator and are bound to specific framework calls. Upon receipt of certain calls, a Configurator may create, configure, or emit a ScriptObject depending on what functions were bound to which framework calls.
The figure above shows the Linker issueing framework calls to a series of Configurators. The "DB Service" has only one framework handler, bound to the call "ConfigJob". The "Application" Configurator has two handlers one each bound to the "ConfigJob" and "MakeJob" calls respectively. Finally, the execution handler has one handler bound to the "RunJob" call.
A description of what this setup does can be found in the framework calling table. The figure below shows an example table corresponding to the above framework diagram. The calling order is defined in the Linker, and is user configurable. In the example, it is "Reset", "ConfigJob", "MakeJob", and "RunJob". These are listed in top-down order by row. Along the columns are the different Configurators. The calls are issued by the Linker in top-down order according to the figure, and according to Configurator order within each row. Framework calls are use definable, and Configurator authors can bind arbitrary functions to framework calls of their choice. In the table, a short description of what each handler does is contained in the cell. The user/developer is responsible for the contents of all framework handlers. Shahkar core merely provides an API for registering the handlers to appropriate metadat containers and the framework itself. (The "Reset" call is always handled internally by the Configurators. If another application specific reset is needed, it is recommended to handle it in another framework call.)
3.0 Runjob Project: Component Interactions
The next figure shows the component architecture of the Runjob Project sofwtare, Shahkar. While it looks complex, it is actually the result of combining many smaller elements.
The principal features of the diagram are that directives from the macro file and from the context are used to initialize Configurators, and then Configurators then receive messgaes (framework calls) from the Linker in order of dependencies specified (not shown). Depending on what handlers are attached, the Configurators may contact an external DB for parameters, create a job in a ScriptObject and register that ScriptObject back with the Linker, or get ScriptObjects from teh Linker and run them on execution services.
4.0 Runjob Project: Modeling Component Dependencies and Metadata Links
