Event Scripting, or RIND (RIND is not dependencies) is a way for administrators to trigger service transitions based on a number of things which occur during cluster operation. Programmatically, it makes most of the business logic of RGManager external - and therefore, customizable. This gives administrators the flexibility to create specific failover policies which are not handled by rgmanager out of the box.

The Language Itself

Event scripts are written in a language called S-Lang; documentation specifics about the language are available at http://www.s-lang.org

Notes

Basic configuration specification

  <rm>
    <events>
      <event class="node"/>        <!-- all node events -->
      <event class="node"
             node="bar"/>          <!-- events concerning 'bar' -->
      <event class="node"
             node="foo"
             node_state="up"/>     <!-- 'up' events for 'foo' -->
      <event class="node"
             node_id="3"
             node_state="down"/>   <!-- 'down' events for node ID 3 -->

          (note, all service ops and such deal with node ID, not
           with node names)

      <event class="service"/>     <!-- all service events-->
      <event class="service"
             service="A"/>    <!-- events concerning 'A' -->
      <event class="service"
             service="B"
             service_state="started"/> <!-- when 'B' is started... -->
      <event class="service"
             service="B"
             service_state="started"/>
             service_owner="3"/> <!-- when 'B' is started on node 3... -->

      <event class="service"
             priority="1"
             service_state="started"/>
             service_owner="3"/> <!-- when 'B' is started on node 3, do this
                                      before the other event handlers ... -->
    </events>
    ...
  </rm>

General globals available from all scripts

Node event globals (i.e. when event_type == EVENT_NODE)

Service event globals (i.e. when event_type == EVENT_SERVICE)

User event globals (i.e. when event_type == EVENT_USER)

Scripting functions - Informational

Scripting functions - Operational

Utility functions - Node list manipulation

Utility functions - Logging

Log levels correspond to the syslog.conf(5) manual page. Resource scripts inherit rgmanager's log filtering level, log facility, and log file (if specified).

Notes:

Error Conditions

Example 1: creating a follows-but-avoid-after-start behavior

%
% If the main queue server and replication queue server are on the same
% node, relocate the replication server somewhere else if possible.
%
define my_sap_event_trigger()
{
        variable state, owner_rep, owner_main;
        variable nodes, allowed;

        %
        % If this was a service event, don't execute the default event
        % script trigger after this script completes.
        %
        if (event_type == EVENT_SERVICE) {
                stop_processing();
        }

        (,,, owner_main, state) = service_status("service:main_queue");
        (,,, owner_rep, state) = service_status("service:replication_server");

        if ((event_type == EVENT_NODE) and (owner_main == node_id) and
            (node_state == NODE_OFFLINE) and (owner_rep >= 0)) {
                %
                % uh oh, the owner of the main server died.  Restart it
                % on the node running the replication server
                %
                notice("Starting Main Queue Server on node ", owner_rep);
                ()=service_start("service:main_queue", owner_rep);
                return;
        }

        %
        % S-Lang doesn't short-circuit prior to 2.1.0
        %
        if ((owner_main >= 0) and
            ((owner_main == owner_rep) or (owner_rep < 0))) {

                %
                % Get all online nodes
                %
                nodes = nodes_online();

                %
                % Drop out the owner of the main server
                %
                allowed = subtract(nodes, owner_main);
                if ((owner_rep >= 0) and (length(allowed) == 0)) {
                        %
                        % Only one node is online and the rep server is
                        % already running.  Don't do anything else.
                        %
                        return;
                }

                if ((length(allowed) == 0) and (owner_rep < 0)) {
                        %
                        % Only node online is the owner ... go ahead
                        % and start it, even though it doesn't increase
                        % availability to do so.
                        %
                        allowed = owner_main;
                }

                %
                % Move the replication server off the node that is
                % running the main server if a node's available.
                %
                if (owner_rep >= 0) {
                        ()=service_stop("service:replication_server");
                }
                ()=service_start("service:replication_server", allowed);
        }

        return;
}

my_sap_event_trigger();

Relevant <rm> section from cluster.conf:

        <rm central_processing="1">
                <events>
                        <event name="main-start" class="service"
                                service="service:main_queue"
                                service_state="started"
                                file="/tmp/sap.sl"/>
                        <event name="rep-start" class="service"
                                service="service:replication_server"
                                service_state="started"
                                file="/tmp/sap.sl"/>
                        <event name="node-up" node_state="up"
                                class="node"
                                file="/tmp/sap.sl"/>

                </events>
                <failoverdomains>
                        <failoverdomain name="all" ordered="1" restricted="1">
                                <failoverdomainnode name="molly"
priority="2"/>
                                <failoverdomainnode name="frederick"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <service name="main_queue"/>
                <service name="replication_server" autostart="0"/>
                <!-- replication server is started when main-server start
                     event completes -->
        </rm>

None: EventScripting (last edited 2009-12-04 17:59:55 by LonHohberger)