egee-uig-approved-logo

Page MPI

egee logo
UIG Page No
Version No
Last review date
Next review date
Contact
MPI
1.01
10/01/2007
09/07/2007

MPI Jobs on the Grid

This example demonstrates how to compile and run a simple MPI (Message Passing Interface) job on the grid. Two versions of the MPI-1 interface (LAM and MPICH) and two versions of the MPI-2 interface (OpenMPI and MPICH2) are supported.

The user must have valid grid credentials and understand basic job submission on the grid. The user must be a member of a VO that has at least one site supporting MPI.

References

The following references provide additional information about MPI.

Grid Services

MPI support is implemented by the following services provided by the Grid.

Introduction

The Message Passing Interface (MPI) is commonly used to handle the communications between tasks in parallel applications. There are two versions of MPI, MPI-1 and MPI-2. Two implementations of MPI-1 (LAM and MPICH) and two implementations of MPI-2 (OpenMPI and MPICH2) are supported. Individual sites may choose to support only a subset of these implementations, or none at all.

In the past, running MPI applications on the EGEE infrastructure required significant hand-tuning for each site. This was needed to compensate for sites that did or did not have a shared file system, location of the default scratch space, etc. The current configuration allows jobs to be more portable and allows the user more flexibility.

The increased portability and flexibility is achieve by working around hard-coded constraints from the Resource Broker (RB) and by off-loading much of the grunt work to the mpi-start scripts. The mpi-start scripts are developed by the int.eu.grid project and based on the work of the MPI working group that contains members from both int.eu.grid and EGEE.

Using the mpi-start system requires the user to define a "wrapper script" and a set of "hooks". The mpi-start system then handles most of the low-level details of running the MPI job on a particular site.

Wrapper script for mpi-start

The user typically uses a wrapper script to initiate the mpi-start processing. The following wrapper script (named "mpi-start-wrapper.sh") is fairly generic and users should not need to make significant modifications to it.

#!/bin/bash

# Pull in the arguments.
MY_EXECUTABLE=`pwd`/$1
MPI_FLAVOR=$2

# Convert flavor to lowercase for passing to mpi-start.
MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'`

# Pull out the correct paths for the requested flavor.
eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH`

# Ensure the prefix is correctly set.  Don't rely on the defaults.
eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH
export I2G_${MPI_FLAVOR}_PREFIX

# Touch the executable.  It exist must for the shared file system
check.
# If it does not, then mpi-start may try to distribute the executable
# when it shouldn't.
touch $MY_EXECUTABLE

# Setup for mpi-start.
export I2G_MPI_APPLICATION=$MY_EXECUTABLE
export I2G_MPI_APPLICATION_ARGS=
export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER
export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh
export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh

# If these are set then you will get more debugging information.
export I2G_MPI_START_VERBOSE=1
#export I2G_MPI_START_DEBUG=1

# Invoke mpi-start.
$I2G_MPI_START

The script first sets up the environment for the chosen flavor of MPI using environment variables supplied by the system administrator. It then defines the executable, arguments, MPI flavor, and location of the hook scripts for mpi-start. The user may optionally ask for more logging information with the verbose and debug environment variables. Lastly, the wrapper invokes mpi-start itself.

Hooks for mpi-start

The user may define a function (script) to call before and after the MPI executable is run. The "pre-hook" can be used, for example, to compile the executable itself or download data. The "post-hook" can be used to analyze results or to save the results on the grid.

The following example (named "mpi-hooks.sh") compiles the executable before running it; the post-hook only writes a message to the standard output. A real job would likely save the results of the job somewhere.

#!/bin/sh

#
# This function will be called before the MPI executable is started.
# You can, for example, compile the executable itself.
#
pre_run_hook () {

  # Compile the program.
  echo "Compiling ${I2G_MPI_APPLICATION}"

  # Actually compile the program.
  cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c"
  echo $cmd
  $cmd
  if [ ! $? -eq 0 ]; then
    echo "Error compiling program.  Exiting..."
    exit 1
  fi

  # Everything's OK.
  echo "Successfully compiled ${I2G_MPI_APPLICATION}"

  return 0
}

#
# This function will be called before the MPI executable is finished.
# A typical case for this is to upload the results to a storage
element.
#
post_run_hook () {

  echo "Executing post hook."
  echo "Finished the post hook."

  return 0
}

The pre- and post-hooks may be defined in separate files, but the names of the functions must be exactly "pre_run_hook" and "post_run_hook".

Defining the job and executable

Running the MPI job itself is not significantly different from running a "normal" grid job. The user must define a JDL file describing the requirements for the job. An example is:

JobType = "MPICH";
NodeNumber = 16;
Executable = "mpi-start-wrapper.sh";
Arguments = "mpi-test OPENMPI";
StdOutput = "mpi-test.out";
StdError = "mpi-test.err";
InputSandbox = {"mpi-start-wrapper.sh","mpi-hooks.sh","mpi-test.c"};
OutputSandbox = {"mpi-test.err","mpi-test.out"};
Requirements =
  Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment)
  && Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
  # && RegExp("grid.*.lal.in2p3.fr.*sdj$",other.GlueCEUniqueID)
  ;

The "JobType" must be "MPICH" and the attribute "NodeNumber" must be defined (16 in this example). Despite the name of the attribute, this attribute defines the number of CPUs required by the job. It is not possible to request more complicated topologies based on nodes and CPUs.

This example uses the OpenMPI implementation of the MPI-2 standard. The other supported implementations can be selected by changing "OPENMPI" (in two places) to the name of the desired implementation. The other names are "LAM", "MPICH", and "MPICH2". Note again that the "JobType" attribute must be "MPICH" in all cases; it selects for an MPI job in general and not the specific implementation.

All of the files for the above example JDL file have been defined except for the actual MPI program. The example uses a simple MPI HelloWorld example written in C. The code is:

/*  hello.c
 *
 *  Simple "Hello World" program in MPI.
 *
 */

#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[]) {

  int numprocs;  /* Number of processors */
  int procnum;   /* Processor number */

  /* Initialize MPI */
  MPI_Init(&argc, &argv);

  /* Find this processor number */
  MPI_Comm_rank(MPI_COMM_WORLD, &procnum);

  /* Find the number of processors */
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);

  /* Shut down MPI */
  MPI_Finalize();
  return 0;
}

Although it is not required to compile the MPI locally, it is highly recommended. Many compilation options are specific to the software installed or hardware installed on a site. Sending a binary risks sub-optimal performance at best and crashes at worst.

Running the MPI job

Running the MPI job is no different from any other grid job. Use the commands edg-job-submit, edg-job-status, and edg-job-get-output to submit, check the status, and recover the output of a job.

If the job ran correctly, then the standard output should contain something like:

Hello world! from processor 15 out of 16
Hello world! from processor 0 out of 16
Hello world! from processor 1 out of 16
Hello world! from processor 7 out of 16
Hello world! from processor 2 out of 16
Hello world! from processor 3 out of 16
Hello world! from processor 4 out of 16
Hello world! from processor 6 out of 16
Hello world! from processor 8 out of 16
Hello world! from processor 9 out of 16
Hello world! from processor 12 out of 16
Hello world! from processor 5 out of 16
Hello world! from processor 10 out of 16
Hello world! from processor 14 out of 16
Hello world! from processor 11 out of 16
Hello world! from processor 13 out of 16

If there are problems running the job and the standard output and error do not contain enough information, setting the mpi-start debug flag in the wrapper script may help.

Valid HTML 4.01! Valid CSS!