|
|
Mardi Gras Conference 2008 Tutorials
The 15th Mardi Gras Conference will have three tutorials, open to all attendees:
Introduction to the Condor High Throughput Computing System and the Metronome Build and Test System
Scientific Workflows: The Pegasus Workflow Management System Example
Swift: Scripting for fast and easy parallel computing with loosely-coupled tasks
Introduction to the Condor High Throughput Computing System and the Metronome Build and Test System
Presenter: Becky Gietzel
University of Wisconsin, Madison, WI
Abstract :
I. An introduction to the Condor High Throughput Computing System
Condor provides features commonly found in batch scheduling systems,
such as job queuing and scheduling policies, priorities, resource
monitoring, and resource management. Condor places submitted jobs into
a queue, decides where and when to run them based on flexible policy
expressions, monitors the jobs as they run and returns the results back
to the user.
Condor may be used to manage both dedicated clusters of compute nodes,
as well as utilizing unused compute cycles from idle workstations.
Condor can also be used to build Grid-style computing environments that
cross administrative boundaries. Condor's flocking technology allows
multiple Condor compute installations to work together. Condor also
incorporates many of the emerging Grid-based computing methodologies and
protocols, including but not limited to Condor-G (Globus).
II. An introduction to Metronome - for reliable, automated building & testing of software
Metronome is a distributed, multi-platform framework designed to provide
automated software building and testing capabilities to a variety of
grid computing projects. We believe that software is not reliable
unless it is regularly built and tested. Doing so requires not only a
significant number of CPU cycles, but often a variety of unusual and
difficult-to-maintain platforms, and a framework for automating,
tracking, and monitoring the entire process. Parameters for each run
are retained, making the builds reproducible.
The Metronome framework is not specific to any application or
programming language, making most builds and tests candidates for
automation in this system. Our goal is to provide an implementation of
this framework utilizing proven grid computing tools as a foundation, as
well as to support the growing number of Metronome facilities
internationally, including our own NMI Lab at the University of
Wisconsin-Madison. Metronome leverages various Condor features,
including scheduling, file transfer, resource management and failover
capabilities.
Scientific Workflows: The Pegasus Workflow Management System Example
Presenters: Ewa Deelman1, Karan Vahi1, Kent Wenger2
1 USC Information Sciences Institute, Marina del Rey, CA
2 University of Wisconsin, Madison, WI
Abstract :
Scientific workflows are becoming an important part of the
scientific discovery process. They capture the individual data
transformations and analysis steps as well as the mechanisms to carry
them out in a distributed environment. Each step in the workflow
specifies a process or computation to be executed (e.g., a software
program to be executed, a web service to be invoked). The steps are
linked according to the data flow and dependencies among them. The
representation of these computational workflows contain many details
required to carry out each analysis step, including the use of specific
execution and storage resources in distributed environments, Workflow
systems can exploit these explicit representations of the complex
computational processes to manage their lifecycle and to automate their
execution. Workflows can capture complex analysis processes at various
levels of abstraction, and also provide the provenance information
necessary for scientific reproducibility, result publication, and result
sharing among collaborators.
In this tutorial we will examine the opportunities and challenges of
designing and running scientific workflows in distributed environments.
In addition to a high-level overview of issues we will also provide
hands-on experience we will provide hands-on experience with the Pegasus
Workflow Management System (Pegasus-WMS). The system is composed of the
Pegasus workflow mapper and the DAGMan workflow execution engine.
Pegasus allows users to design workflows at a high-level of abstraction
and then automatically maps it to the distributed resources. The
tutorial will cover issues of workflow composition-how to design a
workflow in a portable way, and workflow execution-how to run the
workflow on a variety of execution environments: a workstation, campus
cluster, Condor pool, or the grid resources such as the Open Science
Grid or TeraGrid. The tutorial will also cover performance and disk
space optimization capabilities.
Pegasus-WMS has been in development for more than 6 years and is used in
production use by several scientific applications in projects such as
the Southern California Earthquake Center (SCEC), Montage, an astronomy
application, and the Laser Interferometer Gravitational Wave Observatory
(LIGO) running on the TeraGrid as well as OSG. DAGMan , the Pegasus-WMS
workflow executor is a production quality software developed as part of
the Condor project that executes the workflows by performing dependency
analysis and releasing workflow jobs to the execution environment as and
when they are ready for execution.
Swift: Scripting for fast and easy parallel computing with loosely-coupled tasks
Presenters: Ben Clifford1, Michael Wilde2
1 Computation Institute, University of Chicago
2 Argonne National Laboratory
Abstract :
Swift is a scripting language and system that makes it easy to create
applications which execute large numbers of tasks coupled by
disk-resident datasets. It meets prevalent needs in science and
engineering for analyzing vast quantities of data, performing parameter
studies, and executing ensemble simulations.
The open source Swift system combines a simple scripting language for
the concise, high-level specification of parallel computations, mappers
for accessing diverse disk-based data structures conveniently, and an
execution engine that efficiently manages the dispatch of tasks to
distributed processors on parallel clusters, campus networks, or
multi-site grids.
This introductory tutorial will provide a hands-on taste of Swift.
Participants will learn, via a series of examples, how to orchestrate
the execution of multiple independent programs; how to use mappers to
access data in various file-based structures; and how to do parallel
distributed computing with simple but powerful scripts.
|