Have a request for an upcoming news/science story? Submit a Request

Scientific workflow management system, Pegasus, available on Anvil

  • Science Highlights
  • Anvil

Pegasus, an NSF-funded scientific workflow management system, is now available for use on Purdue's Anvil supercomputer. With the addition of Pegasus, Anvil users can define, manage, and execute complex, multi-step computational tasks with ease through a web-based interface, reducing researcher workload and enabling faster time-to-discovery.

Pegasus is a Pegasus Software Logotool to help workflow-based applications function in various environments, including desktops, cloud, and high-performance computing (HPC) systems. It was designed to allow scientists to construct workflows in abstract terms and remove the need to understand the underlying execution environment. Pegasus has been used successfully in a number of scientific fields: astronomy, bioinformatics, earthquake science, gravitational-wave physics, ecology, and cryo-EM, amongst others. A workflow in Pegasus consists of multiple tasks with defined dependencies, and Pegasus handles job submission, data staging, execution ordering, and failure recovery. Some beneficial features of Pegasus include:

  • Data Management: Pegasus handles data transfers, input data selection, and output registration.
  • Automated Error Recovery and Reliability: Errors are automatically addressed by retrying tasks, workflow-level checkpointing, re-mapping, and trying alternative data sources for data staging.
  • Adaptability and Reuse: Pegasus works in a variety of distributed computing environments, and workflows can easily be run in different environments without alteration.
  • Scalability: Pegasus can scale both the size of the workflow and the resources the workflow is distributed over without impacting performance.

Pegasus is deployed on Anvil through the Anvil Notebook Service, which provides browser-based access to Jupyter Notebooks running on Anvil infrastructure. The Pegasus Notebook environment includes the Pegasus workflow management system, HTCondor for workflow execution management, and preconfigured integration with Anvil’s SLURM scheduler. This environment allows users to develop and debug workflows interactively using the Pegasus Python API or command-line tools, submit workflows to Anvil’s batch system using their allocations, and monitor workflow execution and logs directly from the notebook interface. No additional Pegasus installation or configuration is required by the user.

To learn more about Pegasus and how to access it on Anvil, please visit: Pegasus on Anvil

Anvil is one of Purdue University’s most powerful supercomputers, providing researchers from diverse backgrounds with advanced computing capabilities. Built through a $10 million system acquisition grant from the National Science Foundation (NSF), Anvil supports scientific discovery by providing resources through the NSF’s Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS), a program that serves tens of thousands of researchers across the United States. Anvil also supports advanced artificial intelligence research as an official resource provider of the National Artificial Intelligence Research Resource (NAIRR) Pilot.

Researchers may request access to Anvil via the ACCESS allocations process or through the NAIRR allocations process. More information about Anvil is available on Purdue’s Anvil website. Anyone with questions should contact anvil@purdue.edu. Anvil is funded under NSF award No. 2005632.

Written by: Jonathan Poole, poole43@purdue.edu

Originally posted: