Center for Informatics Research in Science and Scholarship

Graduate School of Library and Information Science
Center for Informatics Research in Science and Scholarship
University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

 

Bertram Ludaescher Lecture

 

Title: Modeling and Design of Scientific Workflows with Data Assembly Lines, Provenance, and Semantic Types

Date: Monday, September 21st

Time: 11:00 AM

Location: LIS 126

 

Abstract

Despite an increasing interest in scientific workflow technologies in recent years, workflow design remains a challenging, slow, and often error-prone process, thus limiting the speed of further adoption of scientific workflows. Based on practical experience with data-driven workflows, we identify and illustrate a number of recurring scientific workflow design challenges, i.e., parameter-rich functions; data assembly, disassembly, and cohesion; conditional execution; iteration; and, more generally, workflow evolution. In conventional approaches, such challenges usually lead to the introduction of different types of "shims", i.e., intermediary workflow steps that act as adapters between otherwise incorrectly wired components. However, relying heavily on the use of shims leads to brittle (i.e., change-intolerant) workflow designs that are hard to comprehend and maintain. To this end, we first present a general workflow design paradigm called virtual data assembly lines (VDAL) and argue that the VDAL approach can overcome common scientific workflow design challenges and improve workflow designs by exploiting (i) a semistructured, nested data modellike XML, (ii) a flexible, statically analyzable configuration mechanism (e.g., an XQuery fragment), and (iii) an underlying virtual assembly line model that is resilient to workflow and data changes. The approach has been implemented as Kepler/COMAD, and applied to improve the design of complex, real-world workflows. In the second part of the talk, we discuss related workflow research issues, i.e., the importance of provenance and data lineage in scientific workflows and the use of logic-based semantic types in workflow design.

 

Biographical Sketch

Bertram Ludaescher is professor at the Department of Computer Science and a member of the faculty at the UC Davis Genome Center, both at the University of California, Davis. His research focus includes modeling, design, and optimization of scientific workflows and databases, data and workflow provenance, and knowledge representation and reasoning for scientific workflows and scientific data integration. He is currently involved in several collaborative scientific data and workflow management projects, including the DOE Scientific Data Management (SciDAC/SDM) Center project and NSF projects to develop scientific workflow technology (Kepler-CORE), e.g., for bioinformatics and environmental observatory applications (REAP, COMET). Prof. Ludaescher received his M.S. (Dipl.-Inform.) in Computer Science from the University of Karlsruhe in 1992 and his Ph.D. from the University of Freiburg, Germany in 1998. Until 2004 he was a research scientist at the San Diego Supercomputer Center and an adjunct faculty at the Department of Computer Science and Engineering at UC San Diego.



Home
CIRSS
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA
cirssinfo@cirss.lis.uiuc.edu | (217) 333-1981 | [fax] (217) 244-3302
I3