Title : Hybrid Workflow on the Cloud

Mahmoud Naghibzadeh (نویسنده اصلی),
Book type: تالیف
Publish No: اول
Publishe Date: 2016-12-12
Publisher: Lambert Academic Publishing
Abstract:

Complex projects are usually decomposed into small units of work each to be done using restricted and specific resources, personnel, and specialties in a limited amount of time. Being part of a complex project, each such unit of work has to be carried out only after all prerequisites and requirements are completed. In this book, these small units of work will be called tasks. Graphical representation of tasks of a multipart project in such a way that the prerequisites of each one is clearly shown can very much help in proper scheduling to do each task in a correct time interval. A workflow is one such graph. Workflow modeling is used in almost every large-scale projects such as climate modelling, disaster modeling and recovery, business processes management, structural biology and chemistry, medical surgery, stock market modelling, financial risk analysis, DNA analysis, next generation sequencing, building construction, factory construction, etc. There are many types of workflows such as sequential, state machine, data-driven, fork–join, and loopy. However, workflows are often modeled as Directed Acyclic Graphs (DAG) in which each vertex is a task and each directed edge represents both precedence and possible communications from its originating vertex to its ending vertex. With DAG workflows, when the execution of a task is completed, the communication with its successor(s) can start and anticipated results are transferred. Only after all parents of a task are completed and their results (if any) are received by the task its execution can start. The constraint that tasks cannot communicate during their executions restricts a more general case in which some tasks could directly or indirectly interact during their executions. In this book, a task model composed of both interaction and precedence of tasks is introduced. It is shown that, under certain conditions, this kind of graphs can be transformed into an extended DAG, called Hybrid DAG (HDAG), composed of tasks and super-tasks. The validity of such graphs are verified and if there are inconsistencies in the designs they are diagnosed. With HDAGs, it becomes possible to model many applications which could not be modelled before. Our emphasis will be on scientific workflows (or sometimes called computational workflows), i.e., workflows composed of tasks for execution of computation and/or data manipulation activities. There are numerous developed scientific workflows such 4 as Montage in astronomy, SciEvol in bioinformatics, Cybershake in physics-based sciesmic analysis, Epigenomics in cancer, and Sipht to search for untranslated RNAs, to name a few. Realistic computational workflows are composed of hundreds and even thousands of tasks with long execution times. Running such workflows on conventional computers takes months and even years to complete and by then the results are most probably out of date. Special systems such as supercomputers and vast distributed systems are needed to run such workflows in acceptable timespans. Having such systems may not be possible for many users and companies that need them to run their workflows. Even if they can afford the money, it may not be economical to have them only run workflows once in a while. The Cloud is the most common infrastructure being used to run computational intensive workflows. It provides wide varieties of resources and software in the forms of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) for public use. Users can lease these services for different applications and for the needed periods of time and pay as they are used. Proper scheduling of HDAGs can be tremendously beneficial to the user in many terms such as time and money. An algorithm for scheduling hybrid workflows is presented in this book and its performance is evaluated. The effect of different values for relative deadline on the schedulability of such workflows is also presented.

Keywords:
Hybrid workflow, Workflow modeling, Directed Acyclic Graph, Hybrid workflow, Workflow modeling, Directed Acyclic Graph