New Generation Computing, ( ISI ), Volume (30), No (1), Year (2012-2) , Pages (73-94)

Title : ( Predicting Job Failures in AuverGrid Based on Workload Log Analysis )

Authors: Hamid Saadatfar , Hamid Fadishei , Hossein Deldari ,

Grid systems are popular today due to their ability to solve large problems in business and science. Job failures which are inherent in any computational environment are more common in grids due to their dynamic and complex nature. Furthermore, traditional methods for job failure recovery have proven costly and thus a need to shift toward proactive and predictive management strategies is necessary in such systems. In this paper, an innovative effort has been made to predict the futurity of jobs in a production grid environment. First of all, we investigated the relationship between workload characteristics and job failures by analyzing workload traces of AuverGrid which is a part of EGEE (Enabling Grids for E-science) project. After the recognition of failure patterns, the success or failure status of jobs during 6 months of AuverGrid activity was predicted with approximately 96% accuracy. The quality of services on the grid can be improved by integrating the result of this work into management services like scheduling and monitoring.


, Job Failure Prediction, Grid Workload Archive, Trace Analysis, Bayesian Networks
