Software Fault Localization using Cross Entropy and N-gram Models

Authors

Abstract

The aim is to automate the process of bug localization in program source code. The cause of program failure could be best determined by comparing and analyzing correct and incorrect execution paths generated by running the instrumented program with different failing and passing test cases. To compare and analysis the execution paths, one approach is clustering the paths according to their similarity. To calculate similarities among execution paths, N-gram models could be created for each individual run. The constructed models are further analyzed using cross entropy to compute the similarities among their corresponding execution paths. By counting elements in each execution path, each element is known as uni-gram, we can compute MLE probabilities to create N-gram models, known as markov models. Then by further analyzing the cross-entropy of sequences in each cluster, a series of fault suspicious locations are identified and finally using majority voting among clusters, faulty locations are reported to the programmer as faulty subpath(s). Our experiments on Siemens benchmark suite show that the proposed method in this paper shows the location of faults with high accuracy.

Keywords