Formal Verification

##The Formal Verification on RPython Backend

Introduction

We propose to conduct the verification evaluation on RPython backend, and guarantee that the RPython backend is memory safe. Formal verification allows us to prove conclusively that certain error states can never occur, so we leverage the formal verification technique to achieve our objective.

Terminology

  1. Sat vs Unsat Every verification tool will output their verification result as sat'' orunsat''. sat'' means that there exists the violation of the memory safty checking rule. In other word, it means it is unsafe for the target program.unsat'' means that there is no violation for the memory safty check. It means the target program is memory safe.
  2. Undefined The value of a variable in the program can be undefined. It means that the value of the variable is treated as a symbolic value, and it can be any value inside of its range.

Evaluation Approach

In this report, we only target on the verification of RPython backend C programs, and leverage three existing verification tools for the evaluation. To this end, we first collect all RPython backend C programs, and convert them into the format which can be used for verification. Then, we evaluate all C programs in the memory safety mode of the verification tools. Finally, we manually checked each alarm reported, and fixed them. Repeat this process until the tool considers the target program is safe. For these alarms we cannot fix due to some reasons, we just comment it to make the verification continue. As a result, the verification tool can successfully verify the target program.

We utilize three of the verification tools for our purpose, Seahorn~\cite{seahorn}, Smack~\cite{smack} and Infer~\cite{infer}. Seahorn is the unbounded verification framework, Smack is the bounded verification framework and infer is an abstract interpretation tool. We applied each of three tools on the RPython backend C programs. For Seahorn and Smack, we first built RPython C programs and extracted their bc code, feed these bc code for both of them, and collected their evaluation results. We wrote the script for both of them. For smack, we need to write the test driver for the verification. We wrote the test driver for all C backend programs to enable Smack evaluation. For infer, we wrote the Makefile to enable infer to handle C programs.

To evaluation the verification result, we manually check each of alarms reported by three verification tools. Since Smack and Seahorn do not provide the user friendly alarm results, we need to look into the intermediate results, understand their semantics and map their semantics to the soure code manually. This is a tedious process, and we need to repeat this process for each alarm reported by Smack and Seahorn.

Since all verification tools will terminate when they meet the first alarm. To cover the whole program, we need to fix the alarm manually to make the verification continue. Not all the fixes can be understood by the verification tool (We are working on this issue), so we just simply comment these ``buggy code'' in the program, and make the evaluation continue.