About MesaPy

MesaPy is a memory-safe Python implementation based on PyPy. In addition to PyPy's distinct features -- speed, memory usage, compatibility and stackless, MesaPy mainly focuses on improving its security and memory safety. On top of the enhancements, we also bring MesaPy into Intel SGX to write memory-safe applications running in the trusted execution environment.

We achieve the memory-safety promise through various methods: hardening RPython's type system (RPython is the language for writing PyPy), modifying PyPy/RPython's libraries, and verifying the RPython's libraries as well as its interpreter/JIT backend. Overall, there are four most notable security features of MesaPy:

  • Memory safety: To provide a memory-safe runtime, MesaPy replaces external libraries written in C, which could introduce memory issues, with Rust, a memory-safe programming language. This guarantees the memory safety across all libraries including those written in Python, but also external libraries.

  • Security hardening: PyPy is implemented with RPython, a statically-typed language with translation and support framework. We also enhanced memory-safety of RPython through hardening RPython's type system, i.e., the RPython Typer. For example, we improve RPython's list with runtime index check to avoid arbitrarily list read/write during PyPy's implementation.

  • Formal verification: Some code in RPython's libraries and its interpreter/JIT backend are still written in C, which may contain potential memory bugs. To prove the memory safety of RPython, we aim to formally verify its libraries and backend written in C using state-of-the-art verification tools.

  • SGX support: With the memory safety of MesaPy, we also port it to Intel SGX, which is a trusted execution environment to provide integrity and confidentiality guarantees to security-sensitive computation. Developers now can easily use MesaPy to implement SGX applications without worrying about memory issues.

Getting Started

Building MesaPy from Source

Building MesaPy is as easy as building PyPy. Most steps are same with PyPy. But because MesaPy deleted some non-memory-safe module, the number of dependencies of MesaPy are fewer than PyPy.

Clone the repository

Clone the repository and related projects first:

$ git clone -b mesapy2.7 --recursive https://github.com/mesalock-linux/mesapy.git

This command will clone MesaPy's source code from GitHub and sub-modules such as mesalock-linux/miniz_oxide and mesalock-linux/mesalink.

Install dependencies

Install build-time dependencies, on recent Debian and Ubuntu (16.04 onwards), this is the command to install all build-time dependencies:

$ apt-get install gcc make libffi-dev pkg-config libgdbm-dev libgc-dev python-cffi

Install rust toolchain:

$ curl https://sh.rustup.rs -sSf | sh

Start building

You can use Makefile to easily build MesaPy. Just one line:

$ make pypy-c

The building time will take about 30 minutes to an hours. Because parallelization of translating MesaPy is not supported, we suggest to use a CPU which with high frequency for a single core.

You can also build the project separately. Basically, it contains two steps: building the Rust projects and translating MesaPy.

Build Rust projects

The Rust projects are under the lib_rust directory. Go to each directory and build the Rust projects. The Rust projects are usually built by cargo. Take miniz_oxide as an example. You can build it by the following command:

$ cd lib_rust/miniz_oxide && cargo build --release
$ cd lib_rust/miniz_oxide && cp target/release/libminiz_oxide_c_api.so ../../pypy/goal/

Translate MesaPy

Same as PyPy, both Python and PyPy are supported to translate MesaPy.

$ cd pypy/goal
$ pypy ../../rpython/bin/rpython <rpython options> targetpypystandalone.py <pypy options>

Translate with JIT:

$ pypy ../../rpython/bin/rpython --opt=jit targetpypystandalone.py <pypy options>

Translate without JIT:

pypy ../../rpython/bin/rpython --opt=2 targetpypystandalone.py <pypy options>

Memory Safety

Security Hardening

Stay tuned.

Formal Verification

##The Formal Verification on RPython Backend

Introduction

We propose to conduct the verification evaluation on RPython backend, and guarantee that the RPython backend is memory safe. Formal verification allows us to prove conclusively that certain error states can never occur, so we leverage the formal verification technique to achieve our objective.

Terminology

  1. Sat vs Unsat Every verification tool will output their verification result as sat'' orunsat''. sat'' means that there exists the violation of the memory safty checking rule. In other word, it means it is unsafe for the target program.unsat'' means that there is no violation for the memory safty check. It means the target program is memory safe.
  2. Undefined The value of a variable in the program can be undefined. It means that the value of the variable is treated as a symbolic value, and it can be any value inside of its range.

Evaluation Approach

In this report, we only target on the verification of RPython backend C programs, and leverage three existing verification tools for the evaluation. To this end, we first collect all RPython backend C programs, and convert them into the format which can be used for verification. Then, we evaluate all C programs in the memory safety mode of the verification tools. Finally, we manually checked each alarm reported, and fixed them. Repeat this process until the tool considers the target program is safe. For these alarms we cannot fix due to some reasons, we just comment it to make the verification continue. As a result, the verification tool can successfully verify the target program.

We utilize three of the verification tools for our purpose, Seahorn~\cite{seahorn}, Smack~\cite{smack} and Infer~\cite{infer}. Seahorn is the unbounded verification framework, Smack is the bounded verification framework and infer is an abstract interpretation tool. We applied each of three tools on the RPython backend C programs. For Seahorn and Smack, we first built RPython C programs and extracted their bc code, feed these bc code for both of them, and collected their evaluation results. We wrote the script for both of them. For smack, we need to write the test driver for the verification. We wrote the test driver for all C backend programs to enable Smack evaluation. For infer, we wrote the Makefile to enable infer to handle C programs.

To evaluation the verification result, we manually check each of alarms reported by three verification tools. Since Smack and Seahorn do not provide the user friendly alarm results, we need to look into the intermediate results, understand their semantics and map their semantics to the soure code manually. This is a tedious process, and we need to repeat this process for each alarm reported by Smack and Seahorn.

Since all verification tools will terminate when they meet the first alarm. To cover the whole program, we need to fix the alarm manually to make the verification continue. Not all the fixes can be understood by the verification tool (We are working on this issue), so we just simply comment these ``buggy code'' in the program, and make the evaluation continue.

SGX Support

Roadmap

Contributing

Maintainers

The project is maintained by following people, feel free to contact them if you are interested in the project:

  • Memory-safety and security hardening: Mingshen Sun <[email protected]>
  • Formal verification: Qian Feng
  • SGX support: Huibo Wang and Yu Ding

Also, huge thanks to our steering committees:

  • Tao Wei
  • Yulong Zhang

Artwork