Sandboxing in Python

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

After many years programming in Ruby, PHP, Clojure, etc, I am only now getting into Python. One thing that strikes me is the extent to which each eco-system emphasizes different things. In theory, all of these languages are Turing-complete, and so you could do anything in any of these languages, but the reality is different: each community regards some issues as important, and other issues as not important, and so you end with tooling for different tasks, in each of these communities. Among the surprises in the world of Python is the support for sandboxing:

PyPy offers sandboxing at a level similar to OS-level sandboxing (e.g. SECCOMP on Linux), but implemented in a fully portable way. To use it, a (regular, trusted) program launches a subprocess that is a special sandboxed version of PyPy. This subprocess can run arbitrary untrusted Python code, but all its input/output is serialized to a stdin/stdout pipe instead of being directly performed. The outer process reads the pipe and decides which commands are allowed or not (sandboxing), or even reinterprets them differently (virtualization). A potential attacker can have arbitrary code run in the subprocess, but cannot actually do any input/output not controlled by the outer process. Additional barriers are put to limit the amount of RAM and CPU time used.

Note that this is very different from sandboxing at the Python language level, i.e. placing restrictions on what kind of Python code the attacker is allowed to run (why? read about pysandbox).

Another point of comparison: if we were instead to try to plug CPython into a special virtualizing C library, we would get a result that is not only OS-specific, but unsafe, because CPython can be segfaulted (in many ways, all of them really, really obscure). Given enough efforts, an attacker can turn almost any segfault into a vulnerability. The C code generated by PyPy is not segfaultable, as long as our code generators are correct – that’s a lower number of lines of code to trust. For the paranoid, PyPy translated with sandboxing also contains systematic run-time checks (against buffer overflows for example) that are normally only present in debugging versions.

…One of PyPy’s translation aspects is a sandboxing feature. It’s “sandboxing” as in “full virtualization”, but done in normal C with no OS support at all. It’s a two-processes model: we can translate PyPy to a special “pypy-c-sandbox” executable, which is safe in the sense that it doesn’t do any library or system calls – instead, whenever it would like to perform such an operation, it marshals the operation name and the arguments to its stdout and it waits for the marshalled result on its stdin. This pypy-c-sandbox process is meant to be run by an outer “controller” program that answers these operation requests.

The pypy-c-sandbox program is obtained by adding a transformation during translation, which turns all RPython-level external function calls into stubs that do the marshalling/waiting/unmarshalling. An attacker that tries to escape the sandbox is stuck within a C program that contains no external function calls at all except for writing to stdout and reading from stdin. (It’s still attackable in theory, e.g. by exploiting segfault-like situations, but as explained in the introduction we think that PyPy is rather safe against such attacks.)

The outer controller is a plain Python program that can run in CPython or a regular PyPy. It can perform any virtualization it likes, by giving the subprocess any custom view on its world. For example, while the subprocess thinks it’s using file handles, in reality the numbers are created by the controller process and so they need not be (and probably should not be) real OS-level file handles at all. In the demo controller I’ve implemented there is simply a mapping from numbers to file-like objects. The controller answers to the “os_open” operation by translating the requested path to some file or file-like object in some virtual and completely custom directory hierarchy. The file-like object is put in the mapping with any unused number >= 3 as a key, and the latter is returned to the subprocess. The “os_read” operation works by mapping the pseudo file handle given by the subprocess back to a file-like object in the controller, and reading from the file-like object.

That is a serious commitment of energy to enabling sandboxing. I can not think of an equivalent in Ruby or Clojure. PHP used to have some weak enforcement of security on directories, which was an issue it had to confront, since it was used so widely in shared hosting environments, but even PHP’s official documentation admitted the PHP approach was weak.

Post external references

  1. 1
    http://pypy.readthedocs.org/en/latest/sandbox.html
Source