pypy/doc/sandbox.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150

.. _sandbox:

PyPy's sandboxing features
==========================

.. warning:: This describes the old, unmaintained version.  A new version
   is in progress and should be merged back to trunk at some point soon.
   Please see its description here:
   https://mail.python.org/pipermail/pypy-dev/2019-August/015797.html


Introduction
------------

PyPy offers sandboxing at a level similar to OS-level sandboxing (e.g.
SECCOMP_ on Linux), but implemented in a fully portable way.  To use it,
a (regular, trusted) program launches a subprocess that is a special
sandboxed version of PyPy.  This subprocess can run arbitrary untrusted
Python code, but all its input/output is serialized to a stdin/stdout
pipe instead of being directly performed.  The outer process reads the
pipe and decides which commands are allowed or not (sandboxing), or even
reinterprets them differently (virtualization).  A potential attacker
can have arbitrary code run in the subprocess, but cannot actually do
any input/output not controlled by the outer process.  Additional
barriers are put to limit the amount of RAM and CPU time used.

Note that this is very different from sandboxing at the Python language
level, i.e. placing restrictions on what kind of Python code the
attacker is allowed to run (why? read about pysandbox_).

.. _SECCOMP: https://code.google.com/p/seccompsandbox/wiki/overview
.. _pysandbox: https://mail.python.org/pipermail/python-dev/2013-November/130132.html

Another point of comparison: if we were instead to try to plug CPython
into a special virtualizing C library, we would get a result
that is not only OS-specific, but unsafe, because CPython can be
segfaulted (in many ways, all of them really, really obscure).
Given enough efforts, an attacker can turn almost any
segfault into a vulnerability.  The C code generated by
PyPy is not segfaultable, as long as our code generators are correct -
that's a lower number of lines of code to trust.  For the paranoid,
PyPy translated with sandboxing also contains systematic run-time
checks (against buffer overflows for example)
that are normally only present in debugging versions.

.. warning::

   The hard work from the PyPy side is done --- you get a fully secure
   version.  What is only experimental and unpolished is the library to
   use this sandboxed PyPy from a regular Python interpreter (CPython, or
   an unsandboxed PyPy).  Contributions welcome.

.. warning::
  
  Tested with PyPy2.  May not work out of the box with PyPy3.


Overview
--------

One of PyPy's translation aspects is a sandboxing feature. It's "sandboxing" as
in "full virtualization", but done in normal C with no OS support at all.  It's
a two-processes model: we can translate PyPy to a special "pypy-c-sandbox"
executable, which is safe in the sense that it doesn't do any library or
system calls - instead, whenever it would like to perform such an operation, it
marshals the operation name and the arguments to its stdout and it waits for
the marshalled result on its stdin.  This pypy-c-sandbox process is meant to be
run by an outer "controller" program that answers these operation requests.

The pypy-c-sandbox program is obtained by adding a transformation during
translation, which turns all RPython-level external function calls into
stubs that do the marshalling/waiting/unmarshalling.  An attacker that
tries to escape the sandbox is stuck within a C program that contains no
external function calls at all except for writing to stdout and reading from
stdin.  (It's still attackable in theory, e.g. by exploiting segfault-like
situations, but as explained in the introduction we think that PyPy is
rather safe against such attacks.)

The outer controller is a plain Python program that can run in CPython
or a regular PyPy.  It can perform any virtualization it likes, by
giving the subprocess any custom view on its world.  For example, while
the subprocess thinks it's using file handles, in reality the numbers
are created by the controller process and so they need not be (and
probably should not be) real OS-level file handles at all.  In the demo
controller I've implemented there is simply a mapping from numbers to
file-like objects.  The controller answers to the "os_open" operation by
translating the requested path to some file or file-like object in some
virtual and completely custom directory hierarchy.  The file-like object
is put in the mapping with any unused number >= 3 as a key, and the
latter is returned to the subprocess.  The "os_read" operation works by
mapping the pseudo file handle given by the subprocess back to a
file-like object in the controller, and reading from the file-like
object.

Translating an RPython program with sandboxing enabled also uses a special flag
that enables all sorts of C-level assertions against index-out-of-bounds
accesses.

By the way, as you should have realized, it's really independent from
the fact that it's PyPy that we are translating.  Any RPython program
should do.  I've successfully tried it on the JS interpreter.  The
controller is only called "pypy_interact" because it emulates a file
hierarchy that makes pypy-c-sandbox happy - it contains (read-only)
virtual directories like /bin/lib/pypy1.2/lib-python and
/bin/lib/pypy1.2/lib_pypy and it
pretends that the executable is /bin/pypy-c.


Howto
-----

Grab a copy of the pypy repository_.  In the directory pypy/goal, run::

   ../../rpython/bin/rpython -O2 --sandbox targetpypystandalone.py

If you don't have a regular PyPy installed, you should, because it's
faster to translate; but you can also run the same line with ``python``
in front.

.. _repository: https://bitbucket.org/pypy/pypy


To run it, use the tools in the pypy/sandbox directory::

   ./pypy_interact.py /some/path/pypy-c-sandbox [args...]

Just like with pypy-c, if you pass no argument you get the interactive
prompt.  In theory it's impossible to do anything bad or read a random
file on the machine from this prompt. To pass a script as an argument you need
to put it in a directory along with all its dependencies, and ask
pypy_interact to export this directory (read-only) to the subprocess'
virtual /tmp directory with the ``--tmp=DIR`` option.  Example::

   mkdir myexported
   cp script.py myexported/
   ./pypy_interact.py --tmp=myexported /some/path/pypy-c-sandbox /tmp/script.py

This is safe to do even if script.py comes from some random
untrusted source, e.g. if it is done by an HTTP server.

To limit the used heapsize, use the ``--heapsize=N`` option to
pypy_interact.py. You can also give a limit to the CPU time (real time) by
using the ``--timeout=N`` option.

Not all operations are supported; e.g. if you type os.readlink('...'),
the controller crashes with an exception and the subprocess is killed.
Other operations make the subprocess die directly with a "Fatal RPython
error".  None of this is a security hole.  More importantly, *most other
built-in modules are not enabled.  Please read all the warnings in this
page before complaining about this.  Contributions welcome.*