aboutsummaryrefslogtreecommitdiff
blob: 2caec1620d9f2a282047a89c3bb7d479f5ea89d6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
*********
Internals
*********

How it works?
=============

Scheme
------
.. image:: _static/autodep_arch.png

Format of network messages
--------------------------


1. Format of messages to the File Access Registrar::

   <time of event: sec since 1970>
   <event type: open, read, write>
   <name of file>
   <building stage: stagename or unknown>
   <result:OK,ERR/errno,ASKING,DENIED>
2. Format of answer for ASKING packet from File Access Registrar::

   <ALLOW | DENY>

*Notes:*

* All sockets are SOCK_SEQPACKET
* All fields are delimited with character \0


How does the Hooklib approach work?
===================================

The main idea behind the Hooklib approach is to load a dynamic library-hook 
**before** any other library(including the C runtime). 
So, the calls to functions such as open, read and write, are intercepted 
using this library, instead of executing the ones in *libc*.

Hooklib module modifies Linux's dynamic linker behavior, changing LD_PRELOAD 
environment variable (see 
`man 8 ld-linux <http://linux.die.net/man/8/ld-linux>`_ for details).
This module also protects LD_PRELOAD variable from further changes by executing 
program.

When Hooklib is loaded, it connects to the File Access Registrar via Unix domain 
sockets. If a program forks or creates a new thread, another copy of the library 
loads to register events from this new process/thread. 

When a program calls open(...), read(...), write(...), Hooklib sends a message  
about a call to the File Access Registrar. The Registar can then block 
or allow this event. If Registrar responds to the previous query with 
an ALLOW packet, then the original function is called. Otherwise, the function 
is not called and a "File not Found" error is returned instead.

How does the Fusefs approach work?
==================================

The main idea of the Fusefs approach is to create a loggable filesystem in userspace
and jail a program into it, using a chroot.

Before the program is launched, The File Access Registrar prepare the mounts. 
It would usually take the following steps:

1. mount -o bind / /mnt/rootfs/
2. mount /dev, /dev/pts, /dev/shm, /proc/, /sys/ binding them to /mnt/rootfs
3. mount /lib64/, /lib32/, /var/tmp/portage/ same way to increase performance at 
   cost of accuracy
4. launch FUSE over /mnt/rootfs/

Fuse module blocks all external access to /mnt/rootfs while the program runs.
The FUSE module will also ask the File Access Registrar to check whether access to 
files inside the chroot are allowed or denied. As with the Hooklib approach, if 
access to a file is denied, a "File not Found" error is returned.

*Notes:*

* Checking for permission to access a file with the File Access Registrar, takes a 
lot of time under this approach.

Futher analysis of file access events
=====================================

After file access analyser recieves list of events it maps it on a list of 
packages. 

Then analyser builds a list of dependencies for packages installed and compares 
with the list it got from registrar. Analyser believes that packages from system
profile are implicit dependencies of any package in system.

If dependency from registrar is unexpected simple heuristics used to cut 
unuseful packages.

Rules of heuristics
-------------------

1. *Package is not useful if all files are .desktop, .xml or .m4*.
   Aclocal util tries to read all .m4 files in /usr/share/aclocal directory.
   Files ending on .desktop and .xml are often read in the postrm phase.