aboutsummaryrefslogtreecommitdiff
blob: c8859f945b85a94420931c984a50df374006e4d9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE guide SYSTEM "/dtd/guide.dtd">
<!-- $Header$ -->

<guide lang="en">
<title>Integrity - Introduction and Concepts</title>

<author title="Author">
  <mail link="swift"/>
</author>

<abstract>
Integrity validation is a wide field in which many technologies play a role.
This guide aims to offer a high-level view on what integrity validation is all
about and how the various technologies work together to achieve a (hopefully)
more secure environment to work in.
</abstract>

<!-- The content of this document is licensed under the CC-BY-SA license -->
<!-- See http://creativecommons.org/licenses/by-sa/3.0 -->
<license version="3.0" />

<version>1</version>
<date>2012-07-30</date>

<chapter>
<title>It is about trust</title>
<section>
<title>Introduction</title>
<body>

<p>
Integrity is about trusting components within your environment, and in our case
the workstations, servers and machines you work on. You definitely want to be
certain that the workstation you type your credentials on to log on to the
infrastructure is not compromised in any way. This "trust" in your environment
is a combination of various factors: physical security, system security patching
process, secure configuration, access controls and more.
</p>

<p>
Integrity plays a role in this security field: it tries to ensure that the
systems have not been tampered with by malicious people or organizations. And
this tamperproof-ness extends to a wide range of components that need to be
validated. You probably want to be certain that the binaries that are ran (and
libraries that are loaded) are those you built yourself (in case of Gentoo) or
were provided to you by someone (or something) you trust. And that the Linux
kernel you booted (and the modules that are loaded) are those you made, and not
someone else.
</p>

<p>
Most people trust themselves and look at integrity as if it needs to prove that
things are still as you've built them. But to support this claim, the systems you
use to ensure integrity need to be trusted too: you want to make sure that
whatever system is in place to offer you the final yes/no on the integrity only
uses trusted information (did it really validate the binary) and services (is it
not running on a compromised system). To support these claims, many ideas,
technologies, processes and algorithms have passed the review.
</p>

<p>
In this document, we will talk about a few of those, and how they play in the
Gentoo Hardened Integrity subprojects' vision and roadmap.
</p>

</body>
</section>
</chapter>

<chapter>
<title>Hash results</title>
<section>
<title>Algorithmically validating a file's content</title>
<body>

<p>
Hashes are a primary method for validating if a file (or other resource) has
not been changed since it was first inspected. A hash is the result of a
mathematical calculation on the content of a file (most often a number or
ordered set of numbers), and exhibits the following properties:
</p>

<ul>
  <li>
    The resulting number is represented in a <e>small (often fixed-size) length</e>.
    This is necessary to allow fast verification if two hash values are the same
    or not, but also to allow storing the value in a secure location (which is,
    more than often, much more restricted in space).
  </li>
  <li>
    The hash function always <e>returns the same hash</e> (output) when the file it
    inspects has not been changed (input). Otherwise it'll be impossible to
    ensure that the file content hasn't changed.
  </li>
  <li>
    The hash function is fast to run (the calculation of a hash result does not
    take up too much time or even resources). Without this property, it would
    take too long to generate and even validate hash results, leading to users
    being malcontent (and more likely to disable the validation alltogether).
  </li>
  <li>
    The hash result <e>cannot be used to reconstruct</e> the file. Although this is
    often seen as a result of the first property (small length), it is important
    because hash results are often also seen as a "public validation" of data
    that is otherwise private in nature. In other words, many processes relie on
    the inability of users (or hackers) to reverse-engineer information based on
    its hash result. A good example are passwords and password databases, which
    <e>should</e> store hashes of the passwords, not the passwords themselves.
  </li>
  <li>
    Given a hash result, it is near impossible to find another file with the
    same hash result (or to create such a file yourself). Since the hash result
    is limited in space, there are many inputs that will map onto the same
    hash result. The power of a good hash function is that it is not feasible to
    find them (or calculate them) except by brute force. When such a match is
    found, it is called a <e>collision</e>.
  </li>
</ul>

<p>
Compared with checksums, hashes try to be more cryptographically secure (and as
such more effort is made in the last property to make sure collisions are very
hard to obtain). Some even try to generate hash results in a way that the
duration to calculate hashes cannot be used to obtain information from the data
(such as if it contains more 0s than 1s, etc.)
</p>

</body>
</section>
<section>
<title>Hashes in integrity validation</title>
<body>

<p>
Integrity validation services are often based on hash generation and validation.
Tools such as <uri link="http://www.tripwire.org/">tripwire</uri> or <uri
link="http://aide.sourceforge.net/">AIDE</uri> generate hashes of files and
directories on your systems and then ask you to store them safely. When you want
the integrity of your systems checked, you provide this information to the
program (most likely in a read-only manner since you don't want this list to
be modified while validating) which then recalculates the hashes of the files
and compares them with the given list. Any changes in files are detected and can
be reported to you (or the administrator).
</p>

<p>
A popular hash functions is SHA-1 (which you can generate and validate using the
<c>sha1sum</c> command) which gained momentum after MD5 (using <c>md5sum</c>)
was found to be less secure (nowadays collisions in MD5 are easy to generate).
SHA-2 also exists (but is less popular than SHA-1) and can be played with using
the commands <c>sha224sum</c>, <c>sha256sum</c>, <c>sha384sum</c> and
<c>sha512sum</c>.
</p>

<pre caption="Generating the SHA-1 sum of a file">
~$ <i>sha1sum ~/Downloads/pastie-4301043.rb</i>
6b9b4e0946044ec752992c2afffa7be103c2e748  /home/swift/Downloads/pastie-4301043.rb
</pre>

</body>
</section>
<section>
<title>Hashes are a means, not a solution</title>
<body>

<p>
Hashes, in the field of integrity validation, are a means to compare data and
integrity in a relatively fast way. However, by itself hashes cannot be used to
provide integrity assurance towards the administrator. Take the use of
<c>sha1sum</c> by itself for instance.
</p>

<p>
You are not guaranteed that the <c>sha1sum</c> application behaves correctly
(and as such has or hasn't been tampered with). You can't use <c>sha1sum</c>
against itself since malicious modifications of the command can easily just
return (print out) the expected SHA-1 sum rather than the real one. A way to
thwart this is to provide the binary together with the hash values on read-only
media.
</p>

<p>
But then you're still not certain that it is that application that is executed:
a modified system might have you think it is executing that application, but
instead is using a different application. To provide this level of trust, you
need to get insurance from a higher-positioned, trusted service that the right
application is being ran. Running with a trusted kernel helps here (but might
not provide 100% closure on it) but you most likely need assistance from the
hardware (we will talk about the Trusted Platform Module later).
</p>

<p>
Likewise, you are not guaranteed that it is still your file with hash results
that is being used to verify the integrity of a file. Another file (with
modified content) may be bind-mounted on top of it. To support integrity
validation with a trusted information source, some solutions use HMAC digests
instead of plain hashes.
</p>

<p>
Finally, checksums should not only be taken on file level, but also its
attributes (which are often used to provide access controls or even toggle
particular security measures on/off on a file, such as is the case with PaX
markings), directories (holding information about directory updates such
as file adds or removals) and privileges. These are things that a program like
<c>sha1sum</c> doesn't offer (but tools like AIDE do).
</p>

</body>
</section>
</chapter>

<chapter>
<title>Hash-based Message Authentication Codes</title>
<section>
<title>Trusting the hash result</title>
<body>

<p>
In order to trust a hash result, some solutions use HMAC digests instead. An
HMAC digest combines a regular hash function (and its properties) with a
a secret cryptographic key. As such, the function generates the hash of the
content of a file together with the secret cryptographic key. This not only
provides integrity validation of the file, but also a signature telling the
verification tool that the hash was made by a trusted application (one that
knows the cryptographic key) in the past and has not been tampered with.
</p>

<p>
By using HMAC digests, malicious users will find it more difficult to modify
code and then present a "fake" hash results file since the user cannot reproduce
the secret cryptographic key that needs to be added to generate this new hash
result. When you see terms like <e>HMAC-SHA1</e> it means that a SHA-1 hash
result is used together with a cryptographic key.
</p>

</body>
</section>
<section>
<title>Managing the keys</title>
<body>

<p>
Using keys to "protect" the hash results introduces another level of complexity:
how do you properly, securely store the keys and access them only when needed?
You cannot just embed the key in the hash list (since a tampered system might
read it out when you are verifying the system, generate its own results file and
have you check against that instead). Likewise you can't just embed the key in
the application itself, because a tampered system might just read out the
application binary to find the key (and once compromised, you might need to
rebuild the application completely with a new key).
</p>

<p>
You might be tempted to just provide the key as a command-line argument, but
then again you are not certain that a malicious user is idling on your system,
waiting to capture this valuable information from the output of <c>ps</c>, etc.
</p>

<p>
Again rises the need to trust a higher-level component. When you trust the
kernel, you might be able to use the kernel key ring for this.
</p>

</body>
</section>
</chapter>

<chapter>
<title>Using private/public key cryptography</title>
<section>
<title>Validating integrity using public keys</title>
<body>

<p>
One way to work around the vulnerability of having the malicious user getting
hold of the secret key is to not rely on the key for the authentication of the
hash result in the first place when verifying the integrity of the system. This
can be accomplised if you, instead of using just an HMAC, you also encrypt HMAC
digest with a private key.
</p>

<p>
During validation of the hashes, you decrypt the HMAC with the public key (not
the private key) and use this to generate the HMAC digests again to validate.
</p>

<p>
In this approach, an attacker cannot forge a fake HMAC since forgery requires
access to the private key, and the private key is never used on the system to
validate signatures. And as long as no collisions occur, he also cannot reuse
the encrypted HMAC values (which you could consider to be a replay attack).
</p>

</body>
</section>
<section>
<title>Ensuring the key integrity</title>
<body>

<p>
Of course, this still requires that the public key is not modifyable by a
tampered system: a fake list of hash results can be made using a different
private key, and the moment the tool wants to decrypt the encrypted values, the
tampered system replaces the public key with its own public key, and the system
is again vulnerable.
</p>

</body>
</section>
</chapter>

<chapter>
<title>Trust chain</title>
<section>
<title>Handing over trust</title>
<body>

<p>
As you've noticed from the methods and services above, you always need to have
something you trust and that you can build on. If you trust nothing, you can't
validate anything since nothing can be trusted to return a valid response. And
to trust something means you also want to have confidence that that system
itself uses trusted resources.
</p>

<p>
For many users, the hardware level is something they trust. After all, as long
as no burglar has come in the house and tampered with the hardware itself, it is
reasonable to expect that the hardware is still the same. In effect, the users
trust that the physical protection of their house is sufficient for them.
</p>

<p>
For companies, the physical protection of the working environment is not
sufficient for ultimate trust. They want to make sure that the hardware is not
tampered with (or different hardware is suddenly used), specifically when that
company uses laptops instead of (less portable) workstations. 
</p>

<p>
The more you don't trust, the more things you need to take care of in order to
be confident that the system is not tampered with. In the Gentoo Hardened
Integrity subproject we will use the following "order" of resources:
</p>

<ul>
  <li>
    <e>System root-owned files and root-running processes</e>. In most cases
    and most households, properly configured and protected systems will trust
    root-owned files and processes. Any request for integrity validation of
    the system is usually applied against user-provided files (no-one tampered
    with the user account or specific user files) and not against the system
    itself.
  </li>
  <li>
    <e>Operating system kernel</e> (in our case the Linux kernel). Although some
    precautions need to be taken, a properly configured and protected kernel can
    provide a higher trust level. Integrity validation on kernel level can offer
    a higher trust in the systems' integrity, although you must be aware that
    most kernels still reside on the system itself.
  </li>
  <li>
    <e>Live environments</e>. A bootable (preferably) read-only medium can be
    used to boot up a validation environment that scans and verifies the
    integrity of the system-under-investigation. In this case, even tampered
    kernel boot images can be detected, and by taking proper precautions when
    running the validation (such as ensuring no network access is enabled from
    the boot up until the final compliance check has occurred) you can make
    yourself confident of the state of the entire system.
  </li>
  <li>
    <e>Hypervisor level</e>. Hypervisors are by many organizations seen as
    trusted resources (the isolation of a virtual environment is hard to break
    out of). Integrity validation on the hypervisor level can therefor provide
    confidence, especially when "chaining trusts": the hypervisor first
    validates the kernel to boot, and then boots this (now trusted) kernel which
    loads up the rest of the system.
  </li>
  <li>
    <e>Hardware level</e>. Whereas hypervisors are still "just software", you
    can lift up trust up to the hardware level and use the hardware-offered
    integrity features to provide you with confidence that the system you are
    about to boot has not been tampered with.
  </li>
</ul>

<p>
In the Gentoo Hardened Integrity subproject, we aim to eventually support all
these levels (and perhaps more) to provide you as a user the tools and methods
you need to validate the integrity of your system, up to the point that you
trust. The less you trust, the more complex a trust chain might become to
validate (and manage), but we will not limit our research and support to a
single technology (or chain of technologies).
</p>

<p>
Chaining trust is an important aspect to keep things from becoming too complex
and unmanageable. It also allows users to just "drop in" at the level of trust
they feel is sufficient, rather than requiring technologies for higher levels.
</p>

<p>
For instance:
</p>

<ul>
  <li>
    A hardware component that you trust (like a <e>Trusted Platform Module</e>
    or a specific BIOS-supported functionality) verifies the integrity of the
    boot regions on your disk. When ok, it passes control over to the
    bootloader.
  </li>
  <li>
    The bootloader now validates the integrity of its configuration and of the
    files (kernel and initramfs) it is told to boot up. If it checks out, it
    boots the kernel and hands over control to this kernel.
  </li>
  <li>
    The kernel, together with the initial ram file system, verifies the
    integrity of the system components (and for instance SELinux policy) before
    the initial ram system changes to the real system and boots up the
    (verified) init system.
  </li>
  <li>
    The (root-running) init system validates the integrity of the services it
    wants to start before handing over control of the system to the user.
  </li>
</ul>

<p>
An even longer chain can be seen with hypervisors:
</p>

<ul>
  <li>
    Hardware validates boot loader
  </li>
  <li>
    Boot loader validates hypervisor kernel and system
  </li>
  <li>
    Hypervisor validates kernel(s) of the images (or the entire images)
  </li>
  <li>
    Hypervisor-managed virtual environment starts the image
  </li>
  <li>
    ...
  </li>
</ul>

</body>
</section>
<section>
<title>Integrity on serviced platforms</title>
<body>

<p>
Sometimes you cannot trust higher positioned components, but still want to be
assured that your service is not tampered with. An example would be when you are
hosting a system in a remote, non-accessible data center or when you manage an
image hosted by a virtualized hosting provider (I don't want to say "cloud"
here, but it fits).
</p>

<p>
In these cases, you want a level of assurance that your own image has not been
tampered with while being offline (you can imagine manipulating the guest image,
injecting trojans or other backdoors, and then booting the image) or even while
running the system. Instead of trusting the higher components, you try to deal
with a level of distrust that you want to manage.
</p>

<p>
Providing you with some confidence at this level too is our goal within the
Gentoo Hardened Integrity subproject.
</p>

</body>
</section>
<section>
<title>From measurement to protection</title>
<body>

<p>
When dealing with integrity (and trust chains), the idea behind the top-down
trust chain is that higher level components first measure the integrity of the
next component, validate (and take appropriate action) and then hand over
control to this component. This is what we call <e>protection</e> or
<e>integrity enforcement</e> of resources.
</p>

<p>
If the system cannot validate the integrity, or the system is too volatile to
enforce this integrity from a higher level, it is necessary to provide a trusted
method for other services to validate the integrity. In this case, the system
<e>attests</e> the state of the underlying component(s) towards a third party
service, which <e>appraises</e> this state against a known "good" value.
</p>

<p>
In the case of our HMAC-based checks, there is no enforcement of integrity of
the files, but the tool itself attests the state of the resources by generating
new HMAC digests and validating (appraising) it against the list of HMAC digests
it took before.
</p>

</body>
</section>
</chapter>

<chapter>
<title>An implementation: the Trusted Computing Group functionality</title>
<section>
<title>Trusted Platform Module</title>
<body>

</body>
</section>
</chapter>

</guide>