aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorFilipe Brandenburger <filbranden@google.com>2018-09-07 01:02:42 -0700
committerFilipe Brandenburger <filbranden@google.com>2018-09-08 13:39:03 -0700
commit9e825ebf4f5029ae6cb072ac568d7838e0762a9e (patch)
tree4950d3f39fa29f8513a797d313ca6b48fd04c131 /docs
parentMerge pull request #9832 from yuwata/fix-9831 (diff)
downloadsystemd-9e825ebf4f5029ae6cb072ac568d7838e0762a9e.tar.gz
systemd-9e825ebf4f5029ae6cb072ac568d7838e0762a9e.tar.bz2
systemd-9e825ebf4f5029ae6cb072ac568d7838e0762a9e.zip
docs: move doc/ to docs/
The docs/ directory is special in GitHub, since it can be used to serve GitHub Pages from, so there's a benefit to switching to it in order to expose it directly as a website. Updated references to it from the documentations themselves, from the CONTRIBUTING.md file and from Meson build files.
Diffstat (limited to 'docs')
-rw-r--r--docs/BOOT_LOADER_SPECIFICATION.md140
-rw-r--r--docs/CGROUP_DELEGATION.md456
-rw-r--r--docs/CODE_QUALITY.md64
-rw-r--r--docs/CODING_STYLE460
-rw-r--r--docs/DISTRO_PORTING71
-rw-r--r--docs/ENVIRONMENT.md124
-rw-r--r--docs/HACKING122
-rw-r--r--docs/PORTABLE_SERVICES.md251
-rw-r--r--docs/TRANSIENT-SETTINGS.md457
-rw-r--r--docs/TRANSLATORS27
-rw-r--r--docs/UIDS-GIDS.md278
-rw-r--r--docs/sysvinit/README.in27
-rw-r--r--docs/sysvinit/meson.build11
-rw-r--r--docs/var-log/README.in26
-rw-r--r--docs/var-log/meson.build11
15 files changed, 2525 insertions, 0 deletions
diff --git a/docs/BOOT_LOADER_SPECIFICATION.md b/docs/BOOT_LOADER_SPECIFICATION.md
new file mode 100644
index 000000000..bea380de8
--- /dev/null
+++ b/docs/BOOT_LOADER_SPECIFICATION.md
@@ -0,0 +1,140 @@
+# The Boot Loader Specification
+
+_TL;DR: Currently there's little cooperation between multiple distributions in dual-boot (or triple, ... multi-boot) setups, and we'd like to improve this situation by getting everybody to commit to a single boot configuration format that is based on drop-in files, and thus is robust, simple, works without rewriting configuration files and is free of namespace clashes._
+
+The Boot Loader Specification defines a scheme how different operating systems can cooperatively manage a boot loader configuration directory, that accepts drop-in files for boot menu items that are defined in a format that is shared between various boot loader implementations, operating systems, and userspace programs. The target audience for this specification is:
+
+* Boot loader developers, to write a boot loader that directly reads its configuration at runtime from these drop-in snippets
+* Distribution and Core OS developers, in order to create these snippets at OS/kernel package installation time
+* UI developers, for implementing a user interface that discovers the available boot options
+* OS Installer developers, for setting up the initial drop-in directory
+
+## Why is there a need for this specification?
+
+Of course, without this specification things already work mostly fine. But here's why we think this specification is needed:
+
+* To make the boot more robust, as no explicit rewriting of configuration files is required any more
+* To improve dual-boot scenarios. Currently, multiple Linux installations tend to fight over which boot loader becomes the primary one in possession of the MBR, and only that one installation can then update the boot loader configuration of it freely. Other Linux installs have to be manually configured to never touch the MBR and instead install a chain-loaded boot loader in their own partition headers. In this new scheme as all installations share a loader directory no manual configuration has to take place, and all participants implicitly cooperate due to removal of name collisions and can install/remove their own boot menu entries at free will, without interfering with the entries of other installed operating systems.
+* Drop-in directories are otherwise now pretty ubiquitous on Linux as an easy way to extend configuration without having to edit, regenerate or manipulate configuration files. For the sake of uniformity, we should do the same for extending the boot menu.
+* Userspace code can sanely parse boot loader configuration which is essential with modern BIOSes which do not necessarily initialize USB keyboards anymore during boot, which makes boot menus hard to reach for the user. If userspace code can parse the boot loader configuration, too, this allows for UIs that can select a boot menu item to boot into, before rebooting the machine, thus not requiring interactivity during early boot.
+* To unify and thus simplify configuration of the various boot loaders around, which makes configuration of the boot loading process easier for users, administrators and developers alike.
+* For boot loaders with configuration _scripts_ such as grub2, adopting this spec allows for mostly static scripts that are generated only once at first installation, but then do not need to be updated anymore as that is done via drop-in files exclusively.
+
+## Why not simply rely on the EFI boot menu logic?
+
+The EFI specification provides a boot options logic that can offer similar functionality. Here's why we think that it is not enough for our uses:
+
+* The various EFI implementations implement the boot order/boot item logic to different levels. Some firmware implementations do not offer a boot menu at all and instead unconditionally follow the EFI boot order, booting the first item that is working.
+* If the firmware setup is used to reset all data usually all EFI boot entries are lost, making the system entirely unbootable, as the firmware setups generally do not offer a UI to define additional boot items. By placing the menu item information on disk, it is always available, regardless if the BIOS setup data is lost.
+* Harddisk images should be moveable between machines and be bootable without requiring explicit EFI variables to be set. This also requires that the list of boot options is defined on disk, and not in EFI variables alone.
+* EFI is not universal yet (especially on non-x86 platforms), this specification is useful both for EFI and non-EFI boot loaders.
+* Many EFI systems disable USB support during early boot to optimize boot times, thus making keyboard input unavailable in the EFI menu. It is thus useful if the OS UI has a standardized way to discover available boot options which can be booted to.
+
+## Technical Details
+
+Everything described below is located on a placeholder file system `$BOOT`. The installer program should pick `$BOOT` according to the following rules:
+
+* On disks with MBR disk labels
+ * If the OS is installed on a disk with MBR disk label, and a partition with the MBR type id of 0xEA already exists it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with MBR disk label, a new partition with MBR type id of 0xEA shall be created, of a suitable size (let's say 500MB), and it should be used as `$BOOT`.
+* On disks with GPT disk labels
+ * If the OS is installed on a disk with GPT disk label, and a partition with the GPT type GUID of bc13c2ff-59e6-4262-a352-b275fd6f7172 already exists, it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with GPT disk label, and an ESP partition (i.e. with the GPT type UID of c12a7328-f81f-11d2-ba4b-00a0c93ec93b) already exists and is large enough (let's say 250MB) and otherwise qualifies, it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with GPT disk label, and if the ESP partition already exists but is too small, a new suitably sized (let's say 500MB) partition with GPT type GUID of bc13c2ff-59e6-4262-a352-b275fd6f7172 shall be created and it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with GPT disk label, and no ESP partition exists yet, a new suitably sized (let's say 500MB) ESP should be created and should be used as `$BOOT`.
+
+This placeholder file system shall be determined during _installation time_, and an fstab entry may be created. It should be mounted to either /boot or /efi. Additional locations like /boot/efi, with /boot being a separate file system, might be supported by implementations. This is not recommended because the mounting of `$BOOT` is then dependent on and requires the mounting of the intermediate file system.
+
+**Note:** _`$BOOT` should be considered **shared** among all OS installations of a system. Instead of maintaining one `$BOOT` per installed OS (as `/boot` was traditionally handled), all installed OS share the same place to drop in their boot-time configuration._
+
+For systems where the firmware is able to read file systems directly, `$BOOT` must be a file system readable by the firmware. For other systems, `$BOOT` must be a VFAT (16 or 32) file system. Applications accessing `$BOOT` should hence not assume that fancier file system features such as symlinks, hardlinks, access control or case sensitivity are supported.
+
+### Boot loader specification entries
+
+We define two directories below `$BOOT`:
+
+* `$BOOT/loader/` is the directory containing all files defined by this specification
+* `$BOOT/loader/entries/` is the directory containing the drop-in snippets. This directory contains one `.conf` file for each boot menu item.
+
+**Note:** _In all cases the /loader directory should be located directly in the root of the file system. Specifically, if `$BOOT` is the ESP, then /loader directory should be located directly in the root directory of the ESP, and not in the EFI/ subdirectory._
+
+Inside the `$BOOT/loader/entries/` directory each OS vendor may drop one or more configuration snippets with the suffix ".conf", one for each boot menu item. The file name of the file is used for identification of the boot item but shall never be presented to the user in the UI. The file name may be chosen freely but should be unique enough to avoid clashes between OS installations. More specifically it is suggested to include the machine ID (`/etc/machine-id` or the D-Bus machine ID for OSes that lack `/etc/machine-id`), the kernel version (as returned by `uname -r`) and an OS identifier (The ID field of `/etc/os-release`). Example: `$BOOT/loader/entries/6a9857a393724b7a981ebb5b8495b9ea-3.8.0-2.fc19.x86_64.conf`.
+
+These configuration snippets shall be Unix-style text files (i.e. line separation with a single newline character), in the UTF-8 encoding. The configuration snippets are loosely inspired on Grub1's configuration syntax. Lines beginning with '#' shall be ignored and used for commenting. The first word of a line is used as key and shall be separated by a space from its value. The following keys are known:
+
+* `title` shall contain a human readable title string for this menu item. This will be displayed in the boot menu for the item. It is a good idea to initialize this from the `PRETTY_NAME` of `/etc/os-release`. This name should be descriptive and does not have to be unique. If a boot loader discovers two entries with the same title it is a good idea to show more than just the raw title in the UI, for example by appending the `version` field. This field is optional. Example: "Fedora 18 (Spherical Cow)".
+* `version` shall contain a human readable version string for this menu item. This is usually the kernel version and is intended for use by OSes to install multiple kernel versions at the same time with the same `title` field. This field shall be in a syntax that is useful for Debian-style version sorts, so that the boot loader UI can determine the newest version easily and show it first or preselect it automatically. This field is optional. Example: `3.7.2-201.fc18.x86_64`.
+* `machine-id` shall contain the machine ID of the OS `/etc/machine-id`. This is useful for boot loaders and applications to filter out boot entries, for example to show only a single newest kernel per OS, or to group items by OS, or to maybe filter out the currently booted OS in UIs that want to show only other installed operating systems. This ID shall be formatted as 32 lower case hexadecimal characters (i.e. without any UUID formatting). This key is optional. Example: `4098b3f648d74c13b1f04ccfba7798e8`.
+* `linux` refers to the kernel to spawn and shall be a path relative to the `$BOOT` directory. It is recommended that every distribution creates a machine id and version specific subdirectory below `$BOOT` and places its kernels and initial RAM disk images there. Example: `/6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/linux`.
+* `initrd` refers to the initrd to use when executing the kernel. This also shall be a path relative to the `$BOOT` directory. This key is optional. This key may appear more than once in which case all specified images are used, in the order they are listed. Example: `6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/initrd`.
+* `efi` to spawn arbitrary EFI programs. This also takes a path relative to `$BOOT`. This key is only available on EFI systems.
+* `options` shall contain kernel parameters to pass to the Linux kernel to spawn. This key is optional and may appear more than once in which case all specified parameters are used in the order they are listed.
+* `devicetree` refers to the binary device tree to use when executing the
+kernel. This also shall be a path relative to the `$BOOT` directory. This
+key is optional. Example: `6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.armv7hl/tegra20-paz00.dtb`.
+* `architecture` refers to the UEFI architecture this entry is defined for. If specified and this does not match the local UEFI system architecture the entry is hidden.
+
+Each configuration drop-in snippet must include at least a `linux` or an `efi` key and is otherwise not valid. Here's an example for a complete drop-in file:
+
+ # /boot/loader/entries/6a9857a393724b7a981ebb5b8495b9ea-3.8.0-2.fc19.x86_64.conf
+ title Fedora 19 (Rawhide)
+ version 3.8.0-2.fc19.x86_64
+ machine-id 6a9857a393724b7a981ebb5b8495b9ea
+ options root=UUID=6d3376e4-fc93-4509-95ec-a21d68011da2
+ linux /6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/linux
+ initrd /6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/initrd
+
+On EFI systems all kernel images shall be EFI images. In order to be compatible with EFI systems it is highly recommended only to install EFI kernel images, even on non-EFI systems, if that's applicable and supported on the specific architecture.
+
+Note that these configuration snippets may only reference kernels (and EFI programs) that reside on the same file system as the configuration snippets, i.e. everything referenced must be contained in the same file system. This is by design, as referencing other partitions or devices would require a non-trivial language for denoting device paths. If kernels/initrds are to be read from other partitions/disks the boot loader can do this in its own native configuration, using its own specific device path language, and this is out of focus for this specification. More specifically, on non-EFI systems configuration snippets following this specification cannot be used to spawn other operating systems (such as Windows).
+
+### Unified kernel images
+
+A unified kernel image is a single UEFI executable combining an UEFI stub loader, a kernel image, an initramfs image, and the kernel command line. See the description of the `--uefi` option in [dracut(8)](http://man7.org/linux/man-pages/man8/dracut.8.html). Such images will be searched for under `$BOOT/EFI/Linux` and must have the extension `.efi`.
+
+A valid unified kernel image must contain two PE sections:
+
+* `.cmdline` section with the kernel command line
+* `.osrel` section with an embedded copy of the [os-release](https://www.freedesktop.org/software/systemd/man/os-release.html) file describing the image
+
+The `PRETTY_NAME=` and `VERSION_ID=` fields in the embedded os-release file are used the same as `title` and `version` in the "boot loader specification" entries. The `.cmdline` section is used instead of the `options` field. `linux` and `initrd` fields are not necessary, and there is no counterpart for the `machine-id` field.
+
+Any such images shall be added to the list of valid boot entries.
+
+### Additional notes
+
+Note that these configurations snippets do not need to be the only configuration source for a boot loader. It may extend this list of entries with additional items from other configuration files (for example its own native configuration files) or automatically detected other entries without explicit configuration.
+
+To make this explicitly clear: this specification is designed with "free" operating systems in mind, starting Windows or MacOS is out of focus with these configuration snippets, use boot-loader specific solutions for that. In the text above, if we say "OS" we hence imply "free", i.e. primarily Linux (though this could be easily be extended to the BSDs and whatnot).
+
+Note that all paths used in the configuration snippets use a Unix-style "/" as path separator. This needs to be converted to an EFI-style "\" separator in EFI boot loaders.
+
+
+## Logic
+
+A _boot loader_ needs a file system driver to discover and read `$BOOT`, then simply reads all files `$BOOT/loader/entries/*.conf`, and populates its boot menu with this. It then extends this with any unified kernel images found in `$BOOT/EFI/Linux`. It may also add additional entries, for example "Reboot into firmware". Optionally it may sort the menu based on the `machine-id` and `version` fields, and possibly others. It uses the file name to identify specific items, for example in case it supports storing away default entry information somewhere. A boot loader should generally not modify these files.
+
+For "boot loader specification" entries, the _kernel package installer_ installs the kernel and initrd images to `$BOOT` (it is recommended to place these files in a vendor and OS and installation specific directory) and then generates a configuration snippet for it, placing this in `$BOOT/loader/entries/xyz.conf`, with xyz as concatenation of machine id and version information (see above). The files created by a kernel package are private property of the kernel package and should be removed along with it.
+
+For "unified kernel images", the _kernel install_ creates the combined image and drops it into `$BOOT/EFI/Linux`. This file is also private property of the kernel package and should be removed along with it.
+
+A _UI application_ intended to show available boot options shall operate similar to a boot loader, but might apply additional filters, for example by filtering out the booted OS via the machine ID, or by suppressing all but the newest kernel versions.
+
+An _OS installer_ picks the right place for `$BOOT` as defined above (possibly creating a partition and file system for it) and pre-creates the `/loader/entries/` directory in it. It then installs an appropriate boot loader that can read these snippets. Finally, it installs one or more kernel packages.
+
+
+## Out of Focus
+
+There are a couple of items that are out of focus for this specification:
+
+* If userspace can figure out the available boot options, then this is only useful so much: we'd still need to come up with a way how userspace could communicate to the boot loader the default boot loader entry temporarily or persistently. Defining a common scheme for this is certainly a good idea, but out of focus for this specification.
+* This specification is just about "Free" Operating systems. Hooking in other operating systems (like Windows and macOS) into the boot menu is a different story and should probably happen outside of this specification. For example, boot loaders might choose to detect other available OSes dynamically at runtime without explicit configuration (like <strike>Gummiboot</strike> systemd-boot does it), or via native configuration (for example via explicit Grub2 configuration generated once at installation).
+* This specification leaves undefined what to do about systems which are upgraded from an OS that does not implement this specification. As the previous boot loader logic was largely handled by in distribution-specific ways we probably should leave the upgrade path (and whether there actually is one) to the distributions. The simplest solution might be to simply continue with the old scheme for old installations and use this new scheme only for new installations.
+
+
+## Links
+
+[systemd-boot(7)](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)<br>
+[bootctl(1)](https://www.freedesktop.org/software/systemd/man/bootctl.html)
+
+[Obsolete patch adding Boot Loader Specification support to GNU grub 2](http://pkgs.fedoraproject.org/cgit/grub2.git/tree/0460-blscfg-add-blscfg-module-to-parse-Boot-Loader-Specif.patch?h=f20)
diff --git a/docs/CGROUP_DELEGATION.md b/docs/CGROUP_DELEGATION.md
new file mode 100644
index 000000000..63d9d41b1
--- /dev/null
+++ b/docs/CGROUP_DELEGATION.md
@@ -0,0 +1,456 @@
+# Control Group APIs and Delegation
+
+*Intended audience: hackers working on userspace subsystems that require direct
+cgroup access, such as container managers and similar.*
+
+So you are wondering about resource management with systemd, you know Linux
+control groups (cgroups) a bit and are trying to integrate your software with
+what systemd has to offer there. Here's a bit of documentation about the
+concepts and interfaces involved with this.
+
+What's described here has been part of systemd and documented since v205
+times. However, it has been updated and improved substantially, even
+though the concepts stayed mostly the same. This is an attempt to provide more
+comprehensive up-to-date information about all this, particular in light of the
+poor implementations of the components interfacing with systemd of current
+container managers.
+
+Before you read on, please make sure you read the low-level [kernel
+documentation about
+cgroupsv2](https://www.kernel.org/doc/Documentation/cgroup-v2.txt). This
+documentation then adds in the higher-level view from systemd.
+
+This document augments the existing documentation we already have:
+
+* [The New Control Group Interfaces](https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/)
+* [Writing VM and Container Managers](https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/)
+
+These wiki documents are not as up to date as they should be, currently, but
+the basic concepts still fully apply. You should read them too, if you do something
+with cgroups and systemd, in particular as they shine more light on the various
+D-Bus APIs provided. (That said, sooner or later we should probably fold that
+wiki documentation into this very document, too.)
+
+## Two Key Design Rules
+
+Much of the philosophy behind these concepts is based on a couple of basic
+design ideas of cgroupsv2 (which we however try to adapt as far as we can to
+cgroupsv1 too). Specifically two cgroupsv2 rules are the most relevant:
+
+1. The **no-processes-in-inner-nodes** rule: this means that it's not permitted
+to have processes directly attached to a cgroup that also has child cgroups and
+vice versa. A cgroup is either an inner node or a leaf node of the tree, and if
+it's an inner node it may not contain processes directly, and if it's a leaf
+node then it may not have child cgroups. (Note that there are some minor
+exceptions to this rule, though. E.g. the root cgroup is special and allows
+both processes and children β€” which is used in particular to maintain kernel
+threads.)
+
+2. The **single-writer** rule: this means that each cgroup only has a single
+writer, i.e. a single process managing it. It's OK if different cgroups have
+different processes managing them. However, only a single process should own a
+specific cgroup, and when it does that ownership is exclusive, and nothing else
+should manipulate it at the same time. This rule ensures that various pieces of
+software don't step on each other's toes constantly.
+
+These two rules have various effects. For example, one corollary of this is: if
+your container manager creates and manages cgroups in the system's root cgroup
+you violate rule #2, as the root cgroup is managed by systemd and hence off
+limits to everybody else.
+
+Note that rule #1 is generally enforced by the kernel if cgroupsv2 is used: as
+soon as you add a process to a cgroup it is ensured the rule is not
+violated. On cgroupsv1 this rule didn't exist, and hence isn't enforced, even
+though it's a good thing to follow it then too. Rule #2 is not enforced on
+either cgroupsv1 nor cgroupsv2 (this is UNIX after all, in the general case
+root can do anything, modulo SELinux and friends), but if you ignore it you'll
+be in constant pain as various pieces of software will fight over cgroup
+ownership.
+
+Note that cgroupsv1 is currently the most deployed implementation, even though
+it's semantically broken in many ways, and in many cases doesn't actually do
+what people think it does. cgroupsv2 is where things are going, and most new
+kernel features in this area are only added to cgroupsv2, and not cgroupsv1
+anymore. For example cgroupsv2 provides proper cgroup-empty notifications, has
+support for all kinds of per-cgroup BPF magic, supports secure delegation of
+cgroup trees to less privileged processes and so on, which all are not
+available on cgroupsv1.
+
+## Three Different Tree Setups 🌳
+
+systemd supports three different modes how cgroups are set up. Specifically:
+
+1. **Unified** β€” this is the simplest mode, and exposes a pure cgroupsv2
+logic. In this mode `/sys/fs/cgroup` is the only mounted cgroup API file system
+and all available controllers are exclusively exposed through it.
+
+2. **Legacy** β€” this is the traditional cgroupsv1 mode. In this mode the
+various controllers each get their own cgroup file system mounted to
+`/sys/fs/cgroup/<controller>/`. On top of that systemd manages its own cgroup
+hierarchy for managing purposes as `/sys/fs/cgroup/systemd/`.
+
+3. **Hybrid** β€” this is a hybrid between the unified and legacy mode. It's set
+up mostly like legacy, except that there's also an additional hierarchy
+`/sys/fs/cgroup/unified/` that contains the cgroupsv2 hierarchy. In this mode
+compatibility with cgroupsv1 is retained while some cgroupsv2 features are
+available too. This mode is a stopgap. Don't bother with this too much unless
+you have too much free time.
+
+To say this clearly, legacy and hybrid modes have no future. If you develop
+software today and don't focus on the unified mode, then you are writing
+software for yesterday, not tomorrow. They are primarily supported for
+compatibility reasons and will not receive new features. Sorry.
+
+Superficially, in legacy and hybrid modes it might appear that the parallel
+cgroup hierarchies for each controller are orthogonal from each other. In
+systemd they are not: the hierarchies of all controllers are always kept in
+sync (at least mostly: sub-trees might be suppressed in certain hierarchies if
+no controller usage is required for them). The fact that systemd keeps these
+hierarchies in sync means that the legacy and hybrid hierarchies are
+conceptually very close to the unified hierarchy. In particular this allows us
+to talk of one specific cgroup and actually mean the same cgroup in all
+available controller hierarchies. E.g. if we talk about the cgroup `/foo/bar/`
+then we actually mean `/sys/fs/cgroup/cpu/foo/bar/` as well as
+`/sys/fs/cgroup/memory/foo/bar/`, `/sys/fs/cgroup/pids/foo/bar/`, and so on.
+Note that in cgroupsv2 the controller hierarchies aren't orthogonal, hence
+thinking about them as orthogonal won't help you in the long run anyway.
+
+If you wonder how to detect which of these three modes is currently used, use
+`statfs()` on `/sys/fs/cgroup/`. If it reports `CGROUP2_SUPER_MAGIC` in its
+`.f_type` field, then you are in unified mode. If it reports `TMPFS_MAGIC` then
+you are either in legacy or hybrid mode. To distuingish these two cases, run
+`statfs()` again on `/sys/fs/cgroup/unified/`. If that succeeds and reports
+`CGROUP2_SUPER_MAGIC` you are in hybrid mode, otherwise not.
+
+## systemd's Unit Types
+
+The low-level kernel cgroups feature is exposed in systemd in three different
+"unit" types. Specifically:
+
+1. πŸ’Ό The `.service` unit type. This unit type is for units encapsulating
+ processes systemd itself starts. Units of these types have cgroups that are
+ the leaves of the cgroup tree the systemd instance manages (though possibly
+ they might contain a sub-tree of their own managed by something else, made
+ possible by the concept of delegation, see below). Service units are usually
+ instantiated based on a unit file on disk that describes the command line to
+ invoke and other properties of the service. However, service units may also
+ be declared and started programmatically at runtime through a D-Bus API
+ (which is called *transient* services).
+
+2. πŸ‘“ The `.scope` unit type. This is very similar to `.service`. The main
+ difference: the processes the units of this type encapsulate are forked off
+ by some unrelated manager process, and that manager asked systemd to expose
+ them as a unit. Unlike services, scopes can only be declared and started
+ programmatically, i.e. are always transient. That's because they encapsulate
+ processes forked off by something else, i.e. existing runtime objects, and
+ hence cannot really be defined fully in 'offline' concepts such as unit
+ files.
+
+3. πŸ”ͺ The `.slice` unit type. Units of this type do not directly contain any
+ processes. Units of this type are the inner nodes of part of the cgroup tree
+ the systemd instance manages. Much like services, slices can be defined
+ either on disk with unit files or programmatically as transient units.
+
+Slices expose the trunk and branches of a tree, and scopes and services are
+attached to those branches as leaves. The idea is that scopes and services can
+be moved around though, i.e. assigned to a different slice if needed.
+
+The naming of slice units directly maps to the cgroup tree path. This is not
+the case for service and scope units however. A slice named `foo-bar-baz.slice`
+maps to a cgroup `/foo.slice/foo-bar.slice/foo-bar-baz.slice/`. A service
+`quux.service` which is attached to the slice `foo-bar-baz.slice` maps to the
+cgroup `/foo.slice/foo-bar.slice/foo-bar-baz.slice/quux.service/`.
+
+By default systemd sets up four slice units:
+
+1. `-.slice` is the root slice. i.e. the parent of everything else. On the host
+ system it maps directly to the top-level directory of cgroupsv2.
+
+2. `system.slice` is where system services are by default placed, unless
+ configured otherwise.
+
+3. `user.slice` is where user sessions are placed. Each user gets a slice of
+ its own below that.
+
+4. `machines.slice` is where VMs and containers are supposed to be
+ placed. `systemd-nspawn` makes use of this by default, and you're very welcome
+ to place your containers and VMs there too if you hack on managers for those.
+
+Users may define any amount of additional slices they like though, the four
+above are just the defaults.
+
+## Delegation
+
+Container managers and suchlike often want to control cgroups directly using
+the raw kernel APIs. That's entirely fine and supported, as long as proper
+*delegation* is followed. Delegation is a concept we inherited from cgroupsv2,
+but we expose it on cgroupsv1 too. Delegation means that some parts of the
+cgroup tree may be managed by different managers than others. As long as it is
+clear which manager manages which part of the tree each one can do within its
+sub-graph of the tree whatever it wants.
+
+Only sub-trees can be delegated (though whoever decides to request a sub-tree
+can delegate sub-sub-trees further to somebody else if they like). Delegation
+takes place at a specific cgroup: in systemd there's a `Delegate=` property you
+can set for a service or scope unit. If you do, it's the cut-off point for
+systemd's cgroup management: the unit itself is managed by systemd, i.e. all
+its attributes are managed exclusively by systemd, however your program may
+create/remove sub-cgroups inside it freely, and those then become exclusive
+property of your program, systemd won't touch them β€” all attributes of *those*
+sub-cgroups can be manipulated freely and exclusively by your program.
+
+By turning on the `Delegate=` property for a scope or service you get a few
+guarantees:
+
+1. systemd won't fiddle with your sub-tree of the cgroup tree anymore. It won't
+ change attributes of any cgroups below it, nor will it create or remove any
+ cgroups thereunder, nor migrate processes across the boundaries of that
+ sub-tree as it deems useful anymore.
+
+2. If your service makes use of the `User=` functionality, then the sub-tree
+ will be `chown()`ed to the indicated user so that it can correctly create
+ cgroups below it. Note however that systemd will do that only in the unified
+ hierarchy (in unified and hybrid mode) as well as on systemd's own private
+ hierarchy (in legacy and hybrid mode). It won't pass ownership of the legacy
+ controller hierarchies. Delegation to less privileges processes is not safe
+ in cgroupsv1 (as a limitation of the kernel), hence systemd won't facilitate
+ access to it.
+
+3. Any BPF IP filter programs systemd installs will be installed with
+ `BPF_F_ALLOW_MULTI` so that your program can install additional ones.
+
+In unit files the `Delegate=` property is superficially exposed as
+boolean. However, since v236 it optionally takes a list of controller names
+instead. If so, delegation is requested for listed controllers
+specifically. Note hat this only encodes a request. Depending on various
+parameters it might happen that your service actually will get fewer
+controllers delegated (for example, because the controller is not available on
+the current kernel or was turned off) or more. If no list is specified
+(i.e. the property simply set to `yes`) then all available controllers are
+delegated.
+
+Let's stress one thing: delegation is available on scope and service units
+only. It's expressly not available on slice units. Why? Because slice units are
+our *inner* nodes of the cgroup trees and we freely attach service and scopes
+to them. If we'd allow delegation on slice units then this would mean that
+both systemd and your own manager would create/delete cgroups below the slice
+unit and that conflicts with the single-writer rule.
+
+So, if you want to do your own raw cgroups kernel level access, then allocate a
+scope unit, or a service unit (or just use the service unit you already have
+for your service code), and turn on delegation for it.
+
+## Three Scenarios
+
+Let's say you write a container manager, and you wonder what to do regarding
+cgroups for it, as you want your manager to be able to run on systemd systems.
+
+You basically have three options:
+
+1. 😊 The *integration-is-good* option. For this, you register each container
+ you have either as a systemd service (i.e. let systemd invoke the executor
+ binary for you) or a systemd scope (i.e. your manager executes the binary
+ directly, but then tells systemd about it. In this mode the administrator
+ can use the usual systemd resource management and reporting commands
+ individually on those containers. By turning on `Delegate=` for these scopes
+ or services you make it possible to run cgroup-enabled programs in your
+ containers, for example a nested systemd instance. This option has two
+ sub-options:
+
+ a. You transiently register the service or scope by directly contacting
+ systemd via D-Bus. In this case systemd will just manage the unit for you
+ and nothing else.
+
+ b. Instead you register the service or scope through `systemd-machined`
+ (also via D-Bus). This mini-daemon is basically just a proxy for the same
+ operations as in a. The main benefit of this: this way you let the system
+ know that what you are registering is a container, and this opens up
+ certain additional integration points. For example, `journalctl -M` can
+ then be used to directly look into any container's journal logs (should
+ the container run systemd inside), or `systemctl -M` can be used to
+ directly invoke systemd operations inside the containers. Moreover tools
+ like "ps" can then show you to which container a process belongs (`ps -eo
+ pid,comm,machine`), and even gnome-system-monitor supports it.
+
+2. πŸ™ The *i-like-islands* option. If all you care about is your own cgroup tree,
+ and you want to have to do as little as possible with systemd and no
+ interest in integration with the rest of the system, then this is a valid
+ option. For this all you have to do is turn on `Delegate=` for your main
+ manager daemon. Then figure out the cgroup systemd placed your daemon in:
+ you can now freely create sub-cgroups beneath it. Don't forget the
+ *no-processes-in-inner-nodes* rule however: you have to move your main
+ daemon process out of that cgroup (and into a sub-cgroup) before you can
+ start further processes in any of your sub-cgroups.
+
+3. πŸ™ The *i-like-continents* option. In this option you'd leave your manager
+ daemon where it is, and would not turn on delegation on its unit. However,
+ as first thing you register a new scope unit with systemd, and that scope
+ unit would have `Delegate=` turned on, and then you place all your
+ containers underneath it. From systemd's PoV there'd be two units: your
+ manager service and the big scope that contains all your containers in one.
+
+BTW: if for whatever reason you say "I hate D-Bus, I'll never call any D-Bus
+API, kthxbye", then options #1 and #3 are not available, as they generally
+involve talking to systemd from your program code, via D-Bus. You still have
+option #2 in that case however, as you can simply set `Delegate=` in your
+service's unit file and you are done and have your own sub-tree. In fact, #2 is
+the one option that allows you to completely ignore systemd's existence: you
+can entirely generically follow the single rule that you just use the cgroup
+you are started in, and everything below it, whatever that might be. That said,
+maybe if you dislike D-Bus and systemd that much, the better approach might be
+to work on that, and widen your horizon a bit. You are welcome.
+
+## Controller Support
+
+systemd supports a number of controllers (but not all). Specifically, supported
+are:
+
+* on cgroupsv1: `cpu`, `cpuacct`, `blkio`, `memory`, `devices`, `pids`
+* on cgroupsv2: `cpu`, `io`, `memory`, `pids`
+
+It is our intention to natively support all cgroupsv2 controllers as they are
+added to the kernel. However, regarding cgroupsv1: at this point we will not
+add support for any other controllers anymore. This means systemd currently
+does not and will never manage the following controllers on cgroupsv1:
+`freezer`, `cpuset`, `net_cls`, `perf_event`, `net_prio`, `hugetlb`. Why not?
+Depending on the case, either their API semantics or implementations aren't
+really usable, or it's very clear they have no future on cgroupsv2, and we
+won't add new code for stuff that clearly has no future.
+
+Effectively this means that all those mentioned cgroupsv1 controllers are up
+for grabs: systemd won't manage them, and hence won't delegate them to your
+code (however, systemd will still mount their hierarchies, simply because it
+mounts all controller hierarchies it finds available in the kernel). If you
+decide to use them, then that's fine, but systemd won't help you with it (but
+also not interfere with it). To be nice to other tenants it might be wise to
+replicate the cgroup hierarchies of the other controllers in them too however,
+but of course that's between you and those other tenants, and systemd won't
+care. Replicating the cgroup hierarchies in those unsupported controllers would
+mean replicating the full cgroup paths in them, and hence the prefixing
+`.slice` components too, otherwise the hierarchies will start being orthogonal
+after all, and that's not really desirable. On more thing: systemd will clean
+up after you in the hierarchies it manages: if your daemon goes down, its
+cgroups will be removed too. You basically get the guarantee that you start
+with a pristine cgroup sub-tree for your service or scope whenever it is
+started. This is not the case however in the hierarchies systemd doesn't
+manage. This means that your programs should be ready to deal with left-over
+cgroups in them β€” from previous runs, and be extra careful with them as they
+might still carry settings that might not be valid anymore.
+
+Note a particular asymmetry here: if your systemd version doesn't support a
+specific controller on cgroupsv1 you can still make use of it for delegation,
+by directly fiddling with its hierarchy and replicating the cgroup tree there
+as necessary (as suggested above). However, on cgroupsv2 this is different:
+separately mounted hierarchies are not available, and delegation has always to
+happen through systemd itself. This means: when you update your kernel and it
+adds a new, so far unseen controller, and you want to use it for delegation,
+then you also need to update systemd to a version that groks it.
+
+## systemd as Container Payload
+
+systemd can happily run as a container payload's PID 1. Note that systemd
+unconditionally needs write access to the cgroup tree however, hence you need
+to delegate a sub-tree to it. Note that there's nothing too special you have to
+do beyond that: just invoke systemd as PID 1 inside the root of the delegated
+cgroup sub-tree, and it will figure out the rest: it will determine the cgroup
+it is running in and take possession of it. It won't interfere with any cgroup
+outside of the sub-tree it was invoked in. Use of `CLONE_NEWCGROUP` is hence
+optional (but of course wise).
+
+Note one particular asymmetry here though: systemd will try to take possession
+of the root cgroup you pass to it *in* *full*, i.e. it will not only
+create/remove child cgroups below it, it will also attempt to manage the
+attributes of it. OTOH as mentioned above, when delegating a cgroup tree to
+somebody else it only passes the rights to create/remove sub-cgroups, but will
+insist on managing the delegated cgroup tree's top-level attributes. Or in
+other words: systemd is *greedy* when accepting delegated cgroup trees and also
+*greedy* when delegating them to others: it insists on managing attributes on
+the specific cgroup in both cases. A container manager that is itself a payload
+of a host systemd which wants to run a systemd as its own container payload
+instead hence needs to insert an extra level in the hierarchy in between, so
+that the systemd on the host and the one in the container won't fight for the
+attributes. That said, you likely should do that anyway, due to the
+no-processes-in-inner-cgroups rule, see below.
+
+When systemd runs as container payload it will make use of all hierarchies it
+has write access to. For legacy mode you need to make at least
+`/sys/fs/cgroup/systemd/` available, all other hierarchies are optional. For
+hybrid mode you need to add `/sys/fs/cgroup/unified/`. Finally, for fully
+unified you (of course, I guess) need to provide only `/sys/fs/cgroup/` itself.
+
+## Some Dos
+
+1. ⚑ If you go for implementation option 1a or 1b (as in the list above), then
+ each of your containers will have its own systemd-managed unit and hence
+ cgroup with possibly further sub-cgroups below. Typically the first process
+ running in that unit will be some kind of executor program, which will in
+ turn fork off the payload processes of the container. In this case don't
+ forget that there are two levels of delegation involved: first, systemd
+ delegates a group sub-tree to your executor. And then your executor should
+ delegate a sub-tree further down to the container payload. Oh, and because
+ of the no-process-in-inner-nodes rule, your executor needs to migrate itself
+ to a sub-cgroup of the cgroup it got delegated, too. Most likely you hence
+ want a two-pronged approach: below the cgroup you got started in, you want
+ one cgroup maybe called `supervisor/` where your manager runs in and then
+ for each container a sibling cgroup of that maybe called `payload-xyz/`.
+
+2. ⚑ Don't forget that the cgroups you create have to have names that are
+ suitable as UNIX file names, and that they live in the same namespace as the
+ various kernel attribute files. Hence, when you want to allow the user
+ arbitrary naming, you might need to escape some of the names (for example,
+ you really don't want to create a cgroup named `tasks`, just because the
+ user created a container by that name, because `tasks` after all is a magic
+ attribute in cgroupsv1, and your `mkdir()` will hence fail with `EEXIST`. In
+ systemd we do escaping by prefixing names that might collide with a kernel
+ attribute name with an underscore. You might want to do the same, but this
+ is really up to you how you do it. Just do it, and be careful.
+
+## Some Don'ts
+
+1. 🚫 Never create your own cgroups below arbitrary cgroups systemd manages, i.e
+ cgroups you haven't set `Delegate=` in. Specifically: πŸ”₯ don't create your
+ own cgroups below the root cgroup πŸ”₯. That's owned by systemd, and you will
+ step on systemd's toes if you ignore that, and systemd will step on
+ yours. Get your own delegated sub-tree, you may create as many cgroups there
+ as you like. Seriously, if you create cgroups directly in the cgroup root,
+ then all you do is ask for trouble.
+
+2. 🚫 Don't attempt to set `Delegate=` in slice units, and in particular not in
+ `-.slice`. It's not supported, and will generate an error.
+
+3. 🚫 Never *write* to any of the attributes of a cgroup systemd created for
+ you. It's systemd's private property. You are welcome to manipulate the
+ attributes of cgroups you created in your own delegated sub-tree, but the
+ cgroup tree of systemd itself is out of limits for you. It's fine to *read*
+ from any attribute you like however. That's totally OK and welcome.
+
+4. 🚫 When not using `CLONE_NEWCGROUP` when delegating a sub-tree to a
+ container payload running systemd, then don't get the idea that you can bind
+ mount only a sub-tree of the host's cgroup tree into the container. Part of
+ the cgroup API is that `/proc/$PID/cgroup` reports the cgroup path of every
+ process, and hence any path below `/sys/fs/cgroup/` needs to match what
+ `/proc/$PID/cgroup` of the payload processes reports. What you can do safely
+ however, is mount the upper parts of the cgroup tree read-only (or even
+ replace the middle bits with an intermediary `tmpfs` β€” but be careful not to
+ break the `statfs()` detection logic discussed above), as long as the path
+ to the delegated sub-tree remains accessible as-is.
+
+5. ⚑ Currently, the algorithm for mapping between slice/scope/service unit
+ naming and their cgroup paths is not considered public API of systemd, and
+ may change in future versions. This means: it's best to avoid implementing a
+ local logic of translating cgroup paths to slice/scope/service names in your
+ program, or vice versa β€” it's likely going to break sooner or later. Use the
+ appropriate D-Bus API calls for that instead, so that systemd translates
+ this for you. (Specifically: each Unit object has a `ControlGroup` property
+ to get the cgroup for a unit. The method `GetUnitByControlGroup()` may be
+ used to get the unit for a cgroup.)
+
+6. ⚑ Think twice before delegating cgroupsv1 controllers to less privileged
+ containers. It's not safe, you basically allow your containers to freeze the
+ system with that and worse. Delegation is a strongpoint of cgroupsv2 though,
+ and there it's safe to treat delegation boundaries as privilege boundaries.
+
+And that's it for now. If you have further questions, refer to the systemd
+mailing list.
+
+β€” Berlin, 2018-04-20
diff --git a/docs/CODE_QUALITY.md b/docs/CODE_QUALITY.md
new file mode 100644
index 000000000..e673ad58a
--- /dev/null
+++ b/docs/CODE_QUALITY.md
@@ -0,0 +1,64 @@
+# Code Quality Tools
+
+The systemd project has a number of code quality tools set up in the source
+tree and on the github infrastructure. Here's an incomprehensive list of the
+available functionality:
+
+1. Use `ninja -C build test` to run the unit tests. Some tests are skipped if
+ no privileges are available, hence consider also running them with `sudo
+ ninja -C build test`. A couple of unit tests are considered "unsafe" (as
+ they change system state); to run those too, build with `meson
+ -Dtests=unsafe`. Finally, some unit tests are considered to be very slow,
+ build them too with `meson -Dslow-tests=true`. (Note that there are a couple
+ of manual tests in addition to these unit tests.)
+
+2. Use `./test/run-integration-tests.sh` to run the full integration test
+ suite. This will build OS images with a number of integration tests and run
+ them in nspawn and qemu. Requires root.
+
+3. Use `./coccinelle/run-coccinelle.sh` to run all
+ [Coccinelle](http://coccinelle.lip6.fr/) semantic patch scripts we ship. The
+ output will show false positives, hence take it with a pinch of salt.
+
+4. Use `./tools/find-double-newline.sh recdiff` to find double newlines. Use
+ `./tools/find-double-newline.sh recpatch` to fix them. Take this with a grain
+ of salt, in particular as we generally leave foreign header files we include in
+ our tree unmodified, if possible.
+
+5. Similar use `./tools/find-tabs.sh recdiff` to find TABs, and
+ `./tools/find-tabs.sh recpatch` to fix them. (Again, grain of salt, foreign
+ headers should usually be left unmodified.)
+
+6. Use `ninja -C build check-api-docs` to compare the list of exported
+ symbols of `libsystemd.so` and `libudev.so` with the list of man pages. Symbols
+ lacking documentation are highlighted.
+
+7. Use `ninja -C build hwdb-update` to automatically download and import the
+ PCI, USB and OUI databases into hwdb.
+
+8. Use `ninja -C build man/update-man-rules` to update the meson rules for
+ building man pages automatically from the docbook XML files included in
+ `man/`.
+
+9. There are multiple CI systems in use that run on every github PR submission.
+
+10. [Coverity](https://scan.coverity.com/) is analyzing systemd master in
+ regular intervals. The reports are available
+ [online](https://scan.coverity.com/projects/systemd).
+
+11. [oss-fuzz](https://oss-fuzz.com/) is continuously fuzzing the
+ codebase. Reports are available
+ [online](https://oss-fuzz.com/v2/testcases?project=systemd).
+
+12. Our tree includes `.editorconfig`, `.dir-locals.el` and `.vimrc` files, to
+ ensure that editors follow the right indentiation styles automatically.
+
+13. When building systemd from a git checkout the build scripts will
+ automatically enable a git commit hook that ensures whitespace cleanliness.
+
+14. [LGTM](https://lgtm.com/) analyzes every commit pushed to master. The list
+ of active alerts can be found at
+ https://lgtm.com/projects/g/systemd/systemd/alerts/?mode=list.
+
+Access to Coverity and oss-fuzz reports is limited. Please reach out to the
+maintainers if you need access.
diff --git a/docs/CODING_STYLE b/docs/CODING_STYLE
new file mode 100644
index 000000000..26928d2e2
--- /dev/null
+++ b/docs/CODING_STYLE
@@ -0,0 +1,460 @@
+- 8ch indent, no tabs, except for files in man/ which are 2ch indent,
+ and still no tabs
+
+- We prefer /* comments */ over // comments in code you commit, please. This
+ way // comments are left for developers to use for local, temporary
+ commenting of code for debug purposes (i.e. uncommittable stuff), making such
+ comments easily discernable from explanatory, documenting code comments
+ (i.e. committable stuff).
+
+- Don't break code lines too eagerly. We do *not* force line breaks at 80ch,
+ all of today's screens should be much larger than that. But then again, don't
+ overdo it, ~119ch should be enough really. The .editorconfig, .vimrc and
+ .dir-locals.el files contained in the repository will set this limit up for
+ you automatically, if you let them (as well as a few other things).
+
+- Variables and functions *must* be static, unless they have a
+ prototype, and are supposed to be exported.
+
+- structs in MixedCase (with exceptions, such as public API structs),
+ variables + functions in lower_case.
+
+- The destructors always unregister the object from the next bigger
+ object, not the other way around
+
+- To minimize strict aliasing violations, we prefer unions over casting
+
+- For robustness reasons, destructors should be able to destruct
+ half-initialized objects, too
+
+- Error codes are returned as negative Exxx. e.g. return -EINVAL. There
+ are some exceptions: for constructors, it is OK to return NULL on
+ OOM. For lookup functions, NULL is fine too for "not found".
+
+ Be strict with this. When you write a function that can fail due to
+ more than one cause, it *really* should have "int" as return value
+ for the error code.
+
+- Do not bother with error checking whether writing to stdout/stderr
+ worked.
+
+- Do not log errors from "library" code, only do so from "main
+ program" code. (With one exception: it is OK to log with DEBUG level
+ from any code, with the exception of maybe inner loops).
+
+- Always check OOM. There is no excuse. In program code, you can use
+ "log_oom()" for then printing a short message, but not in "library" code.
+
+- Do not issue NSS requests (that includes user name and host name
+ lookups) from PID 1 as this might trigger deadlocks when those
+ lookups involve synchronously talking to services that we would need
+ to start up
+
+- Do not synchronously talk to any other service from PID 1, due to
+ risk of deadlocks
+
+- Avoid fixed-size string buffers, unless you really know the maximum
+ size and that maximum size is small. They are a source of errors,
+ since they possibly result in truncated strings. It is often nicer
+ to use dynamic memory, alloca() or VLAs. If you do allocate fixed-size
+ strings on the stack, then it is probably only OK if you either
+ use a maximum size such as LINE_MAX, or count in detail the maximum
+ size a string can have. (DECIMAL_STR_MAX and DECIMAL_STR_WIDTH
+ macros are your friends for this!)
+
+ Or in other words, if you use "char buf[256]" then you are likely
+ doing something wrong!
+
+- Stay uniform. For example, always use "usec_t" for time
+ values. Do not mix usec and msec, and usec and whatnot.
+
+- Make use of _cleanup_free_ and friends. It makes your code much
+ nicer to read (and shorter)!
+
+- Be exceptionally careful when formatting and parsing floating point
+ numbers. Their syntax is locale dependent (i.e. "5.000" in en_US is
+ generally understood as 5, while on de_DE as 5000.).
+
+- Try to use this:
+
+ void foo() {
+ }
+
+ instead of this:
+
+ void foo()
+ {
+ }
+
+ But it is OK if you do not.
+
+- Single-line "if" blocks should not be enclosed in {}. Use this:
+
+ if (foobar)
+ waldo();
+
+ instead of this:
+
+ if (foobar) {
+ waldo();
+ }
+
+- Do not write "foo ()", write "foo()".
+
+- Please use streq() and strneq() instead of strcmp(), strncmp() where
+ applicable (i.e. wherever you just care about equality/inequality, not about
+ the sorting order).
+
+- Preferably allocate stack variables on the top of the block:
+
+ {
+ int a, b;
+
+ a = 5;
+ b = a;
+ }
+
+- Unless you allocate an array, "double" is always the better choice
+ than "float". Processors speak "double" natively anyway, so this is
+ no speed benefit, and on calls like printf() "float"s get promoted
+ to "double"s anyway, so there is no point.
+
+- Do not mix function invocations with variable definitions in one
+ line. Wrong:
+
+ {
+ int a = foobar();
+ uint64_t x = 7;
+ }
+
+ Right:
+
+ {
+ int a;
+ uint64_t x = 7;
+
+ a = foobar();
+ }
+
+- Use "goto" for cleaning up, and only use it for that. i.e. you may
+ only jump to the end of a function, and little else. Never jump
+ backwards!
+
+- Think about the types you use. If a value cannot sensibly be
+ negative, do not use "int", but use "unsigned".
+
+- Use "char" only for actual characters. Use "uint8_t" or "int8_t"
+ when you actually mean a byte-sized signed or unsigned
+ integers. When referring to a generic byte, we generally prefer the
+ unsigned variant "uint8_t". Do not use types based on "short". They
+ *never* make sense. Use ints, longs, long longs, all in
+ unsigned+signed fashion, and the fixed size types
+ uint8_t/uint16_t/uint32_t/uint64_t/int8_t/int16_t/int32_t and so on,
+ as well as size_t, but nothing else. Do not use kernel types like
+ u32 and so on, leave that to the kernel.
+
+- Public API calls (i.e. functions exported by our shared libraries)
+ must be marked "_public_" and need to be prefixed with "sd_". No
+ other functions should be prefixed like that.
+
+- In public API calls, you *must* validate all your input arguments for
+ programming error with assert_return() and return a sensible return
+ code. In all other calls, it is recommended to check for programming
+ errors with a more brutal assert(). We are more forgiving to public
+ users than for ourselves! Note that assert() and assert_return()
+ really only should be used for detecting programming errors, not for
+ runtime errors. assert() and assert_return() by usage of _likely_()
+ inform the compiler that he should not expect these checks to fail,
+ and they inform fellow programmers about the expected validity and
+ range of parameters.
+
+- Never use strtol(), atoi() and similar calls. Use safe_atoli(),
+ safe_atou32() and suchlike instead. They are much nicer to use in
+ most cases and correctly check for parsing errors.
+
+- For every function you add, think about whether it is a "logging"
+ function or a "non-logging" function. "Logging" functions do logging
+ on their own, "non-logging" function never log on their own and
+ expect their callers to log. All functions in "library" code,
+ i.e. in src/shared/ and suchlike must be "non-logging". Every time a
+ "logging" function calls a "non-logging" function, it should log
+ about the resulting errors. If a "logging" function calls another
+ "logging" function, then it should not generate log messages, so
+ that log messages are not generated twice for the same errors.
+
+- Avoid static variables, except for caches and very few other
+ cases. Think about thread-safety! While most of our code is never
+ used in threaded environments, at least the library code should make
+ sure it works correctly in them. Instead of doing a lot of locking
+ for that, we tend to prefer using TLS to do per-thread caching (which
+ only works for small, fixed-size cache objects), or we disable
+ caching for any thread that is not the main thread. Use
+ is_main_thread() to detect whether the calling thread is the main
+ thread.
+
+- Command line option parsing:
+ - Do not print full help() on error, be specific about the error.
+ - Do not print messages to stdout on error.
+ - Do not POSIX_ME_HARDER unless necessary, i.e. avoid "+" in option string.
+
+- Do not write functions that clobber call-by-reference variables on
+ failure. Use temporary variables for these cases and change the
+ passed in variables only on success.
+
+- When you allocate a file descriptor, it should be made O_CLOEXEC
+ right from the beginning, as none of our files should leak to forked
+ binaries by default. Hence, whenever you open a file, O_CLOEXEC must
+ be specified, right from the beginning. This also applies to
+ sockets. Effectively this means that all invocations to:
+
+ a) open() must get O_CLOEXEC passed
+ b) socket() and socketpair() must get SOCK_CLOEXEC passed
+ c) recvmsg() must get MSG_CMSG_CLOEXEC set
+ d) F_DUPFD_CLOEXEC should be used instead of F_DUPFD, and so on
+ f) invocations of fopen() should take "e"
+
+- We never use the POSIX version of basename() (which glibc defines it in
+ libgen.h), only the GNU version (which glibc defines in string.h).
+ The only reason to include libgen.h is because dirname()
+ is needed. Every time you need that please immediately undefine
+ basename(), and add a comment about it, so that no code ever ends up
+ using the POSIX version!
+
+- Use the bool type for booleans, not integers. One exception: in public
+ headers (i.e those in src/systemd/sd-*.h) use integers after all, as "bool"
+ is C99 and in our public APIs we try to stick to C89 (with a few extension).
+
+- When you invoke certain calls like unlink(), or mkdir_p() and you
+ know it is safe to ignore the error it might return (because a later
+ call would detect the failure anyway, or because the error is in an
+ error path and you thus couldn't do anything about it anyway), then
+ make this clear by casting the invocation explicitly to (void). Code
+ checks like Coverity understand that, and will not complain about
+ ignored error codes. Hence, please use this:
+
+ (void) unlink("/foo/bar/baz");
+
+ instead of just this:
+
+ unlink("/foo/bar/baz");
+
+ Don't cast function calls to (void) that return no error
+ conditions. Specifically, the various xyz_unref() calls that return a NULL
+ object shouldn't be cast to (void), since not using the return value does not
+ hide any errors.
+
+- Don't invoke exit(), ever. It is not replacement for proper error
+ handling. Please escalate errors up your call chain, and use normal
+ "return" to exit from the main function of a process. If you
+ fork()ed off a child process, please use _exit() instead of exit(),
+ so that the exit handlers are not run.
+
+- Please never use dup(). Use fcntl(fd, F_DUPFD_CLOEXEC, 3)
+ instead. For two reason: first, you want O_CLOEXEC set on the new fd
+ (see above). Second, dup() will happily duplicate your fd as 0, 1,
+ 2, i.e. stdin, stdout, stderr, should those fds be closed. Given the
+ special semantics of those fds, it's probably a good idea to avoid
+ them. F_DUPFD_CLOEXEC with "3" as parameter avoids them.
+
+- When you define a destructor or unref() call for an object, please
+ accept a NULL object and simply treat this as NOP. This is similar
+ to how libc free() works, which accepts NULL pointers and becomes a
+ NOP for them. By following this scheme a lot of if checks can be
+ removed before invoking your destructor, which makes the code
+ substantially more readable and robust.
+
+- Related to this: when you define a destructor or unref() call for an
+ object, please make it return the same type it takes and always
+ return NULL from it. This allows writing code like this:
+
+ p = foobar_unref(p);
+
+ which will always work regardless if p is initialized or not, and
+ guarantees that p is NULL afterwards, all in just one line.
+
+- Use alloca(), but never forget that it is not OK to invoke alloca()
+ within a loop or within function call parameters. alloca() memory is
+ released at the end of a function, and not at the end of a {}
+ block. Thus, if you invoke it in a loop, you keep increasing the
+ stack pointer without ever releasing memory again. (VLAs have better
+ behaviour in this case, so consider using them as an alternative.)
+ Regarding not using alloca() within function parameters, see the
+ BUGS section of the alloca(3) man page.
+
+- Use memzero() or even better zero() instead of memset(..., 0, ...)
+
+- Instead of using memzero()/memset() to initialize structs allocated
+ on the stack, please try to use c99 structure initializers. It's
+ short, prettier and actually even faster at execution. Hence:
+
+ struct foobar t = {
+ .foo = 7,
+ .bar = "bazz",
+ };
+
+ instead of:
+
+ struct foobar t;
+ zero(t);
+ t.foo = 7;
+ t.bar = "bazz";
+
+- When returning a return code from main(), please preferably use
+ EXIT_FAILURE and EXIT_SUCCESS as defined by libc.
+
+- The order in which header files are included doesn't matter too
+ much. systemd-internal headers must not rely on an include order, so
+ it is safe to include them in any order possible.
+ However, to not clutter global includes, and to make sure internal
+ definitions will not affect global headers, please always include the
+ headers of external components first (these are all headers enclosed
+ in <>), followed by our own exported headers (usually everything
+ that's prefixed by "sd-"), and then followed by internal headers.
+ Furthermore, in all three groups, order all includes alphabetically
+ so duplicate includes can easily be detected.
+
+- To implement an endless loop, use "for (;;)" rather than "while
+ (1)". The latter is a bit ugly anyway, since you probably really
+ meant "while (true)"... To avoid the discussion what the right
+ always-true expression for an infinite while() loop is our
+ recommendation is to simply write it without any such expression by
+ using "for (;;)".
+
+- Never use the "off_t" type, and particularly avoid it in public
+ APIs. It's really weirdly defined, as it usually is 64bit and we
+ don't support it any other way, but it could in theory also be
+ 32bit. Which one it is depends on a compiler switch chosen by the
+ compiled program, which hence corrupts APIs using it unless they can
+ also follow the program's choice. Moreover, in systemd we should
+ parse values the same way on all architectures and cannot expose
+ off_t values over D-Bus. To avoid any confusion regarding conversion
+ and ABIs, always use simply uint64_t directly.
+
+- Commit message subject lines should be prefixed with an appropriate
+ component name of some kind. For example "journal: ", "nspawn: " and
+ so on.
+
+- Do not use "Signed-Off-By:" in your commit messages. That's a kernel
+ thing we don't do in the systemd project.
+
+- Avoid leaving long-running child processes around, i.e. fork()s that
+ are not followed quickly by an execv() in the child. Resource
+ management is unclear in this case, and memory CoW will result in
+ unexpected penalties in the parent much, much later on.
+
+- Don't block execution for arbitrary amounts of time using usleep()
+ or a similar call, unless you really know what you do. Just "giving
+ something some time", or so is a lazy excuse. Always wait for the
+ proper event, instead of doing time-based poll loops.
+
+- To determine the length of a constant string "foo", don't bother
+ with sizeof("foo")-1, please use STRLEN() instead.
+
+- If you want to concatenate two or more strings, consider using
+ strjoin() rather than asprintf(), as the latter is a lot
+ slower. This matters particularly in inner loops.
+
+- Please avoid using global variables as much as you can. And if you
+ do use them make sure they are static at least, instead of
+ exported. Especially in library-like code it is important to avoid
+ global variables. Why are global variables bad? They usually hinder
+ generic reusability of code (since they break in threaded programs,
+ and usually would require locking there), and as the code using them
+ has side-effects make programs non-transparent. That said, there are
+ many cases where they explicitly make a lot of sense, and are OK to
+ use. For example, the log level and target in log.c is stored in a
+ global variable, and that's OK and probably expected by most. Also
+ in many cases we cache data in global variables. If you add more
+ caches like this, please be careful however, and think about
+ threading. Only use static variables if you are sure that
+ thread-safety doesn't matter in your case. Alternatively consider
+ using TLS, which is pretty easy to use with gcc's "thread_local"
+ concept. It's also OK to store data that is inherently global in
+ global variables, for example data parsed from command lines, see
+ below.
+
+- If you parse a command line, and want to store the parsed parameters
+ in global variables, please consider prefixing their names with
+ "arg_". We have been following this naming rule in most of our
+ tools, and we should continue to do so, as it makes it easy to
+ identify command line parameter variables, and makes it clear why it
+ is OK that they are global variables.
+
+- When exposing public C APIs, be careful what function parameters you make
+ "const". For example, a parameter taking a context object should probably not
+ be "const", even if you are writing an otherwise read-only accessor function
+ for it. The reason is that making it "const" fixates the contract that your
+ call won't alter the object ever, as part of the API. However, that's often
+ quite a promise, given that this even prohibits object-internal caching or
+ lazy initialization of object variables. Moreover it's usually not too useful
+ for client applications. Hence: please be careful and avoid "const" on object
+ parameters, unless you are very sure "const" is appropriate.
+
+- Make sure to enforce limits on every user controllable resource. If the user
+ can allocate resources in your code, your code must enforce some form of
+ limits after which it will refuse operation. It's fine if it is hard-coded (at
+ least initially), but it needs to be there. This is particularly important
+ for objects that unprivileged users may allocate, but also matters for
+ everything else any user may allocated.
+
+- htonl()/ntohl() and htons()/ntohs() are weird. Please use htobe32() and
+ htobe16() instead, it's much more descriptive, and actually says what really
+ is happening, after all htonl() and htons() don't operate on longs and
+ shorts as their name would suggest, but on uint32_t and uint16_t. Also,
+ "network byte order" is just a weird name for "big endian", hence we might
+ want to call it "big endian" right-away.
+
+- You might wonder what kind of common code belongs in src/shared/ and what
+ belongs in src/basic/. The split is like this: anything that uses public APIs
+ we expose (i.e. any of the sd-bus, sd-login, sd-id128, ... APIs) must be
+ located in src/shared/. All stuff that only uses external libraries from
+ other projects (such as glibc's APIs), or APIs from src/basic/ itself should
+ be placed in src/basic/. Conversely, src/libsystemd/ may only use symbols
+ from src/basic, but not from src/shared/. To summarize:
+
+ src/basic/ β†’ may be used by all code in the tree
+ β†’ may not use any code outside of src/basic/
+
+ src/libsystemd/ β†’ may be used by all code in the tree, except for code in src/basic/
+ β†’ may not use any code outside of src/basic/, src/libsystemd/
+
+ src/shared/ β†’ may be used by all code in the tree, except for code in src/basic/, src/libsystemd/
+ β†’ may not use any code outside of src/basic/, src/libsystemd/, src/shared/
+
+- Our focus is on the GNU libc (glibc), not any other libcs. If other libcs are
+ incompatible with glibc it's on them. However, if there are equivalent POSIX
+ and Linux/GNU-specific APIs, we generally prefer the POSIX APIs. If there
+ aren't, we are happy to use GNU or Linux APIs, and expect non-GNU
+ implementations of libc to catch up with glibc.
+
+- Whenever installing a signal handler, make sure to set SA_RESTART for it, so
+ that interrupted system calls are automatically restarted, and we minimize
+ hassles with handling EINTR (in particular as EINTR handling is pretty broken
+ on Linux).
+
+- When applying C-style unescaping as well as specifier expansion on the same
+ string, always apply the C-style unescaping fist, followed by the specifier
+ expansion. When doing the reverse, make sure to escape '%' in specifier-style
+ first (i.e. '%' β†’ '%%'), and then do C-style escaping where necessary.
+
+- It's a good idea to use O_NONBLOCK when opening 'foreign' regular files, i.e
+ file system objects that are supposed to be regular files whose paths where
+ specified by the user and hence might actually refer to other types of file
+ system objects. This is a good idea so that we don't end up blocking on
+ 'strange' file nodes, for example if the user pointed us to a FIFO or device
+ node which may block when opening. Moreover even for actual regular files
+ O_NONBLOCK has a benefit: it bypasses any mandatory lock that might be in
+ effect on the regular file. If in doubt consider turning off O_NONBLOCK again
+ after opening.
+
+- When referring to a configuration file option in the documentation and such,
+ please always suffix it with "=", to indicate that it is a configuration file
+ setting.
+
+- When referring to a command line option in the documentation and such, please
+ always prefix with "--" or "-" (as appropriate), to indicate that it is a
+ command line option.
+
+- When referring to a file system path that is a directory, please always
+ suffix it with "/", to indicate that it is a directory, not a regular file
+ (or other file system object).
diff --git a/docs/DISTRO_PORTING b/docs/DISTRO_PORTING
new file mode 100644
index 000000000..d1a187aa4
--- /dev/null
+++ b/docs/DISTRO_PORTING
@@ -0,0 +1,71 @@
+Porting systemd To New Distributions
+
+HOWTO:
+ You need to make the follow changes to adapt systemd to your
+ distribution:
+
+ 1) Find the right configure parameters for:
+
+ -D rootprefix=
+ -D sysvinit-path=
+ -D sysvrcnd-path=
+ -D rc-local=
+ -D halt-local=
+ -D loadkeys-path=
+ -D setfont-path=
+ -D tty-gid=
+ -D ntp-servers=
+ -D dns-servers=
+ -D support-url=
+
+ 2) Try it out. Play around (as an ordinary user) with
+ '/usr/lib/systemd/systemd --test --system' for a test run
+ of systemd without booting. This will read the unit files and
+ print the initial transaction it would execute during boot-up.
+ This will also inform you about ordering loops and suchlike.
+
+NTP POOL:
+ By default, systemd-timesyncd uses the Google Public NTP servers
+ time[1-4].google.com, if no other NTP configuration is available. They
+ serve time that uses a leap second smear, and can be up to .5s off from
+ servers that use stepped leap seconds.
+
+ https://developers.google.com/time/smear
+
+ If you prefer to use leap second steps, please register your own
+ vendor pool at ntp.org and make it the built-in default by
+ passing --with-ntp-servers= to configure. Registering vendor
+ pools is free:
+
+ http://www.pool.ntp.org/en/vendors.html
+
+ Use -D ntp-servers= to direct systemd-timesyncd to different fallback
+ NTP servers.
+
+DNS SERVERS:
+ By default, systemd-resolved uses the Google Public DNS servers
+ 8.8.8.8, 8.8.4.4, 2001:4860:4860::8888, 2001:4860:4860::8844 as
+ fallback, if no other DNS configuration is available.
+
+ Use -D dns-servers= to direct systemd-resolved to different fallback
+ DNS servers.
+
+PAM:
+ The default PAM config shipped by systemd is really bare bones.
+ It does not include many modules your distro might want to enable
+ to provide a more seamless experience. For example, limits set in
+ /etc/security/limits.conf will not be read unless you load pam_limits.
+ Make sure you add modules your distro expects from user services.
+
+ Pass -D pamconfdir=no to meson to avoid installing this file and
+ instead install your own.
+
+CONTRIBUTING UPSTREAM:
+ We generally do no longer accept distribution-specific patches to
+ systemd upstream. If you have to make changes to systemd's source code
+ to make it work on your distribution, unless your code is generic
+ enough to be generally useful, we are unlikely to merge it. Please
+ always consider adopting the upstream defaults. If that is not
+ possible, please maintain the relevant patches downstream.
+
+ Thank you for understanding.
diff --git a/docs/ENVIRONMENT.md b/docs/ENVIRONMENT.md
new file mode 100644
index 000000000..9d598a669
--- /dev/null
+++ b/docs/ENVIRONMENT.md
@@ -0,0 +1,124 @@
+# Known Environment Variables
+
+A number of systemd components take additional runtime parameters via
+environment variables. Many of these environment variables are not supported at
+the same level as command line switches and other interfaces are: we don't
+document them in the man pages and we make no stability guarantees for
+them. While they generally are unlikely to be dropped any time soon again, we
+do not want to guarantee that they stay around for good either.
+
+Below is an (incomprehensive) list of the environment variables understood by
+the various tools. Note that this list only covers environment variables not
+documented in the proper man pages.
+
+All tools:
+
+* `$SYSTEMD_OFFLINE=[0|1]` β€” if set to `1`, then `systemctl` will
+ refrain from talking to PID 1; this has the same effect as the historical
+ detection of `chroot()`. Setting this variable to `0` instead has a similar
+ effect as `SYSTEMD_IGNORE_CHROOT=1`; i.e. tools will try to
+ communicate with PID 1 even if a `chroot()` environment is detected.
+ You almost certainly want to set this to `1` if you maintain a package build system
+ or similar and are trying to use a modern container system and not plain
+ `chroot()`.
+
+* `$SYSTEMD_IGNORE_CHROOT=1` β€” if set, don't check whether being invoked in a
+ `chroot()` environment. This is particularly relevant for systemctl, as it
+ will not alter its behaviour for `chroot()` environments if set. Normally it
+ refrains from talking to PID 1 in such a case; turning most operations such
+ as `start` into no-ops. If that's what's explicitly desired, you might
+ consider setting `SYSTEMD_OFFLINE=1`.
+
+* `$SD_EVENT_PROFILE_DELAYS=1` β€” if set, the sd-event event loop implementation
+ will print latency information at runtime.
+
+* `$SYSTEMD_PROC_CMDLINE` β€” if set, may contain a string that is used as kernel
+ command line instead of the actual one readable from /proc/cmdline. This is
+ useful for debugging, in order to test generators and other code against
+ specific kernel command lines.
+
+* `$SYSTEMD_BUS_TIMEOUT=SECS` β€” specifies the maximum time to wait for method call
+ completion. If no time unit is specified, assumes seconds. The usual other units
+ are understood, too (us, ms, s, min, h, d, w, month, y). If it is not set or set
+ to 0, then the built-in default is used.
+
+* `$SYSTEMD_MEMPOOL=0` β€” if set the internal memory caching logic employed by
+ hash tables is turned off, and libc malloc() is used for all allocations.
+
+systemctl:
+
+* `$SYSTEMCTL_FORCE_BUS=1` β€” if set, do not connect to PID1's private D-Bus
+ listener, and instead always connect through the dbus-daemon D-bus broker.
+
+* `$SYSTEMCTL_INSTALL_CLIENT_SIDE=1` β€” if set, enable or disable unit files on
+ the client side, instead of asking PID 1 to do this.
+
+* `$SYSTEMCTL_SKIP_SYSV=1` β€” if set, do not call out to SysV compatibility hooks.
+
+systemd-nspawn:
+
+* `$UNIFIED_CGROUP_HIERARCHY=1` β€” if set, force nspawn into unified cgroup
+ hierarchy mode.
+
+* `$SYSTEMD_NSPAWN_API_VFS_WRITABLE=1` β€” if set, make /sys and /proc/sys and
+ friends writable in the container. If set to "network", leave only
+ /proc/sys/net writable.
+
+* `$SYSTEMD_NSPAWN_CONTAINER_SERVICE=…` β€” override the "service" name nspawn
+ uses to register with machined. If unset defaults to "nspawn", but with this
+ variable may be set to any other value.
+
+* `$SYSTEMD_NSPAWN_USE_CGNS=0` β€” if set, do not use cgroup namespacing, even if
+ it is available.
+
+* `$SYSTEMD_NSPAWN_LOCK=0` β€” if set, do not lock container images when running.
+
+systemd-logind:
+
+* `$SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1` β€” if set, report that
+ hibernation is available even if the swap devices do not provide enough room
+ for it.
+
+installed systemd tests:
+
+* `$SYSTEMD_TEST_DATA` β€” override the location of test data. This is useful if
+ a test executable is moved to an arbitrary location.
+
+nss-systemd:
+
+* `$SYSTEMD_NSS_BYPASS_SYNTHETIC=1` β€” if set, `nss-systemd` won't synthesize
+ user/group records for the `root` and `nobody` users if they are missing from
+ `/etc/passwd`.
+
+* `$SYSTEMD_NSS_DYNAMIC_BYPASS=1` β€” if set, `nss-systemd` won't return
+ user/group records for dynamically registered service users (i.e. users
+ registered through `DynamicUser=1`).
+
+* `$SYSTEMD_NSS_BYPASS_BUS=1` β€” if set, `nss-systemd` won't use D-Bus to do
+ dynamic user lookups. This is primarily useful to make `nss-systemd` work
+ safely from within `dbus-daemon`.
+
+systemd-timedated:
+
+* `$SYSTEMD_TIMEDATED_NTP_SERVICES=…` β€” colon-separated list of unit names of
+ NTP client services. If set, `timedatectl set-ntp on` enables and starts the
+ first existing unit listed in the environment variable, and
+ `timedatectl set-ntp off` disables and stops all listed units.
+
+systemd itself:
+
+* `$SYSTEMD_ACTIVATION_UNIT` β€” set for all NSS and PAM module invocations that
+ are done by the service manager on behalf of a specific unit, in child
+ processes that are later (after execve()) going to become unit
+ processes. Contains the full unit name (e.g. "foobar.service"). NSS and PAM
+ modules can use this information to determine in which context and on whose
+ behalf they are being called, which may be useful to avoid deadlocks, for
+ example to bypass IPC calls to the very service that is about to be
+ started. Note that NSS and PAM modules should be careful to only rely on this
+ data when invoked privileged, or possibly only when getppid() returns 1, as
+ setting environment variables is of course possible in any even unprivileged
+ contexts.
+
+* `$SYSTEMD_ACTIVATION_SCOPE` β€” closely related to `$SYSTEMD_ACTIVATION_UNIT`,
+ it is either set to `system` or `user` depending on whether the NSS/PAM
+ module is called by systemd in `--system` or `--user` mode.
diff --git a/docs/HACKING b/docs/HACKING
new file mode 100644
index 000000000..c7b700e6a
--- /dev/null
+++ b/docs/HACKING
@@ -0,0 +1,122 @@
+HACKING ON SYSTEMD
+
+We welcome all contributions to systemd. If you notice a bug or a missing
+feature, please feel invited to fix it, and submit your work as a github Pull
+Request (PR):
+
+ https://github.com/systemd/systemd/pull/new
+
+Please make sure to follow our Coding Style when submitting patches. See
+docs/CODING_STYLE for details. Also have a look at our Contribution Guidelines:
+
+ https://github.com/systemd/systemd/blob/master/.github/CONTRIBUTING.md
+
+When adding new functionality, tests should be added. For shared functionality
+(in src/basic and src/shared) unit tests should be sufficient. The general
+policy is to keep tests in matching files underneath src/test,
+e.g. src/test/test-path-util.c contains tests for any functions in
+src/basic/path-util.c. If adding a new source file, consider adding a matching
+test executable. For features at a higher level, tests in src/test/ are very
+strongly recommended. If that is no possible, integration tests in test/ are
+encouraged.
+
+Please also have a look at our list of code quality tools we have setup for systemd,
+to ensure our codebase stays in good shape:
+
+ https://github.com/systemd/systemd/blob/master/docs/CODE_QUALITY.md
+
+Please always test your work before submitting a PR. For many of the components
+of systemd testing is straight-forward as you can simply compile systemd and
+run the relevant tool from the build directory.
+
+For some components (most importantly, systemd/PID1 itself) this is not
+possible, however. In order to simplify testing for cases like this we provide
+a set of "mkosi" build files directly in the source tree. "mkosi" is a tool for
+building clean OS images from an upstream distribution in combination with a
+fresh build of the project in the local working directory. To make use of this,
+please acquire "mkosi" from https://github.com/systemd/mkosi first, unless your
+distribution has packaged it already and you can get it from there. After the
+tool is installed it is sufficient to type "mkosi" in the systemd project
+directory to generate a disk image "image.raw" you can boot either in
+systemd-nspawn or in an UEFI-capable VM:
+
+ # systemd-nspawn -bi image.raw
+
+or:
+
+ # qemu-system-x86_64 -enable-kvm -m 512 -smp 2 -bios /usr/share/edk2/ovmf/OVMF_CODE.fd -hda image.raw
+
+Every time you rerun the "mkosi" command a fresh image is built, incorporating
+all current changes you made to the project tree.
+
+Alternatively, you may install the systemd version from your git check-out
+directly on top of your host system's directory tree. This mostly works fine,
+but of course you should know what you are doing as you might make your system
+unbootable in case of a bug in your changes. Also, you might step into your
+package manager's territory with this. Be careful!
+
+And never forget: most distributions provide very simple and convenient ways to
+install all development packages necessary to build systemd. For example, on
+Fedora the following command line should be sufficient to install all of
+systemd's build dependencies:
+
+ # dnf builddep systemd
+
+Putting this all together, here's a series of commands for preparing a patch
+for systemd (this example is for Fedora):
+
+ $ sudo dnf builddep systemd # install build dependencies
+ $ sudo dnf install mkosi # install tool to quickly build images
+ $ git clone https://github.com/systemd/systemd.git
+ $ cd systemd
+ $ vim src/core/main.c # or wherever you'd like to make your changes
+ $ meson build # configure the build
+ $ ninja -C build # build it locally, see if everything compiles fine
+ $ ninja -C build test # run some simple regression tests
+ $ (umask 077; echo 123 > mkosi.rootpw) # set root password used by mkosi
+ $ sudo mkosi # build a test image
+ $ sudo systemd-nspawn -bi image.raw # boot up the test image
+ $ git add -p # interactively put together your patch
+ $ git commit # commit it
+ $ git push REMOTE HEAD:refs/heads/BRANCH
+ # where REMOTE is your "fork" on github
+ # and BRANCH is a branch name.
+
+And after that, head over to your repo on github and click "Compare & pull request"
+
+Happy hacking!
+
+
+FUZZERS
+
+systemd includes fuzzers in src/fuzz that use libFuzzer and are automatically
+run by OSS-Fuzz (https://github.com/google/oss-fuzz) with sanitizers. To add a
+fuzz target, create a new src/fuzz/fuzz-foo.c file with a LLVMFuzzerTestOneInput
+function and add it to the list in src/fuzz/meson.build.
+
+Whenever possible, a seed corpus and a dictionary should also be added with new
+fuzz targets. The dictionary should be named src/fuzz/fuzz-foo.dict and the seed
+corpus should be built and exported as $OUT/fuzz-foo_seed_corpus.zip in
+tools/oss-fuzz.sh.
+
+The fuzzers can be built locally if you have libFuzzer installed by running
+tools/oss-fuzz.sh. You should also confirm that the fuzzer runs in the
+OSS-Fuzz environment by checking out the OSS-Fuzz repo, and then running
+commands like this:
+
+ python infra/helper.py build_image systemd
+ python infra/helper.py build_fuzzers --sanitizer memory systemd ../systemd
+ python infra/helper.py run_fuzzer systemd fuzz-foo
+
+If you find a bug that impacts the security of systemd, please follow the
+guidance in .github/CONTRIBUTING.md on how to report a security vulnerability.
+
+For more details on building fuzzers and integrating with OSS-Fuzz, visit:
+
+ https://github.com/google/oss-fuzz/blob/master/docs/new_project_guide.md
+
+ https://llvm.org/docs/LibFuzzer.html
+
+ https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md
+
+ https://chromium.googlesource.com/chromium/src/testing/libfuzzer/+/HEAD/efficient_fuzzer.md
diff --git a/docs/PORTABLE_SERVICES.md b/docs/PORTABLE_SERVICES.md
new file mode 100644
index 000000000..183324444
--- /dev/null
+++ b/docs/PORTABLE_SERVICES.md
@@ -0,0 +1,251 @@
+# Portable Services Introduction
+
+This systemd version includes a preview of the "portable service"
+concept. "Portable Services" are supposed to be an incremental improvement over
+traditional system services, making two specific facets of container management
+available to system services more readily. Specifically:
+
+1. The bundling of applications, i.e. packing up multiple services, their
+ binaries and all their dependencies in a single image, and running them
+ directly from it.
+
+2. Stricter default security policies, i.e. sand-boxing of applications.
+
+The primary tool for interfacing with "portable services" is the new
+"portablectl" program. It's currently shipped in /usr/lib/systemd/portablectl
+(i.e. not in the `$PATH`), since it's not yet considered part of the officially
+supported systemd interfaces β€” it's a preview still after all.
+
+Portable services don't bring anything inherently new to the table. All they do
+is put together known concepts in a slightly nicer way to cover a specific set
+of use-cases in a nicer way.
+
+# So, what *is* a "Portable Service"?
+
+A portable service is ultimately just an OS tree, either inside of a directory
+tree, or inside a raw disk image containing a Linux file system. This tree is
+called the "image". It can be "attached" or "detached" from the system. When
+"attached" specific systemd units from the image are made available on the host
+system, then behaving pretty much exactly like locally installed system
+services. When "detached" these units are removed again from the host, leaving
+no artifacts around (except maybe messages they might have logged).
+
+The OS tree/image can be created with any tool of your choice. For example, you
+can use `dnf --installroot=` if you like, or `debootstrap`, the image format is
+entirely generic, and doesn't have to carry any specific metadata beyond what
+distribution images carry anyway. Or to say this differently: the image format
+doesn't define any new metadata as unit files and OS tree directories or disk
+images are already sufficient, and pretty universally available these days. One
+particularly nice tool for creating suitable images is
+[mkosi](https://github.com/systemd/mkosi), but many other existing tools will
+do too.
+
+If you so will, "Portable Services" are a nicer way to manage chroot()
+environments, with better security, tooling and behavior.
+
+# Where's the difference to a "Container"?
+
+"Container" is a very vague term, after all it is used for
+systemd-nspawn/LXC-type OS containers, for Docker/rkt-like micro service
+containers, and even certain 'lightweight' VM runtimes.
+
+The "portable service" concept ultimately will not provide a fully isolated
+environment to the payload, like containers mostly intend to. Instead they are
+from the beginning more alike regular system services, can be controlled with
+the same tools, are exposed the same way in all infrastructure and so on. Their
+main difference is that the use a different root directory than the rest of the
+system. Hence, the intention is not to run code in a different, isolated world
+from the host β€” like most containers would do it β€”, but to run it in the same
+world, but with stricter access controls on what the service can see and do.
+
+As one point of differentiation: as programs run as "portable services" are
+pretty much regular system services, they won't run as PID 1 (like Docker would
+do it), but as normal process. A corollary of that is that they aren't supposed
+to manage anything in their own environment (such as the network) as the
+execution environment is mostly shared with the rest of the system.
+
+The primary focus use-case of "portable services" is to extend the host system
+with encapsulated extensions, but provide almost full integration with the rest
+of the system, though possibly restricted by effective security knobs. This
+focus includes system extensions otherwise sometimes called "super-privileged
+containers".
+
+Note that portable services are only available for system services, not for
+user services. i.e. the functionality cannot be used for the stuff
+bubblewrap/flatpak is focusing on.
+
+# Mode of Operation
+
+If you have portable service image, maybe in a raw disk image called
+`foobar_0.7.23.raw`, then attaching the services to the host is as easy as:
+
+```
+# /usr/lib/systemd/portablectl attach foobar_0.7.23.raw
+```
+
+This command does the following:
+
+1. It dissects the image, checks and validates the `/etc/os-release` data of
+ the image, and looks for all included unit files.
+
+2. It copies out all unit files with a suffix of `.service`, `.socket`,
+ `.target`, `.timer` and `.path`. whose name begins with the image's name
+ (with the .raw removed), truncated at the first underscore (if there is
+ one). This prefix name generated from the image name must be followed by a
+ ".", "-" or "@" character in the unit name. Or in other words, given the
+ image name of `foobar_0.7.23.raw` all unit files matching
+ `foobar-*.{service|socket|target|timer|path}`,
+ `foobar@.{service|socket|target|timer|path}` as well as
+ `foobar.*.{service|socket|target|timer|path}` and
+ `foobar.{service|socket|target|timer|path}` are copied out. These unit files
+ are placed in `/etc/systemd/system/` like regular unit files. Within the
+ images the unit files are looked for at the usual locations, i.e. in
+ `/usr/lib/systemd/system/` and `/etc/systemd/system/` and so on, relative to
+ the image's root.
+
+3. For each such unit file a drop-in file is created. Let's say
+ `foobar-waldo.service` was one of the unit files copied to
+ `/etc/systemd/system/`, then a drop-in file
+ `/etc/systemd/system/foobar-waldo.service.d/20-portable.conf` is created,
+ containing a few lines of additional configuration:
+
+ ```
+ [Service]
+ RootImage=/path/to/foobar.raw
+ Environment=PORTABLE=foobar
+ LogExtraFields=PORTABLE=foobar
+ ```
+
+4. For each such unit a "profile" drop-in is linked in. This "profile" drop-in
+ generally contains security options that lock down the service. By default
+ the `default` profile is used, which provides a medium level of
+ security. There's also `trusted` which runs the service at the highest
+ privileges, i.e. host's root and everything. The `strict` profile comes with
+ the toughest security restrictions. Finally, `nonetwork` is like `default`
+ but without network access. Users may define their own profiles too (or
+ modify the existing ones)
+
+And that's already it.
+
+Note that the images need to stay around (and the same location) as long as the
+portable service is attached. If an image is moved, the `RootImage=` line
+written to the unit drop-in would point to an non-existing place, and break the
+logic.
+
+The `portablectl detach` command executes the reverse operation: it looks for
+the drop-ins and the unit files associated with the image, and removes them
+again.
+
+Note that `portable attach` won't enable or start any of the units it copies
+out. This still has to take place in a second, separate step. (That said We
+might add options to do this automatically later on.).
+
+# Requirements on Images
+
+Note that portable services don't introduce any new image format, but most OS
+images should just work the way they are. Specifically, the following
+requirements are made for an image that can be attached/detached with
+`portablectl`.
+
+1. It must contain a binary (and its dependencies) that shall be invoked,
+ including all its dependencies. If binary code, the code needs to be
+ compiled for an architecture compatible with the host.
+
+2. The image must either be a plain sub-directory (or btrfs subvolume)
+ containing the binaries and its dependencies in a classic Linux OS tree, or
+ must be a raw disk image either containing only one, naked file system, or
+ an image with a partition table understood by the Linux kernel with only a
+ single partition defined, or alternatively, a GPT partition table with a set
+ of properly marked partitions following the [Discoverable Partitions
+ Specification](https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/).
+
+3. The image must at least contain one matching unit file, with the right name
+ prefix and suffix (see above). The unit file is searched in the usual paths,
+ i.e. primarily /etc/systemd/system/ and /usr/lib/systemd/system/ within the
+ image. (The implementation will check a couple of other paths too, but it's
+ recommended to use these two paths.)
+
+4. The image must contain an os-release file, either in /etc/os-release or
+ /usr/lib/os-release. The file should follow the standard format.
+
+Note that generally images created by tools such as `debootstrap`, `dnf
+--installroot=` or `mkosi` qualify for all of the above in one way or
+another. If you wonder what the most minimal image would be that complies with
+the requirements above, it could consist of this:
+
+```
+/usr/bin/minimald # a statically compiled binary
+/usr/lib/systemd/minimal-test.service # the unit file for the service, with ExecStart=/usr/bin/minimald
+/usr/lib/os-release # an os-release file explaining what this is
+```
+
+And that's it.
+
+Note that qualifying images do not have to contain an init system of their
+own. If they do, it's fine, it will be ignored by the portable service logic,
+but they generally don't have to, and it might make sense to avoid any, to keep
+images minimal.
+
+Note that as no new image format or metadata is defined, it's very
+straight-forward to define images than can be made use of it a number of
+different ways. For example, by using `mkosi -b` you can trivially build a
+single, unified image that:
+
+1. Can be attached as portable service, to run any container services natively
+ on the host.
+
+2. Can be run as OS container, using `systemd-nspawn`, by booting the image
+ with `systemd-nspawn -i -b`.
+
+3. Can be booted directly as VM image, using a generic VM executor such as
+ `virtualbox`/`qemu`/`kvm`
+
+4. Can be booted directly on bare-metal systems.
+
+Of course, to facilitate 2, 3 and 4 you need to include an init system in the
+image. To facility 3 and 4 you also need to include a boot loader in the
+image. As mentioned `mkosi -b` takes care of all of that for you, but any other
+image generator should work too.
+
+# Execution Environment
+
+Note that the code in portable service images is run exactly like regular
+services. Hence there's no new execution environment to consider. Oh, unlike
+Docker would do it, as these are regular system services they aren't run as PID
+1 either, but with regular PID values.
+
+# Access to host resources
+
+If services shipped with this mechanism shall be able to access host resources
+(such as files or AF_UNIX sockets for IPC), use the normal `BindPaths=` and
+`BindReadOnlyPaths=` settings in unit files to mount them in. In fact the
+`default` profile mentioned above makes use of this to ensure
+`/etc/resolv.conf`, the D-Bus system bus socket or write access to the logging
+subsystem are available to the service.
+
+# Instantiation
+
+Sometimes it makes sense to instantiate the same set of services multiple
+times. The portable service concept does not introduce a new logic for this. It
+is recommended to use the regular unit templating of systemd for this, i.e. to
+include template units such as `foobar@.service`, so that instantiation is as
+simple as:
+
+```
+# /usr/lib/systemd/portablectl attach foobar_0.7.23.raw
+# systemctl enable --now foobar@instancea.service
+# systemctl enable --now foobar@instanceb.service
+…
+```
+
+The benefit of this approach is that templating works exactly the same for
+units shipped with the OS itself as for attached portable services.
+
+# Immutable images with local data
+
+It's a good idea to keep portable service images read-only during normal
+operation. In fact all but the `trusted` profile will default to this kind of
+behaviour, by setting the `ProtectSystem=strict` option. In this case writable
+service data may be placed on the host file system. Use `StateDirectory=` in
+the unit files to enable such behaviour and add a local data directory to the
+services copied onto the host.
diff --git a/docs/TRANSIENT-SETTINGS.md b/docs/TRANSIENT-SETTINGS.md
new file mode 100644
index 000000000..bb13cfdbf
--- /dev/null
+++ b/docs/TRANSIENT-SETTINGS.md
@@ -0,0 +1,457 @@
+# What settings are currently available for transient units?
+
+Our intention is to make all settings that are available as unit file settings
+also available for transient units, through the D-Bus API. At the moment, some
+unit types (device, swap, target) are not supported at all via unit types,
+but most others are pretty well supported, with some notable omissions.
+
+The lists below contain all settings currently available in unit files. The
+ones currently available in transient units are prefixed with `βœ“`.
+
+## Generic Unit Settings
+
+Most generic unit settings are available for transient units.
+
+```
+βœ“ Description=
+βœ“ Documentation=
+βœ“ SourcePath=
+βœ“ Requires=
+βœ“ Requisite=
+βœ“ Wants=
+βœ“ BindsTo=
+βœ“ Conflicts=
+βœ“ Before=
+βœ“ After=
+βœ“ OnFailure=
+βœ“ PropagatesReloadTo=
+βœ“ ReloadPropagatedFrom=
+βœ“ PartOf=
+βœ“ JoinsNamespaceOf=
+βœ“ RequiresMountsFor=
+βœ“ StopWhenUnneeded=
+βœ“ RefuseManualStart=
+βœ“ RefuseManualStop=
+βœ“ AllowIsolate=
+βœ“ DefaultDependencies=
+βœ“ OnFailureJobMode=
+βœ“ IgnoreOnIsolate=
+βœ“ JobTimeoutSec=
+βœ“ JobRunningTimeoutSec=
+βœ“ JobTimeoutAction=
+βœ“ JobTimeoutRebootArgument=
+βœ“ StartLimitIntervalSec=SECONDS
+βœ“ StartLimitBurst=UNSIGNED
+βœ“ StartLimitAction=ACTION
+βœ“ FailureAction=
+βœ“ SuccessAction=
+βœ“ AddRef=
+βœ“ RebootArgument=STRING
+βœ“ ConditionPathExists=
+βœ“ ConditionPathExistsGlob=
+βœ“ ConditionPathIsDirectory=
+βœ“ ConditionPathIsSymbolicLink=
+βœ“ ConditionPathIsMountPoint=
+βœ“ ConditionPathIsReadWrite=
+βœ“ ConditionDirectoryNotEmpty=
+βœ“ ConditionFileNotEmpty=
+βœ“ ConditionFileIsExecutable=
+βœ“ ConditionNeedsUpdate=
+βœ“ ConditionFirstBoot=
+βœ“ ConditionKernelCommandLine=
+βœ“ ConditionKernelVersion=
+βœ“ ConditionArchitecture=
+βœ“ ConditionVirtualization=
+βœ“ ConditionSecurity=
+βœ“ ConditionCapability=
+βœ“ ConditionHost=
+βœ“ ConditionACPower=
+βœ“ ConditionUser=
+βœ“ ConditionGroup=
+βœ“ ConditionControlGroupController=
+βœ“ AssertPathExists=
+βœ“ AssertPathExistsGlob=
+βœ“ AssertPathIsDirectory=
+βœ“ AssertPathIsSymbolicLink=
+βœ“ AssertPathIsMountPoint=
+βœ“ AssertPathIsReadWrite=
+βœ“ AssertDirectoryNotEmpty=
+βœ“ AssertFileNotEmpty=
+βœ“ AssertFileIsExecutable=
+βœ“ AssertNeedsUpdate=
+βœ“ AssertFirstBoot=
+βœ“ AssertKernelCommandLine=
+βœ“ AssertKernelVersion=
+βœ“ AssertArchitecture=
+βœ“ AssertVirtualization=
+βœ“ AssertSecurity=
+βœ“ AssertCapability=
+βœ“ AssertHost=
+βœ“ AssertACPower=
+βœ“ AssertUser=
+βœ“ AssertGroup=
+βœ“ AssertControlGroupController=
+βœ“ CollectMode=
+```
+
+## Execution-Related Settings
+
+All execution-related settings are available for transient units.
+
+```
+βœ“ WorkingDirectory=
+βœ“ RootDirectory=
+βœ“ RootImage=
+βœ“ User=
+βœ“ Group=
+βœ“ SupplementaryGroups=
+βœ“ Nice=
+βœ“ OOMScoreAdjust=
+βœ“ IOSchedulingClass=
+βœ“ IOSchedulingPriority=
+βœ“ CPUSchedulingPolicy=
+βœ“ CPUSchedulingPriority=
+βœ“ CPUSchedulingResetOnFork=
+βœ“ CPUAffinity=
+βœ“ UMask=
+βœ“ Environment=
+βœ“ EnvironmentFile=
+βœ“ PassEnvironment=
+βœ“ UnsetEnvironment=
+βœ“ DynamicUser=
+βœ“ RemoveIPC=
+βœ“ StandardInput=
+βœ“ StandardOutput=
+βœ“ StandardError=
+βœ“ StandardInputText=
+βœ“ StandardInputData=
+βœ“ TTYPath=
+βœ“ TTYReset=
+βœ“ TTYVHangup=
+βœ“ TTYVTDisallocate=
+βœ“ SyslogIdentifier=
+βœ“ SyslogFacility=
+βœ“ SyslogLevel=
+βœ“ SyslogLevelPrefix=
+βœ“ LogLevelMax=
+βœ“ LogExtraFields=
+βœ“ SecureBits=
+βœ“ CapabilityBoundingSet=
+βœ“ AmbientCapabilities=
+βœ“ TimerSlackNSec=
+βœ“ NoNewPrivileges=
+βœ“ KeyringMode=
+βœ“ SystemCallFilter=
+βœ“ SystemCallArchitectures=
+βœ“ SystemCallErrorNumber=
+βœ“ MemoryDenyWriteExecute=
+βœ“ RestrictNamespaces=
+βœ“ RestrictRealtime=
+βœ“ RestrictAddressFamilies=
+βœ“ LockPersonality=
+βœ“ LimitCPU=
+βœ“ LimitFSIZE=
+βœ“ LimitDATA=
+βœ“ LimitSTACK=
+βœ“ LimitCORE=
+βœ“ LimitRSS=
+βœ“ LimitNOFILE=
+βœ“ LimitAS=
+βœ“ LimitNPROC=
+βœ“ LimitMEMLOCK=
+βœ“ LimitLOCKS=
+βœ“ LimitSIGPENDING=
+βœ“ LimitMSGQUEUE=
+βœ“ LimitNICE=
+βœ“ LimitRTPRIO=
+βœ“ LimitRTTIME=
+βœ“ ReadWritePaths=
+βœ“ ReadOnlyPaths=
+βœ“ InaccessiblePaths=
+βœ“ BindPaths=
+βœ“ BindReadOnlyPaths=
+βœ“ TemporaryFileSystem=
+βœ“ PrivateTmp=
+βœ“ PrivateDevices=
+βœ“ PrivateMounts=
+βœ“ ProtectKernelTunables=
+βœ“ ProtectKernelModules=
+βœ“ ProtectControlGroups=
+βœ“ PrivateNetwork=
+βœ“ PrivateUsers=
+βœ“ ProtectSystem=
+βœ“ ProtectHome=
+βœ“ MountFlags=
+βœ“ MountAPIVFS=
+βœ“ Personality=
+βœ“ RuntimeDirectoryPreserve=
+βœ“ RuntimeDirectoryMode=
+βœ“ RuntimeDirectory=
+βœ“ StateDirectoryMode=
+βœ“ StateDirectory=
+βœ“ CacheDirectoryMode=
+βœ“ CacheDirectory=
+βœ“ LogsDirectoryMode=
+βœ“ LogsDirectory=
+βœ“ ConfigurationDirectoryMode=
+βœ“ ConfigurationDirectory=
+βœ“ PAMName=
+βœ“ IgnoreSIGPIPE=
+βœ“ UtmpIdentifier=
+βœ“ UtmpMode=
+βœ“ SELinuxContext=
+βœ“ SmackProcessLabel=
+βœ“ AppArmorProfile=
+βœ“ Slice=
+```
+
+## Resource Control Settings
+
+All cgroup/resource control settings are available for transient units
+
+```
+βœ“ CPUAccounting=
+βœ“ CPUWeight=
+βœ“ StartupCPUWeight=
+βœ“ CPUShares=
+βœ“ StartupCPUShares=
+βœ“ CPUQuota=
+βœ“ MemoryAccounting=
+βœ“ MemoryMin=
+βœ“ MemoryLow=
+βœ“ MemoryHigh=
+βœ“ MemoryMax=
+βœ“ MemorySwapMax=
+βœ“ MemoryLimit=
+βœ“ DeviceAllow=
+βœ“ DevicePolicy=
+βœ“ IOAccounting=
+βœ“ IOWeight=
+βœ“ StartupIOWeight=
+βœ“ IODeviceWeight=
+βœ“ IOReadBandwidthMax=
+βœ“ IOWriteBandwidthMax=
+βœ“ IOReadIOPSMax=
+βœ“ IOWriteIOPSMax=
+βœ“ BlockIOAccounting=
+βœ“ BlockIOWeight=
+βœ“ StartupBlockIOWeight=
+βœ“ BlockIODeviceWeight=
+βœ“ BlockIOReadBandwidth=
+βœ“ BlockIOWriteBandwidth=
+βœ“ TasksAccounting=
+βœ“ TasksMax=
+βœ“ Delegate=
+βœ“ IPAccounting=
+βœ“ IPAddressAllow=
+βœ“ IPAddressDeny=
+```
+
+## Process Killing Settings
+
+All process killing settings are available for transient units:
+
+```
+βœ“ SendSIGKILL=
+βœ“ SendSIGHUP=
+βœ“ KillMode=
+βœ“ KillSignal=
+βœ“ FinalKillSignal=
+```
+
+## Service Unit Settings
+
+Most service unit settings are available for transient units.
+
+```
+βœ“ PIDFile=
+βœ“ ExecStartPre=
+βœ“ ExecStart=
+βœ“ ExecStartPost=
+βœ“ ExecReload=
+βœ“ ExecStop=
+βœ“ ExecStopPost=
+βœ“ RestartSec=
+βœ“ TimeoutStartSec=
+βœ“ TimeoutStopSec=
+βœ“ TimeoutSec=
+βœ“ RuntimeMaxSec=
+βœ“ WatchdogSec=
+βœ“ Type=
+βœ“ Restart=
+βœ“ PermissionsStartOnly=
+βœ“ RootDirectoryStartOnly=
+βœ“ RemainAfterExit=
+βœ“ GuessMainPID=
+βœ“ RestartPreventExitStatus=
+βœ“ RestartForceExitStatus=
+βœ“ SuccessExitStatus=
+βœ“ NonBlocking=
+βœ“ BusName=
+βœ“ FileDescriptorStoreMax=
+βœ“ NotifyAccess=
+ Sockets=
+βœ“ USBFunctionDescriptors=
+βœ“ USBFunctionStrings=
+```
+
+## Mount Unit Settings
+
+All mount unit settings are available to transient units:
+
+```
+βœ“ What=
+βœ“ Where=
+βœ“ Options=
+βœ“ Type=
+βœ“ TimeoutSec=
+βœ“ DirectoryMode=
+βœ“ SloppyOptions=
+βœ“ LazyUnmount=
+βœ“ ForceUnmount=
+```
+
+## Automount Unit Settings
+
+All automount unit setting is available to transient units:
+
+```
+βœ“ Where=
+βœ“ DirectoryMode=
+βœ“ TimeoutIdleSec=
+```
+
+## Timer Unit Settings
+
+Most timer unit settings are available to transient units.
+
+```
+βœ“ OnCalendar=
+βœ“ OnActiveSec=
+βœ“ OnBootSec=
+βœ“ OnStartupSec=
+βœ“ OnUnitActiveSec=
+βœ“ OnUnitInactiveSec=
+βœ“ Persistent=
+βœ“ WakeSystem=
+βœ“ RemainAfterElapse=
+βœ“ AccuracySec=
+βœ“ RandomizedDelaySec=
+ Unit=
+```
+
+## Slice Unit Settings
+
+Slice units are fully supported as transient units, but they have no settings
+of their own beyond the generic unit and resource control settings.
+
+## Scope Unit Settings
+
+Scope units are fully supported as transient units (in fact they only exist as
+such).
+
+```
+βœ“ TimeoutStopSec=
+```
+
+## Socket Unit Settings
+
+Most socket unit settings are available to transient units.
+
+```
+βœ“ ListenStream=
+βœ“ ListenDatagram=
+βœ“ ListenSequentialPacket=
+βœ“ ListenFIFO=
+βœ“ ListenNetlink=
+βœ“ ListenSpecial=
+βœ“ ListenMessageQueue=
+βœ“ ListenUSBFunction=
+βœ“ SocketProtocol=
+βœ“ BindIPv6Only=
+βœ“ Backlog=
+βœ“ BindToDevice=
+βœ“ ExecStartPre=
+βœ“ ExecStartPost=
+βœ“ ExecStopPre=
+βœ“ ExecStopPost=
+βœ“ TimeoutSec=
+βœ“ SocketUser=
+βœ“ SocketGroup=
+βœ“ SocketMode=
+βœ“ DirectoryMode=
+βœ“ Accept=
+βœ“ Writable=
+βœ“ MaxConnections=
+βœ“ MaxConnectionsPerSource=
+βœ“ KeepAlive=
+βœ“ KeepAliveTimeSec=
+βœ“ KeepAliveIntervalSec=
+βœ“ KeepAliveProbes=
+βœ“ DeferAcceptSec=
+βœ“ NoDelay=
+βœ“ Priority=
+βœ“ ReceiveBuffer=
+βœ“ SendBuffer=
+βœ“ IPTOS=
+βœ“ IPTTL=
+βœ“ Mark=
+βœ“ PipeSize=
+βœ“ FreeBind=
+βœ“ Transparent=
+βœ“ Broadcast=
+βœ“ PassCredentials=
+βœ“ PassSecurity=
+βœ“ TCPCongestion=
+βœ“ ReusePort=
+βœ“ MessageQueueMaxMessages=
+βœ“ MessageQueueMessageSize=
+βœ“ RemoveOnStop=
+βœ“ Symlinks=
+βœ“ FileDescriptorName=
+ Service=
+βœ“ TriggerLimitIntervalSec=
+βœ“ TriggerLimitBurst=
+βœ“ SmackLabel=
+βœ“ SmackLabelIPIn=
+βœ“ SmackLabelIPOut=
+βœ“ SELinuxContextFromNet=
+```
+
+## Swap Unit Settings
+
+Swap units are currently not available at all as transient units:
+
+```
+ What=
+ Priority=
+ Options=
+ TimeoutSec=
+```
+
+## Path Unit Settings
+
+Most path unit settings are available to transient units.
+
+```
+βœ“ PathExists=
+βœ“ PathExistsGlob=
+βœ“ PathChanged=
+βœ“ PathModified=
+βœ“ DirectoryNotEmpty=
+ Unit=
+βœ“ MakeDirectory=
+βœ“ DirectoryMode=
+```
+
+## Install Section
+
+The `[Install]` section is currently not available at all for transient units, and it probably doesn't even make sense.
+
+```
+ Alias=
+ WantedBy=
+ RequiredBy=
+ Also=
+ DefaultInstance=
+```
diff --git a/docs/TRANSLATORS b/docs/TRANSLATORS
new file mode 100644
index 000000000..873ec7b01
--- /dev/null
+++ b/docs/TRANSLATORS
@@ -0,0 +1,27 @@
+Notes for translators
+=====================
+
+systemd depends on gettext for multilingual support.
+In po/ directory you'll find the needed files.
+
+POT (Portable Object Template)
+------------------------------
+A text file with .pot extension, with all the extracted labels from code.
+
+To update the template:
+
+$ cd systemd/
+$ ninja -C build systemd-pot
+
+To start a new translation:
+
+$ cd po/
+$ cp systemd.pot <YOUR-LANG-CODE>.po
+
+Replace <YOUR-LANG-CODE> with the two-letters codes of ISO 639 standard.
+
+PO (Portable Object)
+--------------------
+A text file with .po extension, with all the available labels and some additional
+metadata fields. Any editor is ok, but a good standard is 'poedit', a graphical
+application specifically designed for this kind of task.
diff --git a/docs/UIDS-GIDS.md b/docs/UIDS-GIDS.md
new file mode 100644
index 000000000..775549131
--- /dev/null
+++ b/docs/UIDS-GIDS.md
@@ -0,0 +1,278 @@
+# Users, Groups, UIDs and GIDs on `systemd` systems
+
+Here's a summary of the requirements `systemd` (and Linux) make on UID/GID
+assignments and their ranges.
+
+Note that while in theory UIDs and GIDs are orthogonal concepts they really
+aren't IRL. With that in mind, when we discuss UIDs below it should be assumed
+that whatever we say about UIDs applies to GIDs in mostly the same way, and all
+the special assignments and ranges for UIDs always have mostly the same
+validity for GIDs too.
+
+## Special Linux UIDs
+
+In theory, the range of the C type `uid_t` is 32bit wide on Linux,
+i.e. 0…4294967295. However, four UIDs are special on Linux:
+
+1. 0 β†’ The `root` super-user
+
+2. 65534 β†’ The `nobody` UID, also called the "overflow" UID or similar. It's
+ where various subsystems map unmappable users to, for example file systems
+ only supporting 16bit UIDs, NFS or user namespacing. (The latter can be
+ changed with a sysctl during runtime, but that's not supported on
+ `systemd`. If you do change it you void your warranty.) Because Fedora is a
+ bit confused the `nobody` user is called `nfsnobody` there (and they have a
+ different `nobody` user at UID 99). I hope this will be corrected eventually
+ though. (Also, some distributions call the `nobody` group `nogroup`. I wish
+ they didn't.)
+
+3. 4294967295, aka "32bit `(uid_t) -1`" β†’ This UID is not a valid user ID, as
+ `setresuid()`, `chown()` and friends treat -1 as a special request to not
+ change the UID of the process/file. This UID is hence not available for
+ assignment to users in the user database.
+
+4. 65535, aka "16bit `(uid_t) -1`" β†’ Before Linux kernel 2.4 `uid_t` used to be
+ 16bit, and programs compiled for that would hence assume that `(uid_t) -1`
+ is 65535. This UID is hence not usable either.
+
+The `nss-systemd` glibc NSS module will synthesize user database records for
+the UIDs 0 and 65534 if the system user database doesn't list them. This means
+that any system where this module is enabled works to some minimal level
+without `/etc/passwd`.
+
+## Special Distribution UID ranges
+
+Distributions generally split the available UID range in two:
+
+1. 1…999 β†’ System users. These are users that do not map to actual "human"
+ users, but are used as security identities for system daemons, to implement
+ privilege separation and run system daemons with minimal privileges.
+
+2. 1000…65533 and 65536…4294967294 β†’ Everything else, i.e. regular (human) users.
+
+Note that most distributions allow changing the boundary between system and
+regular users, even during runtime as user configuration. Moreover, some older
+systems placed the boundary at 499/500, or even 99/100. In `systemd`, the
+boundary is configurable only during compilation time, as this should be a
+decision for distribution builders, not for users. Moreover, we strongly
+discourage downstreams to change the boundary from the upstream default of
+999/1000.
+
+Also note that programs such as `adduser` tend to allocate from a subset of the
+available regular user range only, usually 1000..60000. And it's also usually
+user-configurable, too.
+
+Note that systemd requires that system users and groups are resolvable without
+networking available β€” a requirement that is not made for regular users. This
+means regular users may be stored in remote LDAP or NIS databases, but system
+users may not (except when there's a consistent local cache kept, that is
+available during earliest boot, including in the initial RAM disk).
+
+## Special `systemd` GIDs
+
+`systemd` defines no special UIDs beyond what Linux already defines (see
+above). However, it does define some special group/GID assignments, which are
+primarily used for `systemd-udevd`'s device management. The precise list of the
+currently defined groups is found in this `sysusers.d` snippet:
+[basic.conf](https://raw.githubusercontent.com/systemd/systemd/master/sysusers.d/basic.conf.in)
+
+It's strongly recommended that downstream distributions include these groups in
+their default group databases.
+
+Note that the actual GID numbers assigned to these groups do not have to be
+constant beyond a specific system. There's one exception however: the `tty`
+group must have the GID 5. That's because it must be encoded in the `devpts`
+mount parameters during earliest boot, at a time where NSS lookups are not
+possible. (Note that the actual GID can be changed during `systemd` build time,
+but downstreams are strongly advised against doing that.)
+
+## Special `systemd` UID ranges
+
+`systemd` defines a number of special UID ranges:
+
+1. 61184…65519 β†’ UIDs for dynamic users are allocated from this range (see the
+ `DynamicUser=` documentation in
+ [`systemd.exec(5)`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html)). This
+ range has been chosen so that it is below the 16bit boundary (i.e. below
+ 65535), in order to provide compatibility with container environments that
+ assign a 64K range of UIDs to containers using user namespacing. This range
+ is above the 60000 boundary, so that its allocations are unlikely to be
+ affected by `adduser` allocations (see above). And we leave some room
+ upwards for other purposes. (And if you wonder why precisely these numbers:
+ if you write them in hexadecimal, they might make more sense: 0xEF00 and
+ 0xFFEF). The `nss-systemd` module will synthesize user records implicitly
+ for all currently allocated dynamic users from this range. Thus, NSS-based
+ user record resolving works correctly without those users being in
+ `/etc/passwd`.
+
+2. 524288…1879048191 β†’ UID range for `systemd-nspawn`'s automatic allocation of
+ per-container UID ranges. When the `--private-users=pick` switch is used (or
+ `-U`) then it will automatically find a so far unused 16bit subrange of this
+ range and assign it to the container. The range is picked so that the upper
+ 16bit of the 32bit UIDs are constant for all users of the container, while
+ the lower 16bit directly encode the 65536 UIDs assigned to the
+ container. This mode of allocation means that the upper 16bit of any UID
+ assigned to a container are kind of a "container ID", while the lower 16bit
+ directly expose the container's own UID numbers. If you wonder why precisely
+ these numbers, consider them in hexadecimal: 0x00080000…0x6FFFFFFF. This
+ range is above the 16bit boundary. Moreover it's below the 31bit boundary,
+ as some broken code (specifically: the kernel's `devpts` file system)
+ erroneously considers UIDs signed integers, and hence can't deal with values
+ above 2^31. The `nss-mymachines` glibc NSS module will synthesize user
+ database records for all UIDs assigned to a running container from this
+ range.
+
+Note for both allocation ranges: when an UID allocation takes place NSS is
+checked for collisions first, and a different UID is picked if an entry is
+found. Thus, the user database is used as synchronization mechanism to ensure
+exclusive ownership of UIDs and UID ranges. To ensure compatibility with other
+subsystems allocating from the same ranges it is hence essential that they
+ensure that whatever they pick shows up in the user/group databases, either by
+providing an NSS module, or by adding entries directly to `/etc/passwd` and
+`/etc/group`. For performance reasons, do note that `systemd-nspawn` will only
+do an NSS check for the first UID of the range it allocates, not all 65536 of
+them. Also note that while the allocation logic is operating, the glibc
+`lckpwdf()` user database lock is taken, in order to make this logic race-free.
+
+## Figuring out the system's UID boundaries
+
+The most important boundaries of the local system may be queried with
+`pkg-config`:
+
+```
+$ pkg-config --variable=systemuidmax systemd
+999
+$ pkg-config --variable=dynamicuidmin systemd
+61184
+$ pkg-config --variable=dynamicuidmax systemd
+65519
+$ pkg-config --variable=containeruidbasemin systemd
+524288
+$ pkg-config --variable=containeruidbasemax systemd
+1878982656
+```
+
+(Note that the latter encodes the maximum UID *base* `systemd-nspawn` might
+pick β€” given that 64K UIDs are assigned to each container according to this
+allocation logic, the maximum UID used for this range is hence
+1878982656+65535=1879048191.)
+
+Note that systemd does not make any of these values runtime-configurable. All
+these boundaries are chosen during build time. That said, the system UID/GID
+boundary is traditionally configured in /etc/login.defs, though systemd won't
+look there during runtime.
+
+## Considerations for container managers
+
+If you hack on a container manager, and wonder how and how many UIDs best to
+assign to your containers, here are a few recommendations:
+
+1. Definitely, don't assign less than 65536 UIDs/GIDs. After all the `nobody`
+user has magic properties, and hence should be available in your container, and
+given that it's assigned the UID 65534, you should really cover the full 16bit
+range in your container. Note that systemd will β€” as mentioned β€” synthesize
+user records for the `nobody` user, and assumes its availability in various
+other parts of its codebase, too, hence assigning fewer users means you lose
+compatibility with running systemd code inside your container. And most likely
+other packages make similar restrictions.
+
+2. While it's fine to assign more than 65536 UIDs/GIDs to a container, there's
+most likely not much value in doing so, as Linux distributions won't use the
+higher ranges by default (as mentioned neither `adduser` nor `systemd`'s
+dynamic user concept allocate from above the 16bit range). Unless you actively
+care for nested containers, it's hence probably a good idea to allocate exactly
+65536 UIDs per container, and neither less nor more. A pretty side-effect is
+that by doing so, you expose the same number of UIDs per container as Linux 2.2
+supported for the whole system, back in the days.
+
+3. Consider allocating UID ranges for containers so that the first UID you
+assign has the lower 16bits all set to zero. That way, the upper 16bits become
+a container ID of some kind, while the lower 16bits directly encode the
+internal container UID. This is the way `systemd-nspawn` allocates UID ranges
+(see above). Following this allocation logic ensures best compatibility with
+`systemd-nspawn` and all other container managers following the scheme, as it
+is sufficient then to check NSS for the first UID you pick regarding conflicts,
+as that's what they do, too. Moreover, it makes `chown()`ing container file
+system trees nicely robust to interruptions: as the external UID encodes the
+internal UID in a fixed way, it's very easy to adjust the container's base UID
+without the need to know the original base UID: to change the container base,
+just mask away the upper 16bit, and insert the upper 16bit of the new container
+base instead. Here are the easy conversions to derive the internal UID, the
+external UID, and the container base UID from each other:
+
+ ```
+ INTERNAL_UID = EXTERNAL_UID & 0x0000FFFF
+ CONTAINER_BASE_UID = EXTERNAL_UID & 0xFFFF0000
+ EXTERNAL_UID = INTERNAL_UID | CONTAINER_BASE_UID
+ ```
+
+4. When picking a UID range for containers, make sure to check NSS first, with
+a simple `getpwuid()` call: if there's already a user record for the first UID
+you want to pick, then it's already in use: pick a different one. Wrap that
+call in a `lckpwdf()` + `ulckpwdf()` pair, to make allocation
+race-free. Provide an NSS module that makes all UIDs you end up taking show up
+in the user database, and make sure that the NSS module returns up-to-date
+information before you release the lock, so that other system components can
+safely use the NSS user database as allocation check, too. Note that if you
+follow this scheme no changes to `/etc/passwd` need to be made, thus minimizing
+the artifacts the container manager persistently leaves in the system.
+
+## Summary
+
+| UID/GID | Purpose | Defined By | Listed in |
+|-----------------------|-----------------------|---------------|-------------------------------|
+| 0 | `root` user | Linux | `/etc/passwd` + `nss-systemd` |
+| 1…4 | System users | Distributions | `/etc/passwd` |
+| 5 | `tty` group | `systemd` | `/etc/passwd` |
+| 6…999 | System users | Distributions | `/etc/passwd` |
+| 1000…60000 | Regular users | Distributions | `/etc/passwd` + LDAP/NIS/… |
+| 60001…61183 | Unused | | |
+| 61184…65519 | Dynamic service users | `systemd` | `nss-systemd` |
+| 65520…65533 | Unused | | |
+| 65534 | `nobody` user | Linux | `/etc/passwd` + `nss-systemd` |
+| 65535 | 16bit `(uid_t) -1` | Linux | |
+| 65536…524287 | Unused | | |
+| 524288…1879048191 | Container UID ranges | `systemd` | `nss-mymachines` |
+| 1879048192…4294967294 | Unused | | |
+| 4294967295 | 32bit `(uid_t) -1` | Linux | |
+
+Note that "Unused" in the table above doesn't meant that these ranges are
+really unused. It just means that these ranges have no well-established
+pre-defined purposes between Linux, generic low-level distributions and
+`systemd`. There might very well be other packages that allocate from these
+ranges.
+
+## Notes on resolvability of user and group names
+
+User names, UIDs, group names and GIDs don't have to be resolvable using NSS
+(i.e. getpwuid() and getpwnam() and friends) all the time. However, systemd
+makes the following requirements:
+
+System users generally have to be resolvable during early boot already. This
+means they should not be provided by any networked service (as those usually
+become available during late boot only), except if a local cache is kept that
+makes them available during early boot too (i.e. before networking is
+up). Specifically, system users need to be resolvable at least before
+`systemd-udevd.service` and `systemd-tmpfiles.service` are started, as both
+need to resolve system users β€” but note that there might be more services
+requiring full resolvability of system users than just these two.
+
+Regular users do not need to be resolvable during early boot, it is sufficient
+if they become resolvable during late boot. Specifically, regular users need to
+be resolvable at the point in time the `nss-user-lookup.target` unit is
+reached. This target unit is generally used as synchronization point between
+providers of the user database and consumers of it. Services that require that
+the user database is fully available (for example, the login service
+`systemd-logind.service`) are ordered *after* it, while services that provide
+parts of the user database (for example an LDAP user database client) are
+ordered *before* it. Note that `nss-user-lookup.target` is a *passive* unit: in
+order to minimize synchronization points on systems that don't need it the unit
+is pulled into the initial transaction only if there's at least one service
+that really needs it, and that means only if there's a service providing the
+local user database somehow through IPC or suchlike. Or in other words: if you
+hack on some networked user database project, then make sure you order your
+service `Before=nss-user-lookup.target` and that you pull it in with
+`Wants=nss-user-lookup.target`. However, if you hack on some project that needs
+the user database to be up in full, then order your service
+`After=nss-user-lookup.target`, but do *not* pull it in via a `Wants=`
+dependency.
diff --git a/docs/sysvinit/README.in b/docs/sysvinit/README.in
new file mode 100644
index 000000000..de5d80d90
--- /dev/null
+++ b/docs/sysvinit/README.in
@@ -0,0 +1,27 @@
+You are looking for the traditional init scripts in @SYSTEM_SYSVINIT_PATH@,
+and they are gone?
+
+Here's an explanation on what's going on:
+
+You are running a systemd-based OS where traditional init scripts have
+been replaced by native systemd services files. Service files provide
+very similar functionality to init scripts. To make use of service
+files simply invoke "systemctl", which will output a list of all
+currently running services (and other units). Use "systemctl
+list-unit-files" to get a listing of all known unit files, including
+stopped, disabled and masked ones. Use "systemctl start
+foobar.service" and "systemctl stop foobar.service" to start or stop a
+service, respectively. For further details, please refer to
+systemctl(1).
+
+Note that traditional init scripts continue to function on a systemd
+system. An init script @SYSTEM_SYSVINIT_PATH@/foobar is implicitly mapped
+into a service unit foobar.service during system initialization.
+
+Thank you!
+
+Further reading:
+ man:systemctl(1)
+ man:systemd(1)
+ http://0pointer.de/blog/projects/systemd-for-admins-3.html
+ https://www.freedesktop.org/wiki/Software/systemd/Incompatibilities
diff --git a/docs/sysvinit/meson.build b/docs/sysvinit/meson.build
new file mode 100644
index 000000000..fbac59ae4
--- /dev/null
+++ b/docs/sysvinit/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: LGPL-2.1+
+
+file = configure_file(
+ input : 'README.in',
+ output : 'README',
+ configuration : substs)
+
+if conf.get('HAVE_SYSV_COMPAT') == 1
+ install_data(file,
+ install_dir : sysvinit_path)
+endif
diff --git a/docs/var-log/README.in b/docs/var-log/README.in
new file mode 100644
index 000000000..2e64fb196
--- /dev/null
+++ b/docs/var-log/README.in
@@ -0,0 +1,26 @@
+You are looking for the traditional text log files in @VARLOGDIR@, and
+they are gone?
+
+Here's an explanation on what's going on:
+
+You are running a systemd-based OS where traditional syslog has been
+replaced with the Journal. The journal stores the same (and more)
+information as classic syslog. To make use of the journal and access
+the collected log data simply invoke "journalctl", which will output
+the logs in the identical text-based format the syslog files in
+@VARLOGDIR@ used to be. For further details, please refer to
+journalctl(1).
+
+Alternatively, consider installing one of the traditional syslog
+implementations available for your distribution, which will generate
+the classic log files for you. Syslog implementations such as
+syslog-ng or rsyslog may be installed side-by-side with the journal
+and will continue to function the way they always did.
+
+Thank you!
+
+Further reading:
+ man:journalctl(1)
+ man:systemd-journald.service(8)
+ man:journald.conf(5)
+ http://0pointer.de/blog/projects/the-journal.html
diff --git a/docs/var-log/meson.build b/docs/var-log/meson.build
new file mode 100644
index 000000000..0ddff20ce
--- /dev/null
+++ b/docs/var-log/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: LGPL-2.1+
+
+file = configure_file(
+ input : 'README.in',
+ output : 'README',
+ configuration : substs)
+
+if conf.get('HAVE_SYSV_COMPAT') == 1
+ install_data(file,
+ install_dir : varlogdir)
+endif