Evaluate Security of Service¶
Overall security scores for all running services:
$ systemd-analyze security ModemManager.service 5.9 MEDIUM 😐 NetworkManager.service 7.8 EXPOSED 🙁 systemd-journald.service 4.3 OK 🙂 systemd-logind.service 2.6 OK 🙂 …
Detailed security analysis for a single service:
$ systemd-analyze security firstname.lastname@example.org NAME DESCRIPTION EXPOSURE ✗ PrivateNetwork= Service has access to the host's network 0.5 ✗ User=/DynamicUser= Service runs as root user 0.4 ✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PC… Service may change UID/GID identities/cap… 0.3 ✓ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has no administrator privileges ✓ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has no ptrace() debugging abiliti… ✗ RestrictAddressFamilies=~AF_(INET|INET6) Service may allocate Internet sockets 0.3 ✓ RestrictNamespaces=~CLONE_NEWUSER Service cannot create user namespaces ✓ RestrictAddressFamilies=~… Service cannot allocate exotic sockets ✓ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|… Service cannot change file ownership/acce… ✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|… Service may override UNIX file/IPC permis… 0.2 ✓ CapabilityBoundingSet=~CAP_NET_ADMIN Service has no network configuration priv… ✓ CapabilityBoundingSet=~CAP_SYS_MODULE Service cannot load kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_RAWIO Service has no raw I/O access ✓ CapabilityBoundingSet=~CAP_SYS_TIME Service processes cannot change the syste… ✗ DeviceAllow= Service has a device ACL with some specia… 0.1 ✗ IPAddressDeny= Service does not define an IP address all… 0.2 ✓ KeyringMode= Service doesn't share key material with o… ✓ NoNewPrivileges= Service processes cannot acquire new priv… …
List of All Options¶
Global Hardening Options¶
Global options applied to systemd and all processes started by it. A reboot is required to apply the settings to systemd and all processes spawned by it.
Example config (
Identical to CapabilityBoundingSet for services but applied to systemd itself and all processes it starts.
Dropping capabilities which are required for the system to boot will leave you with an unbootable system.
As of today, I do not believe that any capability can be dropped easily. There is some capabilities which aren’t usually needed, such as CAP_CHECKPOINT_RESTORE or CAP_PERFMON, but these capabilities, for historical reasons, can also be obtained via CAP_SYS_ADMIN. This renders dropping them moot.
Options that can be used in
[Service] section of systemd
Example, extending the exim4 service with some custom hardening
[Service] PrivateTmp=yes ProtectSystem=strict TemporaryFileSystem=/run/exim4 ReadWritePaths=/var/lib/exim4 ReadWritePaths=/var/log/exim4 ReadWritePaths=/var/spool/exim4
See also More Examples below.
Enforce an AppArmor MAC profile for the service.
Enforce profile <profile_name>:
On Linux, super-user privileges are divided into capabilities. Available
capabilities are listed in capabilities(7) and
capability lists all capabilities known to systemd.
Restrict available capabilities (i.e. restrict super-user privileges).
Drop all capabilites:
Retain only capabilities CAP_SETGID and CAP_SETUID:
Drop only Capabilities CAP_SETGID and CAP_SETUID:
By default, all capabilities are dropped when running a service as non-root user. In order grant a non-root user limited super-user capabilities. This directive can be used.
Grant user backup-daemon capability CAP_DAC_READ_SEARCH:
This should generally be preferred to running a service as root and dropping capabilities via CapabilityBoundingSet because root will still have (write) access to most files as it owns most of them. Also, some services do permission checks based on UID. For instance, Postgres will check the UID/name of the connecting user.
Prevent memory allocations that are writeable and executable at the same time:
It may be possible to circumvent this protection unless any one of these conditions is met:
The memfd_create syscall is filtered (as shown above).
Write access to any file or directory is denied.
noexec mount options is set on any accessible filesystem. This may be achived via NoExecPaths.
Disable emulation of different behaviors to support non-Linux-native binaries.
Deny process to escalating privileges:
In particular, the service process and all its children will ignore setuid and
and setgid bits used by
sudo to gain privileges.
Note about systemd socket:
Services with access to run services via systemd (e.g. via
be able to get around this restriction.
Allow device /dev/loop-conrol, /dev/loop[0-9]:
Allow read-only access to /dev/sda:
Use PrivateDevices when only the default set of pseudo-devices like /dev/null, /dev/zero and /dev/urandom is needed.
By default, access to common pseudo-devices like /dev/null or /dev/urandom is always granted. This behiavior can be changed using systemd.resource-control(5) → DevicePolicy=.
Create a private IPC namespace for the service:
Multiple services can be made to share their IPC namespace using JoinsNamespaceOf.
Availability: systemd 248
Remove IPC objects when service is stopped:
Remove all System V and POSIX IPC objects owned by the user (and not the service) when the service is stopped.
Availability: systemd 248
Only allow access to PID information in
Control access to processes in
Deny access to other users processes:
Hide other users processes:
Hide non-ptraceable processes:
You should usually prefer invisible over noaccess as many services do not handle being denied access well.
These directive corresponds to the hidepid= mount option of proc. See proc(5)#Mount_options
Prevent service from manipulating clock:
Prevent modifications to the cgroup hierarchies by the service:
Only allow execution of /usr/bin/serviced:
This, in combination with MemoryDenyWriteExecute, may be used to make arbitrary code execution harder.
Availability: systemd 248
Create private, empty /tmp/ and /var/tmp/ for the service:
Multiple services can be made to share their /tmp and /var/tmp/ using JoinsNamespaceOf.
Temporary files are cleaned when the service is stopped.
Restrict access to /home/, /root, /run/user for a service.
Make /home/ inaccessible:
Make /home/ read-only:
ReadWritePaths may be used to lift read-only restriction on subdirectories.
Replace /home/ with an empty, read-only directory:
Make directory/files at /etc/hidden, /hidden/ and /home/ inaccessible:
InaccessiblePaths=/etc/hidden /hidden/ InaccessiblePaths=/home/
Make directory/files at /etc/hidden, /hidden/ and /home/ read-only:
ReadOnlyPaths=/etc/hidden /hidden/ ReadOnlyPaths=/home/
Make directory/files at /etc/hidden, /hidden/ and /home/ readable/writable:
ReadWritePaths=/etc/hidden /hidden/ ReadWritePaths=/home/
Only allow opening files on a ext4 or tmpfs filesystem:
Only deny access to network filesystems:
Obtain a list of all known filesystems and groups:
$ systemd-analyze filesystems
Availability: systemd 250
Mount /usr/, /boot/ and /efi/ read-only:
Additionally mount /etc/ read-only:
Mount everything read-only except /dev/, /proc/ and /sys
Use ReadWritePaths to allow write access to specific files or directories.
Prevent service from manipulating hostname (UTS namespace):
Deny service access to kernel logs (e.g. via dmesg(1)):
Prevent loading of kernel modules by service:
Place a empty tmpfs filesystem at /path/directory:
The same but make the directory read-only:
This is often useful when a service can’t deal with a directory being read-only or inaccessible but is fine with it being empty.
Create a private network namespace with only a private loopback interface:
Multiple services can be made to share their network namespace using JoinsNamespaceOf. Restricting access to the (global) loopback interface, or any other interface, can be done using RestrictNetworkInterfaces.
Availability: systemd 250
Restrict socket access to IPv6, IPv4 and Unix socket families respectively:
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
Allow no address family:
The special value of
none is only supported starting with systemd 249.
Enumerating network interfaces, for instance, to be able to bind to specific interfaces.
Logging via syslog(3).
Restrict access to the loopback (lo) interface:
Deny access to interface eth0 only:
When no network access is needed use PrivateNetwork.
Availability: systemd 250
Without also specifying
service may bind to all ports.
Allow service to bind to TCP ports 80 and 443 only:
SocketBindAllow=tcp:80 tcp:443 SocketBindDeny=any
Omit protocol to allow TCP and UDP:
SocketBindAllow=80 443 SocketBindDeny=any
Port ranges, like
1200-1300, are accepted too.
Allow an unprivileged service to bind to TCP ports 80 and 443 only:
AmbientCapabilities=CAP_NET_BIND_SERVICE SocketBindAllow=tcp:80 tcp:443 SocketBindDeny=any User=www-data
Capability CAP_NET_BIND_SERVICE is required to bind to any port lower than 1024. SocketBindAllow can be used to restrict this privilege to certain ports.
Availability: systemd 249
Without also specifying
service will be allowed to connect to any address.
Only allow connecting to CIDR networks 10.0.0.0/8 and fc00/7:
IPAddressAllow=10.0.0.0/8 fc00/7 IPAddressDeny=any
localhost can be used to restrict access to 127.0.0.1
and ::1. If you wish to restrict access to localhost only, consider
using RestrictNetworkInterfaces=lo in addition.
Deny any namespace change:
Only allow access namespaces ipc and net:
Only deny access namespaces ipc and net:
Deny access to any realtime scheduling functionality:
Prevent setting of SUID and SGID bits for file permissions:
System Call Filtering (seccomp)¶
Allow native calls only:
Disable ABI for non-native system calls. Namely, this disables support for x86 binaries on x86_64.
Allow only syscalls in group @system-service:
Allow syscalls in group @system-service and syscall seccomp except those in group @chown:
SystemCallFilter=@system-service seccomp SystemCallFilter=~@chown
Deny syscalls in group @chown with error EPERM rather than terminating the process:
Many services can deal with an EPERM, and other error codes, for certain calls only used for optional functionality.
A list of all known syscalls and groups can be obtained like this:
Rather then killing the process, systemd can also be instructed to return an error code like EPERM for all violations:
Services using SystemCallFilter should also use SystemCallArchitectures=native.
Create files and directories that are only accessible by user/owner if permission are not explicitly set during creation:
Allow user and group only:
User / Group¶
Dynamically create a Unix user as which the service is ran:
This is not suitable for services that write persistent data to disk or have to read private data. This because the UID/GID will be unpredictable and may be shared (though not at the same time) with other services.
Read sysemd.exec(5) → DynamicUser= before use.
See also ExecStart (run ExecStart=, ExecStartPre=, etc. with full privileges)
Run service in a private user namespace:
Run process as user serviced:
Group is taken from the passwd database unless specified via Group and Supplementary groups from the group database.
Set users group to serviced:
On Unix, any process belongs to a user (UID) and group (GID)
but it may also belong to additional/supplementary groups. Such
supplementary groups are shown in
$ id user uid=1000(user) gid=1000(user) groups=1000(user),999(qubes),126(docker)
Add service to supplementary group inet:
Groups from the system’s group database are left untouched and SupplementaryGroups are appended.
! can be used to execute commands with full
privileges (without User/Group/etc. being applied) and
without filesystem access restriction being applied
mkdir /etc/directory/ as root and with /etc/ being writable:
ExecStartPre=+mkdir /etc/directory ExecStart=serviced --foreground ReadOnlyPaths=/etc/ User=serviced
! to only revert the effects of
These prefixes can be used with ExecStart, ExecStartPre, ExecStartPost, ExecStop, ExecStopPre and ExecStopPost.
Audit Seccomp Violations¶
SystemCallFilter and other directives employ seccomp(2) filters and terminate processes that violate the filter. You can use auditd to diagnose filter violations.
apt install auditd
Try to start the service.
Check exit status:
$ systemctl --user status remote-ssh-agent.service ● remote-ssh-agent.service - Connect to SSH agent on remote machine. Loaded: loaded (/home/user/.config/systemd/user/remote-ssh-agent.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Sat 2022-01-08 17:12:17 CET; 2min 41s ago Process: 41342 ExecStartPre=rm /var/run/user/1000/remote-ssh-agent.socket (code=exited, status=0/SUCCESS) Process: 41343 ExecStart=/usr/bin/ncat -k -l -U /var/run/user/1000/remote-ssh-agent.socket -c qrexec-client-vm svc-ssh-agent-git qubes.SshAgent (code=killed, signal=SYS) Main PID: 41343 (code=killed, signal=SYS) CPU: 12ms Jan 08 17:12:17 dev systemd: Starting Connect to SSH agent on remote machine.... Jan 08 17:12:17 dev systemd: Started Connect to SSH agent on remote machine.. Jan 08 17:12:17 dev systemd: remote-ssh-agent.service: Main process exited, code=killed, status=31/SYS Jan 08 17:12:17 dev systemd: remote-ssh-agent.service: Failed with result 'signal'.
Processes that violate the seccomp policy are terminated with signal SIGSYS.
Find recent (i.e. last 10 minutes) audit logs with a message type SECCOMP:
$ ausearch -i -m SECCOMP -ts recent --- type=SECCOMP msg=audit(01/08/2022 17:12:17.214:96) : auid=user uid=user gid=user ses=1 subj==unconfined pid=41343 comm=ncat exe=/usr/bin/ncat sig=SIGSYS arch=x86_64 syscall=socket compat=0 ip=0x7b9e06e59477 code=kill
This logs indicate that the process was terminated with SIGSYS because the syscall socket was denied.
Fix the issue:
If the call is in fact needed allow it. An alternative, in some cases, is to disable certain features in the service that require the syscall.
You can allow the syscall explicitly:
Alternatively, you can allow a group that contains the socket syscall:
See SystemCallFilter for more details.
Apache2 serving static content only
[Service] CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_CHOWN CAP_SETUID CAP_SETGID CAP_KILL MemoryDenyWriteExecute=yes NoNewPrivileges=yes LockPersonality=yes ProtectClock=yes ProtectSystem=strict ReadWritePaths=/var/log/apache2/ ReadWritePaths=/var/run ProtectHome=yes ProtectHostname=yes ProtectKernelLogs=yes ProtectKernelModules=yes ProtectKernelTunables=yes ProtectControlGroups=yes RemoveIPC=yes RestrictAddressFamilies=AF_INET AF_INET6 RestrictNamespaces=yes RestrictRealtime=yes RestrictSUIDSGID=yes SystemCallArchitectures=native SystemCallFilter=@system-service SystemCallFilter=~@keyring
[Service] MemoryDenyWriteExecute=yes NoNewPrivileges=yes LockPersonality=yes ProtectHome=yes RemoveIPC=yes RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK RestrictNamespaces=yes ProtectKernelModules=yes ProtectKernelLogs=yes ProtectControlGroups=yes ProtectKernelTunables=yes RestrictRealtime=yes RestrictSUIDSGID=yes SystemCallArchitectures=native SystemCallFilter=@system-service
[Service] MemoryDenyWriteExecute=yes NoNewPrivileges=yes LockPersonality=yes ProtectHome=yes RemoveIPC=yes RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK RestrictNamespaces=yes ProtectKernelModules=yes ProtectKernelLogs=yes ProtectControlGroups=yes ProtectKernelTunables=yes RestrictRealtime=yes RestrictSUIDSGID=yes SystemCallArchitectures=native SystemCallFilter=@system-service chroot
Tor relay / onion service: roles/tor_server/templates/51-ansible-hardening.conf