PR_SET_NO_NEW_PRIVS（38） (since Linux 3.5)
Set the calling thread's no_new_privs bit to the value in arg2.
With no_new_privs set to 1, execve(2) promises not to grant
privileges to do anything that could not have been done without
the execve(2) call (for example, rendering the set-user-ID and
set-group-ID mode bits, and file capabilities non-functional).
Once set, this bit cannot be unset. The setting of this bit is
inherited by children created by fork(2) and clone(2), and pre‐
served across execve(2).
$ sudo su
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
PR_SET_SECCOMP (since Linux 2.6.23)
Set the secure computing (seccomp) mode for the calling thread,
to limit the available system calls. The more recent seccomp(2)
system call provides a superset of the functionality of
The seccomp mode is selected via arg2. (The seccomp constants
are defined in <linux/seccomp.h>.)
With arg2 set to SECCOMP_MODE_STRICT, the only system calls that
the thread is permitted to make are read(2), write(2), _exit(2)
(but not exit_group(2)), and sigreturn(2). Other system calls
result in the delivery of a SIGKILL signal. Strict secure
computing mode is useful for number-crunching applications that
may need to execute untrusted byte code, perhaps obtained by
reading from a pipe or socket. This operation is available only
if the kernel is configured with CONFIG_SECCOMP enabled.
With arg2 set to SECCOMP_MODE_FILTER (since Linux 3.5), the
system calls allowed are defined by a pointer to a Berkeley
Packet Filter passed in arg3. This argument is a pointer to
struct sock_fprog; it can be designed to filter arbitrary system
calls and system call arguments. This mode is available only if
the kernel is configured with CONFIG_SECCOMP_FILTER enabled.
If SECCOMP_MODE_FILTER filters permit fork(2), then the seccomp
mode is inherited by children created by fork(2); if execve(2)
is permitted, then the seccomp mode is preserved across
execve(2). If the filters permit prctl() calls, then additional
filters can be added; they are run in order until the first non-
allow result is seen.
For further information, see the kernel source file
* seccomp actions
* Kill the process
#define SCMP_ACT_KILL 0x00000000U
* Throw a SIGSYS signal
#define SCMP_ACT_TRAP 0x00030000U
* Return the specified error code
#define SCMP_ACT_ERRNO(x) (0x00050000U | ((x) & 0x0000ffffU))
* Notify a tracing process with the specified value
#define SCMP_ACT_TRACE(x) (0x7ff00000U | ((x) & 0x0000ffffU))
* Allow the syscall to be executed after the action has been logged
#define SCMP_ACT_LOG 0x7ffc0000U
* Allow the syscall to be executed
#define SCMP_ACT_ALLOW 0x7fff0000U
int seccomp_rule_add(scmp_filter_ctx ctx,
uint32_t action, int syscall, unsigned int arg_cnt, ...);