Hiding in plain sight - Mount namespaces

Hiding in plain sight - Mount namespaces
Photo by Fejuz

Introduction

The following post explores a relatively unknown technique on Linux offering the ability to conceal artifacts on the filesystem with an exceptional level of stealth. We will explore how mount namespaces can be (ab)used to:

  • Conceal files from all users on a host (including the root user)
  • Process masquerading with the process image never touching a physical disk
  • Prevent file activity and command logging from a bash shell
  • Performing all the above as both privileged and unprivileged users

The second part of the post explores mitigation and detection strategies. Defenders may want to pay attention - despite the lack of public references, the underlying concept is not exactly new - the ab(use) of mount namespaces to conceal artifacts has been observed in the wild.

A namespace is the the mechanism made available on the Linux kernel (since 2008) which provides support for the isolation of of resources between processes. Namespaces are commonly associated to 'containers', although in modern Linux distributions, systemd makes good use of them. No container runtime is needed to use namespaces. For further reading on the topic, see this post.

The examples described in this post all share a deceptively simple method:

  • Enter (or create a new) mount namespace
  • Mount a tmpfs filesystem over an arbitrary path

Files written to the new isolated tmpfs mount is concealed from all users while avoiding touching physical volumes (hindering disk acquisition). Let's coin a term for this technique - creating a stashspace.

Another use for a stashspace is to masquerade a malicious process as a legitimate program while remaining 'fileless'. Here a binary is executed from an isolated tmpfs volume that mirrors that of a legitimate running process. This method of 'process masquerading' is arguably superior to methods described in prior posts in this 'hiding in plain sight' series:

Part 1 - Modifying process names in UNIX-like systems
Part 2 - Abusing the dynamic linker

We are not limited to tmpfs mounts of course. The use of bind mounts can be used to temporarily 'overwrite' files from within a mount namespace. This can be used for the means of preventing processes touching files on the filesystem. An example of this would be preventing bash invoked in a reverse shell from reading system files, leaving forensic marks from updated access times.

Hiding within an existing namespace

The first example is the most simple: Here we assume a box has been compromised with root privileges. A mount namespace, different from the 'default' one that users of the host are using is chosen. systemd offers a few - anything except /sbin/init's mount namespace will do:

NetworkManager (504) is chosen. We migrate the current process (bash) into NetworkManager's mount namespace:

root@debian:/home/debian# nsenter -t 504 --mount
root@debian(NS):/# 

(For clarity, 'NS' has been added to PS1 to differentiate between the attacker shell and a legitimate user's shell)

From within NetworkManager's mount namespace, a temporary file system is mounted to an arbitrary path:

root@debian(NS):/# mount -t tmpfs tmpfs /root

Artifacts now written within /root from within the current shell will not be visible to any user. Here a 'malicious' file is written to /root:

root@debian(NS):/# echo "root was here" > /root/CATCHMEIFYOUCAN
root@debian(NS):/# ls -al /root
total 8
drwxrwxrwt  2 root root   60 Jun 28 08:30 .
drwxr-xr-x 19 root root 4096 Oct 20  2023 ..
-rw-r--r--  1 root root   18 Jun 28 08:30 CATCHMEIFYOUCAN

The real root user now logs in and does a directory listing of their home directory. They see their own files in /root but CATCHMEIFYOUCAN is nowhere to be seen:

root@debian:~# ls -al /root
total 64
drwx------  5 root root  4096 Jun 18 20:57 .
drwxr-xr-x 19 root root  4096 Oct 20  2023 ..
-rw-------  1 root root  8156 Jun 18 20:57 .bash_history
-rw-r--r--  1 root root   571 Apr 10  2021 .bashrc

Furthermore, if an acquisition of the disk image was taken, the files placed in /root would be absent (just as writing files to common tmpfs volumes, such as /dev/shm).

Masquerading a running process

What happens if we run a program running within a new mount namespace? How will it appear to other users since they have no view of it?

Answer: the kernel lies (kind of) - procfs (hence ps will show the path of the program running in the mounted filesystem, even it is not accessible to users in a different namespace.

To demonstrate, assume /root/implant is a malicious executable. We will masquerade it as another arbitrary program on the host, auditd. To do this , a new mount namespace is created and a new tmpfs volume is mounted to /usr/sbin. The malicious binary implant is then copied to/usr/sbin/auditd and executed:

$ unshare -m -U --map-root-user
$ mount -t tmpfs tmpfs /usr/sbin/

$ cp /root/implant /usr/sbin/auditd
$ /usr/sbin/auditd

No need for any fancy process name stomping techniques: we have two identical process names and paths:

$ ps -e -o pid,ppid,comm,cmd --sort=start_time | grep sbin/auditd 
    PID    PPID COMMAND         CMD
   2627       1 auditd          /usr/sbin/auditd
   4510       1 auditd          /usr/sbin/auditd

4510 is the malicious process (sort by start time). Lucky we have trusty /proc/[pid]/exe to recover the process image right? Nope.

$ readlink /proc/4510/exe
/usr/sbin/auditd

$ readlink /proc/2627/exe
/usr/sbin/auditd

Effectivity what we have fileless malware running in a way that avoids the common artifacts used for detections:

  • No /dev/shm or other common tmpfs path
  • No (deleted) being marked by the kernel if the running process is deleted from the disk after execution
  • No :memfd from the memfd_create technique
  • No process injection required

Unprivileged users

Let's extend the concept of mount namespaces to work for unprivileged users. If there is no mount namespace to enter that a unprivileged user can enter, then can an unprivileged user create a new mount namespace?

debian@debian:~$ unshare --mount
unshare: unshare failed: Operation not permitted

Close but no cigar. There is an additional user namespace which must also be created with the process ID being remapped to UID 0. The switch --map-root-user for the unshare utility conveniently creates a new user namespace and does the required remapping:

debian@debian:~$ unshare --mount --map-root-user

root@debian(NS):~# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)

Now a tmpfs can be mounted within the new namespace and content written to it:

root@debian(NS):~# mount -t tmpfs tmpfs /root
root@debian(NS):~# echo "I'm not even root" > /root/CATCHMEIFYOUCAN
root@debian(NS):~# ls -al /root
total 8
drwxrwxrwt  2 root   root      60 Jun 28 09:07 .
drwxr-xr-x 19 nobody nogroup 4096 Oct 20  2023 ..
-rw-r--r--  1 root   root      18 Jun 28 09:07 CATCHMEIFYOUCAN
root@debian(NS):~# 

Everything within this directory is effectively invisible to all users..

root@debian:/home/debian# ls -al /root | grep CATCHMEIFYOUCAN
root@debian:/home/debian# 

This is due to the fact that when any user logs into host, they land within the same common default namespaces - and hence presented a different view to other users authenticated on the host.

Mitigations are available to prevent unprivileged users from doing this, described later in this post.

Avoiding bash history and access times

The most common way to avoid bash writing to it's history file is to set an environment variable to suppress the logging e.g. export HISTFILE=/dev/null. Looking for that environment variable in running processes is useful from a detection perspective (in part 1 of this series we looked at how the environment variables can be tampered to conceal this).

With the stashspace technique, if the tmpfs mount is set to $HOME the same effect is achieved - $HOME/.bash_history is 'recreated' temporarily and destroyed once the mount namespace is removed by the kernel, and hence nothing is logged to the history file:

mount("tmpfs", getenv("HOME"), "tmpfs", 0, NULL);

Furthermore, the bash shell can be suppressed from touching the disk in other locations, effectively updating their atime which hinders timeline analysis. To mount over a file, a bind mount can be used. The following example uses a subset of files outside of $HOME that bash attempts to read and does a bind mount over them from within the isolated mount namespace:

char *paths[] = { "/etc/nsswitch.conf", "/etc/bash.bashrc", "/etc/profile", NULL };  
char **p = paths;
while (*p != NULL)
   mount("/dev/null", *p++, NULL, MS_BIND, NULL);

Recommended reading on other ways bind mounts can be ab(used) here.

Persistence

There is one potential caveat to consider with the example given previously: if the bash shell terminates, all files written to the stashspace will evaporate into thin air. The reason for this is that the kernel holds a reference counter for each namespace and when the counter reaches zero, the namespace is removed (along with all file content). Seen from another perspective, this could be considered a benefit - when resources are no longer required, the kernel takes care of cleaning everything up which may reduce the risk of human made of OPSEC failures.

To demonstrate the concept further, we will write a program that maintains persistence on a stashspace. We will add the requirement that it must be able to be run as an unprivileged user.

A recap on the functional requirements. The program needs to:

  • Be prevented from exiting so the kernel does not destroy the stashspace
  • It needs to create two new namespaces (MNT and USER)
  • Remap process to be the root user (UID 0). The GID must also be 0.
  • Mount a tmpfs filesystem (the stash path)
  • The stashspace can be accessed on demand

To prevent the process from exiting, glibc's daemon will reparent the process to init , detaching from the current terminal to run in the background.

daemon(0,0);

As with the utility unshare, the unshare system call is used to create the new namespaces (clone could also be used). The flags CLONE_NEWNS and CLONE_NEWUSER are specified for the MNT and USER namespaces respectively:

unshare(CLONE_NEWNS | CLONE_NEWUSER);

A process can write to /proc/self/uid_map to specify a new UID from within the USER namespace it is running in. As a means to mitigate various privilege escalation vulnerabilities, a restriction was put in place in the Linux kernel requiring deny to be written to /proc/self/setgroups to disable the setgroups syscall, otherwise /proc/self/[uid|gid]_map cannot be modified. Additionally, both the user and group IDs need to be remapped, otherwise attempts to write to the new mount point will elicit the error: Value too large for defined data type.

Bringing the above requirements together:

fd = open("/proc/self/setgroups", O_WRONLY);
write(fd, "deny", 4);
  
fd = open("/proc/self/uid_map", O_WRONLY);
snprintf(map_buf, sizeof(map_buf), "0 %d 1", getuid());
write(fd, map_buf, strlen(map_buf));

fd = open("/proc/self/gid_map", O_WRONLY);
snprintf(map_buf, sizeof(map_buf), "0 %d 1", getgid());
write(fd, map_buf, strlen(map_buf));

Next, the a tmpfs filesystem is mounted:

mount("tmpfs", "/tmp", "tmpfs", 0, NULL);

And finally the process needs to stay alive to keep the references to the two new namespaces. Putting it to sleep is one approach:

pause();

Now when the program is run as an unprivileged user, a new /tmp is created and held a new mount namespace (4026532451):

debian@debian:~$ ./stashspace
debian@debian:~$ ps -o pid,ppid,comm,mntns  -p `pidof stashspace`
    PID    PPID COMMAND              MNTNS
   3316       1 stashspace      4026532451

Our current bash shell remains in the default mount namespace (4026531841):

debian@debian:~$ ps -o pid,ppid,comm,mntns  -p $$
    PID    PPID COMMAND              MNTNS
   1833    1832 bash            4026531841

To gain access to the hidden stash, we need to move into stashspace's mount and user namespaces. In another running program, enter the user namespace using the setns system call:

snprintf(ns_path, sizeof(ns_path), "/proc/%d/ns/user", target_pid);
fd = open(ns_path, O_RDONLY);
setns(fd, CLONE_NEWUSER);

Followed by the mount namespace:

snprintf(ns_path, sizeof(ns_path), "/proc/%d/ns/mnt", target_pid);
fd = open(ns_path, O_RDONLY);
setns(fd, CLONE_NEWNS);

Then we an replace the current process image with bash, and then access the hidden file content:

execl("/bin/bash", "bash", NULL);

A 'living off the land' approach would be to simple use the nsenter utility:

debian@debian:~$ nsenter -t 3316 -m -U --preserve-credentials
root@debian(NS):/# ls -al /tmp
total 4
drwxrwxrwt  2 root   root      40 Jun 28 10:54 .
drwxr-xr-x 19 nobody nogroup 4096 Oct 20  2023 .

Mitigations and Detections

Denying creation of user namespaces

It is possible to block the creation of new user namespaces for unprivileged users. The newer versions of Ubuntu now do this by default.

user@ubuntu:~$ unshare --mount --map-root-user
unshare: write failed /proc/self/uid_map: Operation not permitted

It does this with a preconfigured AppArmor profile that effectively drops CAP_SYS_ADMIN, required for both the unshare and clone system calls when creating a new user namespace.

$ cat /etc/apparmor.d/unprivileged_userns
...
profile unprivileged_userns {
     audit deny capability,
...

Note that there was a 'bypass' possible in Ubuntu 24.04, although it appears to have been mitigated now (tested with Ubuntu 25.04).

On Debian a sysctl is available to disable unprivileged user namespaces:

sysctl -w kernel.unprivileged_userns_clone=0

With some other distributions (Fedora, RHEL), the same outcome can be achieved by setting user.max_user_namespaces=0, effectively restricting the creation of any user namespace:

[root@fedora fedora]# sysctl -w user.max_user_namespaces=0
[root@fedora fedora]# unshare -U
unshare: unshare failed: No space left on device

There are other possible workable solutions such as seccomp filters for blocking unshare and clone(CLONE_NEWUSER | CLONE_NEWNS), although just dropping capabilities (CAP_SYS_ADMIN) might be a more robust approach. One implementation idea would be a small program that drops capabilities which is used in conjunction with pam_exec.so.

Detections

Accessing a stashspace

There are two ways to access the file content in a mount namespace - accessing them from within procfs or entering each mount namespace for each process running on the host.

procfs offers a rather convenient way to access files within each process's mount namespace - enumerate /proc/[pid]/root/. (Side note: this also happens to be a useful way access the filesystem for running containers in which you can't enter a shell into).

Taking the very first example in this post, where a file was placed in a tmpfs mounted in /root and is not visible:

root@debian:/home/user# ls -al /root | grep CATCH
root@debian:/home/user#

Now enumerating across every PID in /proc/[pid]/root/ and the file surfaces:

root@ubuntu:/home/user# find /proc/*/root/ | grep CATCH
/proc/5428/root/root/CATCHMEIFYOUCAN

The alternative is to enter the mount namespace before running the desired command. In this case, find:

root@debian:/home/user# ps -eo pid --no-headers | xargs -I{}  nsenter -t {} -m find / 2>/dev/null | grep CATCH
/root/CATCHMEIFYOUCAN

Uncovering malicious mount namespaces

The approach here is going to differ depending on what is considered a normal baseline. By default on many distributions, systemd has creates quite a few mount namespaces:

$ readlink /proc/*/ns/mnt | sort | uniq -c
    141 mnt:[4026531841]
      1 mnt:[4026531862]
      1 mnt:[4026532182]
      1 mnt:[4026532183]
      1 mnt:[4026532185]
      1 mnt:[4026532187]
      1 mnt:[4026532325]
      1 mnt:[4026532330]
      1 mnt:[4026532332]
      1 mnt:[4026532333]
      1 mnt:[4026532335]

We can filter out the 'default' mount namespace by reading in the current bash shell's mount namespace:

$ readlink /proc/$$/ns/mnt
mnt:[4026531841]

Using the control group name is one method to filter out the other default systemd processes. We also need to filter out kernel threads (ppid of 2). A simple bash script that enumerates all processes and does the filtering described:

#!/bin/bash

default_ns=$(readlink /proc/$$/ns/mnt)

ps -e -o pid,ppid,comm | while read -r pid ppid comm; do
  if [ -e "/proc/$pid/ns/mnt" ]; then
    [[ $ppid == 2 ]] && continue
    ns=$(readlink /proc/$pid/ns/mnt)
    [[ "$ns" == "$default_ns" ]] && continue
    grep -q -E 'system.slice|init.scope|systemd' /proc/$pid/cgroup && continue
    echo "$pid,$comm"
  fi
done

Let's run this now on a box that has all three primary examples used in this post (annotated):

$ bash ./script.sh
4510,auditd <-- masquaraded process
3316,bash <-- stashspace example
9800,bash <-- entering ns of existing systemd process

All malicious namespaces are found with no false positives. The next step would be further investigation, e.g. listing the tmpfs mounts from within each process. Taking the auditd example, we see /usr/sbin as an abnormality.

$ nsenter -t 4510 -m mount -t tmpfs | cut -d' ' -f3
/dev/shm
/run
/run/lock
/run/credentials/systemd-journald.service
/run/credentials/systemd-resolved.service
/run/credentials/systemd-networkd.service
/run/credentials/getty@tty1.service
/run/user/1000
/tmp
/usr/sbin

The script could be extended to filter out the common paths (/dev/shm, /tmp etc.) although this is not failsafe as 'legitimate' paths can be arbitrarily mounted, as was done in other examples in this post.

For hosts running containers, detection becomes a little more challenging. One approach here could be to correlate all tmpfs mounts to running containers and then do the necessary filtering.

System events

The unshare and clone system calls may capture malicious activities (and also generate many false positives..). Let's use auditd as an example. The following will match on all calls to the unshare and for clone only when CLONE_NEWNS (for mount namespace) is used as a parameter. Both these will yield plenty of false positive and would require further filtering.

-a always,exit -F arch=b64 -S unshare -k unshare
-a always,exit -F arch=b64 -S clone -F a0&0x20000000 -k mount_namespace

For good measure, tmpfs mounts:

-a always,exit -F arch=b64 -S mount -F a2=tmpfs -k tmpfs_mount

While we are at it, why not the two programs unshare and nsenter

-a always,exit -F arch=b64 -S execve -F path=/usr/bin/unshare -k unshare_cmd
-a always,exit -F arch=b64 -S execve -F path=/usr/bin/nsenter -k nsenter_cmd
$ ausearch -i -k unshare
...
type=PROCTITLE msg=audit(13/07/25 09:20:02.543:148) : proctitle=unshare -U -m --map-root-user
type=SYSCALL msg=audit(13/07/25 09:20:02.543:148) : arch=x86_64 syscall=unshare success=yes exit=0 a0=CLONE_NEWNS|CLONE_NEWUSER a1=0x7ffd6a33f230 a2=0x0 a3=0x8 items=0 ppid=934 pid=1025 auid=user uid=user gid=user euid=user suid=user fsuid=user egid=user sgid=user fsgid=user tty=pts0 ses=1 comm=unshare exe=/usr/bin/unshare subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=unshare

Looking at opensource detections, there does not seem to be too much out there (yet). Existing detections seem to be focused on detecting the use of unshare in the context of exploitation rather then anti-forensics. Elastic.co have a detection for the use of the unshare program:

Namespace Manipulation Using Unshare | Elastic Security Solution [8.18] | Elastic

And on the syscall front, Falco includes a rule that looks for the use of unshare within a container.

Default Rules
List of default rules for Falco

Perhaps after this post more detections will surface. As stated in the beginning of this post, mount namespaces have been observed to be used in the wild for anti-forensic purposes. Further details on this maybe elaborated in a future post.

Part 1 - Modifying process names in UNIX-like systems
Part 2 - Abusing the dynamic linker