Hiding in plain sight: Modifying process names in UNIX-like systems (part 1)
Exploring ways malware on Linux and other UNIX-like systems can disguise their process names.
📅2024-08-02 - part 2 has been published here.
This post explores the defence evasion technique of dynamically modifying process names in UNIX-like systems. First observed as far back as the late '80s, the technique is certainly alive and well today. With a few minor tweaks from the original method possibly first found in use by the Morris worm, threat actors employee the post compromise technique today as a means to remain undetected. This post takes a look at various ways to "process masquerade" or "process stomp" primarily in the Linux operating system with the occasional detour looking at the BSDs and Solaris. The second half of the post goes a little deeper into the mechanisms at play.
ptrace
, LD_PRELOAD
or other "process injection" type techniques. These will be covered in later posts.The Morris worm, the first worm to be unleashed on what would could be called "the Internet" by Robert Morris, disguised it's name as a standard shell in order to trick system administrators as the worm infected and propagated from their systems. In the worm's reverse engineered code we can see that Morris used a simple string copy to overwrite the first element of argv
, the array containing process arguments, with the first element being the process name:
strcpy(argv[0], XS("sh"));
The technique was used by Morris, targeted 4.3BSD on DEC VAX and Sun Microsystems Sun3 machines. This simple string copy works on Linux and two of the three predominant BSD decedents today. In order to avoid detection, simply overwriting argv
is not sufficient, as the (truncated) filename of the executable is exposed elsewhere.
Observe sysadmins demonstrating perhaps a mix of both annoyance and excitement pointing out some of these sh
processes on and old news segment from 1988.
In a study of over 10,000 collected Linux malware samples, about half modified dynamically their process name to a well known process such as sshd
, telnetd
, cleared the name or set it something random:
In the 10k Linux malware study, it was noted that around half of the samples changed their name by modifying the "thread name" (comm
) and the remainder modifying argv
. Interesting enough, the authors mention that no sample applied both, which offers ample detection opportunities, and we do see this applied with the bpfdoor
malware.
As this is a rather long post which covers quite some territory, a TLDR;
Linux:
- A process that overwrites it's own
argv[0]
will change/proc/[pid]/cmdline
. The original process name will be disguised with default arguments to utilities likeps
ortop
. argv
can be reallocated to other memory segments such as the heap by using theprctl
system call with the optionPR_SET_MM
. While not always necessary , this avoids any potential to corrupt the stack when the new name exceeds the stack frame boundaries.- In newer Linux kernels, non-privileged processes can use
PR_SET_MM_MAP
when callingprctl
to update the memory map. - A process that invokes the
prctl
system call with the argumentPR_SET_NAME
will change the name found in/proc/[pid]/comm
and elsewhere (/status
,/[tid]/comm
and/[tid]/status
. - Discrepancies between
comm
andcmdline
can be used for detection, although malware can modifies bothcmdline
andcomm
. This can be detected by checking for a mismatch with the symbolic link/proc/[pid]/exe
- It is actually possible for a running process to tamper and change
/proc/[pid]/exe
. This will be covered in a future post.
Other UNIX-like systems:
- NetBSD and OpenBSD, within the process, changes to
argv
are reflected outside of the process - FreeBSD and Solaris, within the process, changes to
argv
are not reflected outside of the process - In FreeBSD,
argv
can be changed withsysctl
MIBkern.proc.args
- The process or thread name equivanent to Linux's
comm
cannot be changed (if you know of a way, then please let me know!) setproctitle
is available in the BSD's which effectively changesargv
of arbitrary length in a memory safe way. This is not available in Linux, resulting in developers having to write their own implementation.- Invoking the
execve
system call on the same binary the running process with a differentargv
would be one way to avoid thesysctl
system call with FreeBSD (a tip fromnewcomer
)
Linux comm
and argv
In modern Linux distributions, commands like ps
which list running processes, read from the procfs
virtual filesystem to obtain the command line arguments, with the default options to ps
reading from/proc/[pid]/cmdline
. A process that overwrites argv
will be reflected in cmdline
. If that's all that is overwritten, then there is another place that the original process name could be obtained, specifically from reading /proc/[pid]/comm
. This is also reflected in the Name
field from /proc/[pid]/status
. The following program overwrites argv[0]
and pauses execution:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv) {
strcpy(argv[0],"stomped");
pause();
return 0;
}
We will call the program main
, and run it as a background job. It's pid is 5514
:
$ gcc main.c -o main
$ ./main &
[1] 5514
Running ps
and and observe the new name stomped
is displayed:
$ ps aux | grep 5514
user 5514 0.0 0.0 2328 912 pts/0 S 14:27 0:00 stomped
But what about pgrep
or explicitly specifying the process ID with ps
? Here we see the original name, main
. So overwriting argv[0]
did not work.
$ ps -p 5514
PID TTY TIME CMD
5514 pts/0 00:00:00 main
$ pgrep main
5514
ps
is reading from /proc/[pid]/comm
in the above examples. Confirming:
$ cat /proc/5514/comm
main
Let's then improve the program by adding a call to prctl
, a system call available in the Linux kernel which will change comm
:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/prctl.h>
int main(int argc, char **argv) {
strcpy(argv[0],"stomped");
prctl(PR_SET_NAME, "stomped");
pause();
return 0;
}
Running again we can see that process name is changed in both comm
and cmdline
:
$ gcc main.c -o main
$ ./main &
[1] 5841
$ ps -p 5841
PID TTY TIME CMD
5841 pts/0 00:00:00 stomped
$ pgrep stomped
5841
$ cat /proc/5841/comm
stomped
$cat /proc/5841/status | grep Name
Name: stomped
A safer way to modify argv
Overwriting argv
is not ideal as if the new string length exceeds the allocated space on the stack - other data on the stack may be corrupted, starting with the environment variables. This can be avoided by ensuring the new name is shorter or equal in length to the original name (and any other argument that is not used). It may also be acceptable to overwrite the space allocated to the environment variables. There is a "clean" way to change argv
though.
A process can change it's memory map related fields directly in the kernel using the option PR_SET_MM
with the system call prctl
. If the process is running privileged or has the capability CAP_SYS_RESOURCE
, it can directly modify argv
's address for itself by specifying the argument PR_SET_MM_ARG_{START|END}
when calling PR_SET_MM
.
This is what systemd does - requests at least one page of memory from the kernel, copies over the new process name and then calls prctl
accordingly. A simple reimplementation:
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/prctl.h>
#include <sys/mman.h>
int main(int argc, char **argv) {
char * nn;
size_t nn_size;
char name[] = "I can be as long as I want";
nn = mmap(NULL, nn_size, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
strncpy(nn, name, nn_size);
prctl(PR_SET_MM, PR_SET_MM_ARG_START, (unsigned long) nn, 0, 0);
prctl(PR_SET_MM, PR_SET_MM_ARG_END, (unsigned long) nn + nn_size + 1, 0, 0);
sleep(100);
return 0;
}
It is actually possible call prctl
with PR_SET_MM
as a non privileged user, with no capabilities required by instead of specifying specific , PR_MM_MAP
is used, with complete memory map of the process provided.
prctl_map = (struct prctl_mm_map){
...
.arg_start = arg_start,
.arg_end = arg_end,
...
prctl(PR_SET_MM, PR_SET_MM_MAP, &prctl_map,
sizeof(prctl_map), 0);
This is a little more complex, as the running program needs to populate prctl_map
with addresses for it's own memory segments. (This seems to supported for the kernel option CONFIG_CHECKPOINT_RESTORE
, allowing non privileged users to take "snapshots" and restoring running processes, for example with criu). At least in Debian Bookworm, this flag is set to true.
Notably, LXC uses this approach with it's own setproctitle
implementation which we go into a little more detail later in this post.
The exe
symbolic link
A naive detection approach is to check discrepancies between comm
and cmdline
, although this will result in false positives and does not deal with the case of malicious software changing both.
Another detection opportunity is to take a look at the symbolic link /proc/[pid]/exe
which points to the original executable binary on disk. Here is the "stomped" binary from before with it's original file name exposed:
$ ls -al /proc/5868/exe
lrwxrwxrwx 1 user user 0 May 2 05:30 /proc/5868/exe -> /home/user/main
Deleting the file off disk won't help, and another red flag appears with the string deleted
appears:
$ ls -al /proc/5868/exe
lrwxrwxrwx 1 user user 0 May 2 05:33 /proc/5868/exe -> '/home/user/main (deleted)'
While not so trivial, it is actually possible for a running process to change the symbolic link to exe
. It requires munmap
'ing pages of memory marked as executable and then invoking specific system calls in order to bypass some protections in the Linux kernel. After a bit of trial and error I did get this working and will describe how with sample code in a subsequent post on this site.
A tricky name
When it comes to processes disguising themselves as kernel threads (by pre and postfixing with [
and ]
, the parent process can be checked to see if it is [kkthread]
which is generally PID 2
(but not always, for example way back in RHEL 5, it is the init service, PID 1
). If you know what to look for, these processes can be easily identified:
The gtpdoor
malware used this technique (overwriting argv[0]
, but only if the string length of the existing name was long enough).
Ultimately the best way to uncover processes that are modifying their name is to capture the initial execution event. EDR software has this capability and there are open-source options, for example, Sysdig Falco and osquery). I may consider writing a seperate blog post just on detection with these tools.
A bit of theory
Ok, time to get back to the land of Linux. Let's take a look now "under the hood". The intention of here is to understand the internal data structures at play. Refer to the following diagram which depicts various internal structures within the Linux kernel when a python script was run from the command line. The pid of the process is 3389
:
Peeling back a few layers to the basic unit of execution and we have a kernel thread. A running process is really a thread, and there can be one or more threads per process. Information on running processes is stored in the data structure task_struct
which is known as a Process Control Group.
Each thread belonging to a process shares the same resources. This means two threads within the same process (two instances of task_struct
will point to the same resources, e.g. open files, regions of memory). All threads within a process share the same Tgid
(thread group ID). This is why the 3389
appears twice in Figure 1.
task_struct
contains a "memory map" member, mm_struct
, which represents the process memory layout, including two pointers which arg_start
and arg_end
which is the start and end address of argv
. When cmdline
is read from /proc
, it is arg_start
and arg_end
in mm_struct
for the process which is referenced.
/proc/[pid]/comm
on the other hand accesses a member directly in task_struct
, comm
which is the basename of the file that is loaded upon execution (truncated to 15 characters, TASK_COMM_LEN
). The filename of the executable file that the process image is based on is copied into comm
during the execve
(and related) system calls.
The use of comm
as an alternative to argv[0]
serves a few purposes. It's a static, easily human recognised name for a process group for debugging purposes and programs may use it to change their behaviour depending on the original file name. Take for example, pkill
and pgrep
are actually the same file, although their functionality differs based on the filename that the executable was named.
$ ls -al /usr/bin/pkill
lrwxrwxrwx 1 root root 5 Dec 18 2022 /usr/bin/pkill -> pgrep
Running a python script directly from your shell ./myscript.py
, comm
will be exactly that (myscript.py
), whereas cmdline
will include the full path of the python interpreter along with it's arguments /usr/bin/python3 myscript.py
.
We can see that the comm
is of the filename, myscript.py
, but argv[0]
is /usr/bin/python3
and argv[1]
is myscript.py
.
Assume that myscript.py
was written to be multi-threaded, spinning up two threads. Here we would comm
under /task
three times - one for the initial process and the other two for it's threads. Recall the virtual memory is shared between the threads, so comm
will be repeated (as with cmdline
).
$ cat /proc/3389/comm
python3
$ cat /proc/3389/task/*/comm
python3
python3
python3
$ cat /proc/3389/cmdline | tr '\0' ' '
python3 myscript.py
The command line arguments (argv
) is an array of pointers to null terminated strings, with the final element marking the end with a NULL
(explaining the character substitution in the above example with tr
to add in a space between the filename and first argument). As illustrated:
The environment variables follows the same structure. Both argv
and environment variables pointers sit within the processes's virtual memory address space just above the first stack frame. The following diagram has been take from "The Linux Programming Interface" (I cannot recommend the book enough):
Now we have an understanding on why a process can change it's command line arguments as it has direct access to it's own virtual memory address space. task_struct
->comm
is only directly modifiable from within the kernel and hence a system call is needed to change it:ptctl(PR_SET_NAME, name)
.
A little more onmm_struct
, but we will keep it brief.mm_struct
's member mmap
points to a list of virtual memory areas (VMA), which is linked list of vm_area_struct
. Each instance points to the start and end addresses of regions of memory such as the stack, heap, memory mapped files etc. which is made available to userspace processes via /proc/[pid]/maps
(which utilities such as lsof
use). Drilling further into the python example, various members of vm_area_struct
is shown for a memory mapped file, the executable binary (python3.11
) and the processes stack:
While this diagram is perhaps venturing into a little too much detail, this will be very important to understand for a later post where vm_file->f_path
is shown to be tampered which removes some very "noisy" artefacts for "memory only" running processes and binaries that have been subject to the process stomping techniques described in detail in this post.
(Memory safe) setproctitle
Later on we will see that the BSD's offer developers setproctitle
. Linux does not, and some developers choose to implement it themselves, for example LXC
here and a LXC developer notes here. Why not just overwrite argv
? It should now be evident that overwriting argv
past it's boundaries could corrupt the stack, first hitting environment variables, and then into the first stack frame processes.
setproctitle
in the Linux kernel source, a comment in mm/util.c
and the function get_mm_proctitle
in fs/proc/base.c
The LXC project had implemented setproctitle
in a safe manner, allowing an arbitrary length for argv
without touching the process's stack.
LXC's setproctitle
uses the prctl
system call with the optionPR_SET_MM
which according to the man page, modifies "certain kernel memory map descriptor fields of the calling process." With passing PR_SET_MM_MAP
and a prctl_map
structure which is populated with the existing memory map of the running process with one change: a new address for arg_start
and arg_end
which can be malloc
'd on the heap, or just straight page(s) from mmap
. Then the string copy is done on the newly allocated memory:
And indeed this does work, with a process name being long. Take the following with cmdline
being a million characters in length:
size_t size = 1000000000;
char *new = malloc(size);
for (int a = 0; a < size-1; a++)
new[a] = 'A';
new[size] = '\0';
setproctitle(new);
The environment variables are no longer adjacent, and hence are untouched:
$ cat /proc/5092/cmdline | wc -c
1000000000
$ cat /proc/5092/environ | strings
SHELL=/bin/bash
PWD=/home/debian
...
Other UNIX-like systems
The BSDs
Directly overwriting argv
in OpenBSD and NetBSD works to change the arguments string(s), but in FreeBSD this does not work, and a sysctl
system call is required.
Unlike glibc
in Linux, the three BSDs standard library offers a function setproctitle
.
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv) {
setproctitle("stomped");
pause();
return 0;
}
Let's see how it behaves (Here FreeBSD is used, but the result is the same on all three). The binary is called main
and the new name is stomped
:
# gcc main.c -o main
# ./main
# ps -ef -ocomm,args | grep stomped
main main: stomped (main)
As expected, comm
is not changed. We would get the same result by doing:
strcpy(argv[0], "stomped");
in either OpenBSD or NetBSD, but not FreeBSD, so this needs to be done by invoking sysctl
directly with the MIB kern.proc.args.[pid]
:
And we get the same result:
# gcc main.c -o main
# ./main &
# ps -ef -ocomm,args | grep stomp
main stomped (main)
So apparently we need to change comm
. Digging into the kernel source, it appears that kinfo_proc->k_comm
needs to be modified, and this is not possible. In this example we will exit if the sysctl fails - which is to be expected.
...
struct kinfo_proc proc;
// kern.proc.args.[pid]
mib[0] = CTL_KERN;
mib[1] = KERN_PROC;
mib[2] = KERN_PROC_PID;
mib[3] = getpid();
sysctl(mib, 4, &proc, &proc_len, NULL, 0);
strncpy(proc.ki_comm, "stomped", 8);
if (sysctl(mib, 4, NULL, 0, &proc, proc_len) == -1) {
perror("sysctl");
}
# gcc main2.c -o main
# ./main
sysctl: Operation not permitted
As far as i'm aware (please let me know if this is not the case), it is not possible to change comm
in the BSDs or Solaris without loading a custom kernel module - and if we are going to do that, well we may as well just write a rootkit to hide the process in the first place.
To get around these limitations, a technique can be used which at runtime replaces main
with the malicious code on startup via LD_PRELOAD
or by using ptrace
. These two methods will be covered in a future blog post.
Obtaining the process names on other systems
procfs
is not always available (at least by default). There is a non POSIX, but partially portable set of functions related to "KVM", or "Kernel Memory Interface" which has it's origins as far back as SunOS 4.0, released in 1988. As such, we can take a guess "SolarOS" in the world of Tron Legacy likely used this interface.
It then made it's way into 4.3BSD-Reno in 1989 and then onwards. According to the FreeBSD man page for kvm_getprocs
:
The kvm interface was first introduced in SunOS. A considerable number
of programs have been developed that use this interface, making backward
compatibility highly desirable. In most respects, the Sun kvm interface
is consistent and clean. Accordingly, the generic portion of the
interface (i.e.,kvm_open()
,kvm_close()
,kvm_read()
,kvm_write()
, andkvm_nlist()
) has been incorporated into the BSD interface.
Two kvm
functions are required, kvm_getprocs
for comm
and kvm_getargv
for argv
.
The functions are just wrappers to the underlying system calls, so the example above could be replaced with two calls to sysctl.
See here for a working example.
Notably, as Solaris includes procfs
by default, it does expose process info in /proc/[pid]/psinfo
, although this to be parsed as in format of psinfo_t.
Wrapping up
Hopefully you have found this post useful in understanding a little more about how process information is stored and retrieved and how it can be tampered. There are certainly improvements that can be incorporated to further reduce the artifacts which can be used for detection purposes, and threat actors are certainly using these today with particular success. As such I will document these methods in future posts.
Part 2 of this series can be found here.
You can sign up with an email to receive notifications for updates. I often announce new content on X/Twitter -@haxrob