3.2.1 FD Heritage Dilemma.
As explained in Section
2.2, when an
LD_PRELOAD library is used, the mapping table between FDs and user space file metadata may be stored in memory. However, while open FDs and the internal state of the FS maintained by the
LD_PRELOAD library are copied from parent to child upon a
fork, the FS state is destroyed when the child invokes
exec if it is in memory.
Illustration. Consider the processes in Figure
3: (a) User process
p, in the
LD_PRELOAD environment, executes library function wrapped system calls that are intercepted by the
LD_PRELOAD library. (b) Both the
LD_PRELOAD and FS libraries are stored in shared memory and the internal state of the FS resides within the address space of
p. (c) When
p performs an
open system call, the
LD_PRELOAD library makes the corresponding call to the FS library. The FS returns an internal representation of the file as
F. Process
p expects an FD number to be returned from
open, not the internal representation of
F. (d) To avoid passing the
open call to the kernel (to not increase latency), the
LD_PRELOAD library manufactures FD 4, updates its internal file mapping 4 to
F, and returns the manufactured FD to
p. This manufactured FD is only valid in the context of the
LD_PRELOAD library and only if the mapping is in memory. Future system calls using FD 4 will be intercepted by the
LD_PRELOAD library, mapped to
F and passed to the FS library. (e) Process
p creates child process
c that inherits, among other things, the open FDs and the memory of the
LD_PRELOAD library. After performing an
exec, the child’s address space, including the internal state for manufactured FDs, is destroyed by the executable of
c. (f) Process
c later writes to the inherited FD 4. (g) As the
LD_PRELOAD library memory was cleared during
exec, the library cannot map FD 4 to
F, so it passes this I/O to the kernel. (h) The kernel does not know what FD 4 references, since the
LD_PRELOAD library of parent process
p manufactured it. Ultimately, the operation fails in the kernel with an invalid FD and is returned to
c via the
LD_PRELOAD library.
Prominent examples. The FD heritage dilemma occurs commonly in shell file redirection. Consider a shell running under the context of an
LD_PRELOAD library, where the user executes a cloud application running in Spark [
99] by running
spark-submit … > out.txt to capture the output to a file located in a user space FS. The shell opens
out.txt, which is intercepted by the
LD_PRELOAD library prior to creating the
spark-submit process. Due to the FD heritage dilemma, the FD for the output file
out.txt inherited by the child is no longer valid after the
exec call to invoke
spark-submit. The FD heritage dilemma is also common in widely used tools such as gcc that create child processes for performing subtasks, and in resource manager frameworks such as the PySpark workload daemon [
76], which creates child processes for worker tasks at runtime, with each child task sending and receiving data to and from the daemon through FDs.
Impact in other interfaces. The user space FS library interface suffers from a related problem if a parent process needs to pass file metadata to a child. The application must be aware of the FS library API and does not access the FS using FDs, it instead uses the user space file metadata directly. As with open FDs, child processes cannot inherit file metadata.
FUSE is not affected by the FD heritage dilemma, since the FUSE server is a single process, separated from the application processes. Hence the FUSE identifier table is unique. When a parent opens a file, the corresponding (inheritable) FD matches a unique inode within kernel space.
Similarly to FUSE, in-kernel FSs are not affected by FD heritage dilemma, since all file structures are managed within the kernel address space. References to files are synchronized by the FS driver within the kernel to ensure consistency when accessed by applications via FDs. The FDs maintained by the kernel and inherited by child processes map to a unique inode within kernel space.
3.2.2 Memory-mapped Files.
Mapping files into memory using mmap is a common method to share memory between processes. The LD_PRELOAD interface cannot provide consistent memory-mapping of files hosted by user space FSs for the two following reasons.
First, while an LD_PRELOAD library can intercept explicitly invoked system calls such as mmap, it cannot intercept implicit I/O operations. For instance, accesses to a memory-mapped file such as reads and writes are performed using load and store paging instructions to the address region of the mapped memory. These instructions, which may result in an I/O operation to the underlying file, are not explicit system calls and therefore cannot be intercepted.
Second, as a direct consequence of the FD heritage dilemma, a child process cannot access a user space file mapped in memory if the mapping has been done by its parent process. For example, consider the following set of actions taken by a process: (a) A process opens a user space file, the open call is intercepted and redirected to the user space FS library. (b) The process then maps this file into its address space using mmap for shared access; however, since the FD is manufactured by the intercepting FS library, the call to mmap fails, because the FD is invalid.
Impact in other interfaces. Directly linked user space FS libraries equally suffer from the same problem for mmap-ed files as LD_PRELOAD libraries do. FUSE does not, however, since the FUSE kernel driver translates paging operations to read or write sent to the user space FUSE server.