Logo

Chapter 11: File Systems

Goals

Nonvolatile storage and file systems.
Main DRAM memory is volatile, i.e. it can be lost upon crashes/power failure. Nonvolatile storage, like hard disks and flash storage, is persistent/durable, retaining state across crashes/power outages.

One critically important aspect of nonvolatile storage, though, is that data access can be much, much slower (several orders of magnitude) than DRAM access. This plays a significant role in I/O interaction with these devices.

11.1 The File System Abstraction

Formally, a file system is an operating system abstraction that provides persistent, named data. Persistent meaning stored until explicitly deleted, and named meaning accessible via a human-readable identifier.

There are three major parts to the file system abstraction

Typically, the file system is constructed in a "tree-like" fashion. The all-encompassing directory is the root directory /, and other files/directories form other nodes. Notably, files must be leaf nodes. A string that identifies a file/directory is a path, e.g. /etc/passwd, and represents the traversal one must take to reach that file from the root directory. A path starting with / is absolute, and are interpreted relative to the root directory, while all other paths are relative, and are interpreted relative to the current working directory (cwd).

File systems, however, are not exactly trees. They are, in fact, DAGs—directed acyclic graphs. This is due to the possibility of multiple hard links to an underlying file, i.e. a mapping between a name and an underlying file. In other words, it's possible for multiple paths to point to the same file. Note that these hard links are typically restricted to avoid cycles, and hence file systems are usually DAGs.

Why no cycles?

Consider the path /a/b. If b maps to /, then we could have infinitely long paths... /a/b/a/b/a/b....

Many systems also support symbolic or soft links. This is a directory mapping that, instead of mapping a file name to an underlying file, maps a file name to another file name. Though moving the other file name may result in dangling links, symbolic links have two key advantages: (1)(1) systems typically allow symbolic links to directories (because these won't have cycles... traversing a symbolically linked directory will just bring you to the other directory, in contrast to a hard link) and (2)(2) a symbolic link can refer to a file in a different file system or volume (since it just maps to the file name, rather than the underlying file which may have different structure).

Others, namely Windows, support shortcuts, too, which are interpreted by a "windowing system." We won't discuss this in much detail.

A single computer, notably, can make use of multiple file systems stored on multiple volumes by mounting volumes in a single logical hierarchy.

11.2 API

create(path)
link(existing_name, new_name)
mkdir(path)
rmdir(path)

open(path)
close(fd)
read(fd, buf, len)
write(fd, buf, len)
seek(fd, offset)
mmap(fd, off, len)
munmap(ptr, len)
fsync(fd)

11.3 Software Layers

Broadly,

API and Performance

System calls and libraries provide the necessary API for file system usage.

The block cache keeps recently read/write blocks in main memory, keeping programs performant.

Prefetching, i.e. speculatively retrieving blocks (e.g. during large sequential access), reduces future I/O requests' latency, reduces storage device overheads (replace many small requests with one large request), and improves parallelism for RAID and flash drives. Notably, though, too aggressively prefetching has downsides: cache pressure, I/O contention (with active requests), and wasted effort on inaccurate guesses.

Device Drivers

Translate between abstraction layers of OS and I/O hardware devices. Typically, there at least exist block device (read data in fixed block sizes) and character devices (read data in stream) interfaces.

Device Access

Memory-mapped I/O provides a way for the OS to communicate with an I/O device by mapping a device's control registers to physical memory addresses on the memory bus. This is the modern solution.

Port-mapped I/O is the legacy solution, and involves specific architecture support (distinct I/O instructions) to communicate with special "ports" that map to device registers.

Direct Memory Access (DMA) allows a storage device to bypass the CPU and write data directly from its own memory to the system's main memory. This reduces the CPU's workload (otherwise it would need to copy all that data), and the operating system facilitates this by (1)(1) providing a target physical address to the device and (2)(2) pinning the target pages in memory until the device is done, i.e. ensuring they cannot be swapped to disk until explicitly unpinned.

Advanced DMA
Modern systems now frequently use an IOMMU (I/O memory management unit) that translates device virtual addresses to main memory physical addresses, like the CPU's MMU. This provides better protection and abstractions.

Interrupts are an efficient solution to inform the CPU once an I/O device is done handling a request / received a new external input. This is practically always a better option than polling