The Filesystem Structure
From OS X Scientific Computing
Contents |
The File System Hierarchy
The file system hierarchy on OS X is similar to other unix systems, except that it contains a superset of the usual directories. Standard unix directory structures almost always include five or so directories at the root level: /bin, /sbin, /usr, /etc, /var, and often /tmp and /home or its equivalent. Unix executables are usually housed in /bin, /sbin, /usr/bin, /usr/local/bin and so on. Mac OS X includes these as well as some others, that are not canonical unix directories. These generally are the ones that start with a capital letter, such as /Users (its equivalent of /home or /usr/people), as well as /System, /Library, and /Applications. The /System/Library directory contains all of the OS X-specific operating system files including Frameworks and startup files, /Library contains third-party Frameworks, startup files and other components, and /Applications contains the Cocoa and Carbon applications and utilities that come with OS X as well as those third-party applications installed by the user. Most applications (and all Cocoa applications) are actually directories, and their unix executable files can be accessed from the command line by giving the absolute path, e.g., issuing the command:
/Applications/TextEdit.app/Contents/MacOS/TextEdit
opens an instance of the TextEdit application. A more straightforward way to do this is to issue the command
open -a TextEdit
which is equivalent to double-clicking on the application's icon. The first command, though more clumsy, offers more precise control. For example, when the first command is preceded by sudo, it enables one to open and edit a system-owned file, something not possible to do using the open -a TextEdit command.
The File System Format: HFS+
The other unusual property of the OS X filesystem is its unique formatting. Unlike other unix systems, the default installation disk format is case-insensitive, journaled HFS+, rather than Unix File System formatting (UFS) or similar. The main benefit (as well as irritation) is that any file can consist of multiple forks. These forks can be thought of as a collection of "files" (in the traditional unix sense of a simple stream of data) that share the same inode. They are thus treated as one object by the lay-user.
Multi-fork files allow a great deal of structured data to be presented to the user very simply. Historically, a Mac program often consisted of a single file that could be installed by dragging and dropping it from the install disk. However, multi-fork files are less common in OS X because of the difficulties met by systems that expect single-fork files.
Common Forks: Data and Resource
While HFS+ allows a file to have any number of forks, HFS (its predecessor) allowed a maximum of two. These two, the Data fork and the Resource fork, are the most common kinds on any Mac. The Data fork may contain arbitrary data, so it may be used as one would use a file in a single-fork system. In contrast, the Resource fork has a consistent structure that is enforced by the Mac OS.
The resources in the Resource fork may may be a mix of many kinds, including executable code, images (particularly icons), sounds and text (possibly with matching style resources). Many of the standard resources are easy to edit using ResEdit or the newer ResKnife. The classic game Crystal Quest is a notable example of this flexibility. It included a versatile editor for the resources in its executable file.
In Mac OS X, files that would traditionally have only a Resource fork can be stored differently if the programmer wishes. For ease of storage and transmission, the resources are stored in the Data fork instead, and the filename has a ".rsrc" suffix.
A Mac file may have any combination of forks, and empty files have none. The extra forks allowed by HFS+ may have arbitrary names and contents but their alternative, the bundle, sees more use.
Multi-Fork Files in Single-Fork Systems
Many common programs, filesystems and network protocols expect single-fork files. When they encounter a multi-fork file, they typically process, store or transmit only its Data fork. Depending on the type of file, the resulting data-loss may range from insignificant to disastrous.
For example, consider a Mac running a Web server. Because of the design of HTTP, files on the Web must have only one fork. Meanwhile, files in the Web server's filesystem may have many. An image file that contains image data in its Data fork and a preview image in its Resource fork will remain usable after transmission, though its preview will be missing even if it arrives on another Mac.
Conversely, Classic and Carbon Mac programs will always be destroyed if sent over the Web without suitable encapsulation. Though Cocoa applications may be more resilient, they too should be archived just in case they contain extra forks. Other kinds of file may be damaged too, particularly older Mac formats that were not designed to be cross-platform.
Common archive formats include AppleSingle and AppleDouble, and historically MacBinary. Others, such as tar, ZIP and BinHex may use these formats internally, though the software that produces them has to be designed to do so. Some Mac Internet programs can archive and unarchive on-the-fly, normally by calling BOMArchiveHelper. The Mac OS does this transparently with non-Mac disks, though frustratingly OS X and OS 9 use different schemes!
When Mac OS X copies a file from an HFS+ disk to a non-Mac disk, it splits it into a pair of files in AppleDouble format. The first file gets the same name as the original, but contains only the Data fork. The second file goes in the same place and gets the same name, but with a "._" prefix. It contains any remaining forks and metadata, all condensed into one fork.
AppleDouble allows programs to read the Data fork as if it were a single-fork file, which is useful for most types of file where the Data fork contains the most important data. However, users who separate a file from its "._" friend should be aware that they may be losing more than just metadata!
Metadata on the Mac
Any file can be associated with a whole host of "metadata", some of which is visible in Get Info windows, but all is revealed by the command-line program mdls. For example:
% mdls book.pdf
produces the following listing of metadata for the file book.pdf:
book.pdf -------------
kMDItemAttributeChangeDate = 2005-06-27 12:37:44 -0700
kMDItemContentCreationDate = 2005-06-27 12:10:50 -0700
kMDItemContentModificationDate = 2005-06-27 12:37:43 -0700
kMDItemContentType = "com.adobe.pdf"
kMDItemContentTypeTree = (
"com.adobe.pdf",
"public.data",
"public.item",
"public.composite-content",
"public.content"
)
kMDItemCreator = "TeX"
kMDItemDisplayName = "book.pdf"
kMDItemEncodingApplications = ("pdfeTeX-1.21a")
kMDItemFSContentChangeDate = 2005-06-27 12:37:43 -0700
kMDItemFSCreationDate = 2005-06-27 12:10:50 -0700
kMDItemFSName = "book.pdf"
kMDItemFSOwnerGroupID = 501
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 219567
kMDItemFSTypeCode = 0
kMDItemID = 6278117
kMDItemKind = "PDF Document"
kMDItemLastUsedDate = 2005-06-27 12:10:50 -0700
kMDItemNumberOfPages = 19
kMDItemPageHeight = 792
kMDItemPageWidth = 612
kMDItemSecurityMethod = "None"
kMDItemUsedDates = (2005-06-27 12:10:50 -0700)
kMDItemVersion = "1.4"
Notice that the file's type is stored independently of its optional ".pdf" filename suffix, in the form of the kMDItemContentType attribute. Its value, "com.adobe.pdf", is a proprietary Uniform Type Identifier (UTI), while the "public." values are base UTIs listed in order of increasing vagueness.
UTIs are relatively new but their hierarchical nature raises interesting possibilities for programs that support them. Suppose that the user double-clicked a file of type "public.xml" and the system could not find a program to open that specific type. The system could go down the list until it recognised a less-specific type, such as "public.text", and then open the file in a text editor.
Uniform Type Identifiers are also used to tag non-file data, such as the contents of the Clipboard. They are versatile enough to eventually replace the Mac's old four-byte Creator and Type codes. Unfortunately, as with Creator and Type codes, they will be lost if the file's corresponding "._" file is deleted, leaving the filename suffix as the main fallback.
In OS X version 10.4, the obscure directory /.Spotlight-V100/ contains an index of filesystem metadata, which can be searched rapidly and powerfully by Spotlight or its command-line equivalent mdfind, e.g.:
% mdfind -onlyin /Users/wgscott "kMDItemCreator == 'TeX'" /Users/wgscott/Desktop/book.pdf
Case-Insensitivity
As well as supporting traditional case-sensitive filesystems, OS X supports case-insensitive ones. Most HFS+ ones are case-insensitive, so the commands
% which wtf
/sw/bin/wtf
% which WTf
/sw/bin/WTf
% WhiCh wTf
/sw/bin/wTf
all produce the same result! Thankfully, HFS+ is case-preserving, so referring to the file "wtf" using the name "WTF" does not change its name on-disk.
Trying out a UFS Disk Image
In practice, you can strip a file down to its Data fork by creating a disk image. Reformat the disk image to UFS, and then use the command
mdutil -i off /Volumes/YourUFSdiskimage
to ensure that Spotlight keeps away from its future contents. Then, copy a Microsoft Word document file to the UFS disk image, stripping the Resource fork off as you go, with the command ditto as given below:
ditto --norsrc test.doc /Volumes/YourUFSdiskimage/.
If you copy a file using the default mode of ditto, or cp in 10.4, or /Developer/Tools/CpMac, it will attempt to preserve Resource forks.
Since the UFS file system doesn't permit tight association of the data and resource forks, you will often see a file called ._foo associated with a file foo on UFS and other non-HFS+ file systems. The need for preserving resource forks on pre-OS X Mac operating systems was of critical importance. In addition, legacy applications written in Carbon, such as Microsoft Word, Adobe Photoshop, and so on, generally require installation on an HFS+ formatted disk or partition. In other words, although OS X itself claims to be installable on a UFS partition, a lot of the associated applications just won't work. HFS+ is therefore a compromise that tries to keep everyone happy.
A very thorough discussion of metadata in OS X is available. The discussion is very enthusiastic about the virtues of metadata and the HFS+ format. It also goes into a fair amount of detail about extended attributes, permitting the user to encode an arbitrary set of name-data pairs with each file, as well as access control lists, which permit a fine-tuning of file permissions.
Others see things quite a bit differently. HFS+ is admittedly not standard unix. It tends to be more fragile than standard unix-formatted filesystems. Besides case-insensitivity (there is now a case-sensitive version of HFS+), the most striking difference in behavior is that hard links work differently, and occasionally do so with unpleasant surprises as described. The following experiment is adapted from that link, which explains what is going on:
Demonstration of the pathological behavior of hard links on HFS+
1. Make a disk image, and using Disk Utility, erase and reformat the disk image as a unix file system format disk image.
2. Make a pdf file (eg use the print/save dialog to print this browser window). Save the file to the Desktop, and copy it to the UFS disk image.
3. cd to the Desktop, and issue
ln yourfile.pdf hardlink.pdf
4. cd to the disk image, and issue the same command.
5. open yourfile.pdf on the disk image, rotate the pdf by 90 degrees, and save the file to the disk image. All should go smoothly.
6. cd back to the desktop. Repeat the procedure. It fails.
Like all compromises, it seems impossible to keep everyone happy.
ZFS: The future
Rumor has it that ZFS will replace HFS+ as the the file system on future versions of OS X.

