|
Technote 1150HFS Plus Volume FormatBy |
CONTENTSHFS Plus Basics | This Technote describes the on-disk format for an HFS Plus volume. It does not describe any programming interfaces for HFS Plus volumes. This technote is directed at developers who need to work with HFS Plus at a very low level, below the abstraction provided by the File Manager programming interface. This includes developers of disk recovery utilities and programmers implementing HFS Plus support on other platforms. This technote assumes that you have a conceptual understanding of the HFS volume format, as described in Inside Macintosh: Files. |
HFS Plus BasicsHFS Plus is a new volume format for Mac OS. HFS Plus was introduced with Mac OS 8.1. HFS Plus is architecturally very similar to the HFS, although there have been a number of changes. The following table summarizes the important differences. Table 1 HFS and HFS Plus Compared
The extent to which these HFS Plus features are available through a programming interface is OS dependent. Currently (version 8.5), Mac OS does not provide programming interfaces for any HFS Plus-specific features. To summarize, the key goals that guided the design of the HFS Plus volume format were:
The following sections describes these goals, and the differences between HFS and HFS Plus required to meet these goals. Efficient Use of Disk SpaceHFS divides the total space on a volume into equal-sized pieces called allocation blocks. It uses 16-bit fields to identify a particular allocation block, so there must be less than 216 (65,536) allocation blocks on an HFS volume. The size of an allocation block is typically the smallest multiple of 512 such that there are less than 65,536 allocation blocks on the volume (i.e., the volume size divided by 65,535, rounded up to a multiple of 512). Any non-empty fork must occupy an integral number of allocation blocks. This means that the amount of space occupied by a fork is rounded up to a multiple of the allocation block size. As volumes (and therefore allocation blocks) get bigger, the amount of allocated but unused space increases. HFS Plus uses 32-bit values to identify allocation blocks. This allows up to 2 32 (4,294,967,296) allocation blocks on a volume. More allocation blocks means a smaller allocation block size, especially on volumes of 1 GB or larger, which in turn means less average wasted space (the fraction of an allocation block at the end of a fork, where the entire allocation block is not actually used). It also means you can have more files, since the available space can be more finely distributed among a larger number of files. This change is especially beneficial if the volume contains a large number of small files. International-Friendly File NamesHFS uses 31-byte strings to store file names. HFS does not store any kind of script information with the file name to indicate how it should be interpreted. File names are compared and sorted using a routine that assumes a Roman script, wreaking havoc for names that use some other script (such as Japanese). Worse, this algorithm is buggy, even for Roman scripts. The Finder and other applications interpret the file name based on the script system in use at runtime.
HFS Plus uses up to 255 Unicode characters to store file names. Allowing up to 255 characters makes it easier to have very descriptive names. Long names are especially useful when the name is computer-generated (such as Java class names). The HFS catalog B-tree uses 512-byte nodes. An HFS Plus file name can occupy up to 512 bytes (including the length field). Since a B-tree index node must store at least two keys (plus pointers and node descriptor), the HFS Plus catalog must use a larger node size. The typical node size for an HFS Plus catalog B-tree is 4 KB. In the HFS catalog B-tree, the keys stored in an index node always occupy a fixed amount of space, the maximum key size. In HFS Plus, the keys in an index node may occupy a variable amount of space determined by the actual size of the key. This allows for less wasted space in index nodes and creates, on typical disks, a substantially larger branching factor in the tree (requiring fewer node accesses to find any given record). Future Support for Named ForksFiles on an HFS volume have two forks: a data fork and a resource fork, either of which may be empty (zero length). Files and directories also contain a small amount of additional information (known as catalog information or metadata) such as the modification date or Finder info. Apple software teams and third-party developers often need to store information associated with particular files and directories. In some cases (e.g., custom icons for files), the data or resource fork is appropriate. But in other cases (e.g., custom icons for directories, or File Sharing access privileges), using the data or resource fork is not appropriate or not possible. A number of products have implemented special-purpose solutions for storing their file- and directory-related data. But because these are not managed by the file system, they can become inconsistent with the file and directory structure. HFS Plus has an attribute file, another B-tree, that can be used to store additional information for a file or directory. Since it is part of the volume format, this information can be kept with the file or directory as is it moved or renamed, and can be deleted when the file or directory is deleted. The contents of the attribute file's records have not been fully defined yet, but the goal is to provide an arbitrary number of forks, identified by Unicode names, for any file or directory.
Easy Startup of Alternative Operating SystemsHFS Plus defines a special startup file, an unstructured fork that can be found easily during system startup. The location and size of the startup file is described in the volume header. The startup file is especially useful on systems that don't have HFS or HFS Plus support in ROM. In many respects, the startup file is a generalization of the HFS boot blocks, one that provides a much larger, variable-sized amount of storage.
|
Core ConceptsHFS Plus uses a number of interrelated structures to manage the organization of data on the volume. These structures include:
Each of these complex structures is described in its own section. The purpose of this section is to give an overview of the volume format, describe how the structures fit together, and define the primitive data types used by HFS Plus. TerminologyHFS Plus is a specification of how a volume (files that contain user data, along with the structure to retrieve that data) exists on a disk (the medium on which user data is stored). The disk is divided into 512 byte logical blocks of data, known as sectors. Sectors are identified by a sector number, starting at 0 and continuing to the last sector on the disk (whose number is the disk size divided by 512 minus 1). HFS Plus allocates sectors in groups called allocation blocks; an allocation block is simply a group of consecutive sectors. The size (in bytes) of an allocation block is a power of two, greater than or equal to 512, which is set when the volume is initialized. This value cannot be easily changed without reinitializing the volume. Allocation blocks are identified by a 32-bit allocation block number, so there can be at most 232 allocation blocks on a volume. Future implementations of the file system will be optimized for 4K allocation blocks. All of the volume's structures, including the volume header, are part of one or more allocation blocks. This differs from HFS, which has several structures (including the boot blocks, master directory block, and bitmap) which are not part of any allocation block. You can find the first sector of an allocation block by simply multiplying the allocation block number by the allocation block size and dividing by the sector size (512).
To promote file contiguity and avoid fragmentation, disk space is typically allocated to files in groups of allocation blocks, or clumps. The clump size is always a multiple of the allocation block size. The default clump size is specified in the volume header.
Every HFS Plus volume must have a volume header. The volume header contains sundry information about the volume, such as the date and time of the volume's creation and the number of files on the volume, as well as the location of the other key structures on the volume. The volume header is always located at sector 2. A copy of the volume header, known as the alternate volume header, is stored in the second to last sector of the volume (starting 1024 bytes before the end of the volume). Sectors 0 and 1, and the last sector on the volume, are reserved. The actual number of allocation blocks occupied by the volume header and the alternate volume header varies depending on the allocation block size. An HFS Plus volume contains five special files, which store the file system structures required to access the file system payload: folders, user files, and attributes. The special files are the catalog file, the extents overflow file, the allocation file, the attributes file and the startup file. Special files only have a single fork (the data fork) and the extents of that fork are described in the volume header. The catalog file is a special file that describes the folder and file hierarchy on a volume. The catalog file contains vital information about all the files and folders on a volume, as well as the catalog information, for the files and folders that are stored in the catalog file. The catalog file is organized as a B-tree (or "balanced tree") to allow quick and efficient searches through a large folder hierarchy. The catalog file stores the file and folder names, which consist of up to 255 Unicode characters, as described below.
The attributes file is another special file which contains additional data for a file or folder. Like the catalog file, the attributes file is organized as a B-tree. In the future, it will be used to store information about additional forks. (This is similar to the way the catalog file stores information about the data and resource forks of a file.) HFS Plus tracks which allocation blocks belong to a fork by maintaining a list of the fork's extents. An extent is a contiguous range of allocation blocks allocated to some fork, represented by a pair of numbers: the first allocation block number and the number of allocation blocks. For a user file, the first eight extents of each fork are stored in the volume's catalog file. Any additional extents are stored in the extents overflow file, which is also organized as a B-tree. The extents overflow file also stores additional extents for the special files except for the extents overflow file itself. However, if the startup file requires more than the eight extents in the Volume Header (and thus requires additional extents in the extents overflow file), it would be much harder to access, and defeat the purpose of the startup file. So, in practice, a startup file should be allocated such that it doesn't need additional extents in the extents overflow file. The allocation file is a special file which specifies whether an allocation block is used or free. This performs the same role as the HFS volume bitmap, although making it a file adds flexibility to the volume format. The startup file is another special file which facilitates booting of non-Mac OS computers from HFS Plus volumes. Finally, the bad block file prevents the volume from using certain allocation blocks because the portion of the media that stores those blocks is defective. The bad block file is neither a special file nor a user file; this is merely convention used in the extents overflow file. See Bad Block File for more details. Broad StructureThe bulk of an HFS Plus volume consists of seven types of information or areas:
The general structure of an HFS Plus volume is illustrated in Figure 1. Figure 1 Organization of an HFS Plus Volumes The volume header is always at a fixed location (sector 2). However, the special files can appear anywhere between the volume header block and the alternate volume header block. These files can appear in any order and are not necessarily contiguous. The information on HFS Plus volumes is organized solely in allocation blocks. Allocation blocks are simply a means of grouping sectors into more convenient parcels. The size of an allocation block is a power of two, and at least 512. The allocation block size is a volume header parameter whose value is set when the volume is initialized; it cannot be changed easily without reinitializing the volume.
Primitive Data TypesThis section describes the primitive data types used on an HFS Plus volume. All data structures in this volume are defined in the C language. The specification assumes that the compiler will not insert any padding fields. Any necessary padding fields are explicitly declared.
Reserved and Pad FieldsIn many places this specification describes a field, or bit within a field, as reserved. This has a definite meaning, namely:
This definition allows for backward-compatible enhancements to the volume format. Pad fields have exactly the same semantics as a reserved field. The different name merely reflects the designer's goals when including the field, not the behavior of the implementation. Integer TypesAll integer values are defined by one of the following primitive types: All multi-byte integer values are stored in big-endian format. That is, the bytes are stored in order from most significant byte through least significant byte, in consecutive bytes, at increasing offset from the start of a block. HFS Plus NamesFile and folder names on HFS Plus consist of up to 255 Unicode characters with a preceding 16-bit length, defined by the type
HFS Plus stores strings fully decomposed and in canonical order. HFS Plus compares strings in a case-insensitive fashion. Strings may contain Unicode characters that should be ignored by this comparison. For more details on these subtleties, see Unicode Subtleties. Text EncodingsCurrent Mac OS programming interfaces pass filenames as Pascal strings (either as a HFS Plus includes two features specifically designed to help Mac OS handle the conversion between Mac OS-encoded Pascal strings and Unicode. The first feature is the The valid values for the Table 2 Text Encodings
The second use of text encodings in HFS Plus is the It is acceptable for a bit in this bitmap to be set even though no names on the volume use that encoding. This means that when an implementation deletes or renames an object, it does not have to clear the encoding bit if that was the last name to use the given encoding.
HFS Plus DatesHFS Plus stores dates in several data structures, including the volume header and catalog records. These dates are stored in unsigned 32-bit integers ( The maximum representable date is February 6, 2040 at 06:28:15 GMT. The date values do not account for leap seconds. They do include a leap day in every year that is evenly divisible by four. This is sufficient given that the range of representable dates does not contain 1900 or 2100, neither of which have leap days. The implementation is responsible for converting these times to the format expected by client software. For example, the Mac OS File Manager passes dates in local time; the Mac OS HFS Plus implementation converts dates between local time and GMT as appropriate.
HFS Plus PermissionsFor each file and folder, HFS Plus maintains a record containing access permissions, defined by the
The fields have the following meaning:
Fork Data StructureHFS Plus maintains information about the contents of a file using the An unused extent descriptor in an extent record would have both
The fields have the following meaning:
The
The fields have the following meaning:
|
Volume HeaderEach HFS Plus volume contains a volume header at sector 2. The volume header -- analogous to the master directory block (MDB) for HFS -- contains information about the volume as a whole, including the location of other key structures in the volume. The implementation is responsible for ensuring that this structure is updated before the volume is unmounted. A copy of the volume header, the alternate volume header, is stored in the second to last sector of the volume (starting 1024 bytes before the end of the volume). The implementation should only update this copy when the length of one of the special files changes. The alternate volume header is intended for use solely by disk repair utilities. The first two sectors (before the volume header) and the last sector (after the alternate volume header) are reserved.
The allocation blocks containing these five sectors (sectors 0 through 2, and the last two sectors) are marked as used in the allocation file (see the Allocation File section). The volume header is described by the
The fields have the following meaning:
Volume AttributesThe
The bits have the following meaning:
|
B-Trees
This section describes the B-tree structure used for the catalog, extents overflow, and attributes files. A B-tree is stored in file data fork. Each B-tree has a
A B-tree file is divided up into fixed-size nodes, each of which contains records, which consist of a key and some data. The purpose of the B-tree is to efficiently map a key into its corresponding data. To achieve this, keys must be ordered, that is, there must be a well-defined way to decide whether one key is smaller than, equal to, or larger than another key. The node size (which is expressed in bytes) must be power of two, from 512 through 32,768, inclusive. The node size of a B-tree is determined when the B-tree is created. The logical length of a B-tree file is just the number of nodes times the node size. There are four kinds of nodes.
All nodes share a common structure, described in the next section. Node StructureNodes are indicated by number. The node's number can be calculated by dividing its offset into the file by the node size. Each node has the same general structure, consisting of three main parts: a node descriptor at the beginning of the node, a list of record offsets at the end of the node, and a list of records. This structure is depicted in Figure 2. Figure 2 The structure of a node
The node descriptor contains basic information about the node as well as forward and backward links to other nodes. The
The fields have the following meaning:
A node descriptor is always 14 (which is The records are accessed using the list of record offsets at the end of the node. Each entry in this list is a
The
It's important to realise that the B-tree node type determines the type of records found in the node. Leaf nodes always contain data records. Index nodes always contain pointer records. Map nodes always contain map records. The header node always contains a header record, a reserved record, and a map record. The four node types and their corresponding records are described in the subsequent sections. Header NodesThe first node (node 0) in every B-tree file is a header node, which contains essential information about the entire B-tree file. There are three records in the header node. The first record is the B-tree header record. The second is a reserved record that is always 128 bytes long. The last record is the B-tree map record; it occupies all of the remaining space between the reserved record and the record offsets. The header node is shown in Figure 3. Figure 3 Header node structure The Header RecordThe B-tree header record contains general information about the B-tree such as its size, maximum key length, and the location of the first and last leaf nodes. The data type
The fields have the following meaning:
The following constants define the various bits that may be set in the
The bits have the following meaning:
Bits not specified here must be treated as reserved. Reserved RecordThe second record in a header node is always 128 bytes long. It is not used by HFS Plus B-trees. An implementation must treat the contents of this record as a reserved field. Map RecordThe remaining space in the header node is occupied by a third record, the map record. It is a bitmap that indicates which nodes in the B-tree are used and which are free. The bits are interpreted in the same way as the bits in the allocation file. All tolled, the node descriptor, header record, reserved record, and record offsets occupy 256 bytes of the header node. So the size of the map record (in bytes) is Map NodesIf the map record of the header node is not large enough to represent all of the nodes in the B-tree, map nodes are used to store the remaining allocation data. In this case, the
A map node consists of the node descriptor and a single map record. The map record is a continuation of the map record contained in the header node. The size of the map record is the size of the node, minus the size of the node descriptor (14 bytes), minus the size of two offsets (4 bytes), minus two bytes of free space. That is, the size of the map record is the size of the node minus 20 bytes; this keeps the length of the map record an even multiple of 4 bytes. Note that the start of the map record is not aligned to a 4-byte boundary: it starts immediately after the node descriptor (at an offset of 14 bytes). The B-tree uses as many map nodes as needed to provide allocation data for all of the nodes in the B-tree. The map nodes are chained through the
Keyed RecordsThe records in index and leaf nodes share a common structure. They contain a The first part of the record,
Immediately following the Following the key is the record's data. The format of this data depends on the node type, as explained in the next two sections. However, the data is always aligned on a two-byte boundary and occupies an even number of bytes. To meet the first alignment requirement, a pad byte must be inserted between the key and the data if the size of the Index NodesThe records in an index node are called pointer records. They contain a
Leaf NodesThe bottom level of a B-tree is occupied exclusively by leaf nodes, which contain data records instead of pointer records. The data records contain a In an HFS Plus B-tree, the data in the data record is the HFS Plus volume structure (such as a Searching for Keyed RecordsA B-tree is highly structured to allow for efficient searching, insertion, and removal. This structure primarily affects the keyed records (pointer records and data records) and the nodes in which they are stored (index nodes and leaf nodes). The following are the ordering requirements for index and leaf nodes:
Keeping the keys ordered in this way makes it possible to quickly search the B-tree to find the data associated with a given key. Figure 4 shows a sample B-tree containing hypothetical keys (in this case, the keys are simply integers). When an implementation needs to find the data associated with a particular search key, it begins searching at the root node. Starting with the first record, it searches for the record with the greatest key that is less than or equal to the search key. In then moves to the child node (typically an index node) and repeats the same process. This process continues until a leaf node is reached. If the key found in the leaf node is equal to the search key, the found record contains the desired data associated with the search key. If the found key is not equal to the search key, the search key is not present in the B-tree. Figure 4 A sample B-Tree HFS and HFS Plus B-Trees ComparedThe structure of the B-trees on an HFS Plus volume is a closely related to the B-tree structure used on an HFS volume. There are three principal differences: the size of nodes, the size of keys within index nodes, and the size of a key length (UInt8 vs. UInt16). Node SizesIn an HFS B-tree, nodes always have a fixed size of 512 bytes. In an HFS Plus B-tree, the node size is determined by a field (
HFS Plus uses the following default node sizes:
These sizes are set when the volume is initialized and cannot be easily changed. It is legal to initialize an HFS Plus volume with different node sizes, but the node sizes must be large enough for an index node to contain two keys of maximum size (plus the other overhead such as a node descriptor, record offsets, and pointers to children).
Key Size in an Index NodeIn an HFS B-tree, all of the keys in an index node occupy a fixed amount of space: the maximum key length for that B-tree. This simplifies the algorithms for inserting and deleting records because, within an index node, one key can be replaced by another key without worrying whether there is adequate room for the new key. However, it is also somewhat wasteful when the keys are variable length (such as in the catalog file, where the key length varies with the length of the file name). In an HFS Plus B-tree, the keys in an index node are allowed to vary in size. This complicates the algorithms for inserting and deleting records, but reduces wasted space when the length of a key can vary (such as in the catalog file). It also means that the number of keys in an index node will vary with the actual size of the keys. |
Catalog FileHFS Plus uses a catalog file to maintain information about the hierarchy of files and folders on a volume. A catalog file is organized as a B-tree file, and hence consists of a header node, index nodes, leaf nodes, and (if necessary) map nodes. The location of the first extent of the catalog file (and hence of the file's header node) is stored in the volume header. From the catalog file's header node, an implementation can obtain the node number of the root node of the B-tree. From the root node, an implementation can search the B-tree for keys, as described in the previous section. The B-Trees chapter defined a standard rule for the node size of HFS Plus B-trees. As the catalog file is a B-tree, it inherits the requirements of this rule. In addition, the node size of the catalog file must be at least 4 KB ( Each file or folder in the catalog file is assigned a unique catalog node ID (CNID). For folders, the CNID is the folder ID, sometimes called a directory ID, or dirID; for files, it's the file ID. For any given file or folder, the parent ID is the CNID of the folder containing the file or folder, known as the parent folder. The catalog node ID is defined by the
The first 16 CNIDs are reserved for use by Apple Computer, Inc., and include the following standard assignments:
These constants have the following meaning:
In addition, the CNID of zero is never used and serves as a nil value. The implementation must store a number greater than the largest CNID used by any file or folder on the volume in the As the catalog file is a B-tree file, it inherits its basic structure from the definition in B-Trees. Beyond that, you need to know only two things about an HFS Plus catalog file to interpret its data:
Catalog File KeyFor a given file, folder, or thread record, the catalog file key consists of the parent folder's CNID and the name of the file or folder. This structure is described using the
The fields have the following meaning:
Catalog file keys are compared first by For more information about how catalog keys are used to find file, folder, and thread records within the catalog tree, see Catalog Tree Usage. Catalog File DataA catalog file leaf node can contain four different types of data records:
Each record starts with a
The values have the following meaning:
The next three sections describe the folder, file, and thread records in detail.
Catalog Folder RecordsThe catalog folder record is used in the catalog B-tree file to hold information about a particular folder on the volume. The data of the record is described by the
The fields have the following meaning:
Catalog File RecordsThe catalog file record is used in the catalog B-tree file to hold information about a particular file on the volume. The data of the record is described by the
The fields have the following meaning:
For each fork, the first eight extents are described by the The following constants define bit flags in the file record's
The values have the following meaning:
Catalog Thread RecordsThe catalog thread record is used in the catalog B-tree file to link a CNID to the file or folder record using that CNID. The data of the record is described by the
The fields have the following meaning:
The next section explains how thread records can be used to find a file or folder using just its CNID. Catalog Tree UsageFile and folder records always have a key that contains a non-empty The key for a thread record is the file's or folder's CNID (not the CNID of the parent folder) and an empty (zero length) Finding a file or folder by its CNID is a two-step process. The first step is to use the CNID to look up the thread record for the file or folder. This yields the file or folder's parent folder ID and name. The second step is to use that information to look up the real file or folder record. Since files do not contain other files or folders, there are no catalog records whose key has a |
Extents Overflow FileHFS Plus tracks which allocation blocks belong to a file's forks by maintaining a list of extents (contiguous allocation blocks) that belong to that file, in the appropriate order. Each extent is represented by a pair of numbers: the first allocation block number of the extent and the number of allocation blocks in the extent. The file record in the catalog B-tree contains a record of the first eight extents of each fork. If there are more than eight extents in a fork, the remaining extents are stored in the extents overflow file.
Like the catalog file, the extents overflow file is B-tree. However, the structure of the extents overflow file is relatively simple compared to that of a catalog file. The extents overflow file has a simple, fixed length key and a single type of data record. Extents Overflow File KeyThe structure of the key for the extents overflow file is described by the
The fields have the following meaning:
Two Extents Overflow File DataThe data records for an extents overflow file (the extent records) are described by the
Extents Overflow File UsageThe most important thing to remember about extents overflow file is that it is only used for forks with more than eight extents. In most cases, forks have fewer extents, and all the extents information for the fork is held in its catalog file record. However, for more fragmented forks, the extra extents information is stored in the extents overflow file. When an implementation needs to map a fork offset into a sector on disk, it first looks through the extent records in the catalog file record. If the fork offset is within one these extents, the implementation can find the corresponding sector without consulting the extents overflow file. If, on the other hand, the fork offset is beyond the last extent recorded in the catalog file record, the implementation must look in the next extent record, which is stored in the extents overflow file. To find this record, the implementation must form a key, which consists of information about the fork (the fork type and the file ID) and the offset info the fork (the start block). Because extent records are partially keyed off the fork offset of the first extent in the record, the implementation must have all the preceding extent records in order to know the fork offset to form the key of the next extent record. For example, if the fork has two extent records in the extents overflow file, the implementation must read the first extent record to calculate the fork offset for the key for the second extent record. However, you can use the We've got a fork with a total of 23 extents (very fragmented!). The The block counts for the catalog's fork data are: 6, 1, 1, 1, 1, 1, 1, 1. There is an extent overflow record whose startBlock is 13 (0+6+1+1+1+1+1+1+1), and has the following block counts: 1, 1, 1, 1, 1, 1, 1, 2. There is a second extent overflow record whose Suppose the allocation block size for the volume is 4K. Suppose we want to start reading from the file at an offset of 108K. We want to know where the data is on the volume, and how much contiguous data is there. First, we divide 108K (the fork offset) by 4K (the allocation block size) to get 27, which is the number of allocation blocks from the start of the fork. So, we want to know where fork allocation block #27 is. We notice that 27 is greater than or equal to 13 (the number of allocation blocks in the catalog's fork data), so we're going to have to look in the extents B-tree. We construct a search key with the appropriate We compute 27 (the desired fork allocation block) minus 22 (the We grab the Bad Block FileThe extent overflow file is also used to hold information about the bad block file. The bad block file is used to mark areas on the disk as bad, unable to be used for storing data. This is typically used to map out bad sectors on the disk.
When an HFS Plus volume is embedded within an HFS wrapper (the way Mac OS normally initializes a hard disk), the space used by the HFS Plus volume is marked as part of the bad block file within the HFS wrapper itself. (This sounds confusing because you have a volume within another volume.) The bad block file is not a file in the same sense as a user file (it doesn't have a file record in the catalog) or one of the special files (it's not referenced by the volume header). Instead, the bad block file uses a special CNID (
Bad block extent records are always assumed to reference the data fork. The
HFS uses a similar mechanism to store information about bad blocks. This facility is used by the HFS Wrapper to hold an entire HFS Plus volume as bad blocks on an HFS disk. |
Allocation FileHFS Plus uses an allocation file to keep track of whether each allocation block in a volume is currently allocated to some file system structure or not. The contents of the allocation file is a bitmap. The bitmap contains one bit for each allocation block in the volume. If a bit is set, the corresponding allocation block is currently in use by some file system structure. If a bit is clear, the corresponding allocation block is not currently in use, and is available for allocation.
Each byte in the allocation file holds the state of eight allocation blocks. The byte at offset X into the file contains the allocation state of allocations blocks (X * 8) through (X * 8 + 7). Within each byte, the most significant bit holds information about the allocation block with the lowest number, the least significant bit holds information about the allocation block with the highest number. Listing 1 shows how you would test whether an allocation block is in use, assuming that you've read the entire allocation file into memory. Listing 1 Determining whether an allocation block is in use
The size of the allocation file depends on the number of allocation blocks in the volume, which in turn depends both on the number of sectors on the disk and on the size of the volume's allocation blocks (the number of sectors per allocation block). For example, a volume on a 1 GB disk and having an allocation block size of 4 KB needs an allocation file size of 256 Kbits (32 KB, or 8 allocation blocks). Since the allocation file itself is allocated using allocation blocks, it always occupies an integral number of allocation blocks (its size may be rounded up). The allocation file may be larger than the minimum number of bits required for the given volume size. Any unused bits in the bitmap must be set to zero.
|
Attributes FileThe HFS Plus attributes file is reserved for implementing named forks in the future. An attributes file is organized as a B-tree file. It a special file, described by an It is possible for a volume to have no attributes file. If the first extent of the attributes file (stored in the volume header) has zero allocation blocks, the attributes file does not exist. The B-Trees chapter defined a standard rule for the node size of HFS Plus B-trees. As the attributes file is a B-tree, it inherits the requirements of this rule. In addition, the node size of the attributes file must be at least 4 KB (
Attributes File Data
The leaf nodes of an attributes file contain data records, known as attributes. There are two types of attributes:
Each record starts with a
The values have the following meaning:
The next two sections describe the fork data and extension attributes in detail. Fork Data AttributesA fork data attribute is defined by the
The fields have the following meaning:
Extension AttributesA extension attribute is defined by the
The fields have the following meaning:
Attributes and the Allocation File Consistency CheckWhile the key structure for the attributes file is not fully specified, it is still possible for an implementation to use attribute file information in its allocation file consistency check. The leaf records of the attribute file are fully defined, so the implementation can simply iterate over them to determine which allocation blocks on the disk are being used by fork data attributes. See Allocation File Consistency Check for details. |
Startup FileThe startup file is a special file intended to hold information needed when booting a system that does not have built-in (ROM) support for HFS Plus. A boot loader can find the startup file without full knowledge of the HFS Plus volume format (B-trees, catalog file, and so on). Instead, the volume header contains the location of the first eight extents of the startup file. |
IMPORTANT: |
Note: |
Unicode SubtletiesHFS Plus makes heavy use of Unicode strings to store file and folder names. However, Unicode is still evolving, and its use within a file system presents a number of challenges. This section describes some of the challenges, along with the solutions used by HFS Plus. |
IMPORTANT: |
IMPORTANT: |
Note: |
Unicode allows some sequences of characters to be represented by multiple, equivalent forms. For example, the character "" can be represented as the single Unicode character u+00E9 (latin small letter e with acute), or as the two Unicode characters u+0065 and u+0301 (the letter "e" plus a combining acute symbol). To reduce complexity in the B-tree key comparison routines (which have to compare Unicode strings), HFS Plus defines that Unicode strings will be stored in fully decomposed form, with composing characters stored in canonical order. The other equivalent forms are illegal in HFS Plus strings. An implementation must convert these equivalent forms to the fully decomposed form before storing the string on disk. The Unicode Decomposition table contains a list of characters that are illegal as part of an HFS Plus string, and the equivalent character(s) that should be used instead. Any character appearing in a column titled "Illegal", must be replaced by the character(s) in the column immediately to the right (titled "Replace With"). In addition, the Korean Hangul characters with codes in the range u+AC00 through u+D7A3 are illegal and must be replaced with the equivalent sequence of conjoining jamos, as described in the Unicode 2.0 book, section 3.10.
So, for the example given earlier, "" must be stored as the two Unicode characters u+0065 and u+0301 (in that order). The Unicode character u+00E9 may not appear in a Unicode string used as part of an HFS Plus B-tree key. String Comparison AlgorithmIn HFS Plus, strings must be compared in a case- insensitive fashion. The Unicode standard does not strictly define upper and lower case equivalence, although it does suggest some equivalences. The HFS Plus string comparison algorithm (defined below) include a concrete case equivalence definition. An implementation must use the equivalence expressed by this algorithm. Furthermore, Unicode requires that certain formatting characters be ignored (skipped over) during string comparisons. The algorithm and tables used for case equivalence also arrange to ignore these characters. An implementations must ignore the characters that are ignored by this algorithm. The HFS Plus string comparison algorithm is defined by the |
Note: |
// // FastUnicodeCompare - Compare two Unicode strings; produce a relative ordering // // IF RESULT // -------------------------- // str1 < str2 => -1 // str1 = str2 => 0 // str1 > str2 => +1 // // The lower case table starts with 256 entries (one for each of the upper bytes // of the original Unicode char). If that entry is zero, then all characters with // that upper byte are already case folded. If the entry is non-zero, then it is // the _index_ (not byte offset) of the start of the sub-table for the characters // with that upper byte. All ignorable characters are folded to the value zero. // // In pseudocode: // // Let c = source Unicode character // Let table[] = lower case table // // lower = table[highbyte(c)] // if (lower == 0) // lower = c // else // lower = table[lower+lowbyte(c)] // // if (lower == 0) // ignore this character // // To handle ignorable characters, we now need a loop to find the next valid // character. Also, we can't pre-compute the number of characters to compare; // the string length might be larger than the number of non-ignorable characters. // Further, we must be able to handle ignorable characters at any point in the // string, including as the first or last characters. We use a zero value as a // sentinel to detect both end-of-string and ignorable characters. Since the File // Manager doesn't prevent the NULL character (value zero) as part of a filename, // the case mapping table is assumed to map u+0000 to some non-zero value (like // 0xFFFF, which is an invalid Unicode character). // // Pseudocode: // // while (1) { // c1 = GetNextValidChar(str1) // returns zero if at end of string // c2 = GetNextValidChar(str2) // // if (c1 != c2) break // found a difference // // if (c1 == 0) // reached end of string on both // // strings at once? // return 0; // yes, so strings are equal // } // // // When we get here, c1 != c2. So, we just need to determine which one is // // less. // if (c1 < c2) // return -1; // else // return 1; // SInt32 FastUnicodeCompare ( register ConstUniCharArrayPtr str1, register ItemCount length1, register ConstUniCharArrayPtr str2, register ItemCount length2) { register UInt16 c1,c2; register UInt16 temp; register UInt16* lowerCaseTable; lowerCaseTable = gLowerCaseTable; while (1) { // Set default values for c1, c2 in case there are no more valid chars c1 = 0; c2 = 0; // Find next non-ignorable char from str1, or zero if no more while (length1 && c1 == 0) { c1 = *(str1++); --length1; if ((temp = lowerCaseTable[c1>>8]) != 0) // is there a subtable // for this upper byte? c1 = lowerCaseTable[temp + (c1 & 0x00FF)]; // yes, so fold the char } // Find next non-ignorable char from str2, or zero if no more while (length2 && c2 == 0) { c2 = *(str2++); --length2; if ((temp = lowerCaseTable[c2>>8]) != 0) // is there a subtable // for this upper byte? c2 = lowerCaseTable[temp + (c2 & 0x00FF)]; // yes, so fold the char } if (c1 != c2) // found a difference, so stop looping break; if (c1 == 0) // did we reach the end of both strings at the same time? return 0; // yes, so strings are equal } if (c1 < c2) return -1; else return 1; } /* The lower case table consists of a 256-entry high-byte table followed by some number of 256-entry subtables. The high-byte table contains either an offset to the subtable for characters with that high byte or zero, which means that there are no case mappings or ignored characters in that block. Ignored characters are mapped to zero. */ UInt16 gLowerCaseTable[] = { // High-byte indices ( == 0 if no case mapping and no ignorables ) // Full data tables omitted for brevity. // See the Downloadables section for URL to download the code. }; |
HFS WrapperAn HFS Plus volume may be contained within an HFS volume in a way that makes the volume look like an HFS volume to systems without HFS Plus support. This has a two important advantages:
The rest of this section describes how the HFS wrapper is laid out and how the HFS Plus volume is embedded within the wrapper. |
IMPORTANT: |
Note: |
HFS Master Directory BlockAn HFS volume always contains a Master Directory Block (MDB), in sector 2. The MDB is similar to an HFS Plus volume header. In order to support volumes embedded within an HFS volume, several unused fields of the MDB have been changed, and are now used to indicate the type, location, and size of the embedded volume. What was formerly the What were formerly the |
Note: |
To actually find the embedded volume's sectors on disk, an implementation must use the |
IMPORTANT: Listing 2 Sector transform for embedded volumes |
static UInt32 HFSPlusSectorToDiskSector(UInt32 hfsPlusSector) { UInt32 embeddedDiskOffset; embeddedDiskOffset = gMDB.drAlBlSt + gMDB.drEmbedExtent.startBlock * (drAlBlkSiz / 512) return embeddedDiskOffset + hfsPlusSector; } |
In order to prevent accidentally changing the files in the HFS wrapper, the wrapper volume must be marked as software-write-protected by setting To improve performance of HFS Plus volumes, the size of the wrapper's allocation blocks should be a multiple of the size of the HFS Plus volume's allocation blocks. In addition, the wrapper's allocation block start ( Allocating Space for the Embedded VolumeThe space occupied by the embedded volume must be marked as allocated in the HFS wrapper's volume bitmap (similar to the HFS Plus allocation file) and placed in the HFS wrapper's bad block file (similar to the HFS Plus bad block file). This doesn't mean the blocks are actually bad; it merely prevents the HFS Plus volume from being overwritten by newly created files in the HFS wrapper, being deleted accidentally, or being marked as free, usable space by HFS disk repair utilities. The |
IMPORTANT: |
As initialized by the Mac OS Disk Initialization Package, the HFS wrapper volume contains five files in the root folder.
In addition, the root folder is set as the blessed folder by placing its folder ID in the first |
Volume Consistency ChecksAn HFS Plus volume is a complex data structure, consisting of many different inter-related data structures. Inconsistencies between these data structures could cause serious data loss. When an HFS Plus implementation mounts a volume, it must perform basic consistency checks to ensure that the volume is consistent. In addition, the implementation may choose to implement other, more advanced, consistency checks. Many of these consistency checks take a significant amount of time to run. While a safe implementation might run these checks every time a volume is mounted, most implementations will want to rely on the correctness of the previous implementation that modified the disk. The implementation may avoid unnecessary checking by determining whether the volume was last unmounted cleanly. If it was, the implementation may choose to skip a consistency check. An implementation can determine whether a volume was unmounted cleanly by looking at various flag bits in the volume header. See Volume Attributes for details. Next Catalog Node ID Consistency CheckFor an HFS Plus volume to work correctly, it's vital that the
|
WARNING: |
For an HFS Plus volume to work correctly, it's vital that any allocation block in use by file system structures be marked as allocated in the allocation file. The algorithm to ensure this is as follows:
|
WARNING: |
SummaryVolume format specifications are fun. |
|
|
AcknowledgmentsThanks to Quinn "The Eskimo!", Mark Day, Pete Gontier, Ingrid Kelly, Otto Schlosser, and all those brave souls who reviewed this document in its various incarnations. |