DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
[Next] [Previous] [Top] [Contents] [Index]

VxFS System Administrator's Guide

Disk Layout

Chapter 2


Introduction

Three disk layouts are available with the VERITAS File System:

Version 1
The Version 1 disk layout is the original VxFS disk layout provided with pre-2.0 versions of VxFS.

Version 2
The Version 2 disk layout was designed to support features such as filesets, dynamic inode allocation, and enhanced security.

Version 4
The Version 4 disk layout encompasses all file system structural information in files, rather than at fixed locations on disk, allowing for greater scalability. Version 4 supports files up to two terabytes in size, file systems up to one terabyte in size, along with user quotas on file system resources.


Note: The Version 3 disk layout is not supported on SCO UnixWare.


The following topics are covered in this chapter:

All disk layout versions are supported by VxFS. Once VxFS Release 3.2 is installed on a system, new file systems are created with the Version 4 layout by default. Although mkfs_vxfs(1M) allows the user to specify other disk layouts, it is generally preferable to use the Version 4 layout for new file systems.

The vxupgrade command is provided to upgrade an existing VxFS file system to the Version 4 layout while the file system remains online. See the vxupgrade(1M) manual page for details on upgrading VxFS file systems.

Disk Space Allocation

Disk space is allocated by the system in 512 byte sectors. An integral number of sectors are grouped together to form a logical block. VxFS supports logical block sizes of 1024, 2048, 4096, and 8192 bytes. The default block size is 1024 bytes. The block size may be specified as an argument to the mkfs utility and may vary between VxFS file systems mounted on the same system. VxFS allocates disk space to files in extents. An extent is a set of contiguous
blocks.

The VxFS Version 1 Disk Layout

This section describes the VxFS Version 1 disk layout.

Overview

The VxFS Version 1 disk layout, as shown in Figure 1, includes

These elements are discussed in detail in the sections that follow.

Figure 1 VxFS Version 1 Disk Layout

Super-Block

The super-block contains important information about the file system, such as:

Refer to the fs(4) manual page for details on the contents of the super-block.

The super-block is always in a fixed location, offset from the start of the file system by 1024 bytes. This fixed location enables utilities to easily locate the super-block when necessary. The super-block is 1024 bytes long.

Copies of the super-block are kept in allocation unit headers: these copies can be used for recovery purposes if the super-block is corrupted or destroyed (see the fsck_vxfs(1M) manual page for more details).

Intent Log

In the event of system failure, the VxFS file system uses intent logging to guarantee file system integrity.

The intent log is a circular activity log with a default size of 1024 blocks. If the file system is smaller than 4 MB, the default log size is reduced (by mkfs) to avoid wasting space. The intent log contains records of the intention of the system to update a file system structure. An update to the file system structure (a transaction) is divided into separate subfunctions for each data structure that needs to be updated. A composite log record of the transaction is created, containing the subfunctions constituting the transaction.

For example, the creation of a file that would expand the directory in which the file is contained would produce a transaction consisting of the following subfunctions:

VxFS maintains log records in the intent log for all pending changes to the file system structure and ensures that the log records are written to disk in advance of the changes to the file system. Once the intent log has been written, the transaction's other updates to the file system can be written in any order. In the event of a system failure, the pending changes to the file system are either nullified or completed by the fsck utility. The VxFS intent log generally only records changes to the file system structure. File data changes are not normally logged.

Allocation Unit

An allocation unit is a group of consecutive blocks in a file system that contain a resource summary, free resource maps, inodes, data blocks, and a copy of the super-block. An allocation unit in the VxFS file system is similar in concept to the ufs "cylinder group." Each component of an allocation unit begins on a block boundary. The VxFS Version 1 allocation unit is shown in Figure 2.

Figure 2 Allocation Unit Structure

One or more allocation units exist per file system. Allocation units are located immediately after the intent log. The number and size of allocation units can be specified when the file system is made. All of the allocation units, except possibly the last one, are of equal size. If space is limited, the last allocation unit can have a partial set of data blocks to allow use of all remaining blocks.

Allocation Unit Header

The allocation unit header contains a copy of the file system's super-block that is used to verify that the allocation unit matches the super-block of the file system. The super-block copies contained in allocation unit headers can also be used for recovery purposes if the super-block is corrupted or destroyed. The allocation unit header occupies the first block of each allocation unit.

Allocation Unit Summary

The allocation unit summary contains the number of inodes with extended operations pending, the number of free inodes, and the number of free extents in the allocation unit.

Free Inode Map

The free inode map is a bitmap that indicates which inodes are free and which are allocated. A free inode is indicated by the bit being on. Inodes zero and one are reserved by the file system; inode two is the inode for the root directory; inode three is the inode for the lost+found directory.

Extended Inode Operations Map

The extended inode operations map keeps track of inodes on which operations would remain pending for too long to reside in the intent log. The extended inode operations map is in the same format as the free inode map. To prevent the intent log from wrapping and the transaction from getting overwritten, the required operations are stored in the affected inode (if the transaction has not completed, it does not get overwritten, the new log waits and the file system is frozen). This map is then updated to identify the inodes that have extended operations that need to be completed.

Free Extent Map

The free extent map is a series of independent 512 byte bitmaps that are each referred to as a free extent map section. Each section is broken down into multiple regions. The first region, of 2048 bits, represents a section of 2048 one-block extents. The second region, of 1024 bits, represents a section of 1024 two-block extents. This regioning continues for all powers of 2 up to the single bit that represents one 2048 block extent.

The one block bitmaps always represent the true allocation of blocks from the allocation unit. The remaining bitmaps remap these same blocks, in a "binary buddy" scheme, in increasingly larger sized groups. As smaller extents are needed, the larger groups of blocks mapped by the buddy maps are broken apart to create the smaller extents. This way the file system can look for an extent closest in size to the space required to keep files as contiguous as possible for faster performance.

Inode List

An inode is a data structure that contains information about a file. The VxFS default inode size is currently 256 bytes.

Each inode stores information such as the following about a particular file:

There are up to ten direct extent address size pairs per inode. Each direct extent address indicates the starting block number of a direct extent; direct extent sizes can vary. If all of the direct extents are used, two indirect address extents are available for use in each inode:

Each indirect address extent is 8K long and contains 2048 entries. All indirect data extents for a given file have the same size, which is determined when the file's first indirect data extent is allocated.

The inode list is a series of inodes. There is one inode in the list for every file in the file system.

Padding

It may be desirable to align data blocks to a physical boundary. To facilitate this, the system administrator may specify that a gap be left between the end of the inode list and the first data block.

Data Blocks

The balance of the allocation unit is occupied by data blocks. Data blocks contain the actual data stored in files and directories. The default ratio of inodes to data blocks is one inode (256 bytes) for every four data blocks (1024 bytes each).

The VxFS Version 2 Disk Layout

This section describes the VxFS Version 2 disk layout.

Due to the relatively complex nature of the Version 2 layout, the sections that follow are arranged to cover the following general areas:

Overview

Many aspects of the Version 1 disk layout are preserved in the Version 2 disk layout. However, the Version 2 layout differs from the Version 1 layout in that it includes support for the following features:

The addition of filesets and dynamic allocation of inodes has affected the disk layout in various ways. In particular, many of the file system structures are now located in files (referred to as structural files) rather than in fixed disk areas. This provides a simple mechanism for dynamic growth of structures. For example, inodes are now stored in structural files and allocated as needed. In general, file system structures that deal with space allocation are still in fixed disk locations, while most other structures are dynamically allocated and have become clients of the file system's disk space allocation scheme.

Basic Layout

This section describes the structural elements of the file system that exist in fixed locations on the disk.

The VxFS Version 2 disk layout is illustrated in Figure 3 and is composed of

These and other elements are discussed in detail in the sections that follow.

Figure 3 VxFS Version 2 Disk Layout

Super-Block

The super-block contains important information about the file system, such as

The super-block is always in a fixed location, offset from the start of the file system by 1024 bytes. This fixed location enables utilities to easily locate the super-block when necessary. The super-block is 1024 bytes long.

Copies of the super-block are kept in allocation unit headers: these copies can be used for recovery purposes if the super-block is corrupted or destroyed (see the fsck(1M) manual page).

Object Location Table

The object location table (OLT) can be considered an extension of the super-block. The OLT contains information used at mount time to locate file system structures that are not in fixed locations. The OLT is typically located immediately after the super-block and is 8K long. However, if a Version 1 file system is upgraded to Version 2, the placement of the OLT depends on the availability of space.

The OLT is replicated and its replica is located immediately after the intent log. The OLT and its replica are separated in order to minimize the potential for losing both copies of the vital OLT information in the event of localized disk damage.

The contents and use of the OLT are described in detail in the section entitled "Locating Dynamic Structures."

Intent Log

The VxFS file system uses intent logging to guarantee file system integrity in the event of system failure

The intent log is a circular activity log with a default size of 512 blocks. If the file system is less than 4 MB, the log size will be reduced to avoid wasting space. The intent log contains records of the intention of the system to update a file system structure. An update to the file system structure (a transaction) is divided into separate subfunctions for each data structure that needs to be updated. A composite log record of the transaction is created that contains the subfunctions that constitute the transaction.

For example, the creation of a file that would expand the directory in which the file is contained will produce a transaction consisting of the following subfunctions:

VxFS maintains log records in the intent log for all pending changes to the file system structure, and ensures that the log records are written to disk in advance of the changes to the file system. Once the intent log has been written, the transaction's other updates to the file system can be written in any order. In the event of a system failure, the pending changes to the file system are either nullified or completed by the fsck utility. The VxFS intent log generally only records changes to the file system structure. File data changes are not normally logged.

Allocation Unit

An allocation unit is a group of consecutive blocks in a file system that contain a resource summary, a free resource map, data blocks, and a copy of the super-block. An allocation unit in the VxFS file system is similar in concept to the ufs "cylinder group." Each component of an allocation unit begins on a block boundary. All of the Version 2 allocation unit components deal with the allocation of disk space. Those components of the Version 1 allocation unit that deal with inode allocation have been relocated elsewhere for Version 2. In particular, the inode list now resides in an inode list file and the inode allocation information now resides in an inode allocation unit (described later). The VxFS Version 2 allocation unit is depicted in Figure 4.

Figure 4 Allocation Unit Structure

One or more allocation units exist per file system. Allocation units are located after the OLT replica. The number and size of allocation units can be specified when the file system is made. All of the allocation units, except possibly the last one, are of equal size. If space is limited, the last allocation unit can have a partial set of data blocks to allow use of all remaining blocks.

Allocation Unit Header

The allocation unit header contains a copy of the file system's super-block that is used to verify that the allocation unit matches the super-block of the file system. The super-block copies contained in allocation unit headers can also be used for recovery purposes if the super-block is corrupted or destroyed. The allocation unit header occupies the first block of each allocation unit.

Allocation Unit Summary

The allocation unit summary summarizes the resources (data blocks) used in the allocation unit. This includes information on the number of free extents of each size in the allocation unit and a flag indicating the status of the summary.

Free Extent Map

The free extent map is a series of independent 512 byte bitmaps that are each referred to as a free extent map section. Each section is broken down into multiple regions. The first region of 2048 bits represents a section of 2048 one-block extents. The second region of 1024 bits represent a section of 1024 two-block extents. This regioning continues for all powers of 2 up to the single bit that represents one 2048 block extent.

The one block bitmaps always represent the true allocation of blocks from the allocation unit. The remaining bitmaps remap these same blocks, in a "binary buddy" scheme, in increasingly larger sized groups. As smaller extents are needed, the larger groups of blocks mapped by the buddy maps are broken apart to create the smaller extents.

Padding

It may be desirable to align data blocks to a physical boundary. To facilitate this, the system administrator may specify that a gap be left between the end of the free extent map and the first data block. Refer to the "Alignment" section in Chapter 6, "Application Interface," for additional information.

Data Blocks

The balance of the allocation unit is occupied by data blocks. Data blocks contain the actual data stored in files and directories.

Filesets and Structural Files

This section describes the structural elements of the file system that are not necessarily in fixed locations on the disk.

With the Version 2 layout, many structural elements of the file system are encapsulated in files to allow dynamic allocation of the file system structure. Files that store this file system structural data are referred to as structural files. As the file system grows, more space is allocated to the structural files. Structural files are intended for file system use only and are not generally visible to users.

The Version 2 layout supports filesets, which are collections of files that exist within a file system. In the current release, each file system contains two filesets:

structural fileset

A special fileset that stores the structural elements of the file system in the form of structural files. These files are the "property" of the file system and are not normally visible to the user.

primary fileset

A fileset that contains files that are visible to and accessible by users.

Structural files exist in the structural fileset only and include the following:

fileset header file

A file that contains a series of fileset headers.

inode list file

A file that contains a series of inodes.

inode allocation unit (IAU) file

A file that contains a series of inode allocation units.

current usage table (CUT) file

A file that contains a series of fileset usage entries.

link count table file

A file that contains a link count for each inode in the structural fileset.

quotas file
A file containing user quota information (for the primary fileset only).

Structural files and their components are discussed in the sections that follow.

Although structural files are contained in the structural fileset, they can "belong" to another fileset. For example, the inode list file for the primary fileset is in the structural fileset, but the structural details that it contains are only applicable to the primary fileset.

Each fileset is defined by structural files as follows:

Fileset metadata that cannot be reconstructed using the inode list is replicated to help fsck reconstruct the file system in the event of disk damage.

Figure 5 Filesets and Structural Files

Fileset Header

Each fileset has a header containing information about the fileset's contents and characteristic. All fileset headers are stored in a single fileset header file in the structural fileset. The fileset header file contains one fileset header per fileset (see Figure 6). Each fileset header entry is 1 block long. The fileset header file is replicated because fileset headers cannot be rebuilt from other data structures.

Figure 6 Fileset Header File

The fileset header for a given fileset includes information such as:

Inodes

An inode is a data structure that contains information about a file. The VxFS default inode size is currently 256 bytes.

Each inode stores information such as the following about a particular file:

Refer to the inode_vxfs(4) manual page for details on the contents of a vxfs inode.

There are up to ten direct extent address size pairs per inode. Each direct extent address indicates the starting block number of a direct extent; direct extent sizes can vary. If all of the direct extents are used, two indirect address extents are available for use in each inode. The first indirect address extent is used for single indirection, where each entry in the extent indicates the starting block number of an indirect data extent. The second indirect address extent is used for double indirection, where each entry in the extent indicates the starting block number of a single indirect address extent. Each indirect address extent is 8K long and contains 2048 entries. All indirect data extents for a given file have the same size, which is determined when the file's first indirect data extent is allocated.

Version 2 inodes differ from Version 1 inodes in that they are located in structural files to facilitate dynamic inode allocation, which is the allocation of inodes on an as-needed basis. Instead of allocating a fixed number of inodes into the file system, mkfs (see mkfs_vxfs(1M)) allocates a minimum number of inodes. Additional inodes are later allocated as the file system needs them.

The inode list is a series of inodes located in the inode list file. There is one inode in the list for every file in a given fileset. For recovery purposes, the inode list file is referenced by two inodes that point to the same set of data blocks. Although the inode addresses are replicated for recovery purposes, the inodes themselves are not.

An inode extent is an extent that contains inodes and is 8K long, by default. Inode extents are dynamically allocated to store inodes as they are needed.

Initial Inode List Extents

The initial inode list extents contain the inodes first allocated by mkfs for each fileset in a file system. During file system use, inodes are allocated as needed and are added into the inode list files for the filesets.

Figure 7 shows the initial inode list extents allocated for the primary and structural filesets. Each of these extents contain 32 inodes and is 8K long.

The construction of the primary fileset's inode list resembles that of the VxFS Version 1 file system layout, with the first two inodes reserved and inodes 2 and 3 pre-assigned to the root and lost+found directories. The structural fileset's inode list is similarly constructed, with certain inodes allocated for specific files and other inodes reserved or unallocated.

There are two initial inode list extents for the structural fileset. These contain the inodes for all structural files needed to find and set up the file system.

Some of the entries in the structural fileset's inode list are replicas of one another. For example, inodes 4 and 36 both reference copies of the fileset header file. The replicated inodes are used by fsck to reconstruct the file system in the event of damage to either one of the replicas. Although the two initial inode list extents belonging to the structural fileset are logically contiguous, they are physically separated. This helps to ensure the integrity of the replicated information and reduces the chance that localized disk damage might result in complete loss of the file system.

Note that inodes 6 and 38 in the structural fileset reference the inode list file for the structural fileset. In a newly created file system, this file contains the two inode extents pictured for the structural fileset. Likewise, the structural fileset inodes 7 and 39 reference the inode list file for the primary fileset. In a newly created file system, this file contains the single extent pictured for the primary fileset. All of the unused inodes in the initial extents of the structural inode list are reserved for future use.

Figure 7 Inode Lists

Inode Allocation Unit

An Inode Allocation Unit (IAU) contains inode allocation information for a given fileset. Each fileset contains one or more IAUs, each of which details allocation for a set number of inodes. The number of inodes per IAU varies, depending on the block size being used. One IAU exists for every 16,384 inodes in a fileset with the default block size (1024 bytes). If an IAU is damaged, the information that it contains can be reconstructed by examining the fileset's inode list.

The IAUs for a fileset are stored in sequential order in the fileset's IAU file. The fileset header identifies the structural fileset inode associated with that fileset's IAU file.

Figure 8 shows the inode allocation unit structure. All IAU components begin on a block boundary.

Figure 8 Inode Allocation Unit (IAU) Structure

IAU Header

The IAU header verifies that the inode allocation unit matches the fileset. The IAU header occupies the first block of each inode allocation unit. If damaged, the IAU can be reconstructed from inodes and other information.

IAU Summary

The IAU summary summarizes the resources used in the IAU. It includes information on the number of free inodes in the IAU and the number of inodes with extended operation sets in the IAU. The IAU summary is 1 block long.

Free Inode Map

The free inode map is a bitmap that indicates which inodes are free and which are allocated. A free inode is indicated by the bit being on. The length of the free inode map is 2K for file systems with 1K or 2K block sizes and is equal to the block size for file systems with larger block sizes.

Extended Inode Operations Map

The extended inode operations map keeps track of inodes on which operations would remain pending for too long to reside in the intent log. The extended inode operations map is in the same format as the free inode map. To prevent the intent log from wrapping and the transaction from getting overwritten, the required operations are stored in the affected inode. This map is then updated to identify the inodes that have extended operations that need to be completed. This map allows the fsck utility to quickly identify which inodes had extended operations pending at the time of a system failure. The length of the extended inode operations map is 2K for file systems with 1K or 2K block sizes and is equal to the block size for file systems with larger block sizes.

Link Count Table

The link count table (LCT) contains a reference count for each inode in the associated fileset. This reference count is identical to the conventional link field of an inode. Each LCT entry contains the actual reference count for the associated fileset inode. The link count field in an inode itself is set to either 0 or 1, and the actual number of links is stored in the LCT entry for the associated fileset inode.

The link count table can be reconstructed using the inode list, so it is not replicated.

The current layout only uses the LCT for inodes in the structural fileset. The LCT supports quick updates of the link count for structural fileset inodes.

Current Usage Table

The current usage table (CUT) is a file that contains usage related information for each fileset. The information contained in the CUT changes frequently and is not replicated. The information in the CUT can, however, be reconstructed using the inode list if the CUT is damaged.

The CUT file contains one entry per fileset (see Figure 9). The CUT entry for a given fileset contains information such as the following:

Figure 9 Current Usage Table (CUT) File

Locating Dynamic Structures

The existence of dynamic structures in the Version 2 disk layout makes the task of initially locating those structures difficult. The object location table (OLT) contains information needed to initially locate important file system structural elements. In particular, the OLT records the starting block numbers of the initial inode list extents for the structural fileset and indicates which inodes within those initial extents reference the fileset header file.

Object Location Table Contents

The OLT is composed of records for the following:

fileset header inodes

This record identifies the inode numbers of the fileset header file and its replica.

initial inode list extent addresses

This record identifies the addresses of the beginning of each of two 8K inode extents. These are the initial inode list extents for the structural fileset, which contain the inodes for all structural files belonging to the structural fileset.

current usage table inode

This record identifies the inode number of the file that contains the current usage table.

Mounting and the Object Location Table

At mount time, the object location table provides essential information about the location of key file system components. The super-block plays an important role in locating the OLT, in that it contains pointers to both the OLT and its replica.

Using the OLT, the process of mounting a VxFS Version 2 file system is as follows:

1. Read in the super-block. Validate the super-block and its replicas (located in the allocation unit headers).

2. Read and validate the OLT and its replica at the locations recorded in the super-block.

3. Obtain the addresses of the initial inode list extents for the structural fileset from the OLT. Read in these initial inode extents.

4. Find the fileset header file, based on the fileset header file inode number recorded in the OLT.

5. Read the contents of the fileset header file. Each fileset header file entry represents a particular fileset and indicates the inode numbers of its inode list file and IAU file. The structural fileset is set up first so that subsequent references to its inode list can be resolved.

The VxFS Version 4 Disk Layout

The Version 4 disk layout was designed to allow the file system to scale easily to accommodate large files and large file systems.

The Version 1 and 2 disk layouts divided up the file system space into allocation units. The first AU started part way into the file system which caused potential alignment problems depending on where the first AU started. Each allocation unit also had its own summary, bitmaps, and data blocks. Because this AU structural information was stored at the start of each AU, this also limited the maximum size of an extent that could be allocated. By replacing the allocation unit model of previous versions, the need for alignment of allocation units and the restriction on extent sizes was removed.

The VxFS Version 4 disk layout divides the entire file system space into fixed size allocation units. The first allocation unit starts at block zero and all allocation units are a fixed length of 32K blocks. (An exception may be the last AU, which occupies whatever space remains at the end of the file system). Because the first AU starts at block zero instead of part way through the file system as in previous versions, there is no longer a need for explicit AU alignment or padding to be added when creating a file system (see mkfs(1M)).

The Version 4 file system also moves away from the model of storing AU structural data at the start of an AU and puts all structural information in files. So expanding the file system structures simply requires extending the appropriate structural files. This removes the extent size restriction imposed by the Version 1 and Version 2 layouts.

All Version 4 structural files reside in the structural fileset, which is similar to the Version 2 attribute fileset. The structural files in the Version 4 disk layout are:

Object Location Table File
Contains the object location table (OLT). As with the Version 2 disk layout, the OLT, which is referenced from the super-block, is used to locate the other structural files.

Label File
Encapsulates the super-block and super-block replicas. Although the location of the primary super-block is known, the label file can be used to locate super-block copies if there is structural damage to the file system.

Device File
Records device information such as volume length and volume label, and contains pointers to other structural files.

Fileset Header File
Holds information on a per-fileset basis. This may include the inode of the fileset's inode list file, the maximum number of inodes allowed, an indication of whether the file system supports large files, and the inode number of the quotas file if the fileset supports quotas.

When a file system is created, there are two filesets, the structural fileset, which defines the file system structure, and the primary fileset, which contains user data.

Inode List File
Both the structural fileset and the primary fileset have their own inode lists which are stored in inode list files. Increasing the number of inodes involves increasing the size of the file after expanding the inode allocation unit file.

Inode Allocation Unit File
Holds the free inode map, extended operations map, and a summary of inode resources.

Log File
Maps the block used by the file system intent log.

Extent Allocation Unit State File
Indicates the allocation state of each AU by defining whether each AU is free, allocated as a whole (no bitmaps allocated), or expanded, in which case the bitmaps associated with each AU determine which extents are allocated.

Extent Allocation Unit Summary File
Contains the AU summary for each allocation unit, which contains the number of free extents of each size. The summary for an extent is created only when an allocation unit is expanded for use.

Free Extent Map File
Contains the free extent maps for each of the allocation units.

Quotas Files
If the file system supports quotas, there is a quotas file which is used to track the resources allocated to each user.

Figure 10 shows how the kernel and utilities build information about the structure of the file system. The super-block location is in a known location from which the OLT can be located. From the OLT, the initial extents of the structural inode list can be located along with the inode number of the fileset header file. The initial inode list extents contain the inode for the fileset header file from which the extents associated with the fileset header file are obtained.

As an example, when mounting the file system, the kernel needs to access the primary fileset in order to access its inode list, inode allocation unit, quotas file and so on. The required information is obtained by accessing the fileset header file from which the kernel can locate the appropriate entry in the file and access the required information.

Figure 10 VxFS Version 4 Disk Layout


VxFS System Administrator's Guide
[Next] [Previous] [Top] [Contents] [Index]