File Systems: An Overview
How various file systems in different
All computer
applications need to store and retrieve information. This information should
remain persistent even after the application using it terminates or the computer
system is shut down. The solution is to store information on non-volatile
storage like hard drives, magnetic tapes, and optical media. Information on such
media is stored using a logical storage unit, the file. The OS manages files and
the part of the OS dealing with files is known as the file system. A computer
system can have thousands of files so some mechanism is required to organize and
keep track of them, which is done by the file system. It consists of two
distinct parts: a collection of files, each storing related data and a directory
structure, which organizes and provides information about all the files. In this
article we’ll look at the basic structure of a file system and some of the
popular ones.
The
basics
Files are an abstraction
mechanism. They facilitate information storage and retrieval in a way that the
user is shielded from the details of how and where on the disk is all this done.
Files are named for the convenience of its users. Most OSs allow strings of one
to eight characters as file names, and some even support longer file names of up
to 255 characters.
Many
| ||||||||||||||||
|
FAT file system directory
entry |
Files are stored on disk
as discrete entities, but the OS links them logically using the directory
structure. A directory typically contains a number of entries, one per file. The
directory entries store information such as name, location, size and type for
the files placed logically under the directory. This way while the files are
stored separately on the disk, the user gets a feeling that files are kept
together in a directory. Directory is also sometimes called a
folder.
Earlier systems had single-level directory structures with all files
contained in a single directory. The problem was that two people using the
system couldn’t give the same name to two files. Since all files are in the same
directory, they must have unique names. Then came two-level directory
structures. There was a top-level directory, also called root directory,
containing sub-directories for users of the system. The users were allowed to
save files in their sub-directories and not the root directory. This way two
users could have files with the same name, as they were storing the files in
their respective sub-directories. But soon users also felt the need to create
further sub-directories in their sub-directories. So tree level or hierarchical
directory structures came into being. A directory (or subdirectory) contains any
number of files or further sub-directories. A file in a directory has an
absolute path name consisting of the path from the root directory to the
file.
Popular file
systems
This was all about files
and directories, but in the real world there are a number of file systems used
by different
Let s look at some of
the popular and widely used file systems.
FAT
FAT is an acronym for
File Allocation Table. The FAT file system uses 8.3 filename naming convention
and all filenames are created with the ASCII character set. 8.3 implies that the
filename can be up to eight characters long, and have a three-character
extension to indicate file type. The name cannot contain spaces and is not case
sensitive. However, all characters get converted to uppercase after a file is
created. All Microsoft Operating Systems, Mac OS and some versions of Unix
support FAT.
In FAT, files
are stored in clusters, whose size is determined by the size of the partition
size. A file can be stored in a single cluster or can use multiple clusters
depending on file size. Earlier versions of FAT used 16-bit addressing so the
file system is also called FAT16. FAT is actually a table indexed on cluster
numbers. Using 16-Bit addressing a total of 216 (65536 or 64K) clusters can be
present in the file system. To support large disks, the cluster size can go up
to 32K. This means the maximum disk size can be 64K*32K=2GB. When a file is
created, an entry is created in the directory, which contains the file name,
file attributes and the starting cluster number, which indexes into the FAT.
This entry in the FAT table either indicates that this is the last cluster of
the file, or points to the next cluster. To protect the data two copies of FAT
are maintained in case one becomes damaged.
VFAT
VFAT is an extension of
the FAT file system and was introduced with Win 95. VFAT maintains backward
compatibility with FAT, but relaxes some of the rules. VFAT filenames can
contain up to 255 characters, spaces, and multiple periods. VFAT is not case
sensitive, and unlike FAT, it also preserves the case of the file name once
created. The maximum disk size supported by VFAT is 4 GB. In Win NT 4.0, if you
format a partition as FAT, it is actually formatted as
VFAT.
FAT32
FAT32 is actually an
extension of FAT and VFAT, first introduced with Win 95 OEM service Release 2
(OSR2). This has all the filename features of VFAT. Plus the greatest advantage
is 32-Bit addressing.
This results in
smaller cluster sizes than FAT16. Smaller cluster size dramatically increases
the amount of free hard disk space.
To illustrate
this, consider a 2 GB FAT16 partition, which has a cluster size of
32K.
Now on this
partition even a 1-byte file will occupy the entire 32K cluster. If this rule
applies to every file on your hard disk, a lot of space is wasted. In FAT32 file
system, partitions of less than 8 GB have a cluster size of 4 KB. This way it s
not uncommon to gain back hundreds of megabytes, using a FAT32 partition. The
FAT32 file system supports disk sizes up to 2 TB. FAT32 is also supported by Win
95 (OSR2)/98/2000 but not by NT.
NTFS
NTFS was created to
compensate the features that FAT lacked. It is sometimes called New Technology
File System, but this is not the exact name. File and directory names can be up
to 255 characters long. Filenames preserve case but are not case sensitive. The
maximum size of an NTFS partition is 16 exabytes, i.e. 264 bytes. The cluster
sizes of NTFS are 512 bytes, 1kB, 2kB and 4kB, depending on the partition size.
The goals of NTFS are to provide: reliability, fault tolerance, security, POSIX
support, etc. Files in NTFS are not considered a single stream of data as in
FAT, but it supports multiple data streams. These additional streams contain
data that describe the file attributes. FAT supports only read-only, hidden,
system, and archive file attributes. Apart from these NTFS also supports
last-access, last-write, file-creation date-time stamps and security access
restrictions. Due to the multiple data stream a user can add his or her own
user-defined attributes to a file. This is because each attribute of a file is
an independent byte stream that can be created, deleted, read and written. These
attributes can be specific to certain kinds of files.
| ||||
|
Ext2fs directory entry |
NTFS goal of
providing reliability is met by organizing I/O by transactions. Transactions are
atomic, which means that either the entire I/O operation must complete or none
of it can complete. If anything interrupts the transaction in-progress, such as
loss of power to the computer or a cancellation of the I/O operation, any
changes made to the file system as part of the I/O operation are undone, or
rolled back, returning the file system to its condition before the I/O operation
began. NTFS allows the operating system to recover without having to use
disk-checking utilities like chkdsk, which are required for fat and fat32 file
systems.
NTFS implements
files and directories as securable objects. NTFS fully supports the Win NT
security model. Access to file and directory objects can be restricted to
specific users and groups. NTFS keeps access lists with files, which define
which users and groups can access the file.
A FAT16
partition enables compression of the entire partition but it slows down the file
access after compression. FAT32 offers no compression. NTFS offers a much better
option. It lets the user compress and encrypt individual files and directories
of choice. This way you can compress seldom used files to save space and it wont
slow down your overall system performance.
NTFS also
supports creation of hard links. It is a technique that allows a file to appear
in more than one directory. In this the actual file remains the same but
additional directory entries can be made which point to the original file. Any
changes made to the file through one link are visible to applications accessing
the file from the other links. A hard link is similar to the original directory
entry and after creation there is no difference between a hard link and original
directory entry.
Ext2fs, the second
extended file system is probably the most widely used file system in the Linux
community. The directory structure used in Ext2fs is extremely simple with each
entry containing just file name and its i-node number. i-node is a structure,
which describes the file. All information about the file type, size, timestamp,
ownership, access rights, file type pointers to data blocks is contained in the
i-node.
When a file has to be accessed,
the directory is checked for the file name to find the i-node number. Then the
i-node is located and the disk locations of the files block are read. Using the
block addresses the file is read from the disk. File blocks are similar to the
clusters used in FAT to store files. The block sizes can typically be 1k, 2k
4k.
Apart from
regular files and directories, the file system also has block character and
character special files. These special files represent block devices, hard disk,
and character devices, keyboard, in the file system. This way applications can
directly access the device through normal file read, write operations. This mode
of access is sometimes called raw I/O. Ext2fs supports a maximum partition size
up to 4TB and long file names up to 255 characters, which could be extended to
1012 if needed. Hard links can also be created in the file system. For security,
files contain read, write, and execute attributes for the user, group and
others. A user can only access the file in a particular mode if the appropriate
attribute bit is set otherwise an access denied message is
displayed.
Ext3fs
Ext3fs is actually a
Ext2fs file system with a transaction log similar to the NTFS file system. It is
also called a journaling-file system, as the transaction log is called journal.
Now if there is power loss and the system reboots the integrity of the file
system is guaranteed to be preserved, and no fsck is necessary. The system is up
and running quickly.
Reiser file
system
ReiserFS is a
comparatively new file system for Linux systems. It is also a journaling file
system like the Ext3fs and facilitates crash recovery, speeds up booting process
and helps prevent data loss due to mechanical or serious user errors. Maximum
partition size can be 16 TB with block sizes of 4 KB and going up to 64 KB. It
provides fast performance when reading and writing small files and may be more
suited to work as database server. But, the developers of the file system say it
is equally good at large files and is truly a general-purpose file
system.
There are lots of other file systems too, and covering them all is beyond the scope of this article.