Over the years, I’ve found that customers and partners often have basic questions about the architecture, inner workings, and capabilities of file systems. As a result, I thought it might be useful to cover some common topics of interest in a series of “101” blogs. In this initial blog, I’ll start with an overview of some key file system internals. I have a several other topics in mind (e.g. soft links, hard links), but would also love to hear your suggestions/requests. If you have specific a topic that you’d like to hear about, follow us on Twitter and tweet your suggestion at us. I’ll be happy to consider it for a future post.
And with that, let’s begin…
File System Objects
Most file systems are based on two fundamental object types: files and directories.
The File Object
A file is a sequence (array) of bytes with an attached metadata structure containing file attributes (size, creating time, modify time, access permissions, etc.). This metadata structure (depicted below) is often referred to as an inode. Each inode has a unique internal ID called inode-number (ino) and has a reference to the file’s data…and so the inode is typically used as the internal reference to a particular file.
Simplified File Object
The Directory Object
A directory is a set of name entries, and it also has an inode. Each name entry in the set contains a name and a pointer to an inode (i.e. a file or directory MD structure). Since the directory entries can point to file inodes or to directory inodes, this layout supports a hierarchical namespace organized as a graph without cycles – or, in other words, a tree.
The File System Namespace
You’ll often hear people refer to a file system’s “namespace”. The namespace is simply the set of named entries that make up the file system. By convention, a file system namespace begins with a special directory named “root” and denoted “/” in Unix and “\” in Windows. In addition, each directory has a special entry to denote itself (the “.” entry) and another special entry to denote its parent (the “..” entry). An example of a simple file system namespace, formatted as a graph, is depicted below, along with a similar Windows-style view.
A simple hierarchical namespace
Windows namespace view
The most common way to locate a file is to specify its “path”. A path is the ordered set of directories describing the hierarchy of directories leading to the desired file formed as <directory-name><separator><name>… where the path separator in Unix is “/” and in Windows it is “\”. For example, in the namespace above, the path to “MyPreso.ppt” is “/tmp/MyPreso.ppt” in Unix notation and “\tmp\MyPreso.ppt” in Windows notation. That path means that, to locate the target file, you need to start with the root directory (“/”), then look up the sub directory “tmp”, and then look up “MyPreso.ppt” inside of “/tmp”. The operation of performing these repetitive lookups is also called a “tree walk”. Also, note that file names (or directory names) are unique only within a single directory. Therefore, several files can have the name “MyPreso.ppt”…but only one can reside in the “/tmp” directory at any given time.
Absolute Path vs. Relative Paths
A path that starts with the root directory is called an absolute path. Most operating systems also a concept called relative paths. Relative paths start from a directory not specified in the path itself. For example, considering the namespace depicted above, the path /home/bob/note.txt is absolute. The path “bob/note.txt” is relative to the start directory. For example, if we want to reach the file located at /home/bob/note.txt, using this relative path, we have to start from the /home directory.
To walk “backwards” towards the root of the tree one can use the special “..” entry which denotes the parent directory. So from the starting point of /home/bob you can reach the “alice” directory by following “../alice”. Relative paths are primarily used by users and applications to denote paths relative to the “current working directory”, which refers to a directory managed by the OS to track a specific location within the namespace. For example, in Windows to list directory entries from the command line (“cmd”), you use a shell command named “dir”. To list entries in “\home\bob” you can either specify the absolute path (i.e. “dir \home\bob”) or change the current working directory to “\home” (by using the “cd \home” command) and then use the relative path to list bob (i.e. “dir bob”). Relative paths can also be very instrumental when using soft (symbolic) links…and, as mentioned above, I’ll be covering soft links (and hard links) in detail in a future “101” blog. Stay tuned.