File-Related System Calls in FreeBSD
Copyright © 2002 Yaodong Bi

An Introduction to the FreeBSD File System

INODES

In addition to its actually data contents, every file in the FreeBSD file system has a unique descriptor called inode. An inode contains the information of the file which includes at least:

Data stored in an inode may be accessed either through the ls command or the stat system call, which will be discussed shortly.

Structure of a Regular File

The data in file is saved as a logical stream of bytes, although the actually physical storage may be over multiple sectors and tracks. This logical stream of bytes starts from offset zero to the last byte with the highest offset. One can read data of variable length (limited by the actual length of the file) starting from any offset, and one can write data of any length from any offset, even an offset that is beyond the last byte of the file. For example, one can write 10 bytes of data starting from offset 1000 to a file with only one byte in it. After the write is completed, The size of the file becomes 1009, with a gap between the first bytes and the last ten bytes just written by the write. When a read tries to read data from inside the gap, the kernel treats the gap as if it were filled with zeros.

File Access Permissions

The FreeBSD file system protects files according to three classes: the owner and the group owner of the file, and other users. Each classes may be given the right to read, write, and execute the file. the three access rights are specified by three binary digits in the order of read (r), write (w) and execute (x).


System calls for File Processing

FreeBSD (4.4) has six file-related system calls. The following table briefly describe the function of each.
System calls Function
open open an existing file or create a new file
read Read data from a file
write Write data to a file
lseek Move the read/write pointer to the specified location
close Close an open file
unlink Delete a file
chmod Change the file protection attributes
stat Read file information from inodes

Files to be included for file-related system calls.
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/stat.h>

Open files

The open system call can be used to open an existing file or to create a new file if it does not exist already. The syntax of open has two forms:

int open(const char *path, int flags); and
int open(const char *path, int flags, mode_t modes);

The first form is normally used to open an existing file, and the second form to open a file and to create a file if it does not exist already. Both forms returns an integer called the file descriptor. The file descriptor will be used for reading from and writing to the file. If the file cannot be opened or created, it returns -1. The first parameter path in both forms sPecifies the file name to be opened or created. The second parameter (flags) specifies how the file may be used. The following list some commonly used flag values.

Flag Description
O_RDONLY open for reading only
O_WRONLY open for writing only
O_RDWR open for reading and writing
O_NONBLOCK do not block on open
O_APPEND append on each write
O_CREAT create file if it does not exist
O_TRUNC truncate size to 0
O_EXCL error if create and file exists
O_SHLOCK atomically obtain a shared lock
O_EXLOCK atomically obtain an exclusive lock
O_DIRECT eliminate or reduce cache effects
O_FSYNC synchronous writes
O_NOFOLLOW do not follow symlinks

The flag (O_CREAT) may be used to create the file if it does not exist. When this flag is used, the third parameter (modes) must be used to specify the file access permissions for the new file. Commonly used modes (or access permissions) include
Constant Name Octal Value Description
S_IRWXU 0000700 /* RWX mask for owner */
S_IRUSR 0000400 /* R for owner */
S_IWUSR 0000200 /* W for owner */
S_IXUSR 0000100 /* X for owner */
S_IRWXO 0000007 /* RWX mask for other */
S_IROTH 0000004 /* R for other */
S_IWOTH 0000002 /* W for other */
S_IXOTH 0000001 /* X for other */

R: read, W: write, and X: executable

For example, to open file "tmp.txt" in the current working directory for reading and writing:

fd = open("tmp.txt", O_RDWR);

To open "sample.txt" in the current working directory for appending or create it, if it does not exist, with read, write and execute permissions for owner only:

fd = open("tmp.txt", O_WRONLY|O_APPEND|O_CREAT, S_IRWXU);

A file may be opened or created outside the current working directory. In this case, an absolute path and relative path may prefix the file name. For example, to create a file in /tmp directory:

open("/tmp/tmp.txt", O_RDWR);

Read from files

The system call for reading from a file is read. Its syntax is

ssize_t read(int fd, void *buf, size_t nbytes);

The first parameter fd is the file descriptor of the file you want to read from, it is normally returned from open. The second parameter buf is a pointer pointing the memory location where the input data should be stored. The last parameter nbytes specifies the maximum number of bytes you want to read. The system call returns the number of bytes it actually read, and normally this number is either smaller or equal to nbytes. The following segment of code reads up to 1024 bytes from file tmp.txt:

	int actual_count = 0;
	int fd = open("tmp.txt", O_RDONLY);
 	void *buf = (char*) malloc(1024);

	actual_count = read(fd, buf, 1024);  

Each file has a pointer, normally called read/write offset, indicating where next read will start from. This pointer is incremented by the number of bytes actually read by the read call. For the above example, if the offset was zero before the read and it actually read 1024 bytes, the offset will be 1024 when the read returns. This offset may be changed by the system call lseek, which will be covered shortly.

Write to files

The system call write is to write data to a file. Its syntax is

ssize_t write(int fd, const void *buf, size_t nbytes);

It writes nbytes of data to the file referenced by file descriptor fd from the buffer pointed by buf. The write starts at the position pointed by the offset of the file. Upon returning from write, the offset is advanced by the number of bytes which were successfully written. The function returns the number of bytes that were actually written, or it returns the value -1 if failed.

Reposition the R/W offset

The lseek system call allows random access to a file by reposition the offset for next read or write. The syntax of the system call is

off_t lseek(int fd, off_t offset, int reference);

It repositions the offset of the file descriptor fd to the argument offset according to the directive reference. The reference indicate whether offset should be considered from the beginning of the file (with reference 0), from the current position of the read/write offset (with reference 1), or from the end of the file (with reference 2). The call returns the byte offset where the next read/write will start.

Close files

The close system call closes a file. Its syntax is

int close(int fd);

It returns the value 0 if successful; otherwise the value -1 is returned.

Delete files

The unlink may be used to delete a file (A file may have multiple names (also called links), here we assume that a file in this context has only one name or link.). Its syntax is:

int unlink(const char *path);

path is the file name to be deleted. The unlink system call returns the value 0 if successful, otherwise it returns the value -1.

Change file access permissions

File access permissions may be set using the chmod system call (note that there is a command with the same name for setting access permissions.). It has two forms:

int chmod(const char *path, mode_t mode); or

int fchmod(int fd, mode_t mode);

Both forms set the access permission of a file to mode. In the first form the file is identified by its name, and in the second it is identified by a file descriptor returned from the open system call. For mode, you may use any of the constants defined in the Open files section of this tutorial.

The system call returns the value 0 if successful, otherwise it returns the value -1.

Accessing file information from Inodes

The stat system call can be used to access file information of a file from its inode. It can appear in two forms:

int stat(const char *path, struct stat *sb); or

int fstat(int fd, struct stat *sb);

Both forms return the information through the stat structure pointed by sb. In the first form, the file is identified by its name and in the second form, it is identified by its file descriptor returned from a call to open. The stat structure includes at least the following elements:

Element Description
st_mode file protection mode
st_uid user ID of the file owner
st_size file size in bytes

No permission is needed to stat a file. However since the second form requires a file descriptor of the file and a file descriptor may be only obtained by open, fstat can only be applied to files that has proper access permissions.

The call returns the value 0 if successful, otherwise it returns the value -1. The call fails if the specified path or the file does not exist.