C programming | Working with files
At some point when developing software no matter how big o small the program is going to be, we need to store some data in the computer, and read from other sources too. Let's take a look at how to work with external files in C.
Files in C programming don't have a predefined structure. They are meant to be a container for some sequence of bytes. That way the internal structure of a file is something that the program itself has to deal with.
As long as we know how a file structure is made, we can open, work and write with any file.
Opening files
Opening files in C can be achieved in two ways; using the stdio
function fopen()
or using the lower level one open()
.
The main difference between them is that open()
is a system call while fopen()
is a library call.
fopen()
calls open()
under the hood and uses buffering to improve execution timing. When timing is critical(eg. embedded systems), is better to use open()
and take full control on when we want the data to be processed.
— The fopen() way
The fopen()
function associates a file with a stream and initializes an object of the type FILE
, which contains a structure with information to control the stream.
We can specify how we want to operate with the data by passing different modes into the mode
parameter.
Possible modes are:
r
opens a file for readingw
creates a file for writing. If it's not empty, it discards the previous content.a
opens or creates a file (in case it doesn't exist) and writes at the end of it.
Adding a +
sign after any of the letters make the file to work in update mode. That is, the mode allows both reading and writing.
FILE *file = fopen("path/to/file.type", "mode");
— The open() way
The open()
function returns an int
object called file descriptor. Every open file has a file descriptor number, which is used by the operating system to keep track of them.
Similar to fopen(), we can specify how we want to work with the opened file passing specific flags into the flags
parameter.
Valid mandatory flags are:
O_RDONLY
which opens a file in read-only mode.O_WRONLY
which opens a file in write-only mode.O_RDWR
which opens a file in read/write mode.
Additional flags can be added in order to perform other operations such as O_APPEND
to open a file in append mode, or O_ASYNC
to use a pipe of a FIFO.
We can add a third optional parameter to specify permissions of the file, like:
S_IRUSR
user has read permissions.S_IWUSR
user has write permissions.
int fileData = open("path/to/file.type", flags, mode);
Writing files
We can run a program that takes arguments from the user via the terminal emulator, and perform operations based on those arguments, print them back to the terminal, and ask for more operations if needed, but each time we close the program, that data is gone.
We can write data in binary files and in text files.
The standard library has two useful functions to help us in the task of saving that data we ask for and process during the program execution, into a file. These functions are fwrite()
and fprintf()
.
— Using fwrite()
The function fwrite()
writes a number of objects of a given size to a file. Is often used to write binary data.
The information we need to pass to fwrite()
is the following:
- A memory buffer, or the address of the data to store.
- The size in bytes of each element of the data to store.
- The amount of elements to write.
- A pointer to a FILE object.
fwrite(&data, sizeof(data_type), strlen(data), file);
This is going to return us a binary file. We can check its content using a tool like hexdump(1)
.
typedef struct Car { int power; //kW int torque; //NM int wheels; //[4, 5] int seats; //up to 7 int doors; //[3, 5] } Car rallyCar { .power = 235, .torque = 384, .wheels = 5, .seats = 2, .doors = 3 }; FILE *file = fopen("cars.bin", "w"); fwrite(&rallyCar, sizeof(Car), 1, file); fclose(file);
— We can however, write text files using fwrite()
by making use of the function sprintf()
, which writes its output as a string in the buffer referenced.
char buffer[40]; sprintf(buffer, "The actual engine torque is %f.\n", engine.torque); fwrite(buffer, sizeof(char), strlen(buffer), file); fclose(file);
— Using fprintf()
Similar to the printf()
function, we have fprintf()
in the standard library, with which we can write formatted outputs into a file, passing a character constant as a format parameter.
The information we need to pass to fprintf() is the following:
- A file pointer of type FILE.
- The desired output format, which is a
const *char
. - The desired content to format.
fprintf(file_pointer, format, content);
This way we store text data by default in a file.
FILE *file = fopen("temp.log", "a"); if (file != null) fprintf(file, "%s\n", "Appending data to temp file."); fclose(file);
At the end of the article we'll use this function to serialize some JSON data.
Other operations with files
Apart from opening and writing files the header file stdio.h
has more functions required to work with I/O which we can use to rename, remove, and close files among other operations.
— Close a file
Once we are done working with a file, we can close the stream and free up the memory using the function fclose()
. The function deletes any unwritten data for the stream and discards any unread buffered input, so be sure to write changes before.
fclose(file);
— Rename a file
We can rename a file using the function rename()
by passing the name of the old file and a string (const *char
) to use as the new one.
rename("old_file_name", "new_file_name");
— Remove a file
We can make a file unavailable using the remove()
function, passing the file's filename. If the file has no other names linked, then the file is deleted. Depending on the mode used by the file, the function may or may not be able to perform the deletion.
remove("file_name");
— Create a temporary file
Using tmpfile()
we can create a temporary file with a unique name in wb+
mode which is automatically removed once we close it or the program terminates.
If the function is unable to open a temporary file, it returns a NULL
pointer, otherwise it returns a pointer to the temp file.
FILE *file = tmpfile(); //file is pointing to a tempfile.
How to map files in memory
There is a way to work more efficiently with files, that is allocating them in virtual memory with mmap
.
Virtual memory helps when the processes ask for more memory than the system has. At that point the operating system's memory management takes memory from the RAM and places it into the swap, bringing it back to the RAM when requested. Is basically moving data from the RAM to the hard drive back and forward.
We can use that way of work to read and write files too.
Let's use mmap
to request blocks of memory from a text file (it can be any other file too):
— Open a file
int fileData = open("text_file.txt", O_RDONLY, S_IRUSR | S_IWUSR);
If we want to also write content into the file we have to open it in a read-write mode using different flags in the open()
function:
int fileData = open("text_file.txt", O_RDWR, S_IRUSR | S_IWUSR);
We can do the same using fopen()
, but is a good thing not to mix high level I/O with low level operations. We would killing the performance.
If we usefopen()
then we need to use the functionfileno()
to get the file descriptor from our opened file.
FILE *fileData = fopen("text_file.txt", "r"); int fileDescriptor = fileno(fileData);
— Get the size of the file
We need to include <sys/stat.h>
and <unistd.h>
to help:
#include <sys/stat.h> #include <unistd.h> ... struct stat sb; if(fstat(file, &sb) == -1) printf("couldn't get file size\n");
— Allocate in memory using mmap()
We need to pass the following parameters to the function:
- The desired starting address,
NULL
in this case, letting the system to choose the address. - The length of the file to map. We are using file status to get the total size in bytes with
sb.st_size
. - The flag or flags representing how we want to operate with the memory page.
If we just want to read the file it'sPROT_READ
. If we want to read and write the file it needs to bePROT_READ | PROT_WRITE
.
- The flag or flags representing if the mapping is going to be shared with other processes or not. In this case
MAP_PRIVATE
.
If we want to write the file we need to changeMAP_PRIVATE
toMAP_SHARED
otherwise the program is not going to share the memory with the rest of the system, and it's not going to be able to write back to the file.
- The file descriptor from our opened file,
fileData
. - The offset where to start mapping the file, in this case
0
, which is the beginning.
char *fileInRAM = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fileData, 0);
— Operate with the data
Now that we have mapped our file we can start working freely with it.
for (int i = 0, i < sb.st_size; i++) printf("%c", fileInRAM[i]; printf("\n");
— Unmap memory and close the file
Once we're done working with the file, just by closing the file descriptor we don't unmap the data. The function munmap()
takes mapped file and deletes its mappings in the specified address range.
After that we can close the file descriptor to finish.
munmap(fileInRAM, sb.st_size); close(fileData);
A complete view of the code should look like this:
int main() { int fileData = open("plain_text_file.txt"); struct stat sb; char *fileInRAM = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fileData, 0); for (int i = 0, i < filesize; i++) printf("%c", fileInRAM[i]; munmap(fileInRAM, filesize); close(fileData); }
Structuring data
We know that the C programming language doesn't care about the type of file we use. Some applications may be fulfilled by storing data in plain text files, but even by being text files, they may need to follow a structure so we can interoperate later with the data inside them.
To achieve this we need to convert the abstract in-memory data into a series of bytes that record the data structure into a recoverable format. This is called serialization.
Our data structure can be a simple list or array, a complex group of nested arrays and structs, or whatever required.
Writing structured data to a file
— As an example, let's take a look at a program where the user can store information about a vehicle's engine.
- We should have a struct type that handles how an engine is defined.
/*simplified engine structure*/ typedef struct Engine { char model[10]; //engine model char manufacturer[10]; //engine manufacturer int power; //kW int torque; //NM int cylinders; //total cylinders in engine int structure; //block structure [1, 2, 3] rows char fuelType[10]; //fuel type [gasoline, diesel] } Engine;
- Once we are working in the program we can create an engine and assign values to it.
Engine engine { .model = "RB26DETT", .manufacturer = "nismo", .power = 235, .torque = 384, .cylinders = 6, .structure = 1, .fuelType = "gasoline" };
- Now it's time to define a constant to serialize the data into a file. Instead of reinventing the wheel, let's use an existing data-interchange format such as JSON (XML applies here too).
const char *ENGINE_EXPORT_FMT = "{\n\t\"model\": \"%s\",\n\t\"manufacturer\": \"%s\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%s\"\n}\n";
Most of the "complexity" here is to correctly describe our object. As for this simple example, we can just go with this constant. For serious projects we would need to improve this in a header file and probably make some functions that warp the process.
- Moving on, we have to open a file to write the data to, or create a new one.
FILE *file = fopen("engine_data.json", "w+");
- Once we have our file opened, we need to print the content of our engine struct into it, using the function
fprintf()
.
fprintf(file, ENGINE_EXPORT_FMT, engine.model, engine.manufacturer, engine.power, engine.torque, engine.cylinders, engine.structure, engine.fuelType);
Note that we have named our example file as .json
but we could actually add the name and extension we'd want, and the result would be the same.
A complete view of the code should look like this:
#include<stdio.h> #include<stdlib.h> /*engine struct format data*/ const char *ENGINE_EXPORT_FMT = "{\n\t\"model\": \"%s\",\n\t\"manufacturer\": \"%s\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%s\"\n}\n"; /*simplified engine structure*/ typedef struct Engine { char model[10]; //engine model char manufacturer[10]; //engine manufacturer int power; //kW int torque; //NM int cylinders; //total cylinders in engine int structure; //block structure [1, 2, 3] rows char fuelType[10]; //fuel type [gasoline, diesel] } Engine; int main() { Engine engine { .model = "RB26DETT", .manufacturer = "nismo", .power = 235, .torque = 384, .cylinders = 6, .structure = 1, .fuelType = "gasoline" }; FILE *file = fopen("engine_data.json", "w+"); fprintf(file, ENGINE_EXPORT_FMT, engine.model, engine.manufacturer, engine.power, engine.torque, engine.cylinders, engine.structure, engine.fuelType); fclose(file); return 0; }
We should have a new file named engine_data.json
in our directory with the engine struct parsed into it.
Parsing structured data from a file
If we want the saved data to be used back in the program, we have to kinda reverse engineering our constant to parse our object.
- Create a new constant char.
const char *ENGINE_IMPORT_FMT = "{\n\t\"model\": \"%[^\"]\",\n\t\"manufacturer\": \"%[^\"]\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%[^\"]\"\n}";
- We need to specify where we want to start reading the data from the file.
fseek(file, 0, SEEK_SET);
- Finally we can assign the read data to a new variable using
fscanf()
.
Engine iEngine; fscanf(file, ENGINE_IMPORT_FMT, iEngine.model, iEngine.manufacturer, &iEngine.power, &iEngine.torque, &iEngine.cylinders, &iEngine.structure, iEngine.fuelType);
A complete view of the code should look like this:
#include<stdio.h> #include<stdlib.h> /*engine struct format data*/ const char *ENGINE_IMPORT_FMT = "{\n\t\"model\": \"%[^\"]\",\n\t\"manufacturer\": \"%[^\"]\",\n\t\"power\": %d,\n\t\"torque\": %d,\n\t\"cylinders\": %d,\n\t\"structure\": %d,\n\t\"fuel\": \"%[^\"]\"\n}"; /*simplified engine structure*/ typedef struct Engine { char model[10]; //engine model char manufacturer[10]; //engine manufacturer int power; //kW int torque; //NM int cylinders; //total cylinders in engine int structure; //block structure [1, 2, 3] rows char fuelType[10]; //fuel type [gasoline, diesel] } Engine; int main() { Engine engine; FILE *file = fopen("engine_data.json", "r"); fseek(file, 0, SEEK_SET); fscanf(file, ENGINE_EXPORT_FMT, engine.model, engine.manufacturer, &engine.power, &engine.torque, &engine.cylinders, &engine.structure, engine.fuelType); fclose(file); return 0; }
Summing up
Files play a really important role in software programs. We've seen how to work with operations that read, write and format text both from and into files, but the same can be achieved for binary files such as images or audio.
In addition to that, we can also implement ways to obfuscate how our program writes the data so not everyone can open our format back. This is kind of an unfriendly way to do the things, but corporate often makes this so the competition cannot just sneak into a company's new software and steal how they engineer things. But hey, we have reverse engineers to do so (:
A further discussion in this field will be present in a future article.