When a mv is a cp

When I was starting to dive deeper into the Ops world, I learned an interesting thing about Linux filesystems and moving files around. This was essential knowledge as my team worked with very large files on a performance-critical system. To improve performance, sometimes moving a file is faster than copying it. In this situation, a move is defined as transferring a file to another location and ensuring that the file at the original location no longer exists. A copy preserves the file at the original location, essentially cloning it.

You will notice that copying (cp) files takes more time than moving (mv) files. This is because when a file is copied, a second set of inodes is created. The bigger the file, the more inodes are needed, the more time it will take to copy. In contrast, moving a file does not change inodes but rather the location on the disk, which can be thought of as basically just a string. Regardless of the filesize, a move will always be immediate…except in one particular circumstance.

If you attempt to use this time-saving technique when working across other disks (e.g. hard drives, USB drives, SSD) you will be disappointed. In fact, you will find that cp and mv take roughly the same amount of time. This is because each drive (or filesystem) has a separate inode table. A mv operation between filesystems cannot reuse the same inode entry, forcing the Linux kernel to use cp instead of mv, but to the operator only the mv command was used and it simply took longer than usual.

This behaviour can be seen by looking at the syscalls Linux uses for cp and mv when moving files on the same filesystem versus different filesystems. I'll be using strace for this, which allows Linux administrators to view the syscalls happening when a particular command is used.

Let's copy a file within the same filesystem. In this case, the /tmp directory is on the same filesystem as $HOME.

$ echo "test" > testfile
$ strace -o strace.out cp testfile /tmp/testfile
$ cat strace.out
... snip ...
stat("/tmp/testfile", 0x7ffd86a86cd0)   = -1 ENOENT (No such file or directory)
stat("testfile", {st_mode=S_IFREG|0664, st_size=5, ...}) = 0
stat("/tmp/testfile", 0x7ffd86a86a50)   = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "testfile", O_RDONLY)  = 3
openat(AT_FDCWD, "/tmp/testfile", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
read(3, "test\n", 131072)               = 5
write(4, "test\n", 5)                   = 5
read(3, "", 131072)                     = 0
close(4)                                = 0
close(3)                                = 0
... snip ...

We can see that file descriptor 3 was opened (openat) to read from testfile, and file descriptor 4 was opened to write to /tmp/testfile. The copying happens by reading from one file and writing to the other, then both file descriptors are closed. Creating a new file will create a new inode entry.

Now we'll try it across filesystems. In this case, a USB drive is used called /media/scott.

$ echo "test" > testfile
$ strace -o strace.out cp testfile /media/scott/testfile
$ cat strace.out
... snip ...
stat("/media/scott/testfile", 0x7fff365e56f0) = -1 ENOENT (No such file or directory)
stat("testfile", {st_mode=S_IFREG|0664, st_size=5, ...}) = 0
stat("/media/scott/testfile", 0x7fff365e5470) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "testfile", O_RDONLY)  = 3
openat(AT_FDCWD, "/media/scott/testfile", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
read(3, "test\n", 131072)               = 5
write(4, "test\n", 5)                   = 5
read(3, "", 131072)                     = 0
close(4)                                = 0
close(3)                                = 0
... snip ...

Both strace results are basically the same. Now let's move a file. First we move a file on the same filesystem.

$ echo "test" > testfile
$ strace -o strace.out mv testfile /tmp/testfile
$ cat strace.out
... snip ...
stat("/tmp/testfile", 0x7ffe087f22b0)   = -1 ENOENT (No such file or directory)
lstat("testfile", {st_mode=S_IFREG|0664, st_size=5, ...}) = 0
lstat("/tmp/testfile", 0x7ffe087f1f90)  = -1 ENOENT (No such file or directory)
rename("testfile", "/tmp/testfile")     = 0
... snip ...

Now we can see why the mv command is so much faster. There is just a rename of the file to point to the new location, but no inode creation (via an openat syscall).

$ echo "test" > testfile
$ strace -o strace.out mv testfile /media/scott/testfile
$ cat strace.out
... snip ...
stat("/media/scott/testfile", 0x7ffc1d52bf20) = -1 ENOENT (No such file or directory)
lstat("testfile", {st_mode=S_IFREG|0664, st_size=5, ...}) = 0
lstat("/media/scott/testfile", 0x7ffc1d52bc00) = -1 ENOENT (No such file or directory)
rename("testfile", "/media/scott/testfile") = -1 EXDEV (Invalid cross-device link)
unlink("/media/scott/testfile")  = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "testfile", O_RDONLY|O_NOFOLLOW) = 3
openat(AT_FDCWD, "/media/scott/testfile", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
read(3, "test\n", 131072)               = 5
write(4, "test\n", 5)                   = 5
read(3, "", 131072)                     = 0
close(4)                                = 0
close(3)                                = 0
unlinkat(AT_FDCWD, "testfile", 0)       = 0
... snip ...

And here we see that Linux first tries to perform a rename but recognizes that we are crossing filesystems, and then the operation falls back to using the same commands as cp.

And for bonus points on any Linux certification test, know that a mv within a filesystem is atomic because only one syscall (rename) is used to move a file, whereas a mv across filesystems or any type of cp is non-atomic because more than one syscall is used.

Tags: linux