Friday 15 April 2016

How to correctly make "latest" symlinks

"Latest symlink"

A "latest" symlink, is a symbolic link (on Linux, Unix etc) which links to the "latest" version of a file.

Suppose we have a file which takes some effort to create, which is generated periodically or in response to some stimulus (e.g. user activity). Then we want to create a "latest version" symlink.

Ideally the properties should be
  • latest symlink always points at the latest version (duuh!)
  • latest symlink always exists
  • latest symlink never points at a partially completed, broken, missing or otherwise bad file
Sometimes people do this in a way which won't work.

How to create a symlink

Dead easy, right? Just call the "symlink" function. 

 int symlink(const char *oldpath, const char *newpath);

 DESCRIPTION
       symlink()  creates  a  symbolic  link  named newpath which contains the
       string oldpath.


But if we call "symlink" and the newpath param points to an already existing file (including a symlink) then it will return EEXIST.

The wrong (obvious) way

try:
  unlink(dest)
except OSError:
  pass
symlink(src, dest)

The correct (not so obvious) way

templink = dest + '.temp'
symlink(src, templink)
rename(templink, dest) 

Why?

Because we want to avoid a race condition where the destination symbolic link does not exist. Renaming files is atomic and will instantly replace the existing link with a new one; no other program can possibly see a non-existing file.

Other wrong ways

Some possibly common, but wrong (or even wronger) ways to do this
  • Just create the "latest" file using our "do lots of work" process directly. This is really bad, as during file creation, another process can see a partially completed file. If your code looks like this: write file header; do lots of work to create file body; write file footer, then there is a really good chance that another process sees an incomplete file.
  • Create a different file, then copy the file (file copying isn't atomic)

No comments: