Post by Thomas Levine
One inconvenience I noted is that the deconstruct command always writes
artefacts to the filesystem, even if a file of the appropriate name and
size and contents already exists.
You might want to split that observation into two, as rsync does:
- name, size, and modification date match
- contents also match
If you’re willing to gamble that if the first test returns true that the second will also returns true, it buys you a big increase in speed. The gamble is worth taking as long as the files’ modification timestamps are trustworthy.
When the timestamps aren’t trustworthy, you do the first test, then if that returns true, also do the second as extra assurance.
Post by Thomas Levine
Would the developers welcome a flag
to blob_write_to_file in src/blob.c to skip the writing of a new
artefact file if the file already exists?
In addition to your backup case, it might also benefit snapshotting mechanisms found in many virtual machine systems and in some of the more advanced filesystems. (ZFS, btrfs, APFS…)
However, I’ll also give a counterargument to the whole idea: you probably aren’t saving anything in the end. An intelligent deconstruct + backup probably saves no net I/O over just re-copying the Fossil repo DB to the destination unless the destination is *much* slower than the machine being backed up.
(rsync was created for the common case where networks are much slower than the computers they connect. rsync within a single computer is generally no faster than cp -r, and sometimes slower, unless you take the mtime optimization mentioned above.)
The VM/ZFS + snapshots case has a similar argument against it: if you’re using snapshots to back up a Fossil repo, deconstruction isn’t helpful. The snapshot/CoW mechanism will only clone the changed disk blocks in the repo.
So, what problem are you solving? If it isn’t the slow-networks problem, I suspect you’ve got an instance of the premature optimization problem here. If you go ahead and implement it, measure before committing the change, and if you measure a meaningful difference, document the conditions to help guide expectations.