Thus said Matt Welland on Wed, 16 Apr 2014 09:01:28 -0700:
> fossil commit cfgdat tests -m "Added another drc test"
> Autosync: ssh://host/path/project.fossil
> Round-trips: 1 Artifacts sent: 0 received: 0
> Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT
> m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM
> time_fudge);}
> Round-trips: 1 Artifacts sent: 0 received: 0
> Pull finished with 360 bytes sent, 280 bytes received
> Autosync failed
> continue in spite of sync failure (y/N)? n
I've done a fair bit of profiling with this, and this seems to happen
primarily with the test-http command (the default sync method for SSH
clients). I don't know what the history is behind the test-http command,
but my guess is that it was really not intended to be a heavily used
sync method for shared repositories. I'm not really sure why this
particular database locking error happens so frequently with test-http,
but not at all with http. This is happening in manifest_crosslink_end()
when it's trying to fudge times.
If I force my SSH command to use http instead of test-http, this error
disappears entirely and I only ever see an occasional locking error due
to multiple committers when I try to commit large change sets (like a
10,000 line, 840K change set); same behavior as standard HTTP/HTTPS
transports in my environment (slow disk/cpu/network).
Are all your users using SSH to access shared repositories? Or do you
just have a few users using SSH?
Perhaps it would be better to switch to using SSH keys and forced
commands to cause fossil to use http instead of test-http? This does
require a bit more setup. For example, each .fossil has to have the
remote_user_ok configuration enabled so you can setup the REMOTE_USER
environment variable for them. This is because there currently is no
mechanism to use Fossil authentication while using SSH as the transport
and fossil http requires it if you want to commit.
I suppose an alternative configuration would be to give nobody/anonymous
users the ability to write, which if SSH authentication is the only
allowed sync method it may be acceptable. The only drawback that I see
there is that the rcvfrom information would show up as having come from
nobody, e.g.,
User: amb
Received From: nobody @ 192.168.1.9 on 2014-04-20 04:33:35
I think one thing I've learned from all this is that forks and database
locking errors occur much more frequently on slow hardware and large
change sets. Also, I seem to be able to cause forking that goes
undetected (without a warning). All of this probably explains why it is
difficult to reproduce except on older hardware.
As for making sync try harder, we could certainly just loop X number of
times if we think it is worth it (not sure how feasible it will be to
make it silent, or if there will be other side effects). Here I have it
loop for 10 times before bailing. As you can see it failed once, but
then succeeded the second time and received updates that indicate it is
out of sync:
$ fossil ci -m synctest2
Autosync: ssh://fossil/tmp/test.fossil
Round-trips: 1 Artifacts sent: 0 received: 0
Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);}
Round-trips: 1 Artifacts sent: 0 received: 0
Pull finished with 314 bytes sent, 280 bytes received
Autosync failed
Autosync: ssh://fossil/tmp/test.fossil
Round-trips: 3 Artifacts sent: 0 received: 102
Pull finished with 3451 bytes sent, 170661 bytes received
would fork. "update" first or use --allow-fork.
There was also a sync failure on the first committer after it
successfully committed the artifacts:
$ fossil ci -m synctest1
Autosync: ssh://fossil/tmp/test.fossil
Round-trips: 1 Artifacts sent: 0 received: 0
Pull finished with 316 bytes sent, 229 bytes received
New_Version: 04e7debfa4f29ee3c1635007e3f380f0a0630366
Autosync: ssh://fossil/tmp/test.fossil
Round-trips: 3 Artifacts sent: 101 received: 0
Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);}
Round-trips: 3 Artifacts sent: 101 received: 0
Sync finished with 179617 bytes sent, 3234 bytes received
Autosync failed
Autosync: ssh://fossil/tmp/test.fossil
Round-trips: 1 Artifacts sent: 0 received: 1
Sync finished with 4916 bytes sent, 2724 bytes received
Thoughts?
Andy
--
TAI64 timestamp: 40000000535358db