Discussion:
Backup traffic
(too old to reply)
Florian Balmer
2018-07-20 08:12:27 UTC
Permalink
As much as I like the simplicity of keeping the full repository
history in one single SQLite database, I see a minor downside.

There's a lot of backup traffic, if "non-contents changes" (such as
the admin and user logs, the login cookies, but also the "PRAGMA
optimize" information) are causing updates to the repository database,
marking it dirty for the next backup cycle.

(This was also the main reason for my complaining about the leftover
WAL and SHM files, recently, which accumulated in my backup logs.
Because in the end, WAL and SHM have to be kept together with the
SQLite database, as they might contain valuable information?)

From peeking at the Fossil timeline, my question is, will the new
"backoffice processing" cause even more frequent updates to the main
repository database, i.e. with the pids stored in the configuration
table, and updated after each web page display?

I have created a (surprisingly simple) patch to attach a separate
login cookie database (shared among all repositories in the same
directory), so that plain login and logout actions will no longer
cause repository database writes. With admin and user logs turned off,
and "PRAGMA optimize" removed, the repository database is only touched
if there's new contents, or new configuration settings.

What's your comments to this? Does anybody care about the repository
database, holding all your valuable contents, being modified
frequently with simple non-contents state information? Given the
reliability of SQLite, we probably shouldn't care. But what is a good
strategy to minimize backup traffic, if repository databases change
that often?

--Florian
Stephan Beal
2018-07-20 08:43:18 UTC
Permalink
Post by Florian Balmer
I have created a (surprisingly simple) patch to attach a separate
login cookie database (shared among all repositories in the same
directory), so that plain login and logout actions will no longer
cause repository database writes. With admin and user logs turned off,
and "PRAGMA optimize" removed, the repository database is only touched
if there's new contents, or new configuration settings.
What's your comments to this? Does anybody care about the repository
database, holding all your valuable contents, being modified
frequently with simple non-contents state information?
This behaviour doesn't bother me at all (in 10 years of using Fossil), but
if a patch for working around it is simple and non-intrusive, i would
consider it to be an interesting feature (with the caveat that it might
impact future changes).

i conceptually like the idea of the login cookie/timestamps being in a
separate db, but i'm not sure that i like it enough to justify the idea of
maintaining two files where one file is sufficient. That wouldn't really
impact me much, as i keep all of my hosted .fsl files in one directory, but
for a hoster like chisselapp, where each repo is (probably) in its own
directory, it doubles the number of fossil-related files. One _potential_
problem i see, but it's largely hypothetical, is that the login cookie db
could become a point of locking contention if is used together with many
very active .fsl files. That is probably only possible if several of those
repos are _extremely_ active, though.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Florian Balmer
2018-07-20 10:18:34 UTC
Permalink
.. i'm not sure that i like it enough to justify the idea of
maintaining two files where one file is sufficient.
The current implementation uses one single cookie database shared for
all repositories in the same directory, which can be excluded from
backups, and deleted (or, better, emptied by SQL script) to have
everybody logged off. But it's possible to modify the code to use one
single cookie database per system, or per repository.
.. the login cookie db could become a point of locking contention ...
Would WAL mode prevent this, mostly?

Below are my current "works for me on Windows and FreeBSD" patches. I
hope we still have the same definitions of "surprisingly simple" :)

The 2nd patch is only required with my previous patch to change ETag
generation to produce a "login-time-sensitive" hash.

I'm sorry e-mail processing may insert one or two unwanted line breaks
after column 72, as Fossil seems to use a source code line length
limit of 80 chars.

Some notes are included directly with the patch file headers, but I'd
like to emphasize that I haven't bothered making things work with
login groups, so far.

I'd be happy to do more work towards a more generalized "separate
(shared) database for non-repository contents, such as 'volatile' or
'system-specific' state information" approach, should this be
considered interesting for Fossil.

--Florian

===================== Patch for Fossil [e08f9c04] ======================

Baseline: Fossil [e08f9c0423]

Proof-of-concept to outsource login cookie information to a separate
database named "cookiestore", saved as "fossil-cookiestore.sqlite" in
the directory of the main repository database, and attached on demand.

The "cookiestore" database is left attached until shutdown; it may be
safer to have it detached explicitly as soon as possible.

HTTP cache handlers, and any other code relying on "user.cexpire", must
query "cookiestore.user.cexpire", instead.

Support to share login credentials across login groups is not
implemented by this patch; in fact, this may even break login group
features.

Admins changing their own password through the /setup_uedit page (not
through the /login page) are no longer logged out automatically.

To prevent writes to the main repository database caused by read-only
web server access, the "PRAGMA optimize" call needs to be removed, and
the "access_log" feature needs to be disabled (the logs could be
recorded to a plain text file, or outsourced to a separate database, if
required).

There may be more elegant SQL queries to work with the connected tables,
either by using JOINs, or FOREIGN KEYs (yet the latter have been
disabled by Fossil).

Windows batch file to dump or tweak the "cookiestore" database:

:: @echo off
:: setlocal
:: set c=fossil-cookiestore.sqlite
:: if not exist "%c%" goto:eof
:: (
:: echo ATTACH '%c%' AS 'c';
:: echo -- PRAGMA c.journal_mode;
:: echo -- PRAGMA c.page_size;
:: echo -- PRAGMA c.auto_vacuum;
:: echo SELECT * FROM c.user;
:: echo -- UPDATE c.user SET cexpire=0;
:: ) | fossil sql --no-repository

Index: src/login.c
==================================================================
--- src/login.c
+++ src/login.c
@@ -143,10 +143,53 @@
*/
static char *abbreviated_project_code(const char *zFullCode){
return mprintf("%.16s", zFullCode);
}

+/*
+** Attach the fossil-cookiestore.sqlite db to store login cookies.
+*/
+void attach_cookiestore()
+{
+ static int attached_cookiestore = 0;
+ char *zDBName;
+ Blob bDBFullName;
+ char *zProjCode;
+
+ if (attached_cookiestore) return;
+
+ zDBName = mprintf("%s/../fossil-cookiestore.sqlite",g.zRepositoryName);
+ file_canonical_name(zDBName,&bDBFullName,0);
+ sqlite3_free(zDBName);
+ db_attach(blob_str(&bDBFullName),"cookiestore");
+ blob_reset(&bDBFullName);
+
+ /* Initialize */
+ db_multi_exec(
+ "CREATE TABLE IF NOT EXISTS cookiestore.user( "
+ "repo TEXT, uid INTEGER, login TEXT, "
+ "cookie TEXT, ipaddr TEXT, cexpire DATETIME,"
+ "PRIMARY KEY (repo, uid), "
+ "UNIQUE (repo, uid, login) ON CONFLICT REPLACE );");
+ /* Clear expired cookies */
+ zProjCode = db_get("project-code",NULL);
+ db_multi_exec(
+ "DELETE FROM cookiestore.user WHERE "
+ "repo=%Q AND cexpire<julianday('now');",zProjCode);
+ free(zProjCode);
+ /* Shrink if not inside BEGIN..COMMIT */
+ if (sqlite3_get_autocommit(g.db)!=0)
+ db_multi_exec(
+ "PRAGMA cookiestore.journal_mode=DELETE;"
+ "PRAGMA cookiestore.page_size=512;"
+ "PRAGMA cookiestore.auto_vacuum=FULL;"
+ "VACUUM cookiestore;"
+ "PRAGMA cookiestore.journal_mode=WAL;");
+
+ attached_cookiestore = 1; /* Check with "PRAGMA database_list"? */
+}
+

/*
** Check to see if the anonymous login is valid. If it is valid, return
** the userid of the anonymous user.
**
@@ -266,30 +309,33 @@
int expires = atoi(zExpire)*3600;
char *zHash;
char *zCookie;
const char *zIpAddr = PD("REMOTE_ADDR","nil"); /* IP address of user */
char *zRemoteAddr = ipPrefix(zIpAddr); /* Abbreviated IP address */
+ char *zProjCode = db_get("project-code",NULL);

assert((zUsername && *zUsername) && (uid > 0) && "Invalid user data.");
+ attach_cookiestore();
zHash = db_text(0,
- "SELECT cookie FROM user"
- " WHERE uid=%d"
+ "SELECT cookie FROM cookiestore.user"
+ " WHERE repo=%Q AND uid=%d"
" AND ipaddr=%Q"
" AND cexpire>julianday('now')"
" AND length(cookie)>30",
- uid, zRemoteAddr);
+ zProjCode, uid, zRemoteAddr);
if( zHash==0 ) zHash = db_text(0, "SELECT hex(randomblob(25))");
zCookie = login_gen_user_cookie_value(zUsername, zHash);
cgi_set_cookie(zCookieName, zCookie, login_cookie_path(), expires);
record_login_attempt(zUsername, zIpAddr, 1);
db_multi_exec(
- "UPDATE user SET cookie=%Q, ipaddr=%Q, "
- " cexpire=julianday('now')+%d/86400.0 WHERE uid=%d",
- zHash, zRemoteAddr, expires, uid
- );
+ "INSERT OR REPLACE INTO cookiestore.user ("
+ " repo, uid, cookie, ipaddr, cexpire, login )"
+ " VALUES ( %Q, %d, %Q, %Q, julianday('now')+%d/86400.0, %Q );",
+ zProjCode, uid, zHash, zRemoteAddr, expires, zUsername );
free(zRemoteAddr);
free(zHash);
+ free(zProjCode);
if( zDest ){
*zDest = zCookie;
}else{
free(zCookie);
}
@@ -348,17 +394,19 @@
void login_clear_login_data(){
if(!g.userUid){
return;
}else{
const char *cookie = login_cookie_name();
+ char *zProjCode = db_get("project-code",NULL);
/* To logout, change the cookie value to an empty string */
cgi_set_cookie(cookie, "",
login_cookie_path(), -86400);
- db_multi_exec("UPDATE user SET cookie=NULL, ipaddr=NULL, "
- " cexpire=0 WHERE uid=%d"
- " AND login NOT IN ('anonymous','nobody',"
- " 'developer','reader')", g.userUid);
+ attach_cookiestore();
+ db_multi_exec(
+ "DELETE FROM cookiestore.user WHERE repo=%Q AND uid=%d;",
+ zProjCode, g.userUid);
+ free(zProjCode);
cgi_replace_parameter(cookie, NULL);
cgi_replace_parameter("anon", NULL);
}
}

@@ -841,21 +889,27 @@
const char *zLogin, /* User name */
const char *zCookie, /* Login cookie value */
const char *zRemoteAddr /* Abbreviated IP address for valid login */
){
int uid;
+ char *zProjCode;
if( login_is_special(zLogin) ) return 0;
+ attach_cookiestore();
+ zProjCode = db_get("project-code",NULL);
uid = db_int(0,
- "SELECT uid FROM user"
- " WHERE login=%Q"
+ "SELECT uid FROM cookiestore.user"
+ " WHERE ( repo=%Q AND login=%Q"
" AND ipaddr=%Q"
" AND cexpire>julianday('now')"
+ " AND constant_time_cmp(cookie,%Q)=0 )"
+ " AND EXISTS ( SELECT 1 FROM user"
+ " WHERE login=%Q"
" AND length(cap)>0"
- " AND length(pw)>0"
- " AND constant_time_cmp(cookie,%Q)=0",
- zLogin, zRemoteAddr, zCookie
+ " AND length(pw)>0);",
+ zProjCode, zLogin, zRemoteAddr, zCookie, zLogin
);
+ free(zProjCode);
return uid;
}

/*
** Return true if it is appropriate to redirect login requests to HTTPS.

===================== Patch for Fossil [e08f9c04] ======================

===================== Patch for Fossil [e08f9c04] ======================

Modify the HTTP cache handlers to query "cookiestore.user.cexpire"
instead of "user.cexpire".

Index: src/etag.c
==================================================================
--- src/etag.c
+++ src/etag.c
@@ -87,12 +87,18 @@
sqlite3_snprintf(sizeof(zBuf),zBuf,"mtime: %lld\n", mtime);
md5sum_step_text(zBuf, -1);

/* Include "user.cexpire" for logged-in users in the hash */
if ( (eFlags & ETAG_CEXP)!=0 && g.zLogin ){
- char *zCExp = db_text(0, "SELECT cexpire FROM user WHERE uid=%d",
- g.userUid);
+ char *zProjCode;
+ char *zCExp;
+ attach_cookiestore();
+ zProjCode = db_get("project-code",NULL);
+ zCExp = db_text(0,
+ "SELECT cexpire FROM cookiestore.user WHERE repo=%Q AND uid=%d",
+ zProjCode,g.userUid);
+ free(zProjCode);
if ( zCExp ){
md5sum_step_text("cexp: ", -1);
md5sum_step_text(zCExp, -1);
md5sum_step_text("\n", 1);
fossil_free(zCExp);

===================== Patch for Fossil [e08f9c04] ======================

===================== Patch for Fossil [e08f9c04] ======================

Baseline: Fossil [e08f9c0423]

Omit the "PRAGMA optimize" command prior to closing the repository
database, to prevent writes caused by read-only web server access.

Note: The sqlite_statN tables created by the "ANALYZE" command can be
managed using the --analyze and --deanalyze rebuild options.

Index: src/db.c
==================================================================
--- src/db.c
+++ src/db.c
@@ -1713,13 +1713,15 @@
while( db.pAllStmt ){
db_finalize(db.pAllStmt);
}
db_end_transaction(1);
pStmt = 0;
+#if 0
g.dbIgnoreErrors++; /* Stop "database locked" warnings from PRAGMA
optimize */
sqlite3_exec(g.db, "PRAGMA optimize", 0, 0, 0);
g.dbIgnoreErrors--;
+#endif
db_close_config();

/* If the localdb has a lot of unused free space,
** then VACUUM it as we shut down.
*/

===================== Patch for Fossil [e08f9c04] ======================
Richard Hipp
2018-07-20 11:04:56 UTC
Permalink
Post by Florian Balmer
But what is a good
strategy to minimize backup traffic, if repository databases change
that often?
Don't backup by copying the database file (which is not safe to do
anyhow, unless you shutdown Fossil during the copy, because otherwise
the database file might change while it is being copied, resulting in
a corrupt copy.). Instead, create your backups by cloning and
syncing. That is what DVCSes are designed to do.

The canonical Fossil self-hosting repository, and the SQLite source
repository that Fossil was created to manage, are both backed up this
way. There are three separate servers, each in separate
geographically distributed data centers, managed by two indenpendent
ISPs. These repos are all synced with one another automatically using
a cron-job.

One cool bonus feature of this approach is that the 'backups" are live
repositories, that can be directly accessed (as
https://www2.fossil-scm.org/ and
httpss://www3.fossil-scm.org/site.cgi) so it is easy to verify that
the backups are really happening and that they are correct.
--
D. Richard Hipp
***@sqlite.org
Warren Young
2018-07-20 13:15:27 UTC
Permalink
create your backups by cloning and syncing
…with Admin privileges. Otherwise, you won’t get important things like the user table. After the first clone, each backup should consist of both a “fossil sync” as well as a “fossil conf pull all”.

While you can recreate the user *list* from Fossil checkin contents and then recreate the users table and do whatever dance it is you do to pass out user passwords and get them changed to something secure, it’s better to just back all that up to begin with.
Florian Balmer
2018-07-20 12:36:40 UTC
Permalink
... create your backups by cloning and syncing ...
Thank you for your comments.

I see, this completely makes sense.

The process of "restoring" a repository from backup would include
copying database files, as syncing from backup → original might not
work if something's gone awry with the original. My main concern here
is that the cloned backup really includes everything from the original
(configuration, etc.). But hearing again (haven't you already outlined
the "cloning as backup strategy" recently, on this list?) that it
works for the experts should give me the faith to trust it.

Backing up "hot" databases is currently not a concern with my private,
traditional-style CGI-served repositories.

I would like to have some "rotating" backup, with a way to go back
certain steps with the complete repository, i.e. day-by-day, for up to
one week, so I could catch the "last good" if I notice something
wrong. Copying and replacing duplicate files with hard-links is an
extremely straight forward and space efficient process to achieve
this.

I will try the same with cloning new (some extra logic required) and
syncing existing repositories. But it may not be possible to detect
unchanged / duplicate repository database files, like this, as some
internally stored last sync or URL last access time stamps might
always result in a different database file, I assume.

--Florian
John P. Rouillard
2018-07-20 21:32:26 UTC
Permalink
Hi all:

In message <CAHgAu9J7-kp22SW_V8eMbLsuCMV_JaC3txj3F+***@mail.gmail.com>
,
Post by Florian Balmer
... create your backups by cloning and syncing ...
Thank you for your comments.
I see, this completely makes sense.
The process of "restoring" a repository from backup would include
copying database files, as syncing from backup → original might not
work if something's gone awry with the original. My main concern here
is that the cloned backup really includes everything from the original
(configuration, etc.). But hearing again (haven't you already outlined
the "cloning as backup strategy" recently, on this list?) that it
works for the experts should give me the faith to trust it.
Does a clone/sync grab passwords and user accounts as well? I thought those
weren't copied in the clone but were private to the repository.

--
-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.
Warren Young
2018-07-20 22:07:04 UTC
Permalink
Post by John P. Rouillard
Does a clone/sync grab passwords and user accounts as well? I thought those
weren't copied in the clone but were private to the repository.
You get a copy of the users table *if* you clone while logged in with a user with Setup privileges. It might also work with Admin, but I haven’t checked.

Otherwise, you’re right: Fossil strips the user table contents while cloning, on purpose.
Richard Hipp
2018-07-21 01:08:39 UTC
Permalink
Post by John P. Rouillard
Does a clone/sync grab passwords and user accounts as well? I thought those
weren't copied in the clone but were private to the repository.
If you have Admin or Setup privilege, you can do "fossil config sync user"
--
D. Richard Hipp
***@sqlite.org
Florian Balmer
2018-07-20 12:56:42 UTC
Permalink
Just one more thought:

Copying database files (vs. cloning) also includes any hand-made meta
changes, for example I remember adjusting the page size and journal
mode for older repositories, when the defaults for new Fossil
repositories were changed.

Of course `fossil rebuild --wal' after the sync can help with things
like these, but the database file checksum will definitely change and
trigger a complete backup, for the rebuilt repository.

I think I need to come away from my traditional "copy a file and get
exactly what you had" way of thinking ...

--Florian
Florian Balmer
2018-07-21 06:15:06 UTC
Permalink
Post by Warren Young
Quantify “a lot.”
I have some rarely committed-to but frequently web-accessed
repositories (with login), and I see daily backups of the modified
repository database, even though I'm sure I haven't committed
anything. It's like "hey, what's going on there with my babies?"
everytime, but I need to get used to it.
Post by Warren Young
I’d find out why the DB client is dying early and fix that, so that
the WAL ends up being deleted entirely upon a clean DB shutdown.
I think I found it:

https://www.mail-archive.com/fossil-***@lists.fossil-scm.org/msg27269.html

There's a call to fossil_exit() from within a
db_step()...db_finalize() block, and calling fossil_exit() only after
db_finalize() fixed it.

There's been some changes to fossil_exit() in the meantime, I'll check
these, and report back here.
Post by Warren Young
I find it odd that some people get so itchy over DB concurrency and
such with Fossil when highly active projects might have 40 or so
commits per day.
I'm not worried by this. Stephan just wondered if a shared cookie
database may be prone to locks contention, if I got him right.

I'd assume the main bottleneck to be high-frequency, read-only,
no-login web access (for a renowned project), in which case the cookie
database doesn't even need to be attached, and not the frequency of
commits.

--Florian
Florian Balmer
2018-07-21 12:09:22 UTC
Permalink
Post by Florian Balmer
There's been some changes to fossil_exit() in the meantime, I'll check
these, and report back here.
I was wrong, the changes were to fossil_fatal() and fossil_panic(),
and not to fossil_exit(). The current tip version of Fossil still
exhibits the behavior summarized here:

https://www.mail-archive.com/fossil-***@lists.fossil-scm.org/msg27269.html

--Florian
Richard Hipp
2018-07-21 16:50:43 UTC
Permalink
Post by Florian Balmer
The current tip version of Fossil still
I think this problem has been addressed in a more general way on the
latest trunk. Please let me know if you find otherwise.
--
D. Richard Hipp
***@sqlite.org
Warren Young
2018-07-20 13:11:50 UTC
Permalink
Post by Florian Balmer
There's a lot of backup traffic
Quantify “a lot.”

Do you have benchmark numbers showing that the current load is too high, and that your wished-for changes will reduce load to acceptable levels?
Post by Florian Balmer
(This was also the main reason for my complaining about the leftover
WAL and SHM files, recently, which accumulated in my backup logs.
Because in the end, WAL and SHM have to be kept together with the
SQLite database, as they might contain valuable information?)
The greater concern is that if these files are present after all clients have disconnected from the DB, it means you’ve got a DB client that is dying without closing the DB properly. That’s a problem in its own right, but it might also mean that the last transaction run might not have hit the journal before the program died, so it’s effectively rolled back upon replay of the journal.

Rather than worry over the resulting WAL size, I’d find out why the DB client is dying early and fix that, so that the WAL ends up being deleted entirely upon a clean DB shutdown.
Post by Florian Balmer
From peeking at the Fossil timeline, my question is, will the new
"backoffice processing" cause even more frequent updates to the main
repository database, i.e. with the pids stored in the configuration
table, and updated after each web page display?
How many checkins, syncs, etc. do you have per day?

I find it odd that some people get so itchy over DB concurrency and such with Fossil when highly active projects might have 40 or so commits per day. Amortized evenly over an 8-hour work day, that’s only one every 12 minutes. With real-world bursty traffic, there’s still an excellent chance that on every DB update, there is no write contention at all.
Post by Florian Balmer
Does anybody care about the repository
database, holding all your valuable contents, being modified
frequently with simple non-contents state information?
If I didn’t trust it to withstand that, I wouldn’t trust it to hold my unique work products, either.
Florian Balmer
2018-07-21 17:48:04 UTC
Permalink
Post by Richard Hipp
I think this problem has been addressed in a more general way
on the latest trunk. Please let me know if you find otherwise.
This works fine (tested only on Windows, so far), thank you very much!

--Florian

Loading...