Discussion:
Fossil server load control
(too old to reply)
Richard Hipp
2014-03-12 13:40:24 UTC
Permalink
A new feature was recently added to Fossil that allows it to deny expensive
requests (such as "blame" or "tarball" on a large repository) if the server
load average is too high. See
http://www.fossil-scm.org/fossil/doc/tip/www/server.wiki#loadmgmt for
further information.

This new feature was recently added to the server that self-hosts Fossil
and which also hosts a number of other projects including SQLite and
System.Data.SQLite.

I am pleased to announce that this new feature has passed its first test.

About three hours ago, a single user in Beijing began downloading multiple
copies of the same System.Data.SQLite tarball. As of this writing, he has
so far attempted to download that one tarball 11,784 times (at last count -
the download attempts are continuing...) Download requests are arriving at
a rate of about one per second, and each request takes about 3.1 seconds of
CPU time in order to compute the 80MB tarball. Since requests are arriving
faster than they can be serviced, this would formerly have resulted in
unlimited growth of the run-queue, essentially shutting down the server.
The effect is the same as having been "slashdotted". But thanks to the
recent enhancements, most of these massive download requests are rejected
with a "503" error and the server load average is staying below 4.0 at all
times and thus continues to provide quick responses to other 10
requests/second that the server normally receives.

If you are running your own publicly accessible Fossil server, you might
want to consider updating to the latest Fossil trunk and activating the
load-average limiter too. (Note to Tclers: I have already done this on
core.tcl.tk.)

And if you have alternative suggestions about how to keep a light-weight
host running smoothly under a massive Fossil request load, please post
follow-up comments.
--
D. Richard Hipp
***@sqlite.org
Andreas Kupries
2014-03-12 17:13:21 UTC
Permalink
Post by Richard Hipp
A new feature was recently added to Fossil that allows it to deny expensive
requests (such as "blame" or "tarball" on a large repository) if the server
load average is too high. See
http://www.fossil-scm.org/fossil/doc/tip/www/server.wiki#loadmgmt for
further information.
Interesting.
Post by Richard Hipp
I am pleased to announce that this new feature has passed its first test.
About three hours ago, a single user in Beijing began downloading multiple
copies of the same System.Data.SQLite tarball. As of this writing, he has
so far attempted to download that one tarball 11,784 times (at last count -
a rate of about one per second, and each request takes about 3.1 seconds of
CPU time in order to compute the 80MB tarball.
And if you have alternative suggestions about how to keep a light-weight
host running smoothly under a massive Fossil request load, please post
follow-up comments.
How sensible do you think would it be to have a (limited-size)
(in-memory|disk) cache to hold the most recently requested tarballs ?
That way a high-demand tarball, etc. would be computed only once and
then served statically from the cache.

Note that I actually see this as a possible complement to the load mgmt feature.
The cache would help if demand is high for a small number of
revisions, whereas load mgmt would kick in and restrict load if the
access pattern of revisions is sufficiently random/spread out to
negate the cache (i.e. cause it to thrash).

Side note: While the same benefits could be had by putting a regular
web cache in front of the fossil server, i.e. a squid or the like this
would require more work to set up and admin. And might be a problem
for the truly dynamic parts of the fossil web ui. An integrated cache
just for the assets which are expensive to compute and yet
(essentially) static does not have these issues.

I mentioned in-memory and disk ... I can see that a two-level scheme
here ... A smaller in-memory cache for the really high-demand pieces
with LRU, and a larger disk cache for the things not so much in-demand
at the moment, but possibly in the future. The disk cache could
actually be much larger (disks are large and cheap these days), this
would help with random access attacks (as they would become
asymptotically more difficult as the disk cache over time extends its
net of quickly served assets).
--
Andreas Kupries
Senior Tcl Developer
Code to Cloud: Smarter, Safer, Faster(tm)
F: 778.786.1133
***@activestate.com
http://www.activestate.com
Learn about Stackato for Private PaaS: http://www.activestate.com/stackato

EuroTcl'2014, July 12-13, Munich, GER
Richard Hipp
2014-03-12 17:25:01 UTC
Permalink
On Wed, Mar 12, 2014 at 1:13 PM, Andreas Kupries
Post by Andreas Kupries
Post by Richard Hipp
And if you have alternative suggestions about how to keep a light-weight
host running smoothly under a massive Fossil request load, please post
follow-up comments.
How sensible do you think would it be to have a (limited-size)
(in-memory|disk) cache to hold the most recently requested tarballs ?
That way a high-demand tarball, etc. would be computed only once and
then served statically from the cache.
It's on my to-do list, actually. The idea is to have a separate database
that holds the cache. And yes it is complementary to the load management
feature.
Post by Andreas Kupries
Side note: While the same benefits could be had by putting a regular
web cache in front of the fossil server, ....
No they can't actually, at least not by any technology I'm aware of. The
problem is that these request must be authenticated. Downloads might be
only authorized for certain users. If an authorized user does a download,
and squid caches it, some other unauthorized user might be able to obtain
the download from cache.

Even if downloads are currently authorized for anybody (which is the common
case, at least on public repos), I don't think you want them being cached,
since to do so would mean that turning off public downloads would be
ineffective until the caches all expired.

I mentioned in-memory and disk ... I can see that a two-level scheme
Post by Andreas Kupries
here ... A smaller in-memory cache for the really high-demand pieces
with LRU, and a larger disk cache for the things not so much in-demand
at the moment, but possibly in the future. The disk cache could
actually be much larger (disks are large and cheap these days), this
would help with random access attacks (as they would become
asymptotically more difficult as the disk cache over time extends its
net of quickly served assets).
The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be
disk-based.
--
D. Richard Hipp
***@sqlite.org
Ramon Ribó
2014-03-12 17:31:29 UTC
Permalink
​
The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be disk-
based.
​Does not FastCGI do exactly the opposite?​

​RR​
Post by Andreas Kupries
Post by Richard Hipp
And if you have alternative suggestions about how to keep a light-weight
host running smoothly under a massive Fossil request load, please post
follow-up comments.
How sensible do you think would it be to have a (limited-size)
(in-memory|disk) cache to hold the most recently requested tarballs ?
That way a high-demand tarball, etc. would be computed only once and
then served statically from the cache.
It's on my to-do list, actually. The idea is to have a separate database
that holds the cache. And yes it is complementary to the load management
feature.
Post by Andreas Kupries
Side note: While the same benefits could be had by putting a regular
web cache in front of the fossil server, ....
No they can't actually, at least not by any technology I'm aware of. The
problem is that these request must be authenticated. Downloads might be
only authorized for certain users. If an authorized user does a download,
and squid caches it, some other unauthorized user might be able to obtain
the download from cache.
Even if downloads are currently authorized for anybody (which is the
common case, at least on public repos), I don't think you want them being
cached, since to do so would mean that turning off public downloads would
be ineffective until the caches all expired.
I mentioned in-memory and disk ... I can see that a two-level scheme
Post by Andreas Kupries
here ... A smaller in-memory cache for the really high-demand pieces
with LRU, and a larger disk cache for the things not so much in-demand
at the moment, but possibly in the future. The disk cache could
actually be much larger (disks are large and cheap these days), this
would help with random access attacks (as they would become
asymptotically more difficult as the disk cache over time extends its
net of quickly served assets).
​​
The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be
disk-based.
--
D. Richard Hipp
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Richard Hipp
2014-03-12 17:34:27 UTC
Permalink
Post by Richard Hipp
The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be disk-
based.
Does not FastCGI do exactly the opposite?
Fossil doesn't support FastCGI, only SCGI. And even with SCGI, Fossil
forks a new process to handle each request.
--
D. Richard Hipp
***@sqlite.org
Stephan Beal
2014-03-12 17:53:35 UTC
Permalink
Post by Richard Hipp
The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be disk-
based.
Does not FastCGI do exactly the opposite?
FastCGI requires that there be some sort of state object which is can
re-set between calls, and feed that state into each child. Fossil doesn't
have such a state object (it has one, but not one which can simply be
re-set/re-used), so FastCGI can't really do its magic with fossil.
libfossil (currently under construction and moving along nicely) provides
such a construct.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Lluís Batlle i Rossell
2014-03-12 17:57:29 UTC
Permalink
Post by Richard Hipp

The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be disk-
based.
​Does not FastCGI do exactly the opposite?​
The current implementation simply uses a fork() per request. Not fork+exec,
which is the common case (CGI) FastCGI is meant to improve.

AFAIU, using a separate process eases the heap memory handling.

Andreas Kupries
2014-03-12 17:36:26 UTC
Permalink
Post by Richard Hipp
Post by Andreas Kupries
Post by Richard Hipp
And if you have alternative suggestions about how to keep a light-weight
host running smoothly under a massive Fossil request load, please post
follow-up comments.
How sensible do you think would it be to have a (limited-size)
(in-memory|disk) cache to hold the most recently requested tarballs ?
That way a high-demand tarball, etc. would be computed only once and
then served statically from the cache.
It's on my to-do list, actually. The idea is to have a separate database
that holds the cache.
Single-file strikes again ... While I was thinking of a regular
directory and files. But that is an implementation detail. Database
might be a bit easier to manage (i.e. setup/remove).

The teapot server [1] has a disk cache, but not as database, plain
directory with files.
[1] http://docs.activestate.com/activetcl/8.5/tpm/tpm/files/CTP_teapot.html
Post by Richard Hipp
And yes it is complementary to the load management
feature.
Post by Andreas Kupries
Side note: While the same benefits could be had by putting a regular
web cache in front of the fossil server, ....
No they can't actually, at least not by any technology I'm aware of. The
problem is that these request must be authenticated.
Ack. Forgot the permission issue. Yes, we do not want to have
authenticated downloads in a public area.
Post by Richard Hipp
Post by Andreas Kupries
I mentioned in-memory and disk ... I can see that a two-level scheme
The current Fossil implementation runs a separate process for each HTTP
request. So an in-memory cache wouldn't be helpful. It has to be
disk-based.
Right. Getting in-memory cache would require redesign of the web
server parts itself to threaded or some such. ... Could be done, but
more work. ... Maybe Stephan can prototype that design in his
libfossil ;)
--
Andreas Kupries
Senior Tcl Developer
Code to Cloud: Smarter, Safer, Faster(tm)
F: 778.786.1133
***@activestate.com
http://www.activestate.com
Learn about Stackato for Private PaaS: http://www.activestate.com/stackato

EuroTcl'2014, July 12-13, Munich, GER
Stephan Beal
2014-03-12 17:26:25 UTC
Permalink
On Wed, Mar 12, 2014 at 6:13 PM, Andreas Kupries
Post by Andreas Kupries
How sensible do you think would it be to have a (limited-size)
(in-memory|disk) cache to hold the most recently requested tarballs ?
That way a high-demand tarball, etc. would be computed only once and
then served statically from the cache.
FWIW: i was scratching down ideas for this very idea today for the
libfossil CGI demos because i don't like the memory cost of generate ZIP
files from script code. Caching the (say) 10 most recent ZIPs could
alleviate some of my load concerns. It need not be a synchable table, nor
in one which survives a rebuild.

Note that I actually see this as a possible complement to the load mgmt
Post by Andreas Kupries
feature.
The cache would help if demand is high for a small number of
revisions, whereas load mgmt would kick in and restrict load if the
access pattern of revisions is sufficiently random/spread out to
negate the cache (i.e. cause it to thrash).
+1
Post by Andreas Kupries
would require more work to set up and admin. And might be a problem
for the truly dynamic parts of the fossil web ui. An integrated cache
just for the assets which are expensive to compute and yet
(essentially) static does not have these issues.
In my experience, most proxies won't cache for requests which have URL
parameters. Whether or not that's generally true, i can't say. For static
content (lots of what fossil serves is static), the URLs can/should be
written as /path/arg1/arg2, rather than /path?arg1=...&arg2=..., to make
them "potentially more cacheable".
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Richard Hipp
2014-03-12 17:32:23 UTC
Permalink
Post by Stephan Beal
In my experience, most proxies won't cache for requests which have URL
parameters. Whether or not that's generally true, i can't say. For static
content (lots of what fossil serves is static), the URLs can/should be
written as /path/arg1/arg2, rather than /path?arg1=...&arg2=..., to make
them "potentially more cacheable".
With a few carefully chosen exceptions, Fossil always sets "Cache-control:
no-cache" in the header of its replies, due in large part to those pesky
authentication cookies.
--
D. Richard Hipp
***@sqlite.org
Continue reading on narkive:
Loading...