Discussion:
A fossil library
(too old to reply)
Sam Putman
2018-06-15 20:25:57 UTC
Permalink
First post. Hi!

I've been lurking along, following the discussion here.

Common thread is a desire for 'more fossil'. I'm in this camp myself.

But I see the attraction of the core fossil application. It works perfectly
for a fairly close-knit community, and it follows a philosophy that's been
working for decades now. One that is, if anything, more effective as it
becomes less fashionable.

Let me make a suggestion: what we need is not more fossil, it is less
fossil.

I wrote Dr. Richard Hipp about this earlier, his response was positive
enough
that I felt encouraged to bring it to the community.

For my own projects, I've switched to fossil. It's the obvious choice, we're
using SQLite in preference to the old pile o' files already.

The fossil codebase has all the core algorithms for storing deltas in a
single database file, merging, deduplication, Merkle hashing, key signature
management, extensible metadata... I don't have to sell you on the virtues
of this VCS!

I would benefit greatly from being able to use this excellent collection of
SQLite best practices and algorithms, the same way I use SQLite: as a static
or linked library, one which can be wrapped in various FFIs for VMs, or
linked
directly from a systems language.

My own case would call this from LuaJIT, what matters is everyone can be
happy. fossil proper can stay attuned to the SQLite/Tcl/Tk alliance, as it
should, and adventurers could wire it to mailing lists, wikis, forums.

I think this would help fossil really stand out. Just the fact that here we
have tools to read and write git to a single-file database, that's huge!

Tools for revision control would be a real boon to applications already
using
SQLite as an AFF. I could go on.

I always feel some trepidation towards what amounts to asking other people
for
free work. I feel this refactoring could benefit fossil as well as my own
software. I'd be a part of such an effort as soon as anything halfway
plausible
was compiling, if invited.

Sincerely,

-Sam Putman
--
Special Circumstances
Stephan Beal
2018-06-15 22:46:28 UTC
Permalink
i will write a longer response when i'm back on the PC, but short version:

- refactoring to a lib is a huge effort.

- up until late 2014 i was actively working on a library port and had most
of the core features working.

- RSI struck me down and has since effectively removed me from the
programming world, so libfossil has remained unmaintained and is not longer
compatible since the addition of non-SHA1 hashes (and i have no estimate
for what it would take to bring it up to date).

More details upcoming about that first point in the morning.

----- stephan
Sent from a mobile device, possibly left-handed from bed. Please excuse
brevity, typos, and top-posting.
Post by Sam Putman
First post. Hi!
I've been lurking along, following the discussion here.
Common thread is a desire for 'more fossil'. I'm in this camp myself.
But I see the attraction of the core fossil application. It works perfectly
for a fairly close-knit community, and it follows a philosophy that's been
working for decades now. One that is, if anything, more effective as it
becomes less fashionable.
Let me make a suggestion: what we need is not more fossil, it is less
fossil.
I wrote Dr. Richard Hipp about this earlier, his response was positive
enough
that I felt encouraged to bring it to the community.
For my own projects, I've switched to fossil. It's the obvious choice, we're
using SQLite in preference to the old pile o' files already.
The fossil codebase has all the core algorithms for storing deltas in a
single database file, merging, deduplication, Merkle hashing, key signature
management, extensible metadata... I don't have to sell you on the virtues
of this VCS!
I would benefit greatly from being able to use this excellent collection of
SQLite best practices and algorithms, the same way I use SQLite: as a static
or linked library, one which can be wrapped in various FFIs for VMs, or
linked
directly from a systems language.
My own case would call this from LuaJIT, what matters is everyone can be
happy. fossil proper can stay attuned to the SQLite/Tcl/Tk alliance, as it
should, and adventurers could wire it to mailing lists, wikis, forums.
I think this would help fossil really stand out. Just the fact that here we
have tools to read and write git to a single-file database, that's huge!
Tools for revision control would be a real boon to applications already
using
SQLite as an AFF. I could go on.
I always feel some trepidation towards what amounts to asking other people
for
free work. I feel this refactoring could benefit fossil as well as my own
software. I'd be a part of such an effort as soon as anything halfway
plausible
was compiling, if invited.
Sincerely,
-Sam Putman
--
Special Circumstances
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Warren Young
2018-06-16 01:29:25 UTC
Permalink
Post by Stephan Beal
- refactoring to a lib is a huge effort.
That’s the real trick, I think: the library needs to be part of Fossil proper, so that it stays up to date.

That in turn means finding and maintaining a strong boundary between whatever your conception of “less Fossil” is from “whole Fossil.”

If such a clear boundary doesn’t already exist, refactoring Fossil until such a boundary appears will be difficult, but perhaps worthwhile on its own merits, even if your liblessfossil is never used.

More than once, people have proposed applications of Fossil that could use this liblessfossil, where arm-twisting whole Fossil into the role was not sensible.
Stephan Beal
2018-06-16 10:07:28 UTC
Permalink
Post by Stephan Beal
- refactoring to a lib is a huge effort.
...
More details upcoming about that first point in the morning.
So...

http://fossil.wanderinghorse.net/r/libfossil

that's now effectively defunct, though, as i've been on medical leave for
RSI most of the past 3.5 years and am currently on track to be forced
into early retirement within the next couple of months.

Several aspects of fossil make it very tedious (but not difficult, per se)
to port to a library:

1) it uses a great deal of global state. That's simple enough to factor
into a Context object, but...

2) it relies on a fail-fast-and-fail-loud allocator. Any allocation error
will immediately (intentionally) crash the app. While that saves literally
half (sometimes more) of code/error checking any place where memory is
allocated (that's a lot of places), that pattern is unusable for libraries.
Granted, allocation errors are rare, but every single C call which
allocates has to check for failure or risk Undefined Behaviour. To simplify
the vast majority of the implementation, Fossil does this checking in a
single place and abort()s the app if an allocation fails.

3) Fossil effectively uses exit() to handle just about any type of
non-allocation error. i.e. there's little library-friendly error handling
in fossil.

4) Last but not least: Fossil implements a great many intricate algorithms
which, if not ported 100% perfectly, could lead to all sorts of Grief, some
of it difficult to track down. Such ports typically require 2x as much
code, sometimes more, because of the addition of error checking and
handling (as opposed to using abort() and exit()).

libfossil had essentially all of the core-most functionality running
(documented,
too) when RSI knocked me out, and was mainly missing network-related
functionality. It took, according to the timeline, about 16 months to get
it to that point (noting that i also worked on other projects at the time,
so that's not "16 months of effort"). My plan was to pick it back up when
my RSI problems passed, but whether they will is now an open question. In
the mean time, the SHA-related changes have made libfossil incompatible
with fossil, meaning that it would be much more difficult to get it back up
and running.

i would be thrilled to see someone implement a library for fossil, but
anyone doing so needs to understand, in advance, that it's a large
undertaking.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Stephan Beal
2018-06-16 10:17:59 UTC
Permalink
Post by Stephan Beal
libfossil had essentially all of the core-most functionality
running (documented, too)
http://fossil.wanderinghorse.net/repos/libfossil/doxygen/

Ah, those were the days... (i actually _miss_ documenting software.)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Sam Putman
2018-06-16 21:44:43 UTC
Permalink
Post by Stephan Beal
Post by Stephan Beal
- refactoring to a lib is a huge effort.
...
More details upcoming about that first point in the morning.
So...
http://fossil.wanderinghorse.net/r/libfossil
that's now effectively defunct, though, as i've been on medical leave for
RSI most of the past 3.5 years and am currently on track to be forced
into early retirement within the next couple of months.
Stephan, thank you for your thoughtful reply, and your work as well. This
is encouraging.

I have been on the bench for medical reasons. My sympathy for your
condition, I hope for the
best outcome in treatment.

I'll be reading through the codebase and documentation, some initial
Post by Stephan Beal
Several aspects of fossil make it very tedious (but not difficult, per se)
1) it uses a great deal of global state. That's simple enough to factor
into a Context object, but...
An incremental refactoring of this into something more modular would
be a boon to maintenance and testing. Seems like a sleeping dog we can let
lie for now.
Post by Stephan Beal
2) it relies on a fail-fast-and-fail-loud allocator. Any allocation error
will immediately (intentionally) crash the app. While that saves literally
half (sometimes more) of code/error checking any place where memory is
allocated (that's a lot of places), that pattern is unusable for libraries.
Granted, allocation errors are rare, but every single C call which
allocates has to check for failure or risk Undefined Behaviour. To simplify
the vast majority of the implementation, Fossil does this checking in a
single place and abort()s the app if an allocation fails.
Ok, this doesn't sound /ideal/ granted, but maybe not so bad either.

I would likely prefer as much allocation as possible during load. An
allocation error during this stage
is a show-stopper.

If it can be refactored into a goto cleanup that doesn't bring the whole
tower down, so it just returns a
non-zero number unless setup succeeds, that's plenty to get started.
Post by Stephan Beal
3) Fossil effectively uses exit() to handle just about any type of
non-allocation error. i.e. there's little library-friendly error handling
in fossil.
I guess this bullet depends on how much error handling is possible at those
points, and how badly
failures would bork the global state.

If the answer is "none" and "not a bit" then turning some of these exit()s
into a library error would be plenty.

I suspect it's more like "usually none" and "we don't know because fossil
just exits on error".
Post by Stephan Beal
4) Last but not least: Fossil implements a great many intricate algorithms
which, if not ported 100% perfectly, could lead to all sorts of Grief, some
of it difficult to track down. Such ports typically require 2x as much
code, sometimes more, because of the addition of error checking and
handling (as opposed to using abort() and exit()).
libfossil had essentially all of the core-most functionality running (documented,
too) when RSI knocked me out, and was mainly missing network-related
functionality. It took, according to the timeline, about 16 months to get
it to that point (noting that i also worked on other projects at the time,
so that's not "16 months of effort"). My plan was to pick it back up when
my RSI problems passed, but whether they will is now an open question. In
the mean time, the SHA-related changes have made libfossil incompatible
with fossil, meaning that it would be much more difficult to get it back up
and running.
The networking-related functionality is the part I personally don't need;
we're using the luv bindings
to libuv and I'm quite happy with that.

The way I explained my desire in that initial email is "everything you
can't remove without breaking
fossil". From what I gather there are some tasks which rely on the admin
interface, and those
SQL queries might need to end up in some kind of controller module to make
a durable API.

This also means you might be closer to done than you think!

I concur with Warren that the effort of a libfossil is best justified if it
becomes the core of fossil proper.

Keeping a libfossil in sync with an upstream fossil poses risks in both
directions. There are merges from
fossil core, which is an arbitrary amount of ongoing work. There's also
the real possibility that libfossil would
start innovating in ways that would cause compatibility drift.

Tasks like isolating those core intricate algorithms into well-documented
modules, where
errors and edge cases are handled where they occur, this can really pay
off. Merging and patch theory are
areas where real conceptual leaps are still happening.

This is one of the shortcomings of git: aspects of the data structure are
baked into the
filesystem-database hybrid, and aspects of the merge algorithm leak into
the data structure. Working with
a dynamic, relational, single-file database can only improve upon this.

The one area of fossil I've done enough reading into to feel comfortable in
my understanding is the
file format itself. There's an edge to the documentation and I'm kinda
peering over that edge slightly.

It has the necessary sort of forward compatibility baked into it. Following
a SQLite model of using merge_v1,
merge_v2, and so on, could really come in handy.

i would be thrilled to see someone implement a library for fossil, but
Post by Stephan Beal
anyone doing so needs to understand, in advance, that it's a large
undertaking.
I'm happy to sign contribution agreements and otherwise smooth the way to
collaborating on this.

I am already a project lead on a daunting number of repos; taking point
wouldn't be possible, I'm also
basically brand new here.

A short introduction is probably in order: I'm with Special Circumstances,
a nascent think tank. We're
contracting with the Defense Department, among other partners, to work on
open source software. Our initial
brief is concentrated on language-theoretic security; as we grow, hiring
people to work on this sort of project
is how we intend to do it.

Thanks again, Stephan. I'll be looking into those links, please don't feel
as though a back-and-forth on each
email is necessary, whatever is comfortable for you.

cheers,
-Sam
Stephan Beal
2018-06-17 11:50:32 UTC
Permalink
Post by Sam Putman
I'll be reading through the codebase and documentation, some initial
No pressure, but: i would _love_ to see someone pick up the torch and run
with it.

A bit of background: in Sept. 2011 i had the great pleasure of meeting
Richard in Munich (at which point i'd been active on the mailing list since
early 2008). He asked me what Fossil needed, to which i immediately
responded "a library". We quickly came to the conclusion that the effort
would be "herculean" (i believe was his (apt) description of it (or maybe
that adjective got applied on the mailing list later on)), so i responded
with my second choice: a JSON interface. (HTTP/JSON interfaces are, in
essence, shared libraries with call-time linking. Many of Fossil's features
simply aren't realistic for a JSON interface, but most are.) Richard
promptly agreed, and i spent the next few months building the JSON API
(using a then-recent JSON wiki project of mine as the "structural basis").

Anyway...
Post by Sam Putman
Post by Stephan Beal
Several aspects of fossil make it very tedious (but not difficult, per
1) it uses a great deal of global state. That's simple enough to factor
into a Context object, but...
An incremental refactoring of this into something more modular would
be a boon to maintenance and testing. Seems like a sleeping dog we can
let lie for now.
That's actually the easy part. The real effort comes in with error checking
and handling, especially in cases where an error may (in a library)
propagated from 3+ levels deep. The app will just exit() at that point, so
no thought has gone (nor needed to go) into error handling or propagation
(because there is no propagation - all errors are "immediate"). Not only
does the propagation at the error-triggering point need to be decided upon,
but how it will propagate arbitrarily far up the call stack. In the end,
libfossil went with a hybrid approach of returning non-0 (from a
well-defined enum of result codes) and the context object holds an Error
object which may (or may not, depending on context) hold more details about
the error.
Post by Sam Putman
2) it relies on a fail-fast-and-fail-loud allocator. Any allocation error
Post by Stephan Beal
will immediately (intentionally) crash the app. While that saves literally
half (sometimes more) of code/error checking any place where memory is
allocated (that's a lot of places), that pattern is unusable for libraries.
Granted, allocation errors are rare, but every single C call which
allocates has to check for failure or risk Undefined Behaviour. To simplify
the vast majority of the implementation, Fossil does this checking in a
single place and abort()s the app if an allocation fails.
Ok, this doesn't sound /ideal/ granted, but maybe not so bad either.
Because allocations fail so rarely (at least ostensibly), it's "not that
big of a deal", but the library-level implementation code "needs" (in my
somewhat-purist point of view) to check for allocation errors nonetheless.
App-level code is free to use a fail-fast allocator, and libfossil's
app-level code did, in fact, use one because it speeds up writing the
app-level code so much. Fossil does _lots_ of allocation, and does, in
fact, sometimes run out of memory. i've never seen it happen on my
machines, but i've seen several reports from users who try to store
multi-GB files in fossil and then wonder why it fails on their Raspberry
Pi. Fossil needs scads of memory. Certain parts of that "could"
hypothetically be optimized to only alloc what they need (e.g. the diff
generator could arguably stream its output), but (1) that would greatly
complicate those parts and (2) very possibly wouldn't result in a leaner
app. e.g. constructing version X of a file from its parent version and the
diff of the versions requires allocating memory for X, X's parent, and the
diff (it knows all of those sizes in advance). In the average case that's
just a bit over 2X memory for each such operation, and fossil regularly has
to perform such an operation during many different types of activities.
Post by Sam Putman
I would likely prefer as much allocation as possible during load. An
allocation error during this stage
is a show-stopper.
Because fossil can be used in several discrete ways (e.g. within a
checkout, with (only) a repo, and with neither checkout nor repo (for a
limited subset of operations)), it's impossible to supply a single init
operation. An app needs to tell fossil to init into some specific mode of
operation, and the API "should" allow the user to toss that away and
re-init with a different mode (but that's kind of a free feature when you
create the API as library-centric).
Post by Sam Putman
3) Fossil effectively uses exit() to handle just about any type of
Post by Stephan Beal
non-allocation error. i.e. there's little library-friendly error handling
in fossil.
I guess this bullet depends on how much error handling is possible at
those points, and how badly
failures would bork the global state.
In fossil proper there are very few (if any - i can't recall any at the
moment) places where it's considered feasible to even attempt recovery. It
either succeeds completely or fails right in the middle of what it was
doing (which might leave stale files laying around, but it won't break the
DBs, thanks to transactions (fossil adds pseudo-recursive transactions to
sqlite, btw, which greatly simplifies certain types of db operations)).
Post by Sam Putman
If the answer is "none" and "not a bit" then turning some of these exit()s
into a library error would be plenty.
That requires, though (as touched on above), rewriting all of the
interfaces to allow such a propagation. That's a significant part of a
library port.

4) Last but not least: Fossil implements a great many intricate algorithms
Post by Sam Putman
Post by Stephan Beal
which, if not ported 100% perfectly, could lead to all sorts of Grief, some
of it difficult to track down. Such ports typically require 2x as much
code, sometimes more, because of the addition of error checking and
handling (as opposed to using abort() and exit()).
The networking-related functionality is the part I personally don't need;
we're using the luv bindings
to libuv and I'm quite happy with that.
Networking was slated for last - the underlying streaming interfaces were
in place (off of which networking resp. remote communication could be
added), but there were, at the time, no concrete plans to implement those
particular features.

The way I explained my desire in that initial email is "everything you
Post by Sam Putman
can't remove without breaking
fossil". From what I gather there are some tasks which rely on the admin
interface, and those
SQL queries might need to end up in some kind of controller module to make
a durable API.
This also means you might be closer to done than you think!
i got it pretty far along, but the SHA-related changes in 2017(?) made
libfossil immediately incompatible with newer repos, which means that
getting it back up and running would take some effort (for which i have no
estimate, and can't get one without spending more time in the code than is
remotely good for my hands).
Post by Sam Putman
I concur with Warren that the effort of a libfossil is best justified if
it becomes the core of fossil proper.
Absolutely 100%, but it's essentially impossible to back-port it into
fossil proper without some massive upheaval. Since fossil lies at the heart
of the sqlite project, there's not (in my somewhat conservatively cautious
view) much room for such severe upheaval. A 3rd-party implementation is
interesting in and of itself, but it would also potentially be a point of
contention, as you say...

Keeping a libfossil in sync with an upstream fossil poses risks in both
Post by Sam Putman
directions. There are merges from
fossil core, which is an arbitrary amount of ongoing work. There's also
the real possibility that libfossil would
start innovating in ways that would cause compatibility drift.
There were _never_ any plans to innovate libfossil in terms of the SCM
features. The only "incompatible" thing libfossil ever did was allowed a
repo to be completely empty (no initial checkin). Fossil "seeds" new repos
with an empty checkin because that means it never has to deal with a
non-positive artifact ID, but that's not strictly a requirement of the
model, just an implementation convenience. (IIRC, Jan found and fixed all
such assertions in fossil at the time, but more may have snuck back in
since then.)

"Feature drift" was a genuine concern which (at the time) i hand-waved away
with 2 justifications: 1) i am/was active in Fossil, so my visibility into
fossil was high, i.e. it was unlikely that i'd forget to port some
important fix/feature. 2) it's actually extremely rare that the core
algorithms get (or needed to be) touched - they've been in place for many
years and are low-maintenance.

Thought point (2) still stands, obviously point (1) was overly-optimistic -
i would have bet more on being hit by a bus than having both of my elbow
nerves go on strike for so long.

Tasks like isolating those core intricate algorithms into well-documented
Post by Sam Putman
modules, where
errors and edge cases are handled where they occur, this can really pay
off.
That's all in libfossil.
Post by Sam Putman
Merging and patch theory are
areas where real conceptual leaps are still happening.
libfossil has all of fossil's diff algorithms but i don't think i ever
ported the full merge support (it can apply deltas but i don't recall
porting the type of merging decisions which are made during, e.g., a
checkout). Speaking of merging: that's often an interactive process, and
interactivity is difficult to define in a UI-ignorant library.
Post by Sam Putman
The one area of fossil I've done enough reading into to feel comfortable
in my understanding is the
file format itself. There's an edge to the documentation and I'm kinda
peering over that edge slightly.
The "artifact format" documentation is really Fossil's heart. All of the
other parts are implementation details for supporting that. Nonetheless,
any port will certainly want to take advantage of as many of those details
as possible (much of fossil's "heavy lifting" is done with sqlite, and
reimplementing many of those pieces without sqlite would be a massive
undertaking).

i would be thrilled to see someone implement a library for fossil, but
Post by Sam Putman
Post by Stephan Beal
anyone doing so needs to understand, in advance, that it's a large
undertaking.
I'm happy to sign contribution agreements and otherwise smooth the way to
collaborating on this.
None are needed if you just want to access to libfossil (initially they
were, but that requirement was later dropped). If you'll send me your
preferred user name off-list i'll get it set up.

Thanks again, Stephan. I'll be looking into those links, please don't feel
Post by Sam Putman
as though a back-and-forth on each
email is necessary, whatever is comfortable for you.
My hands have "good days" and "bad days", but today's relatively good. In
any case, every now and then i have to sit down and type for a while just
to see if my hands can take it.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
David Mason
2018-06-17 13:39:43 UTC
Permalink
Just had a quick thought that might make the conversion to library much
easier.

If you have a relatively small API interface, each of the API functions
could do a setjmp https://en.wikipedia.org/wiki/Setjmp.h and then the fatal
error routines could longjmp back. This would give you API safety, at very
limited code intervention. And if a flag got set by the API functions, then
the fatal routines could check, so that fossil the program would need no
changes as only the API functions would change the default fail-hard
behaviour.

But perhaps the API would be too big to make this a win.

Just a thought.

../Dave
Post by Stephan Beal
Post by Sam Putman
I'll be reading through the codebase and documentation, some initial
No pressure, but: i would _love_ to see someone pick up the torch and run
with it.
A bit of background: in Sept. 2011 i had the great pleasure of meeting
Richard in Munich (at which point i'd been active on the mailing list since
early 2008). He asked me what Fossil needed, to which i immediately
responded "a library". We quickly came to the conclusion that the effort
would be "herculean" (i believe was his (apt) description of it (or maybe
that adjective got applied on the mailing list later on)), so i responded
with my second choice: a JSON interface. (HTTP/JSON interfaces are, in
essence, shared libraries with call-time linking. Many of Fossil's features
simply aren't realistic for a JSON interface, but most are.) Richard
promptly agreed, and i spent the next few months building the JSON API
(using a then-recent JSON wiki project of mine as the "structural basis").
Anyway...
Post by Sam Putman
Post by Stephan Beal
Several aspects of fossil make it very tedious (but not difficult, per
1) it uses a great deal of global state. That's simple enough to factor
into a Context object, but...
An incremental refactoring of this into something more modular would
be a boon to maintenance and testing. Seems like a sleeping dog we can
let lie for now.
That's actually the easy part. The real effort comes in with error
checking and handling, especially in cases where an error may (in a
library) propagated from 3+ levels deep. The app will just exit() at that
point, so no thought has gone (nor needed to go) into error handling or
propagation (because there is no propagation - all errors are "immediate").
Not only does the propagation at the error-triggering point need to be
decided upon, but how it will propagate arbitrarily far up the call stack.
In the end, libfossil went with a hybrid approach of returning non-0 (from
a well-defined enum of result codes) and the context object holds an Error
object which may (or may not, depending on context) hold more details about
the error.
Post by Sam Putman
2) it relies on a fail-fast-and-fail-loud allocator. Any allocation error
Post by Stephan Beal
will immediately (intentionally) crash the app. While that saves literally
half (sometimes more) of code/error checking any place where memory is
allocated (that's a lot of places), that pattern is unusable for libraries.
Granted, allocation errors are rare, but every single C call which
allocates has to check for failure or risk Undefined Behaviour. To simplify
the vast majority of the implementation, Fossil does this checking in a
single place and abort()s the app if an allocation fails.
Ok, this doesn't sound /ideal/ granted, but maybe not so bad either.
Because allocations fail so rarely (at least ostensibly), it's "not that
big of a deal", but the library-level implementation code "needs" (in my
somewhat-purist point of view) to check for allocation errors nonetheless.
App-level code is free to use a fail-fast allocator, and libfossil's
app-level code did, in fact, use one because it speeds up writing the
app-level code so much. Fossil does _lots_ of allocation, and does, in
fact, sometimes run out of memory. i've never seen it happen on my
machines, but i've seen several reports from users who try to store
multi-GB files in fossil and then wonder why it fails on their Raspberry
Pi. Fossil needs scads of memory. Certain parts of that "could"
hypothetically be optimized to only alloc what they need (e.g. the diff
generator could arguably stream its output), but (1) that would greatly
complicate those parts and (2) very possibly wouldn't result in a leaner
app. e.g. constructing version X of a file from its parent version and the
diff of the versions requires allocating memory for X, X's parent, and the
diff (it knows all of those sizes in advance). In the average case that's
just a bit over 2X memory for each such operation, and fossil regularly has
to perform such an operation during many different types of activities.
Post by Sam Putman
I would likely prefer as much allocation as possible during load. An
allocation error during this stage
is a show-stopper.
Because fossil can be used in several discrete ways (e.g. within a
checkout, with (only) a repo, and with neither checkout nor repo (for a
limited subset of operations)), it's impossible to supply a single init
operation. An app needs to tell fossil to init into some specific mode of
operation, and the API "should" allow the user to toss that away and
re-init with a different mode (but that's kind of a free feature when you
create the API as library-centric).
Post by Sam Putman
3) Fossil effectively uses exit() to handle just about any type of
Post by Stephan Beal
non-allocation error. i.e. there's little library-friendly error handling
in fossil.
I guess this bullet depends on how much error handling is possible at
those points, and how badly
failures would bork the global state.
In fossil proper there are very few (if any - i can't recall any at the
moment) places where it's considered feasible to even attempt recovery. It
either succeeds completely or fails right in the middle of what it was
doing (which might leave stale files laying around, but it won't break the
DBs, thanks to transactions (fossil adds pseudo-recursive transactions to
sqlite, btw, which greatly simplifies certain types of db operations)).
Post by Sam Putman
If the answer is "none" and "not a bit" then turning some of these
exit()s into a library error would be plenty.
That requires, though (as touched on above), rewriting all of the
interfaces to allow such a propagation. That's a significant part of a
library port.
4) Last but not least: Fossil implements a great many intricate algorithms
Post by Sam Putman
Post by Stephan Beal
which, if not ported 100% perfectly, could lead to all sorts of Grief, some
of it difficult to track down. Such ports typically require 2x as much
code, sometimes more, because of the addition of error checking and
handling (as opposed to using abort() and exit()).
The networking-related functionality is the part I personally don't need;
we're using the luv bindings
to libuv and I'm quite happy with that.
Networking was slated for last - the underlying streaming interfaces were
in place (off of which networking resp. remote communication could be
added), but there were, at the time, no concrete plans to implement those
particular features.
The way I explained my desire in that initial email is "everything you
Post by Sam Putman
can't remove without breaking
fossil". From what I gather there are some tasks which rely on the admin
interface, and those
SQL queries might need to end up in some kind of controller module to
make a durable API.
This also means you might be closer to done than you think!
i got it pretty far along, but the SHA-related changes in 2017(?) made
libfossil immediately incompatible with newer repos, which means that
getting it back up and running would take some effort (for which i have no
estimate, and can't get one without spending more time in the code than is
remotely good for my hands).
Post by Sam Putman
I concur with Warren that the effort of a libfossil is best justified if
it becomes the core of fossil proper.
Absolutely 100%, but it's essentially impossible to back-port it into
fossil proper without some massive upheaval. Since fossil lies at the heart
of the sqlite project, there's not (in my somewhat conservatively cautious
view) much room for such severe upheaval. A 3rd-party implementation is
interesting in and of itself, but it would also potentially be a point of
contention, as you say...
Keeping a libfossil in sync with an upstream fossil poses risks in both
Post by Sam Putman
directions. There are merges from
fossil core, which is an arbitrary amount of ongoing work. There's also
the real possibility that libfossil would
start innovating in ways that would cause compatibility drift.
There were _never_ any plans to innovate libfossil in terms of the SCM
features. The only "incompatible" thing libfossil ever did was allowed a
repo to be completely empty (no initial checkin). Fossil "seeds" new repos
with an empty checkin because that means it never has to deal with a
non-positive artifact ID, but that's not strictly a requirement of the
model, just an implementation convenience. (IIRC, Jan found and fixed all
such assertions in fossil at the time, but more may have snuck back in
since then.)
"Feature drift" was a genuine concern which (at the time) i hand-waved
away with 2 justifications: 1) i am/was active in Fossil, so my visibility
into fossil was high, i.e. it was unlikely that i'd forget to port some
important fix/feature. 2) it's actually extremely rare that the core
algorithms get (or needed to be) touched - they've been in place for many
years and are low-maintenance.
Thought point (2) still stands, obviously point (1) was overly-optimistic
- i would have bet more on being hit by a bus than having both of my elbow
nerves go on strike for so long.
Tasks like isolating those core intricate algorithms into well-documented
Post by Sam Putman
modules, where
errors and edge cases are handled where they occur, this can really pay
off.
That's all in libfossil.
Post by Sam Putman
Merging and patch theory are
areas where real conceptual leaps are still happening.
libfossil has all of fossil's diff algorithms but i don't think i ever
ported the full merge support (it can apply deltas but i don't recall
porting the type of merging decisions which are made during, e.g., a
checkout). Speaking of merging: that's often an interactive process, and
interactivity is difficult to define in a UI-ignorant library.
Post by Sam Putman
The one area of fossil I've done enough reading into to feel comfortable
in my understanding is the
file format itself. There's an edge to the documentation and I'm kinda
peering over that edge slightly.
The "artifact format" documentation is really Fossil's heart. All of the
other parts are implementation details for supporting that. Nonetheless,
any port will certainly want to take advantage of as many of those details
as possible (much of fossil's "heavy lifting" is done with sqlite, and
reimplementing many of those pieces without sqlite would be a massive
undertaking).
i would be thrilled to see someone implement a library for fossil, but
Post by Sam Putman
Post by Stephan Beal
anyone doing so needs to understand, in advance, that it's a large
undertaking.
I'm happy to sign contribution agreements and otherwise smooth the way to
collaborating on this.
None are needed if you just want to access to libfossil (initially they
were, but that requirement was later dropped). If you'll send me your
preferred user name off-list i'll get it set up.
Thanks again, Stephan. I'll be looking into those links, please don't
Post by Sam Putman
feel as though a back-and-forth on each
email is necessary, whatever is comfortable for you.
My hands have "good days" and "bad days", but today's relatively good. In
any case, every now and then i have to sit down and type for a while just
to see if my hands can take it.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Sam Putman
2018-06-17 21:11:04 UTC
Permalink
Post by David Mason
Just had a quick thought that might make the conversion to library much
easier.
If you have a relatively small API interface, each of the API functions
could do a setjmp https://en.wikipedia.org/wiki/Setjmp.h and then the
fatal error routines could longjmp back. This would give you API safety, at
very limited code intervention. And if a flag got set by the API functions,
then the fatal routines could check, so that fossil the program would need
no changes as only the API functions would change the default fail-hard
behaviour.
But perhaps the API would be too big to make this a win.
Just a thought.
../Dave
This is the kind of approach I glossed over as a "goto cleanup", so we're on
the same track here.

I haven't had a chance to go over some of the core C files in libfossil
yet,
curious to what degree it follows this pattern already.

Stephan has indicated that allocation is necessarily somewhat more
fine-grained
than this, and elsewhere in the documentation, exceptions are mentioned...

-Sam
Stephan Beal
2018-06-18 11:54:14 UTC
Permalink
Post by Sam Putman
Post by David Mason
Just had a quick thought that might make the conversion to library much
easier.
If you have a relatively small API interface, each of the API functions
could do a setjmp https://en.wikipedia.org/wiki/Setjmp.h
This is the kind of approach I glossed over as a "goto cleanup", so we're on
Post by Sam Putman
the same track here.
I haven't had a chance to go over some of the core C files in libfossil
yet,
curious to what degree it follows this pattern already.
i have to admit that you lost me at setjmp. There are certain C APIs which
i won't touch unless absolutely forced to, and setjmp/longjmp belong to
that category. gotos are widely used in libfossil to simplify error
handling/deallocation within a given function.

In libfossil, all error state is propagated as an integer, with some cases
providing additional information in an Error object owned by the Context
object (each Context manages, at most, one opened repo instance). The API
docs describe, where relevant, which result codes must be considered
fatal/unrecoverable (allocation error being the primary case). An example
of propagating more information is SQL query preparation failure - the
error string from sqlite would be propagated back up via the Context's
Error object. An allocation error, on the other hand, is simply returned as
the enum entry FSL_RC_OOM, as we can't provide more information for that
case without more allocation (which would presumably fail).
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Sam Putman
2018-06-18 17:23:48 UTC
Permalink
Post by Sam Putman
Post by Sam Putman
Post by David Mason
Just had a quick thought that might make the conversion to library much
easier.
If you have a relatively small API interface, each of the API functions
could do a setjmp https://en.wikipedia.org/wiki/Setjmp.h
This is the kind of approach I glossed over as a "goto cleanup", so we're
Post by Sam Putman
on
the same track here.
I haven't had a chance to go over some of the core C files in libfossil
yet,
curious to what degree it follows this pattern already.
i have to admit that you lost me at setjmp. There are certain C APIs which
i won't touch unless absolutely forced to, and setjmp/longjmp belong to
that category. gotos are widely used in libfossil to simplify error
handling/deallocation within a given function.
setjmp/longmp are of course weapons of awesome power. In this context,
just a way of goto jumping farther than
goto alone allows for.

Never use them when a simple goto will do, and it sounds like it will. I
think the proposal was to replace every crash
with the same longjmp, to simplify porting the code, but you've already
taken the time to do it right.

In libfossil, all error state is propagated as an integer, with some cases
Post by Sam Putman
providing additional information in an Error object owned by the Context
object (each Context manages, at most, one opened repo instance). The API
docs describe, where relevant, which result codes must be considered
fatal/unrecoverable (allocation error being the primary case). An example
of propagating more information is SQL query preparation failure - the
error string from sqlite would be propagated back up via the Context's
Error object. An allocation error, on the other hand, is simply returned as
the enum entry FSL_RC_OOM, as we can't provide more information for that
case without more allocation (which would presumably fail).
Excellent, no surprises here.
Sam Putman
2018-06-17 21:07:29 UTC
Permalink
Post by Stephan Beal
Post by Sam Putman
An incremental refactoring of this into something more modular would
be a boon to maintenance and testing. Seems like a sleeping dog we can
let lie for now.
That's actually the easy part. The real effort comes in with error
checking and handling, especially in cases where an error may (in a
library) propagated from 3+ levels deep. The app will just exit() at that
point, so no thought has gone (nor needed to go) into error handling or
propagation (because there is no propagation - all errors are "immediate").
Not only does the propagation at the error-triggering point need to be
decided upon, but how it will propagate arbitrarily far up the call stack.
In the end, libfossil went with a hybrid approach of returning non-0 (from
a well-defined enum of result codes) and the context object holds an Error
object which may (or may not, depending on context) hold more details about
the error.
About those objects...

What your docs call fossil(1) is written in plain C. Or rather C, Tcl, SQL
and TH1, if you prefer.
In any case, not C++.

We have fairly strict requirements for the languages we include in our own
runtime, because
we intend to parse both the syntax and the static semantics of the
languages we use.

That makes C++ a non starter, as I would imagine it is for including a
libfossil as the core of
fossil(1). Even for my own needs, the C89-99 family hits the sweet spot;
I'm tolerant of ANSIsms
in old code but try not to emit them personally.
Post by Stephan Beal
Post by Sam Putman
I concur with Warren that the effort of a libfossil is best justified if
it becomes the core of fossil proper.
Absolutely 100%, but it's essentially impossible to back-port it into
fossil proper without some massive upheaval. Since fossil lies at the heart
of the sqlite project, there's not (in my somewhat conservatively cautious
view) much room for such severe upheaval. A 3rd-party implementation is
interesting in and of itself, but it would also potentially be a point of
contention, as you say...
I'm still pretty convinced the work on libfossil won't go to waste.

The excellent documentation alone has advanced my understanding
considerably.

What might make sense is a sort of 'parallel construction'. Nice thing
about a revision control
system, it's got all the revisions.

So to write a clean C libfossil, we can start with the first commit and
follow the breadcrumbs.
Post by Stephan Beal
Tasks like isolating those core intricate algorithms into well-documented
Post by Sam Putman
modules, where
errors and edge cases are handled where they occur, this can really pay
off.
That's all in libfossil.
Post by Sam Putman
Merging and patch theory are
areas where real conceptual leaps are still happening.
libfossil has all of fossil's diff algorithms but i don't think i ever
ported the full merge support (it can apply deltas but i don't recall
porting the type of merging decisions which are made during, e.g., a
checkout). Speaking of merging: that's often an interactive process, and
interactivity is difficult to define in a UI-ignorant library.
This is a legitimately hard problem. Pijul's approach of having a conflict
object instead of
spraying the merge conflict all over the source file has potential.

I don't think it's strongly tied to the patch-centric model, it's a
(relatively) simple matter of
representing possible outcomes as distinct states. There might be some
potentially
exponential bad behavior, we are talking about permutations after all.

I'm way ahead of myself there; this is just to say that we can most likely
do better than
pushing a diff into the source code, while still providing a library API.
Post by Stephan Beal
The one area of fossil I've done enough reading into to feel comfortable
Post by Sam Putman
in my understanding is the
file format itself. There's an edge to the documentation and I'm kinda
peering over that edge slightly.
The "artifact format" documentation is really Fossil's heart. All of the
other parts are implementation details for supporting that. Nonetheless,
any port will certainly want to take advantage of as many of those details
as possible (much of fossil's "heavy lifting" is done with sqlite, and
reimplementing many of those pieces without sqlite would be a massive
undertaking).
SQLite is a core library for bridgetools. I took to heart the "SQLite as
alternative to fopen()" slogan, one
of the best architectural decisions I've made.

That's why we're using fossil as our DVCS, no matter what happens with this
libfossil proposal.

We are / will be using SQLite for dependency management, bundling, and
application file formats already,
and even if we have to shell out to fossil(1) we're ahead of the game using
a compatible format for
our repos.

But those intricate algorithms for deduplication, hash chaining, and
merging, those would come
in handy across the board.

A bit about drift: it's a natural outcome of parallel codebases, even with
something like a common
standard. Without that, it's guaranteed, unless one of the forks doesn't
get used.
Stephan Beal
2018-06-18 11:43:33 UTC
Permalink
Post by Sam Putman
About those objects...
What your docs call fossil(1) is written in plain C. Or rather C, Tcl, SQL
and TH1, if you prefer.
In any case, not C++.
liubfossil is 100% C89 except that it requires "long long" because sqlite3
requires it. long long is not strictly C89 but all compilers supports it.
The C++ code in the tree is simply optional high-level wrappers, intended
primarily to allow me to "exercise" the core API, to make sure that it
would be useful in other contexts.
Post by Sam Putman
I'm still pretty convinced the work on libfossil won't go to waste.
The excellent documentation alone has advanced my understanding
considerably.
:)
Post by Sam Putman
What might make sense is a sort of 'parallel construction'. Nice thing
about a revision control
system, it's got all the revisions.
So to write a clean C libfossil, we can start with the first commit and
follow the breadcrumbs.
The only "problem" with that (for a given definition of "problem") is that
i'm rather chaotic in terms of how i work on code: there's little rhyme or
reason, nor any specific ordering to commits or (for the most part)
features. You won't find any reasonable order to the timeline. The
implementation required, of course, certain features before others, though,
so... there is that.
Post by Sam Putman
I don't think it's strongly tied to the patch-centric model, it's a
(relatively) simple matter of
representing possible outcomes as distinct states. There might be some
potentially
exponential bad behavior, we are talking about permutations after all.
It might even be feasible to store each such variation in the 'stash' table
(llibfossil never got far enough to implement the 'stash' or 'undo' parts,
as both depend on the merge process, which was the final "big/scary" hurdle
left to port).
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Sam Putman
2018-06-18 17:18:03 UTC
Permalink
Post by Stephan Beal
Post by Sam Putman
About those objects...
What your docs call fossil(1) is written in plain C. Or rather C, Tcl,
SQL and TH1, if you prefer.
In any case, not C++.
liubfossil is 100% C89 except that it requires "long long" because sqlite3
requires it. long long is not strictly C89 but all compilers supports it.
The C++ code in the tree is simply optional high-level wrappers, intended
primarily to allow me to "exercise" the core API, to make sure that it
would be useful in other contexts.
Oh that's good news. A C++ wrapper is even a useful template for FFI
bindings.
Post by Stephan Beal
I'm still pretty convinced the work on libfossil won't go to waste.
Post by Sam Putman
The excellent documentation alone has advanced my understanding
considerably.
:)
Post by Sam Putman
What might make sense is a sort of 'parallel construction'. Nice thing
about a revision control
system, it's got all the revisions.
So to write a clean C libfossil, we can start with the first commit and
follow the breadcrumbs.
The only "problem" with that (for a given definition of "problem") is that
i'm rather chaotic in terms of how i work on code: there's little rhyme or
reason, nor any specific ordering to commits or (for the most part)
features. You won't find any reasonable order to the timeline. The
implementation required, of course, certain features before others, though,
so... there is that.
It's never fun to take a forensic approach to a codebase.

This was all predicated on some references to a Context object, both in the
thread and the docs.

A Capitalized pure-C struct being referred to as an object is not unheard
of! But it did lead me down
the wrong path.
Post by Stephan Beal
I don't think it's strongly tied to the patch-centric model, it's a
Post by Sam Putman
(relatively) simple matter of
representing possible outcomes as distinct states. There might be some
potentially
exponential bad behavior, we are talking about permutations after all.
It might even be feasible to store each such variation in the 'stash'
table (llibfossil never got far enough to implement the 'stash' or 'undo'
parts, as both depend on the merge process, which was the final "big/scary"
hurdle left to port).
Looking like all I've got is a small chicken-and-egg problem.

Needing to port a few more modules over is a-ok.

Trouble is the repos I want to work with are c. 2018 (or ported in 2018
from older git), so they'd
be using the new hash. I could start linking in libfossil and poke around
an older repository,
but that breaks the feedback loop. Big difference between playing around
with a library and
dogfooding it.

I got the sense from the docs that the hash is using the SQLite style
versioned API, so it follows that
the old hash code is sitting where it needs to.

Does this amount to following the style of that file for another similar
file in fossil(1)?

That's a bite small enough I might be able to chew it.
Stephan Beal
2018-06-19 17:20:51 UTC
Permalink
Post by Sam Putman
A Capitalized pure-C struct being referred to as an object is not unheard
of! But it did lead me down
the wrong path.
Here's my little contribution to spreading the word about OO in C:

http://www.wanderinghorse.net/computing/papers/DoingOOInC.pdf

:)
Post by Sam Putman
I got the sense from the docs that the hash is using the SQLite style
versioned API, so it follows that
the old hash code is sitting where it needs to.
Does this amount to following the style of that file for another similar
file in fossil(1)?
i'm not clear what you mean :|.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Sam Putman
2018-06-19 18:13:08 UTC
Permalink
Post by Stephan Beal
Post by Sam Putman
A Capitalized pure-C struct being referred to as an object is not unheard
of! But it did lead me down
the wrong path.
http://www.wanderinghorse.net/computing/papers/DoingOOInC.pdf
:)
Looks like I've got some evening reading ahead of me!
Post by Stephan Beal
Post by Sam Putman
I got the sense from the docs that the hash is using the SQLite style
versioned API, so it follows that
the old hash code is sitting where it needs to.
Does this amount to following the style of that file for another similar
file in fossil(1)?
i'm not clear what you mean :|.
I'm sure that could have been more clear.

Did some poking around between fossil and libfossil, it looks like they
both have a sha-1.c,
the difference starts with the copyright, 2006 vs 2013. A cursory scan
suggests that the ~200
line difference is more tests, and some framework code that presumably
libfossil doesn't need.

fossil also has sha1-hard.c (let's ignore this for now?) and sha3.c.

I haven't yet found where in libfossil sha-1.c is called, and what the
substantive differences
are between the two. What I'm wondering is, can the wrapper for sha-1.c be
rewritten to also
wrap sha-3? Possibly sha1-hard as well, if it's on a critical path.

I know there's some wrinkles around how fossil picks a sha that allowed for
the transition, I'm
content with being able to wield those sha functions in a fossil context at
a fairly low level, for now.
Stephan Beal
2018-06-19 18:28:27 UTC
Permalink
Post by Sam Putman
Post by Stephan Beal
Post by Sam Putman
I got the sense from the docs that the hash is using the SQLite style
versioned API, so it follows that
the old hash code is sitting where it needs to.
Does this amount to following the style of that file for another similar
file in fossil(1)?
i'm not clear what you mean :|.
I'm sure that could have been more clear.
Did some poking around between fossil and libfossil, it looks like they
both have a sha-1.c,
the difference starts with the copyright, 2006 vs 2013. A cursory scan
suggests that the ~200
line difference is more tests, and some framework code that presumably
libfossil doesn't need.
fossil also has sha1-hard.c (let's ignore this for now?) and sha3.c.
I haven't yet found where in libfossil sha-1.c is called, and what the
substantive differences
are between the two. What I'm wondering is, can the wrapper for sha-1.c
be rewritten to also
wrap sha-3? Possibly sha1-hard as well, if it's on a critical path.
A few of the utility classes (most notably sha1 and md5) were originally
copied over 1-to-1 and renamed to match the libfossil project conventions.
sha1-hard and sha-3 came along after my RSI fallout, and are not included
in libfossil. i have _no_ idea what the differences are between sha1 and
sha1-hard, so can't comment on those. The buffer sizes differ between sha1
and sha-3, so i'm not sure whether those two could be reasonably/cleanly
combined. i have to resist the temptation to go poking around in the code
rabbit hole, as that almost invariably leads to days of hand pains :(.
(Software development was always like a drug to me, and i am very much a
recovering addict.)

I know there's some wrinkles around how fossil picks a sha that allowed for
Post by Sam Putman
the transition, I'm
content with being able to wield those sha functions in a fossil context
at a fairly low level, for now.
It "should" be trivial to port the core sha1-hard and sha-3 to libfossil -
porting of sha1 and md5 was literally a copy/paste/rename job. However, the
assumption that SHA1 is "the" hash is "strongly embedded" in many places in
libfossil (md5 is only used in the manifest files and its usage does not
need to be modified/extended). It "should" be simple to find all such
places by grepping for FSL_UUID_STRLEN (defined in
include/fossil-scm/fossil-hash.h), and porting all such places to support
variable hashes is, AFAIK, the only critical piece needed for making
libfossil compatible (again) with fossil(1). If that hurdle can be
surpassed, the rest is "easy" (even the merging - it simply needs to be
ported over from fossil, adapting the API to a library interface along the
way).
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Sam Putman
2018-06-19 19:08:44 UTC
Permalink
Post by Stephan Beal
Post by Sam Putman
Post by Stephan Beal
Post by Sam Putman
I got the sense from the docs that the hash is using the SQLite style
versioned API, so it follows that
the old hash code is sitting where it needs to.
Does this amount to following the style of that file for another
similar file in fossil(1)?
i'm not clear what you mean :|.
I'm sure that could have been more clear.
Did some poking around between fossil and libfossil, it looks like they
both have a sha-1.c,
the difference starts with the copyright, 2006 vs 2013. A cursory scan
suggests that the ~200
line difference is more tests, and some framework code that presumably
libfossil doesn't need.
fossil also has sha1-hard.c (let's ignore this for now?) and sha3.c.
I haven't yet found where in libfossil sha-1.c is called, and what the
substantive differences
are between the two. What I'm wondering is, can the wrapper for sha-1.c
be rewritten to also
wrap sha-3? Possibly sha1-hard as well, if it's on a critical path.
A few of the utility classes (most notably sha1 and md5) were originally
copied over 1-to-1 and renamed to match the libfossil project conventions.
sha1-hard and sha-3 came along after my RSI fallout, and are not included
in libfossil. i have _no_ idea what the differences are between sha1 and
sha1-hard, so can't comment on those. The buffer sizes differ between sha1
and sha-3, so i'm not sure whether those two could be reasonably/cleanly
combined. i have to resist the temptation to go poking around in the code
rabbit hole, as that almost invariably leads to days of hand pains :(.
(Software development was always like a drug to me, and i am very much a
recovering addict.)
I'm fairly sure the various hashes need to operate separately.

It also sounds like the fossil(1) side of this is a matter of updating
libfossil with the latest versions.
Post by Stephan Beal
I know there's some wrinkles around how fossil picks a sha that allowed
Post by Sam Putman
for the transition, I'm
content with being able to wield those sha functions in a fossil context
at a fairly low level, for now.
It "should" be trivial to port the core sha1-hard and sha-3 to libfossil -
porting of sha1 and md5 was literally a copy/paste/rename job. However, the
assumption that SHA1 is "the" hash is "strongly embedded" in many places in
libfossil (md5 is only used in the manifest files and its usage does not
need to be modified/extended). It "should" be simple to find all such
places by grepping for FSL_UUID_STRLEN (defined in
include/fossil-scm/fossil-hash.h), and porting all such places to support
variable hashes is, AFAIK, the only critical piece needed for making
libfossil compatible (again) with fossil(1). If that hurdle can be
surpassed, the rest is "easy" (even the merging - it simply needs to be
ported over from fossil, adapting the API to a library interface along the
way).
That kind of breadcrumb (FSL_UUID_STRLEN) is really helpful, thanks.

Don't know when or even if I might, but I'm warming to the idea of trying
it.
Richard Hipp
2018-06-19 19:46:03 UTC
Permalink
Post by Stephan Beal
i have _no_ idea what the differences are between sha1 and
sha1-hard,
SHA1-hard is a modified SHA1 algorithm that is resistant to the
SHAttered attack (https://shattered.io/) against SHA1 that came out
about a year ago. SHA1-hard generates the same hashes as SHA1, except
in the extremely rare cases where the hash is vulnerable to SHAttered.
SHA1-hash works by detecting cases where the hash seems to be
exploiting weaknesses in the SHA1 compression function and then it
makes the hash "safe" by increasing the number of rounds in those rare
cases.

I converted from using SHA1 to SHA1-hard within about a day of
SHAttered being announced. Git also has converted, but it took them
months. I also added SHA3 support at the same time. Git is still
SHA1-only, the last time I checked.

The SHA1-hard code was stolen from
https://github.com/cr-marcstevens/sha1collisiondetection. The only
changes I made were to clean it up a little and convert it into a
single-file implementation so that it was easier to import into the
Fossil source tree.
--
D. Richard Hipp
***@sqlite.org
Alek Paunov
2018-06-18 17:42:24 UTC
Permalink
Post by Sam Putman
But those intricate algorithms for deduplication, hash chaining, and
merging, those would come
in handy across the board.
A bit about drift: it's a natural outcome of parallel codebases, even with
something like a common
standard. Without that, it's guaranteed, unless one of the forks doesn't
get used.
Just a thought (probably stupid, since I haven't started to study fossil
and libfossil codebases yet):

Is it possible and feasible (i.e. will it have serious negative impact
on performance and resources usage) if fossil internal representations
and algorithms gradually be ported to collection, UDFs, VTables and
sqlite memdb tables where needed?

If the above is possible, both fossil(1) and libfossil core layers could
be written in SQL using the same sqlite extensions (eventually sharing
big portions of the code).

Kind Regards,
Alek
Alek Paunov
2018-06-18 17:44:40 UTC
Permalink
Sorry: s/collection, UDFs/collection of UDFs/
Richard Hipp
2018-06-24 19:34:40 UTC
Permalink
Post by Stephan Beal
3) Fossil effectively uses exit() to handle just about any type of
non-allocation error. i.e. there's little library-friendly error handling
in fossil.
Not just errors. If Fossil finds an opportunity to send a "304 Not
Modified" reply, it does so and then immediately calls exit(0),
without having to unwind the stack.
--
D. Richard Hipp
***@sqlite.org
David Mason
2018-06-24 21:19:55 UTC
Permalink
I really don't understand the reticence to use setjmp/longjmp to turn all
of these short-cut exits into library return-to-API trampolines. It would
allow you to retain all the existing fossil codebase. Rewriting the code
into library form is an interesting project, but it seems like a huge
amount of work and unless Richard is going to change fossil to use the API,
it is also going to be a huge ongoing maintenance nightmare and fraught
with opportunities for failure.

All that is required is to declare a global setjmp buffer, and then each
API function would do a setjmp before calling into the existing code... and
you ignore that code if the longjmp is called. Then you simply need to
close any files that were opened and change working-directory if it might
have been changed, then return from the API. Then in any of the places that
exit is called, add a test to see if you were called by an API function and
if so do the longjmp instead.

It's a few dozen lines of code (in addition to the actual API interface
code). Sure seems like worth the experiment to me!

../Dave
Post by Richard Hipp
Post by Stephan Beal
3) Fossil effectively uses exit() to handle just about any type of
non-allocation error. i.e. there's little library-friendly error handling
in fossil.
Not just errors. If Fossil finds an opportunity to send a "304 Not
Modified" reply, it does so and then immediately calls exit(0),
without having to unwind the stack.
--
D. Richard Hipp
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Stephan Beal
2018-06-24 21:48:47 UTC
Permalink
Post by David Mason
I really don't understand the reticence to use setjmp/longjmp to turn all
of these short-cut exits into library return-to-API trampolines. It
To be clear: that's my reticence, not Richard's. libfossil was always
effectively a third-party effort which had Richard's blessing.

My aversion to setjmp/longjmp is that they're effectively global gotos, and
gotos, except in very tightly-controlled circumstances, quickly produces
spaghetti messes.

would allow you to retain all the existing fossil codebase. Rewriting the
Post by David Mason
code into library form is an interesting project, but it seems like a huge
amount of work and unless Richard is going to change fossil to use the API,
it is also going to be a huge ongoing maintenance nightmare and fraught
with opportunities for failure.
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?

It's a few dozen lines of code (in addition to the actual API interface
Post by David Mason
code). Sure seems like worth the experiment to me!
i'll go make some popcorn :).

(To be fair: my [strong] aversion to that solution isn't intended to imply
that you can't pull it off. My journey with libfossil was always _at least_
as much about reimplementing it "cleanly" as it was about getting it
running at all. Without the former, the latter would have been, at best, a
hollow success.)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
David Mason
2018-06-25 12:18:05 UTC
Permalink
Actually, setjmp/longjmp isn't goto, it's try-catch/throw.

It unwinds the stack, so you can only (correctly) longjmp to a point that
is in the current call-chain. Of course, this is C, so you can screw it up,
but said screw-ups are not subtle! And subtle bugs are the ones to worry
about.

I count 35 direct uses of exit() (2 in fossil_panic, and one in
fossil_exit). I don't know why those 32 calls to exit() don't call
fossil_exit - perhaps they don't want the db closed... but if you replace
those 35 calls with a call to fossil_exit_now then there would be only 1
place that had to do the longjmp logic.

There is often value in re-implementing programs! All I really said was
that it was a mug's game unless fossil-the-program is going to adopt it,
because you'd be constantly tracking the changes of fossil-the-program in
the re-implemented libfossil code.

I unfortunately don't have the time at the moment, or I would certainly try
the experiment. I suspect your work on libfossil would be an excellent API
definition! Perhaps I can find a student I could interest in the project in
the fall.

../Dave
Post by Stephan Beal
Post by David Mason
I really don't understand the reticence to use setjmp/longjmp to turn all
of these short-cut exits into library return-to-API trampolines. It
To be clear: that's my reticence, not Richard's. libfossil was always
effectively a third-party effort which had Richard's blessing.
My aversion to setjmp/longjmp is that they're effectively global gotos,
and gotos, except in very tightly-controlled circumstances, quickly
produces spaghetti messes.
would allow you to retain all the existing fossil codebase. Rewriting the
Post by David Mason
code into library form is an interesting project, but it seems like a huge
amount of work and unless Richard is going to change fossil to use the API,
it is also going to be a huge ongoing maintenance nightmare and fraught
with opportunities for failure.
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?
It's a few dozen lines of code (in addition to the actual API interface
Post by David Mason
code). Sure seems like worth the experiment to me!
i'll go make some popcorn :).
(To be fair: my [strong] aversion to that solution isn't intended to imply
that you can't pull it off. My journey with libfossil was always _at least_
as much about reimplementing it "cleanly" as it was about getting it
running at all. Without the former, the latter would have been, at best, a
hollow success.)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Warren Young
2018-06-25 12:30:03 UTC
Permalink
Post by Stephan Beal
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?
#ifdef LIBFOSSIL
# define FOSSIL_EXIT(n) longjmp(blabla)
#else
# define FOSSIL_EXIT(n) exit(n)
#endif

$ sed -i -e 's/exit(/FOSSIL_EXIT(/g' src/*.c
Stephan Beal
2018-06-25 12:37:42 UTC
Permalink
Post by Warren Young
Post by Stephan Beal
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?
#ifdef LIBFOSSIL
# define FOSSIL_EXIT(n) longjmp(blabla)
#else
# define FOSSIL_EXIT(n) exit(n)
#endif
Yeah, i was exaggerating, but still... i think the required effort is being
underestimated by at least an order of magnitude. That said: i would
_absolutely love_ to be proven wrong.

$ sed -i -e 's/exit(/FOSSIL_EXIT(/g' src/*.c
i recommend a slight variation:

perl -i -pe 's/\bexit\(/FOSSIL_EXIT(/g' src/*.c

sed probably also has a \b (at-word-boundary) equivalent, but i'm not as
well-versed in that flavor of regex.

Sidebar: i once corrupted a fossil checkout db by using $(find . -type f)
as my target for some perl -i-style refactoring :|. Never perlify your
sqlite db files.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
David Mason
2018-06-25 12:49:03 UTC
Permalink
I was thinking it would be a little more than this, but perhaps:
# define FOSSIL_EXIT(n) (api_exit_status-=n,longjmp(blabla))
would actually be enough.

And a similar setjmp-calling macro at the beginning of each API function
would be all that would be required there.

../Dave
Post by Stephan Beal
Post by Warren Young
Post by Stephan Beal
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?
#ifdef LIBFOSSIL
# define FOSSIL_EXIT(n) longjmp(blabla)
#else
# define FOSSIL_EXIT(n) exit(n)
#endif
Yeah, i was exaggerating, but still... i think the required effort is
being underestimated by at least an order of magnitude. That said: i would
_absolutely love_ to be proven wrong.
$ sed -i -e 's/exit(/FOSSIL_EXIT(/g' src/*.c
perl -i -pe 's/\bexit\(/FOSSIL_EXIT(/g' src/*.c
sed probably also has a \b (at-word-boundary) equivalent, but i'm not as
well-versed in that flavor of regex.
Sidebar: i once corrupted a fossil checkout db by using $(find . -type f)
as my target for some perl -i-style refactoring :|. Never perlify your
sqlite db files.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Dominique Devienne
2018-06-25 12:52:47 UTC
Permalink
Post by Warren Young
Post by Stephan Beal
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?
#ifdef LIBFOSSIL
# define FOSSIL_EXIT(n) longjmp(blabla)
#else
# define FOSSIL_EXIT(n) exit(n)
#endif
If there's a libfossil ever, it should be C++ friendly IMHO, i.e. not use
longjmp :)

But whether it's C's longjmp or C++ throw, that doesn't solve the fact
fossil doesn't
cleanup after itself properly, simply because it's exit() based, thus lets
the OS reclaim
memory, and I guess all other resources like sockets, etc... That's
obviously a nonstarter
for a library, evidenced by all the end/free/finish APIs in SQLite, and
value destructors
passed to other APIs. My $0.02. --DD

PS: My own interest in fossil is not in its "less core" Sam advocates for,
although
I do find that prospect very interesting and worthwhile, but it's
"fast-enough HTTP
server and web-app framework" in C using a lightweight template engine
(TH1), FWIW.
David Mason
2018-06-25 13:24:51 UTC
Permalink
There is nothing wrong with a C library handling its internal processing
using setjmp/longjmp, as long as there's no C++ callbacks or any other way
that C++ code that might use throw/catch can be executed from within calls
to that library.

It's a little bit more work than just replacing the calls to exit with
longjmps.

There are 5 OS resources that exit resolves: CWD, open files, memory, child
processes, and the stack.

stejmp/longjmp only directly resolves the stack, but it provides a hook to
resolve the others.

Fossil appears to be careful with memory allocation too, with very few raw
calls to malloc, so memory allocaions can be unwound.

Again, chdir, and fopen are rarely used raw, so they can also be unwound.
(opendir is called raw in a few places, so they may need a fossil_opendir)

So as I originally said... its a pretty small amount of overhead code to
write to do the cleanup.

BUT, the big advantage is that then libfossil *automatically* gets all the
fossil bug-fixes and some of the improvements for free! Sure seems worth it
to me.

(Note that 3 files that might be the biggest problem: sheel.c, fshell.c,
and main.c) wouldn't be part of libfossil.

../Dave
Post by Warren Young
Post by Warren Young
Post by Stephan Beal
Isn't adding hundreds (literally) of gotos just as fraught with
opportunities for failure ;)?
#ifdef LIBFOSSIL
# define FOSSIL_EXIT(n) longjmp(blabla)
#else
# define FOSSIL_EXIT(n) exit(n)
#endif
If there's a libfossil ever, it should be C++ friendly IMHO, i.e. not use
longjmp :)
But whether it's C's longjmp or C++ throw, that doesn't solve the fact
fossil doesn't
cleanup after itself properly, simply because it's exit() based, thus lets
the OS reclaim
memory, and I guess all other resources like sockets, etc... That's
obviously a nonstarter
for a library, evidenced by all the end/free/finish APIs in SQLite, and
value destructors
passed to other APIs. My $0.02. --DD
PS: My own interest in fossil is not in its "less core" Sam advocates for,
although
I do find that prospect very interesting and worthwhile, but it's
"fast-enough HTTP
server and web-app framework" in C using a lightweight template engine
(TH1), FWIW.
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Florian Balmer
2018-06-25 22:12:23 UTC
Permalink
Post by David Mason
Fossil appears to be careful with memory allocation too, with very few
raw calls to malloc, so memory allocations can be unwound.
SQLite has the "Zero-malloc" and "Application-supplied memory
allocators" options [0], which may be helpful for cases without proper
db engine shutdown? On the other hand, at least the first option may
not be a good choice for Fossil's heavy-lifting work?

[0] https://www.sqlite.org/malloc.html

Also, I've recently come across an article to solve a problem with
memory fragmentation by using private heaps [1]. This way, the library
could simply dispose the entire heap when done, and also "have the
system do the cleanup". Or is this too heavy-weight for a library, or
not supported on all systems?

[1] https://blogs.msdn.microsoft.com/ricom/2006/02/02/unmanaged-memory-fragmentation-an-old-story/

--Florian

Continue reading on narkive:
Loading...