Discussion:
Proposed roadmap for Fossil 2.0
(too old to reply)
Richard Hipp
2017-02-26 20:45:18 UTC
Permalink
This message is cross-posted to fossil-users and fossil-dev.
Follow-ups should go to fossil-dev only, please. Thanks.

I propose that the next release of Fossil be called "Fossil 2.0", that
it occur before Easter (2017-04-16), and that it have the following
features:

(1) Fossil 2.0 is backwards compatible with Fossil 1.x. Fossil 2.0
can push and pull from a Fossil 1.x server. Fossil 2.0 can read and
write Fossil 1.x repositories, though only after having run "fossil
rebuild". The upgrade path is to first overwrite the older fossil 1.x
executable with a new fossil 2.0 executable, then run "fossil all
rebuild".

(2) Artifacts can be identified via multiple hash algorithms. The
initial implementation will support SHA1 and SHA3-228. (For brevity,
SHA3-228 will hereafter be referred to as K228.)

(3) The low-level file formats
(https://www.fossil-scm.org/fossil/doc/trunk/www/fileformat.wiki) are
unchanged except that the artifact hashes are allowed to be longer
than 40 hex digits for alternative hash algorithms. For K228, the
hashes are 56 hex digits long. Other hash algorithms may be supported
in future releases as long as each hash algorithm has a unique hash
length, thus enabling Fossil to figure out which algorithm is being
used simply by looking at the length of the hash.

(4) All artifact hashes within a single well-formed structure artifact
must use the same algorithm. This restriction does not apply to the
MD5 hash used by the R-card and the Z-card.

(5) Every repository will have a preferred hash algorithm. The
preferred hash algorithm can be changed by running "fossil rebuild"
with appropriate options. The artifact hashes displayed in the web
interface and on command-line output will be computed using the
preferred hash algorithm. This means that the displayed hash names
for legacy check-ins will change when the hash algorithm is changed.
However, references to the old hash values will still be correctly
resolved.

For example, the current tip of trunk in the Fossil self-hosting
repository is named using a SHA1 hash as:
ccdafa2a93e7bcefa1b4d0ea7474f9ce84c690f2. If the hash algorithm is
changed to K228, then this check-in will afterwards be displayed as
3c658054301feb7e1cd25b66e32c94ffbf48d0b2f67310d33fb79a50. However,
you will still be able to access the check-in using the
"https://www.fossil-scm.org/fossil/info/ccdafa2a93e7bcef" URL and you
will still be able to update to that check-in by typing "fossil update
ccdafa2a". In this way, a repository can transition from one hash
algorithm to another without breaking any legacy hyperlinks.

(6) Repositories can be configured to reject check-ins and other
structure artifacts that occur after a selected cut-off date and which
use the SHA1 hash algorithm.

(7) To implement the above, the BLOB.UUID field will be removed from
the repository database. In its place, a new table will be added,
tentatively declared as follows:

CREATE TABLE hname(
hash TEXT,
alg ANY,
rid INTEGER REFERENCES blob(rid),
aux ANY,
PRIMARY KEY(hash,alg)
) WITHOUT ROWID;
CREATE INDEX hname_rid ON hname(rid);

In Fossil 1.x, there was a 1-to-1 correspondence between hash values
and artifacts. Since it supports multiple hash algorithms, Fossil 2.0
now has a many-to-one relationship between hash values and artifacts,
and so the hash values need to be stored in a separate table. The
"alg" field will be a numeric 0 for the preferred hash, and some other
code (yet to be decided) for alternative hashes. Note that this new
table can also store git-style artifact hashes which would facilitate
creating a Fossil-to-Git bridge that enables a Fossil server to
directly respond to push/pull requests from Git clients using the Git
wire protocol. The "aux" field is included in anticipation of this
Fossil-to-Git bridge. For now, the "aux" field will always be NULL.
This Fossil-to-Git bridge will not be available in the first release
but might be a feature added in subsequent releases.

I believe that most of the work in creating Fossil 2.0 will involve
going through the source code, locating queries that use BLOB.UUID,
and revising those queries to use the HNAME table instead.

Unknowns:

(8) Is it possible for two Fossil servers to sync if they are using
different preferred hash algorithms? This is a desired goal, but I
do not yet understand how hard that will be.

(9) Can a Fossil 1.x client push/pull/clone from a Fossil 2.0 server,
assuming the repository uses SHA1 has it preferred hash algorithm?
This is desirable, but I am willing to sacrifice this capability in
order to reduce complexity.

(10) Should Keccak hashes that are not part of the SHA3 standard
(example: Keccak[196]) be supported? K196 is desirable in that its
hash length is 48 bytes, only 8 bytes longer than SHA1.

Feedback is welcomed and encouraged, though let's keep the discussion
on fossil-dev and off of fossil-users if possible. Thanks.
--
D. Richard Hipp
***@sqlite.org
Tony Papadimitriou
2017-02-27 01:34:47 UTC
Permalink
Leaving aside for a moment the consequences in general of the presumed
imminent SHA1 collapse (and some of the valid points already made by Linus
regarding Git):

If FOSSIL will refuse (and I actually tried it with those two same SHA1
PDFs) to accept a file (commit, push, pull) with the same SHA1 as any of
those already in the repo (not sure about the unversioned case, however),
how is it possible for someone to inject a 'bad' file with the same SHA1 as
a 'good' file already in the repo?

The only ways I can imagine (and please add more if you see them) are:

* Deconstruct the repo, replace the specific file(s) with the 'bad' one(s)
and reconstruct. But, this would be in the user's local copy, and s/he
would not be able to push those changes to the other side (again, because
the given SHA1 already exists, and the file with the same SHA1 will not be
retransmitted/reloaded). The injection will not propagate beyond the
attacker's machine.

* Know the 'good' file before it's actually committed, prepare a 'bad' same
SHA1 replacement, and commit it before the 'good' has a chance, locking it
out. (Rather impossible even for clairvoyant people -- and even if, it
would most likely be noticed more easily than replacing a dormant file
nobody bothers with!)

* Be the administrator of a site (like chiselapp for example -- I do not
mean to insinuate anything, I simply do not know of another public example)
and go through the deconstruct-replace-reconstruct process replacing good
with bad. This is the only scenario I see which will affect the general
public -- specifically, those cloning the injected repo from scratch.
However, this again (because of no same SHA1 reloading) will not affect the
local copies of the contributors, when pulling/syncing -- or any of the
clones done before the injection. This is the only one I would worry about
at a theoretical level.

So, unless my assumptions above are incorrect, how urgent is the need to
transition away from SHA1?

Also, the two example PDF files with the same SHA1 still have different MD5
which fossil apparently already uses, and this (MD5) could be used as an
alternate verifier for each artifact without changing anything else. I
believe it will be really-really difficult (for the foreseeable future at
least) for someone to come up with a 'bad' file with both SHA1 and MD5 being
the same. Don't tell me MD5 is broken. One would still need to match both
SHA1 and MD5 to inject -- not easy!

I'm certainly not against transitioning to a more secure hash *eventually*
but I doubt there is such an immediate need (until the Easter deadline, for
example) for making what seems to be a rather serious update that (and this
my biggest concern) may introduce (an avalanche[?] of) bugs, and possibly
even risk the integrity of our current repos until fully bug-free. (I for
one would be reluctant to try it for actual work until enough other people
have used it for some time without problems.) So, I think it could be done
in a more relaxed timeframe that will also give time to brainstorm the best
possible general solution that will work easily even in the event of another
hash function replacement in the future (e.g., what if SHA3 is already being
prepped by Google for summer announcement?) while maintaining backwards
compatibility to the greatest extent possible. It's also interesting to
take some time to see how others will try to deal with this problem and get
ideas.

As for the proposal, although it sounds OK on first reading, the 'unknowns'
are a bit worrisome, particularly the syncing between different versions --
you can't really get the whole population to switch at the exact same time.

And, I'm not sure it's the minimum (i.e., less chance for new bugs) solution
possible. I believe the example I gave with the MD5 is safe enough
temporary 'hack' for the foreseeable future with less possibility of bugs as
it will not switch to a new hash, simply use the second one for extra
verification (and it doesn't have to be MD5, you can use SHA3 but in a
similar context -- simply MD5 is already there).

My 0.01 eurocent!
Richard Hipp
2017-02-27 01:48:42 UTC
Permalink
Post by Tony Papadimitriou
how urgent is the need to
transition away from SHA1?
From a technical standpoint, it is not very urgent, in my assessment.

However, from a PR standpoint, I think it needs to happen quickly.

It can also be a big PR win if we are able to boast that Fossil
transitioned away from SHA1 painlessly, quickly, and efficiently and
without breaking any legacy.
--
D. Richard Hipp
***@sqlite.org
Ron Aaron
2017-02-27 04:57:51 UTC
Permalink
I'm happy to see you thinking along those lines.

From a performance standpoint, I would rather see Fossil adopt the
BLAKE2 hash, as it is one of the fastest of the SHA3 finalists, and has
adjustable output hash size.
Post by Richard Hipp
Post by Tony Papadimitriou
how urgent is the need to
transition away from SHA1?
From a technical standpoint, it is not very urgent, in my assessment.
However, from a PR standpoint, I think it needs to happen quickly.
It can also be a big PR win if we are able to boast that Fossil
transitioned away from SHA1 painlessly, quickly, and efficiently and
without breaking any legacy.
*Ron Aaron | * CTO Aaron High-Tech, Ltd <http://8th-dev.com> | +1
425.296.0766 / +972 52.652.5543 | GnuPG Key: 91F92EB8
<https://pgp.mit.edu/pks/lookup?op=get&search=0xC90C1BD191F92EB8>
Richard Hipp
2017-02-27 14:14:14 UTC
Permalink
Post by Ron Aaron
From a performance standpoint, I would rather see Fossil adopt the
BLAKE2 hash, as it is one of the fastest of the SHA3 finalists, and has
adjustable output hash size.
Please write and check-in code (on the "fossil-2.0" branch) that
implements BLAKE2 in a manner analogous to the code in sha1.c and
sha3.c. I suggest a new file named "blake2.c". I will then make it an
option.

What hash length do you want for BLAKE2? Please make the hash length
for Fossil-BLAKE2 be a multiple of 8 bits and different from 160, 228,
and 256 as those values are already assigned to other hashes.
--
D. Richard Hipp
***@sqlite.org
Ron Aaron
2017-02-27 14:23:18 UTC
Permalink
_______________________________________________
fossil-users mailing list
fossil-***@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Warren Young
2017-02-27 20:49:22 UTC
Permalink
how is it possible for someone to inject a 'bad' file with the same SHA1 as a 'good' file already in the repo?
Your attacker could be MITM’d into the sync stream. I gave an example requiring only the current SHA-1 collision technology in my first reply in the other thread:

https://news.ycombinator.com/item?id=13715887

As for the difficulty of MITM’ing your sync connection, are your Fossil servers all behind strong TLS proxies? If not, your Fossil sync streams pretty trivially MITM-able. And if you do use strong TLS, are you sure there are no TLS-busting middleboxes[1] or broken antimalware tools[2] in the computer networks at either end of the connection?


[1]: https://en.wikipedia.org/wiki/Middlebox
[2]: https://goo.gl/37H7pN
The only ways I can imagine
That’s because you aren’t a highly motivated, highly resourced, highly trained black hat. But such people do exist.
how urgent is the need to transition away from SHA1?
If we could magically upgrade all of the Fossil clients in the world, we could wait until just before the sky actually begins to fall. That event is likely years out.

If drh actually manages to ship a fix for this by Easter — aggressive, but doable — then we’ll still be seeing Fossil 1.x on a fair number of systems on Easter of 2022, by which time the current attack could be either 5-10x faster or 5-10x cheaper, the factor depending on whether there are further breakthroughs in that time. I think we should count on 10x and be pleasantly surprised if it’s “only” 5x.

Moore’s Law may be running out of steam for general purpose computing, but not for embarrassingly parallel applications like hash collision searching.
I believe it will be really-really difficult (for the foreseeable future at least) for someone to come up with a 'bad' file with both SHA1 and MD5 being the same.
Given that an MD5 collision costs less than 65 cents US to create these days,[3] I think your premise is flawed.

Yes, creating a collision in both MD5 and SHA-1 is probably more expensive than the addition of the two (US $100000.65) but I don’t think you can say that MD5 is adding meaningful security here.

MD5 is broken, broken, broken.[4]


[3]: https://natmchugh.blogspot.com/2015/02/create-your-own-md5-collisions.html
[4]: https://eprint.iacr.org/2013/170.pdf
Don't tell me MD5 is broken.
MD5 is broken. :)
One would still need to match both SHA1 and MD5 to inject -- not easy!
Argument from incredulity.[5]


[5]: http://rationalwiki.org/wiki/Argument_from_incredulity
may introduce (an avalanche[?] of) bugs, and possibly even risk the integrity of our current repos until fully bug-free.
Have avalanches of bugs been a notable hallmark of Fossil and SQLite, in your experience?
(I for one would be reluctant to try it for actual work until enough other people have used it for some time without problems.)
That’s why the SQLite project dogfoods trunk Fossil, and drh encourages us end users to use Fossil from trunk, too. By the time Fossil 2.0 is released, it *will* be well-tested, both formally and informally.
you can't really get the whole population to switch at the exact same time.
We have decades of experience managing such problems. This is known tech. We don’t need to invent any new wheels here.
Tony Papadimitriou
2017-02-28 01:28:04 UTC
Permalink
-----Original Message-----
From: Warren Young
Post by Warren Young
Post by Tony Papadimitriou
how is it possible for someone to inject a 'bad' file with the same SHA1
as a 'good' file already in the repo?
Your attacker could be MITM’d into the sync stream. I gave an example
requiring only the current SHA-1 collision technology in my first reply in
So now HTTPS is also broken?
Post by Warren Young
Post by Tony Papadimitriou
The only ways I can imagine
That’s because you aren’t a highly motivated, highly resourced, highly
trained black hat. But such people do exist.
I was implying 'practical' ways, not theoretical. Now, if Google wants to
use its billions to break my repo, they may do it, but it would be easier
for them to try to buy it from me for much less!
So, do you actually know of some other practical ways, or just looking to
make conversation?
Post by Warren Young
Post by Tony Papadimitriou
I believe it will be really-really difficult (for the foreseeable future
at least) for someone to come up with a 'bad' file with both SHA1 and MD5
being the same.
Given that an MD5 collision costs less than 65 cents US to create these
days,[3] I think your premise is flawed.
You think, I think, who cares? Can you prove your point by providing such a
collision while we're still alive?
Post by Warren Young
Yes, creating a collision in both MD5 and SHA-1 is probably more expensive
than the addition of the two (US $100000.65) but I don’t think you can say
that MD5 is adding meaningful security here.
MD5 is broken, broken, broken.[4]
So, if I give you a random piece of source code you (or your rich friends)
can create a matching paired-SHA1-MD5 alternate version (while we're still
alive)?
You would probably make bigger headlines than Google is making right now.
Post by Warren Young
Post by Tony Papadimitriou
One would still need to match both SHA1 and MD5 to inject -- not easy!
Argument from incredulity.[5]
Ditto! (Or prove how easy it is!)
Post by Warren Young
Post by Tony Papadimitriou
may introduce (an avalanche[?] of) bugs, and possibly even risk the
integrity of our current repos until fully bug-free.
Have avalanches of bugs been a notable hallmark of Fossil and SQLite, in your experience?
Past success rates do not guarantee future ones (slightly modified from a
bank fine print warning).
Post by Warren Young
Post by Tony Papadimitriou
(I for one would be reluctant to try it for actual work until enough
other people have used it for some time without problems.)
That’s why the SQLite project dogfoods trunk Fossil, and drh encourages us
end users to use Fossil from trunk, too. By the time Fossil 2.0 is
released, it *will* be well-tested, both formally and informally.
Actually, it's quite common that trunk fossil uses alpha and beta versions
of SQLite3 for the benefit of test-driving SQLite3. So, regardless any
assurances to the contrary, it's still a bit risky.
Post by Warren Young
Post by Tony Papadimitriou
you can't really get the whole population to switch at the exact same time.
We have decades of experience managing such problems.
But ... you are so Young. :)

Less arrogant people with decades of experience also created MD5, SHA1, ...
Not a very comforting argument at this point, is it?
Warren Young
2017-02-28 21:56:47 UTC
Permalink
Post by Tony Papadimitriou
Post by Warren Young
how is it possible for someone to inject a 'bad' file with the same SHA1 as a 'good' file already in the repo?
So now HTTPS is also broken?
Did you not visit the middlebox and antimalware links I provided? Yes, TLS is indeed broken at many sites.

Not theoretically broken. I mean desperately, entirely, in-practice broken.

On top of that, Fossil doesn’t ship with HTTPS built in, and it’s difficult to add after the fact, so many Fossil users aren’t going to be using TLS.

(That’s not a criticism of Fossil, just a fact about it. I don’t *want* a TLS stack built into Fossil; if it were present, I’d bypass it. That’s one of those things best left to the specialists.)
Post by Tony Papadimitriou
Post by Warren Young
The only ways I can imagine
That’s because you aren’t a highly motivated, highly resourced, highly trained black hat. But such people do exist.
I was implying 'practical' ways, not theoretical.
Past is prologue: http://valerieaurora.org/hash.html
Post by Tony Papadimitriou
So, do you actually know of some other practical ways
I’m not a security researcher, just a practitioner who keeps an eye on this sort of thing. I find that the professionals are scarier than I am.
Post by Tony Papadimitriou
Can you prove your point by providing such a collision while we're still alive?
This link was also provided up-thread: https://goo.gl/d7FTbI

Did you not at least read the abstract, or is high-level mathematics not convincing to you?
Post by Tony Papadimitriou
Post by Warren Young
One would still need to match both SHA1 and MD5 to inject -- not easy!
Argument from incredulity.[5]
Ditto! (Or prove how easy it is!)
I’m giving you mathematics, and you’re giving me maybes. I think I win.
Post by Tony Papadimitriou
Post by Warren Young
may introduce (an avalanche[?] of) bugs, and possibly even risk the integrity of our current repos until fully bug-free.
Have avalanches of bugs been a notable hallmark of Fossil and SQLite, in your experience?
Past success rates do not guarantee future ones (slightly modified from a bank fine print warning).
Current imperviousness does not guarantee future imperviousness.
s***@gmail.com
2017-03-01 16:06:14 UTC
Permalink
All sha's aside:
1. 'Prune' repo to deliver a branch or whatever as a new repo.
Ideally, history preserved from point of prune forward.
2. Unversioned files supported with check in/out.
Current approach is confusing(that may be intentional?).
3. Fossil 2.0+ delivered as dll.
I use the exe for remote repo server, but automate my check-in/out's.
That would be more fluid without parsing CLI text.

Thanks for Fossil(s)!
Jan Danielsson
2017-03-01 21:50:26 UTC
Permalink
On 03/01/17 17:06, ***@gmail.com wrote:
[---]
Post by s***@gmail.com
3. Fossil 2.0+ delivered as dll.
I use the exe for remote repo server, but automate my check-in/out's.
That would be more fluid without parsing CLI text.
This has brought up a few times before, and there are no such plans
(not for 2.0, 2.1 or beyond). There's a separate project which is
designed to accomplish what you're looking for:
http://fossil.wanderinghorse.net/repos/libfossil
--
Kind regards,
Jan Danielsson
Richard Hipp
2017-03-01 15:56:25 UTC
Permalink
I propose that the next release of Fossil be called "Fossil 2.0"....
An alpha version of Fossil 2.0 is now live on the main fossil website:

https://www.fossil-scm.org/

That same Fossil instance also runs SQLite: https://www.sqlite.org/src

This Fossil 2.0 instance is able to understand SHA3 artifact names,
but it does not generate new ones. So it should interact seamlessly
with your current version 1.37 or earlier Fossil client. And all the
web screens should continue to work as they always have. Please try
this out, and if you discover otherwise let me know either on this
mailing list or via private email.

Thanks for your help.
--
D. Richard Hipp
***@sqlite.org
s***@gmail.com
2017-03-01 16:14:02 UTC
Permalink
Cool!
More 2.0+ requests...
1. 'Prune' repo to deliver a branch or whatever as a new repo.
Ideally, history preserved from point of prune forward.
2. Unversioned files supported with check in/out.
Current approach is confusing(that may be intentional?).
3. Fossil 2.0+ delivered as dll.
I use the exe for remote repo server, but automate my check-in/out's.
That would be more fluid without parsing CLI text.

Thanks for Fossil(s)!
Post by Richard Hipp
I propose that the next release of Fossil be called "Fossil 2.0"....
https://www.fossil-scm.org/
That same Fossil instance also runs SQLite: https://www.sqlite.org/src
This Fossil 2.0 instance is able to understand SHA3 artifact names,
but it does not generate new ones. So it should interact seamlessly
with your current version 1.37 or earlier Fossil client. And all the
web screens should continue to work as they always have. Please try
this out, and if you discover otherwise let me know either on this
mailing list or via private email.
Thanks for your help.
--
D. Richard Hipp
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
s***@gmail.com
2017-03-01 16:16:53 UTC
Permalink
Sorry for double post, I got spammed between reply and lost track of what I
deleted. :(
Post by s***@gmail.com
Cool!
More 2.0+ requests...
1. 'Prune' repo to deliver a branch or whatever as a new repo.
Ideally, history preserved from point of prune forward.
2. Unversioned files supported with check in/out.
Current approach is confusing(that may be intentional?).
3. Fossil 2.0+ delivered as dll.
I use the exe for remote repo server, but automate my check-in/out's.
That would be more fluid without parsing CLI text.
Thanks for Fossil(s)!
Post by Richard Hipp
I propose that the next release of Fossil be called "Fossil 2.0"....
https://www.fossil-scm.org/
That same Fossil instance also runs SQLite: https://www.sqlite.org/src
This Fossil 2.0 instance is able to understand SHA3 artifact names,
but it does not generate new ones. So it should interact seamlessly
with your current version 1.37 or earlier Fossil client. And all the
web screens should continue to work as they always have. Please try
this out, and if you discover otherwise let me know either on this
mailing list or via private email.
Thanks for your help.
--
D. Richard Hipp
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Richard Hipp
2017-03-01 16:36:00 UTC
Permalink
Post by s***@gmail.com
More 2.0+ requests...
Fossil 2.0 will say focused on one thing: SHA3
--
D. Richard Hipp
***@sqlite.org
Sean Woods
2017-03-01 16:37:12 UTC
Permalink
Do you keep updating Fossil 1.x? Will changes to the Fossil 1.x line be
ported to 2.x?
Post by Richard Hipp
Post by s***@gmail.com
More 2.0+ requests...
Fossil 2.0 will say focused on one thing: SHA3
--
D. Richard Hipp
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Richard Hipp
2017-03-01 16:58:35 UTC
Permalink
Post by Sean Woods
Do you keep updating Fossil 1.x? Will changes to the Fossil 1.x line be
ported to 2.x?
No. Fossil 2.0 is a drop-in replace for Fossil 1.x. If you find a
problem in historical Fossil 1.x, then the solution is to upgrade to
Fossil 2.0.
--
D. Richard Hipp
***@sqlite.org
Continue reading on narkive:
Loading...