Discussion:
SHA1 and security
(too old to reply)
Eduard
2015-10-29 00:37:16 UTC
Permalink
Hi,

I wish to discuss the issues surrounding the use of SHA1 in Fossil and
their consequences, as well as propose several possibilities to deal
with them.

I would like to take a moment to define collision resistance and
second-preimage resistance. A hash H is collision-resistant if it is
infeasible to come up with x1 and x2 such that H(x1)=H(x2). A hash H is
second-preimage resistant if given some x1, it is infeasible to come up
with x2 such that H(x1)=H(x2). Of course, collision resistance implies
second-preimage resistance (but not the other way around).

First I propose that the use of SHA1 in Fossil is a serious problem.
Even if no second-preimage attack is ever successful against it, a
collision attack is currently considered possible (although expensive) [1].

How much damage can be done given the capacity to generate collisions?
Suppose the attacker generates two versions of file "main.c" that share
the same SHA1 hash, one which is malicious ("main-malicious.c") and one
which is clean ("main-clean.c"). If the attacker can intercept
communications between the server and a developer, the attacker can push
"main-malicious.c" to the server, intercept the sync between the server
and developer and substitute "main-clean.c" for "main-malicious.c", then
wait until the developer tags and/or signs the change. Moreover, it will
appear as if the developer has PGP signed the malicious version!

If the attacker is in control of the server, then this becomes even
easier; push the clean version, tell the developer to tag/sign/approve
their checkin, then shun the clean version and replace it with the
malicious one.

If a project is hosted on multiple mirrors that periodically sync with
each other and the attacker knows that the main developer tends to use
only one of them, the attacker can push the clean version to the mirror
that is used by the main developer, and simultaneously push the
malicious one to the other servers.

These concerns are only amplified as the price of generating full SHA1
collisions drops (by further cryptanalytic advancement or by
technological improvements in computing).

Hoping that I have convinced you that this is a serious problem, I would
like to discuss the ways to tackle it.

The first solution is to do nothing and just tell users not to sync with
untrusted repositories. Given the distributed nature of software (and
otherwise) development, I believe it is a difficult burden to impose
upon developers that all contributors always be carefully vetted, and
that third-party (web) hosting never be trusted. I feel that this also
breaks the "eternally incorruptible" promise of Fossil.

The second solution is to incompatibly change the Fossil specification
and replace SHA1 hashes with BetterHash (for some value of BetterHash;
discussion below) in the definition of an artifact ID. This is a
*breaking* change, and requires the *modification* of artifacts (which I
believe is frowned upon in the fossil community to say the least). This
would break older hyperlinks (which would be easy to fix automatically
just by replacement when porting the artifacts to the new format), and
most definitely breaks older PGP clearsigned checkins (which would have
remained secure as long as SHA1 second-preimage attacks are infeasible).
The main advantage to this approach is that it is the most elegant and
easy to understand and deal with. The main disadvantage is that porting
artifacts to the new format requires their modification (which breaks
the "artifacts never change" promise; I would like to note that that
promise would also be broken as soon as an attacker inserts an artifact
for which a SHA1 collision is known).

The third solution is to change the Fossil specification to redefine the
artifact ID to be the concatenation of the SHA1 and BetterHash hash
digests, and allow 40 hexadecimal digit IDs as prefixes. One can show
that the preimage- and collision-resistance of this combination is at
least as good as the strongest of the two. The main advantage of this
approach is that it is not a breaking change, and does not require the
modification of older artifacts (hyperlinks stay the same too). The main
disadvantage is that if SHA1 preimages become feasible, an attacker can
definitely go back and mess with the pre-change SHA1-only artifacts (and
thus corrupt repositories, or worse). Another disadvantage is that the
SHA1 part of the ID takes up extra room and extra computing time with no
benefit in security.

As for the exact value of BetterHash, I would like to nominate
BLAKE2b-512 [2]. It is faster than both MD5 and SHA1, it is based upon
BLAKE which has received a lot of cryptanalytic attention during the
SHA3 competition, and it retains a large security margin (the best
(academic) attack to date is on a reduced version that does only 2.5
rounds instead of 10, and even then only downgrades the security from
512 to 481 bits).

Please let me know your thoughts on this matter.

Best regards,
Eduard

[1] https://sites.google.com/site/itstheshappening/
[2] https://blake2.net/
Scott Robison
2015-10-29 05:40:31 UTC
Permalink
Post by Eduard
Hi,
I wish to discuss the issues surrounding the use of SHA1 in Fossil and
their consequences, as well as propose several possibilities to deal
with them.
{whole bunch of snipped stuff}

If fossil didn't say it used SHA1 to generate artifact IDs, I don't think
anyone would care how it generated IDs.

An artifact ID is a way of assigning a fixed length identifier to an
artifact with good distribution of IDs in the fixed length space provided.
It is not intended to be a cryptographic.

You can't create a collision in advance based on not knowing who is going
to commit what to the repository in advance.

Let's say you do, after the fact, manage to create a collision. If you try
to upload it to the repository it will be ignored because fossil believes
(correctly) it already has the artifact in question.

As you observe, one could in theory mount a MITM attack. At this point what
is to stop them from serving a completely alien repository that they've
specially crafted? No collisions required.

In fact, the "easiest" way to getting people to use malicious software is
to host a compromised repository and convince people to use it instead of
the "blessed" repository.

If you want to change the way fossil does things to limit the possibility
of fraudulent artifacts, that's fine. Perhaps prefixing the blob data with
a length (ala git) might help mitigate the possibility of hash collisions.
Perhaps creating a hash of the complete commit (vs just the manifest) and
storing it in the manifest might help.

Ultimately, one can chase hash algorithms forever trying to create some
ultimately secure ideal. In the case of actual security software, I can see
the point. In this case, it's just an identifier, and the odds of a
non-malicious collision are so close to zero that those odds might as well
be zero.
--
Scott Robison
Eduard
2015-10-29 14:41:01 UTC
Permalink
Hi Scott,

Thank you for your reply!
Post by Scott Robison
If fossil didn't say it used SHA1 to generate artifact IDs, I don't
think anyone would care how it generated IDs.
An artifact ID is a way of assigning a fixed length identifier to an
artifact with good distribution of IDs in the fixed length space
provided. It is not intended to be a cryptographic.
(...)
In this case, it's just an identifier, and the odds of a
non-malicious collision are so close to zero that those odds might as
well be zero.
That's the thing, an artifact ID is *not* just an identifier! I don't
know whether this is an intentional feature or not, but a Fossil
repository is structured as a Merkle tree (or a "hash tree"). If I know
that a single commit is genuine (because I wrote it down on a piece of
paper or because it is PGP signed), then that guarantees that all of its
ancestors (and all of their files) have not been tampered with. This is
a very powerful property, assuming that the hash function is
cryptographically secure.

In contrast, suppose that instead artifacts were identified by their
CRC160 (like CRC32, but longer) or by some sort of GUID. Accidental
collisions would be extremely unlikely, but intentional collisions or
preimages would be trivial to forge. Knowing that a particular commit
manifest is genuine would be pointless; anyone could make up a different
ancestor tree that matches the parent ID in the manifest. Worse yet,
that would not even guarantee the 'goodness' of the files in the commit
(anyone would be able to make up a different file with the same artifact
ID).
Post by Scott Robison
You can't create a collision in advance based on not knowing who is
going to commit what to the repository in advance.
Let's say you do, after the fact, manage to create a collision. If you
try to upload it to the repository it will be ignored because fossil
believes (correctly) it already has the artifact in question.
That's not a collision, that's a second-preimage. My point is rather
that an attacker can serve the clean version of the file to the
auditors, and the malicious version to the users (who trust the auditors
to tell them whether a commit is okay to use). This doesn't require a
second-preimage attack against the hash, only the ability to generate
collisions (which is possible right now, but expensive).
Post by Scott Robison
As you observe, one could in theory mount a MITM attack. At this point
what is to stop them from serving a completely alien repository that
they've specially crafted? No collisions required.
It would then be obvious that the repository is completely alien since
none of the artifact IDs match. If the user knows at least one top-level
genuine artifact ID (for example, because they trust a developer and
that developer signs their commits), then that guarantees the integrity
of all the ancestors.
Post by Scott Robison
In fact, the "easiest" way to getting people to use malicious software
is to host a compromised repository and convince people to use it
instead of the "blessed" repository.
I agree, that's the way it usually works in practice. But no amount of
checking (save for re-hashing every single artifact in a different
database and signing *that*) can protect against SHA1 collision (or
second-preimage) attacks.
Post by Scott Robison
If you want to change the way fossil does things to limit the
possibility of fraudulent artifacts, that's fine. Perhaps prefixing the
blob data with a length (ala git) might help mitigate the possibility of
hash collisions. Perhaps creating a hash of the complete commit (vs just
the manifest) and storing it in the manifest might help.
If the hash function itself is broken, neither of those would work. The
first fix would only work if SHA1 collisions are only possible with
different-length inputs (from what I understand, they're usually on
equal-length inputs), and the second fix is essentially the R-card
(which has been deprecated for efficiency reasons, and because it is
redundant with the files' artifact IDs which (are supposed to) guarantee
their contents' genuineness).
Post by Scott Robison
Ultimately, one can chase hash algorithms forever trying to create some
ultimately secure ideal. In the case of actual security software, I can
see the point.
A DVCS *is* a security software. It is perhaps the most important
security software, since the security and integrity of all other
software ultimately depends on it. The integrity of the sqlite
repository (which is probably used by over a billion people right now)
depends on it.
Post by Scott Robison
one can chase hash algorithms forever
I know that it may appear so (I thought so too for a long time), but
it's not a question of choosing better hash algorithms forever either.
From the very beginning SHA1 had a very short digest (n=160 bits), so
the most security it could ever have against a collision attack is
n/2=80 bits (i.e. that's assuming that it is a perfect hash function,
which we now know it isn't). It stands to reason that any significant
reduction in collision security from that point would bring an attack
into the realm of feasibility (just 30 bits would be enough to utterly
break it).

Suppose that instead we chose a 512-bit hash. Its maximum collision
security is 256 bits. Suppose that some cryptographic attack eventually
manages to downgrade that to 196 bits (i.e. a 60 bit reduction in
security, which is huge and extremely unlikely). That means it would
take 2^196 operations to generate a collision. Could technological
advancement maybe make that possible someday? I'll just quote Bruce
Schneier's calculation:

-- begin quote --

One of the consequences of the second law of thermodynamics is that
a certain amount of energy is necessary to represent information. To
record a single bit by changing the state of a system requires an amount
of energy no less than kT, where T is the absolute temperature of the
system and k is the Boltzman constant. (Stick with me; the physics
lesson is almost over.)

Given that k = 1.38×10^-16 erg/°Kelvin, and that the ambient
temperature of the universe is 3.2°Kelvin, an ideal computer running at
3.2°K would consume 4.4×10^-16 ergs every time it set or cleared a bit.
To run a computer any colder than the cosmic background radiation would
require extra energy to run a heat pump.

Now, the annual energy output of our sun is about 1.21×10^41 ergs.
This is enough to power about 2.7×10^56 single bit changes on our ideal
computer; enough state changes to put a 187-bit counter through all its
values. If we built a Dyson sphere around the sun and captured all its
energy for 32 years, without any loss, we could power a computer to
count up to 2^192. Of course, it wouldn't have the energy left over to
perform any useful calculations with this counter.

But that's just one star, and a measly one at that. A typical
supernova releases something like 10^51 ergs. (About a hundred times as
much energy would be released in the form of neutrinos, but let them go
for now.) If all of this energy could be channeled into a single orgy of
computation, a 219-bit counter could be cycled through all of its states.

These numbers have nothing to do with the technology of the devices;
they are the maximums that thermodynamics will allow. And they strongly
imply that brute-force attacks against 256-bit keys will be infeasible
until computers are built from something other than matter and occupy
something other than space.

-- end quote --

Best,
Eduard
Scott Robison
2015-10-29 18:06:12 UTC
Permalink
Post by Eduard
Hi Scott,
Thank you for your reply!
Post by Scott Robison
If fossil didn't say it used SHA1 to generate artifact IDs, I don't
think anyone would care how it generated IDs.
An artifact ID is a way of assigning a fixed length identifier to an
artifact with good distribution of IDs in the fixed length space
provided. It is not intended to be a cryptographic.
(...)
In this case, it's just an identifier, and the odds of a
non-malicious collision are so close to zero that those odds might as
well be zero.
That's the thing, an artifact ID is *not* just an identifier! I don't
know whether this is an intentional feature or not, but a Fossil
repository is structured as a Merkle tree (or a "hash tree"). If I know
that a single commit is genuine (because I wrote it down on a piece of
paper or because it is PGP signed), then that guarantees that all of its
ancestors (and all of their files) have not been tampered with. This is
a very powerful property, assuming that the hash function is
cryptographically secure.
In the case of fossil, it is just an identifier. I've never read anything
anywhere that attempted to claim that the artifact ID was used as anything
other than identifier. I may have missed something (which is possible and
likely). Just because the storage matches something that could have some
property one finds desirable doesn't mean that was the intent.
Post by Eduard
In contrast, suppose that instead artifacts were identified by their
CRC160 (like CRC32, but longer) or by some sort of GUID. Accidental
collisions would be extremely unlikely, but intentional collisions or
preimages would be trivial to forge. Knowing that a particular commit
manifest is genuine would be pointless; anyone could make up a different
ancestor tree that matches the parent ID in the manifest. Worse yet,
that would not even guarantee the 'goodness' of the files in the commit
(anyone would be able to make up a different file with the same artifact
ID).
Had CRC160 (or other less computationally intense) algorithm been available
at the time, DRH may have used it. SHA1 was convenient and already had well
tested implementations available with a public domain license.
Post by Eduard
Post by Scott Robison
You can't create a collision in advance based on not knowing who is
going to commit what to the repository in advance.
Let's say you do, after the fact, manage to create a collision. If you
try to upload it to the repository it will be ignored because fossil
believes (correctly) it already has the artifact in question.
That's not a collision, that's a second-preimage. My point is rather
that an attacker can serve the clean version of the file to the
auditors, and the malicious version to the users (who trust the auditors
to tell them whether a commit is okay to use). This doesn't require a
second-preimage attack against the hash, only the ability to generate
collisions (which is possible right now, but expensive).
It is a collision. A collision is simply two different buffers that hash
down to the same value. It doesn't matter how it is found, it's still a
collision (think two cars trying to occupy the same space at the same
time). Note that I'm not necessarily using collision in a cryptographic
sense, I'm using it in the classic sense that it has been used for decades
when discussing hashing algorithms.
Post by Eduard
Post by Scott Robison
As you observe, one could in theory mount a MITM attack. At this point
what is to stop them from serving a completely alien repository that
they've specially crafted? No collisions required.
It would then be obvious that the repository is completely alien since
none of the artifact IDs match. If the user knows at least one top-level
genuine artifact ID (for example, because they trust a developer and
that developer signs their commits), then that guarantees the integrity
of all the ancestors.
It only guarantees integrity until the new hash is compromised, or his
public key is no longer secure, or whatever. "Simply" changing the hash
does not make the problem go away forever, which means that fossil, if
trying to use these hash algorithms in a cryptographically secure manner,
will forever be refining the algorithms used and the database schema, which
is somewhat at odds with the stated objective of fossil being a durable
format intended to last indefinitely.

Alternatively, we can look at this use of SHA1 as an identifier. No more or
less. Security provided by actual security software / features
(certificates & TLS for web access, physically secured machines, ssh with
quality keys for remote access, etc).
Post by Eduard
Post by Scott Robison
In fact, the "easiest" way to getting people to use malicious software
is to host a compromised repository and convince people to use it
instead of the "blessed" repository.
I agree, that's the way it usually works in practice. But no amount of
checking (save for re-hashing every single artifact in a different
database and signing *that*) can protect against SHA1 collision (or
second-preimage) attacks.
Even that can't protect against attacks with 100%. It doesn't make it
easier, but it seems likely that there are two artifacts out there
somewhere that hash to the same SHA1 value *and* hash to the same
AWESOMENEWPERFECTHASH here. Yes, it is a vanishingly small possibility, but
SHA1 is already darn near vanishingly small. ;)
Post by Eduard
Post by Scott Robison
If you want to change the way fossil does things to limit the
possibility of fraudulent artifacts, that's fine. Perhaps prefixing the
blob data with a length (ala git) might help mitigate the possibility of
hash collisions. Perhaps creating a hash of the complete commit (vs just
the manifest) and storing it in the manifest might help.
If the hash function itself is broken, neither of those would work. The
first fix would only work if SHA1 collisions are only possible with
different-length inputs (from what I understand, they're usually on
equal-length inputs), and the second fix is essentially the R-card
(which has been deprecated for efficiency reasons, and because it is
redundant with the files' artifact IDs which (are supposed to) guarantee
their contents' genuineness).
In order for a collision to be useful, the structured data must have some
value beyond "breaking the repository". You want it to be undetected and do
something different than the original artifact intended. At this point
simply finding a collision is inadequate. You need to find some sort of
structured data (and source code *is* structured data, else compilers /
interpreters would not be able to process them) that does the new thing in
the undetectable way (it still compiles, for example). The likelihood that
someone is going to be able to create alternative source code artifacts
that have the same hash value and the same size yet do something nefarious
seems a lot harder than just finding a collision.
Post by Eduard
Post by Scott Robison
Ultimately, one can chase hash algorithms forever trying to create some
ultimately secure ideal. In the case of actual security software, I can
see the point.
A DVCS *is* a security software. It is perhaps the most important
security software, since the security and integrity of all other
software ultimately depends on it. The integrity of the sqlite
repository (which is probably used by over a billion people right now)
depends on it.
By this definition you can call *anything* security software. A word
processor. A spreadsheet. A text editor.

I disagree fundamentally that "security" and "integrity" are synonymous.
For example: ethernet frames include a 32-bit CRC to validate the integrity
of the data, not for security. Integrity is about detecting accidental
corruption. Security is about detecting (or preventing) intentional
"corruption".
Post by Eduard
Post by Scott Robison
one can chase hash algorithms forever
I know that it may appear so (I thought so too for a long time), but
it's not a question of choosing better hash algorithms forever either.
From the very beginning SHA1 had a very short digest (n=160 bits), so
the most security it could ever have against a collision attack is
n/2=80 bits (i.e. that's assuming that it is a perfect hash function,
which we now know it isn't). It stands to reason that any significant
reduction in collision security from that point would bring an attack
into the realm of feasibility (just 30 bits would be enough to utterly
break it).
Suppose that instead we chose a 512-bit hash. Its maximum collision
security is 256 bits. Suppose that some cryptographic attack eventually
manages to downgrade that to 196 bits (i.e. a 60 bit reduction in
security, which is huge and extremely unlikely). That means it would
take 2^196 operations to generate a collision. Could technological
advancement maybe make that possible someday? I'll just quote Bruce
-- begin quote --
One of the consequences of the second law of thermodynamics is that
a certain amount of energy is necessary to represent information. To
record a single bit by changing the state of a system requires an amount
of energy no less than kT, where T is the absolute temperature of the
system and k is the Boltzman constant. (Stick with me; the physics
lesson is almost over.)
Given that k = 1.38×10^-16 erg/°Kelvin, and that the ambient
temperature of the universe is 3.2°Kelvin, an ideal computer running at
3.2°K would consume 4.4×10^-16 ergs every time it set or cleared a bit.
To run a computer any colder than the cosmic background radiation would
require extra energy to run a heat pump.
Now, the annual energy output of our sun is about 1.21×10^41 ergs.
This is enough to power about 2.7×10^56 single bit changes on our ideal
computer; enough state changes to put a 187-bit counter through all its
values. If we built a Dyson sphere around the sun and captured all its
energy for 32 years, without any loss, we could power a computer to
count up to 2^192. Of course, it wouldn't have the energy left over to
perform any useful calculations with this counter.
But that's just one star, and a measly one at that. A typical
supernova releases something like 10^51 ergs. (About a hundred times as
much energy would be released in the form of neutrinos, but let them go
for now.) If all of this energy could be channeled into a single orgy of
computation, a 219-bit counter could be cycled through all of its states.
These numbers have nothing to do with the technology of the devices;
they are the maximums that thermodynamics will allow. And they strongly
imply that brute-force attacks against 256-bit keys will be infeasible
until computers are built from something other than matter and occupy
something other than space.
-- end quote --
This is only a problem if the only attacks against a hash are brute force.
Hash algorithms are deterministic. With enough time and effort, someone
will find ways of narrowing the search space. For that reason, you'll
always be in a mode of chasing the next better hash.

Also, this doesn't take into consideration non-public compromises of hashes
that a government, for example, might know about and keep secret. The very
types of organizations that would probably have the most to gain from the
sorts of attacks we're discussing.

DRH has talked about fossil 2.0 ideas (
http://fossil-scm.org/xfer/wiki?name=Fossil+2.0), and maybe some of these
ideas can be used there. Making changes to the database structure or format
of fossil 1.0 seems to me to be very unlikely. Not impossible! I don't make
those decisions.

Ultimately, what this comes down to is a difference of philosophy. I doubt
I'm going to convince you that my philosophy is correct. I am convinced of
the need for security, but in this case I don't think of SHA1 as being
security. I think of it has a name / hash generator that is resistant to
accidental collisions. That's what fossil needed. That's what git uses.
That's what mercurial uses. I'm not aware of any plans to change on their
parts, and if they do change, it will create a ton of headaches for every
user of those systems.
--
Scott Robison
Warren Young
2015-10-29 19:07:35 UTC
Permalink
If fossil didn't say it used SHA1 to generate artifact IDs, I don't think anyone would care how it generated IDs.
+1. It should just say “artifact ID”, or “checkin ID”.
In fact, the "easiest" way to getting people to use malicious software is to host a compromised repository and convince people to use it instead of the "blessed" repository.
Anyone who thinks that’s unlikely probably missed the XcodeGhost news:

http://www.macrumors.com/2015/09/20/xcodeghost-chinese-malware-faq/
Ultimately, one can chase hash algorithms forever trying to create some ultimately secure ideal.
We have 42 years of history — dating from the addition of crypt(3) to Unix V3 in 1973 — telling us that hash algorithms have a finite lifetime, and that we need a way to replace them after their useful years of service life are spent:

http://www.cs.technion.ac.il/~cs236350/Material/unix-password-security-ten.pdf

The argument over whether Fossil is vulnerable today or if not how long it will take before it is vulnerable is a side issue next to the fact that Fossil wasn’t designed to make replacing SHA-1 straightforward.

If it were easy, we could just swap in something better out of an abundance of caution and go on our way.

Those arguing for replacement of SHA-1 usually come from a world where such swaps are easy: /etc/shadow, X.509 certs, web pages reporting binary package sums, etc.
the odds of a non-malicious collision are so close to zero that those odds might as well be zero.
I’ll bet there are a whole lot of people who would love to get some evil code into pretty much every smartphone in the world by hacking the SQLite code repo.

That’s a powerful motivation. Don’t underestimate it.
Scott Robison
2015-10-29 20:11:02 UTC
Permalink
the odds of a non-malicious collision are so close to zero that those
odds might as well be zero.
I’ll bet there are a whole lot of people who would love to get some evil
code into pretty much every smartphone in the world by hacking the SQLite
code repo.
That’s a powerful motivation. Don’t underestimate it.
I don't underestimate it. I'm saying it's not relevant to the discussion of
using SHA1 as a means of generating identifiers (non malicious collisions),
and that "Evil Governments"(TM) are the most likely source of such an
attack (malicious collisions). Well, most likely for now as they would have
the most resources available to mount such an attack.
--
Scott Robison
Richard Hipp
2015-10-29 21:20:52 UTC
Permalink
Post by Warren Young
I’ll bet there are a whole lot of people who would love to get some evil
code into pretty much every smartphone in the world by hacking the SQLite
code repo.
That’s a powerful motivation. Don’t underestimate it.
That might be difficult.

(1) More is involved that just breaking the SHA1 artifact hashes.
Each check-in manifest also has a hash over all content of all files
in the R card. It's an MD5 hash, but that still means the attacker
would have to find replacement source code that (a) matched both SHA1
and MD5 hashes and (b) was valid C code. Good luck with that.

(2) And even if an attacker were able to do this, it wouldn't likely
go undetected. Remember that SQLite uses 100% branch testing. Any
malicious code would also have to preserve all current functionality
and also preserve 100% branch coverage to escape detection.

(3) We also do 100% inspection of all code changes between each
release using "fossil diff --from release --to trunk --tk". You don't
think we would see unauthorized code?

I think if the bad guys wanted to break into phones, they'd probably
go after the Linux kernel first, which has far less testing and is far
more loosey-goosey about configuration management and which uses Git -
also with SHA1 but without the extra MD5 R-card hash.
--
D. Richard Hipp
***@sqlite.org
Scott Robison
2015-10-29 21:33:20 UTC
Permalink
Post by Richard Hipp
I’ll bet there are a whole lot of people who would love to get some evil
code into pretty much every smartphone in the world by hacking the SQLite
code repo.
That’s a powerful motivation. Don’t underestimate it.
That might be difficult.
(1) More is involved that just breaking the SHA1 artifact hashes.
Each check-in manifest also has a hash over all content of all files
in the R card. It's an MD5 hash, but that still means the attacker
would have to find replacement source code that (a) matched both SHA1
and MD5 hashes and (b) was valid C code. Good luck with that.
Wait, so fossil is already doing what I suggested it could do (hashing the
entire commit). Why is the R card optional?
--
Scott Robison
Richard Hipp
2015-10-29 22:13:23 UTC
Permalink
Post by Scott Robison
Why is the R card optional?
Because it is expensive to compute on large repos (ex: NetBSD) with
hundreds of megabytes of content. Some projects elect to omit it.
--
D. Richard Hipp
***@sqlite.org
Eduard
2015-10-29 22:22:19 UTC
Permalink
Hi Richard,

Thanks for replying!
Post by Richard Hipp
Post by Scott Robison
Why is the R card optional?
Because it is expensive to compute on large repos (ex: NetBSD) with
hundreds of megabytes of content. Some projects elect to omit it.
Therefore large projects have to choose between having
order-of-magnitude slower security checks and being liable to SHA1
collision attacks. Moreover, it is precisely those large projects that
suffer significantly from the slowdown that need additional protection
the most (since it is easier to hide a malicious needle in a bigger
haystack).

Best,
Eduard
Warren Young
2015-10-29 21:59:08 UTC
Permalink
Post by Richard Hipp
Each check-in manifest also has a hash over all content of all files
in the R card. It's an MD5 hash, but that still means the attacker
would have to find replacement source code that (a) matched both SHA1
and MD5 hashes and (b) was valid C code. Good luck with that.
MD5 collisions can be found in about a second on modern hardware:

https://tools.ietf.org/html/rfc6151

With that work to build on, the only remaining tricky bit is working out a perturbation algorithm for C source code that doesn’t introduce so much noise that the code will be flagged as obviously-bad. I mean, you could just put random UTF-8 text into a C comment to force the collision, but that will jump out to even one casually scanning the code.
Post by Richard Hipp
Any
malicious code would also have to preserve all current functionality
and also preserve 100% branch coverage to escape detection.
If the attack trigger can rely on a new feature, it won’t be caught by the existing tests. Say, a new SQL function.

The evildoer might instead be doing something like exfiltrating user data over a TCP socket on every VACUUM call. Again, not something the tests are likely to catch.

And realize that I’m not very motivated, nor am I trained to do this. I assume a motivated expert could come up with much better ideas than these.
Post by Richard Hipp
(3) We also do 100% inspection of all code changes between each
release using "fossil diff --from release --to trunk --tk”.
Glad to hear it.
Post by Richard Hipp
You don’t think we would see unauthorized code?
That depends on how well the C perturbation algorithm works, and how clever the attack is.

Have you studied the winners of the Underhanded C Contest? How many of those jump out at you as obviously evil?

http://www.underhanded-c.org/

I can’t quite talk myself into believing these contestants were less motivated than a true black hat. Pride and social standing may be stronger motivations than money and love of country. But still, I think I’m comparing top-10 motives here, not wildly incomparable ones.
Post by Richard Hipp
I think if the bad guys wanted to break into phones, they'd probably
go after the Linux kernel first
Not first. That comes after convincing the local lawmakers that “I sees it so I wants it” and the $5 wrench attack. :)

https://xkcd.com/538/
Scott Doctor
2015-10-29 22:04:33 UTC
Permalink
I thought this topic was beat to death a couple times already.

------------
Scott Doctor
***@scottdoctor.com
------------------
Scott Robison
2015-10-29 22:22:16 UTC
Permalink
Post by Warren Young
Post by Richard Hipp
Each check-in manifest also has a hash over all content of all files
in the R card. It's an MD5 hash, but that still means the attacker
would have to find replacement source code that (a) matched both SHA1
and MD5 hashes and (b) was valid C code. Good luck with that.
https://tools.ietf.org/html/rfc6151
With that work to build on, the only remaining tricky bit is working out a
perturbation algorithm for C source code that doesn’t introduce so much
noise that the code will be flagged as obviously-bad. I mean, you could
just put random UTF-8 text into a C comment to force the collision, but
that will jump out to even one casually scanning the code.
Well, the MD5 collision might be easy to find, but the intersection of
"SHA1(useful evil C source code file) == SHA1(pure C source code file)" and
"MD5(useful evil full commit) == MD5(pure full commit)" and "useful evil C
source code file on the tip of some branch or other location that is likely
to be used and not buried deep in the historical recesses of the
repository" and ("capable of taking over a computer to inject evil
artifact" or "capable of orchestrating man in the middle attack" or
"capable of social engineering to convince people to use evil artifact" or
"something else I can't think of at the moment") seems to be a pretty tiny
intersection.
--
Scott Robison
Stephan Beal
2015-10-29 07:40:17 UTC
Permalink
Post by Eduard
First I propose that the use of SHA1 in Fossil is a serious problem.
This has been said at least a dozen times, and has not once been
demonstrated. Show me the code. Falisify ONE artifact, and i'll believe
it's a problem.

The first solution is to do nothing and just tell users not to sync with
Post by Eduard
untrusted repositories.
Which is a no-brainer, IMO.
Post by Eduard
Given the distributed nature of software (and
otherwise) development, I believe it is a difficult burden to impose
upon developers that all contributors always be carefully vetted, and
that third-party (web) hosting never be trusted. I feel that this also
breaks the "eternally incorruptible" promise of Fossil.
So far it's held up against everything except purely hypothetical thought
experiments.
Post by Eduard
most definitely breaks older PGP clearsigned checkins (which would have
remained secure as long as SHA1 second-preimage attacks are infeasible).
The main advantage to this approach is that it is the most elegant and
easy to understand and deal with.
i fail to see how changing from hash A to B makes anything more elegant or
easier to understand.
Post by Eduard
The third solution is to change the Fossil specification to redefine the
artifact ID to be the concatenation of the SHA1 and BetterHash hash
digests, and allow 40 hexadecimal digit IDs as prefixes. One can show
that the preimage- and collision-resistance of this combination is at
least as good as the strongest of the two. The main advantage of this
approach is that it is not a breaking change
But it's a heck of a lot of work to solve an as-yet-undemonstrated,
hypothetical problem.
Post by Eduard
Please let me know your thoughts on this matter.
i stubbornly refuse to be convinced until someone demonstrates the problem.
Once it's demonstrated, i'm all ears.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Warren Young
2015-10-29 18:46:27 UTC
Permalink
Post by Eduard
I wish to discuss the issues surrounding the use of SHA1 in Fossil
Have you read the prior discussions on this?

http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg18053.html
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05970.html
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg21423.html
Post by Eduard
First I propose that the use of SHA1 in Fossil is a serious problem.
The known attacks on SHA-1 are still computationally expensive, and will remain so for years. Not impossible, but still very difficult. We have time to move, if we need to.

But also, and much more importantly, most of the attacks on SHA-1 only apply to standalone blob cases such as binary package validation, X.509 certificate signing, etc. In Fossil, most of the SHA-1 checksummed artifacts are chained in some way, so that you can only modify the leaves of branches.

That makes tampering with the tree without detection quite difficult. Anyone paying attention to the timeline will probably notice an attack.

Understand, I’ve been on your side prior to this, worried about SHA-1 purely because Bruce Schneier and Google tell me that SHA-1 should give me stomach upset. But I also have D. Richard Hipp telling me that due to the way SHA-1 is used in Fossil, he isn’t worried about it. That’s a powerful antacid. :)
Post by Eduard
If the attacker can intercept
communications between the server and a developer
…then you did not run Fossil over TLS, like you should if MITM is a legitimate risk in your situation. :)
Post by Eduard
If the attacker is in control of the server
…then he can serve you any content he likes, no matter how good your hash algorithm is.

The correct solution here is something like TLS with certificate pinning, GPG signing, etc.
Post by Eduard
The third solution is to change the Fossil specification to redefine the
artifact ID to be the concatenation of the SHA1 and BetterHash
A fourth solution is to use Modular Crypt Format to declare the hash for each artifact, and for future Fossil versions to tolerate SHA-1 only in existing artifacts, accepting new ones using only known-good algorithms:

https://pythonhosted.org/passlib/modular_crypt_format.html

This could be done without breaking the DB, simply because a 20-byte hash must be SHA-1, since even a 160-bit BetterHash will have the MCF wrapper on it, making it more than 20 bytes.

The SQLite card format parser would have to be made more flexible, to make it understand that if it sees a leading dollar sign, the following hash can be variable-width.

I think the biggest problem with this is that older Fossil clients wouldn’t be able to sync with the repo after the server is upgraded. That’s a well-known problem with a raft of well-understood solutions, so we don’t need to detail that here.
Eduard
2015-10-29 21:40:08 UTC
Permalink
Hi Warren,

Thanks for replying!
Post by Warren Young
Post by Eduard
I wish to discuss the issues surrounding the use of SHA1 in Fossil
Have you read the prior discussions on this?
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg18053.html
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05970.html
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg21423.html
I had read 2/3 of them, yes. Thanks for the third one!
Post by Warren Young
Post by Eduard
First I propose that the use of SHA1 in Fossil is a serious problem.
The known attacks on SHA-1 are still computationally expensive, and will remain so for years. Not impossible, but still very difficult. We have time to move, if we need to.
I agree. I also believe that the best time to think about it is right
now. The number of Fossil users will only increase with time (in fact
I'm about to introduce four new people to Fossil), and so will the
number of people potentially annoyed by a non-backwards compatible
change in the specification.
Post by Warren Young
But also, and much more importantly, most of the attacks on SHA-1 only apply to standalone blob cases such as binary package validation, X.509 certificate signing, etc. In Fossil, most of the SHA-1 checksummed artifacts are chained in some way, so that you can only modify the leaves of branches.
And individual files (that are part of commits). That won't show up in
the timeline.
Post by Warren Young
Post by Eduard
If the attacker can intercept
communications between the server and a developer

then you did not run Fossil over TLS, like you should if MITM is a legitimate risk in your situation. :)
Post by Eduard
If the attacker is in control of the server

then he can serve you any content he likes, no matter how good your hash algorithm is.
True, but he shouldn't be able to convince me that ID "abcdef"
corresponds to something other than the original artifact created with
ID "abcdef". Again, I might know (through some other source, e.g.
PGP-signed email) that artifact "abcdef" is genuine, and it shouldn't
matter where I download it from. If artifact "abcdef" refers to "xyzzy",
trusting the genuineness of "abcdef" should imply trusting that of "xyzzy".

I also don't believe that the users and developers should have to trust
the Fossil server (including mirrors) and its operator; I don't have to
trust my Debian mirror to download packages (and their sources) from it.
That would avoid happenings like the XcodeGhost incident.
Post by Warren Young
The correct solution here is something like TLS with certificate pinning, GPG signing, etc.
That's the thing, GPG signing covers the contents of the manifest, which
itself refers to the files inside it only by their SHA1 hash. If someone
substitutes a file with a malicious file that hashes the same, it won't
change anything in the manifest and the GPG signature will remain valid.
Post by Warren Young
Post by Eduard
The third solution is to change the Fossil specification to redefine the
artifact ID to be the concatenation of the SHA1 and BetterHash
https://pythonhosted.org/passlib/modular_crypt_format.html
This could be done without breaking the DB, simply because a 20-byte hash must be SHA-1, since even a 160-bit BetterHash will have the MCF wrapper on it, making it more than 20 bytes.
The SQLite card format parser would have to be made more flexible, to make it understand that if it sees a leading dollar sign, the following hash can be variable-width.
That is a great (and extensible!) solution! There are a few issues though:
- Every artifact must be hashed by every known algorithm. The database
size grows linearly with the number of hashing algorithms.
- There must be an additional mechanism for upgrading the older hash
version artifacts. Consider a checkin manifest from 3 years ago. It is
very likely that no new checkin/branch will ever refer to it directly,
so nobody will ever refer to it by new-hash. Worse yet, it is likely
nobody will ever refer to the files inside that checkin by new-hash. If
a preimage attack on old-hash becomes possible (or even easy), one could
mess with the artifacts that are only referred to using old-hash.

I don't believe the first issue will ever be a problem, though, since I
personally don't think we'll ever need to go past BetterHash-512.

As for the second issue, one solution is to rehash all of the older
artifacts using new-hash and rewrite all of the control artifacts in
terms of new-hash (this operation is fully deterministic and can be
verified independently). This won't play well at all with shunned
content (since we can't recompute hashes on artifacts we don't have
anymore), and will definitely do very badly if one tries to put back
shunned content (since we've probably put in some sort of placeholder
null value in the manifest). I don't know whether the adding-back
shunned content part is really an issue; we only shun things when we
want them truly gone forever. But there is still the annoying issue that
if two people don't have the same shunning lists, they will end up with
radically different new-hash artifact sets (one checkin will have a
placeholder whereas the other one doesn't, and that will change the
artifact IDs of all of the descendants). So I guess exactly one person
should upgrade the hashes once per project (which I don't believe to be
a really terrible limitation, especially since their work can be
verified independently). This also has the annoying side-effect of
increasing the space taken up by control artifacts (since we're carrying
both the old-hash and the new-hash versions), but I guess one could
purge all of the old-hash control artifacts from the repository after a
few years (once old-hash is no longer trusted at all).

PGP-clearsigned manifests would probably also need to be re-signed in a
timely manner (in all likelihood the hash function PGP used when signing
the manifest has also been deprecated). (This could be done
after-the-fact using a signed tag.)

There is also the issue that (e.g. URL) references to old-hash artifacts
will be broken. I'm not sure how I feel about that; one could say that
they *should* be broken because we can no longer be certain about what
they point to (assuming that we no longer trust old-hash's security).
Intraproject references can always be fixed (in an automated manner),
but interproject references will likely be much harder to upgrade. It
might be highly useful to write a tool which scans a text file for
artifact-referencing URLs and tries to resolve the hash-upgraded version
automatically assuming that the referenced repository is available locally.

I'm not sure whether in this approach the new-hash control artifacts
should explicitly list the old-hash artifacts as parents (or maybe as
some new card type). It may be useful as a quick way to identify the
old-hash corresponding control artifact (it may even resolve some
ambiguity when verifying the transition from old-hash to new-hash).
Thoughts?

(Also are there any issues on any of the supported platforms with having
dollar signs in filenames (or URLs)? Just a random thought.)

Best,
Eduard
Warren Young
2015-10-29 22:50:52 UTC
Permalink
Post by Eduard
Post by Warren Young
Post by Eduard
I wish to discuss the issues surrounding the use of SHA1 in Fossil
Have you read the prior discussions on this?
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg18053.html
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05970.html
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg21423.html
I had read 2/3 of them, yes. Thanks for the third one!
The third one’s the mother lode. Don’t be fooled by the mail-archive.com UI, which presents only 10 or so results at a time. That thread went on and on and on. Hopefully we can avoid retreading some of the same ground in this one.
Post by Eduard
Post by Warren Young
most of the attacks on SHA-1 only apply to standalone blob cases
And individual files (that are part of commits). That won't show up in
the timeline.
Do you mean newly-added files? They’re shown on the Check-in details screen, the most likely thing you’re going to click on on the Timeline page.

For a newly-added file to have an effect on a repo, it’ll probably also require modification to an existing file, such as the Makefile.

(Exceptions are files included by wildcard.)
Post by Eduard
Post by Warren Young
Post by Eduard
If the attacker is in control of the server
…then he can serve you any content he likes, no matter how good your hash algorithm is.
True, but he shouldn't be able to convince me that ID "abcdef"
corresponds to something other than the original artifact created with
ID "abcdef”.
How are you going to know that the legitimate file has ID abcdef? Cross-reference to another repo? What if there is only one central repo?

If an evildoer has taken over the central server, they are just providing a pile of artifacts, and you are trusting that those artifacts are legitimate.

Granted, you can’t do such a swap to people with existing checkouts, since that will break the sync algorithm, but an evil Fossil instance could probably be made to detect whether it is being asked for a clean checkout or a sync update of an existing one.
Post by Eduard
I might know (through some other source, e.g.
PGP-signed email) that artifact "abcdef" is genuine, and it shouldn't
matter where I download it from.
How many people will be doing such cross-checking?

Again I bring up the XcodeGhost example. People do foolish things in the name of expediency.
Post by Eduard
I don't have to
trust my Debian mirror to download packages (and their sources) from it.
You’re referring to the fact that DEBs are GPG-signed, I assume?

That works because the Debian gatekeepers can sign the packages on an assumed-secure box. (Such central package repos have been compromised in the past.) The distro includes a copy of the central source’s public key, so if the package signature doesn’t decrypt correctly, it isn’t legitimate.

Where can you put such a root of trust in the Fossil case?

There is no central presumed-secure site with Fossil. Remember, you were just positing that the central repo’s server got rooted.

You also can’t solve it by having people with checkin bits submit a GPG public key to the repo along with their login creds and sign checkins, because those keys live on the same compromised server. The evildoer can just generate a new set of keys, re-sign the compromised artifacts, and store the new keys in the user table instead of the original keys.

It’s the problem with all PKIs: who do you trust?
Post by Eduard
That would avoid happenings like the XcodeGhost incident.
Apple has a code-signing mechanism, too, and packages from Apple are always signed. But, the client-side checker (Gatekeeper) is not mandatory, and developers often turn it off entirely, since it gets in the way of developing software.

Plus, you can bypass Gatekeeper for $99: get a code signing cert from Apple and sign your evil packages with it. It’ll work until Apple catches you and revokes your cert. Almost no one checks *who* signed the package; all they know is that the OS let them install it when they double-clicked it.
Post by Eduard
- Every artifact must be hashed by every known algorithm.
I’m assuming it's possible to change from one algorithm to another mid-stream, as long as the client knows all of the algorithms in use, and is told where the change points occur.

Do you know for a fact that you cannot do this?
Post by Eduard
The database
size grows linearly with the number of hashing algorithms.
If so, it’s only a handful of bytes per artifact, per algorithm.

The real cost would be the computation time.
Post by Eduard
Consider a checkin manifest from 3 years ago. It is
very likely that no new checkin/branch will ever refer to it directly,
so nobody will ever refer to it by new-hash. Worse yet, it is likely
nobody will ever refer to the files inside that checkin by new-hash. If
a preimage attack on old-hash becomes possible (or even easy), one could
mess with the artifacts that are only referred to using old-hash.
Yes, if you want old artifacts to be unassailable, you’d have to recompute all the hashes.

But, I think you’re not realizing that artifact chaining removes the attraction of replacing old artifacts. As I understand it, you can’t replace an artifact 10 checkins back from the tip of the branch without recomputing the 9 other hashes on the way back to the tip. Therefore, an attack that takes a year of CPU time to attack a leaf node takes 10 years to attack a node 10 checkins back from the tip.

This came up in that third-linked thread.
Post by Eduard
I personally don't think we'll ever need to go past BetterHash-512.
I’m not sure if you’re saying that 512 bits will be enough forever, or that we already have the last hash algorithm we will ever need.

History says either is a foolish prediction, and that the best hash algorithms remain state-of-the-art for only about a dozen years.

Maybe the sunsetting of Moore’s Law will break us out of that pattern. But if Fossil is going to go through the pain of replacing SHA-1, it should be done in a way that makes it easier to do again later.
Post by Eduard
Thoughts?
tl;dr. >:)

As I said in my previous email, I don’t see a reason to work out the migration strategy before we work out the *ifs* and *whys*.
Post by Eduard
(Also are there any issues on any of the supported platforms with having
dollar signs in filenames (or URLs)? Just a random thought.)
Why do file names come into it? The MCF tag would be in one of the cards, which live in the DB.

I don’t even see that it has to be reported in the UI, or accepted on the command line, since the chance of two algorithms having a conflicting hash are near zero. Even if you do come across, say, a 10 hex digit prefix of two hashes that are the same under, say, SHA-1 and SHA-512, Fossil already knows how to stop and make you be specific about which one you mean, if it can’t see that one is obviously correct.

Therefore, command line usage will remain unchanged in this scheme: “fossil up EA5D538D23A7C” will most often uniquely identify one artifact, or none, not 2+.
Eduard
2015-10-29 23:32:02 UTC
Permalink
Hi Warren,
Post by Eduard
Post by Warren Young
(...)
I had read 2/3 of them, yes. Thanks for the third one!
The third one’s the mother lode. Don’t be fooled by the mail-archive.com UI, which presents only 10 or so results at a time. That thread went on and on and on. Hopefully we can avoid retreading some of the same ground in this one.
Thanks!
Post by Eduard
Post by Warren Young
most of the attacks on SHA-1 only apply to standalone blob cases
And individual files (that are part of commits). That won't show up in
the timeline.
Do you mean newly-added files? They’re shown on the Check-in details screen, the most likely thing you’re going to click on on the Timeline page.
For a newly-added file to have an effect on a repo, it’ll probably also require modification to an existing file, such as the Makefile.
(Exceptions are files included by wildcard.)
I'm talking about generating collisions for non-control artifacts
(actual files), not control artifacts. This has a higher chance of being
successful if the repository has the R-card disabled (for efficiency
reasons).
Post by Eduard
Post by Warren Young
Post by Eduard
If the attacker is in control of the server

then he can serve you any content he likes, no matter how good your hash algorithm is.
True, but he shouldn't be able to convince me that ID "abcdef"
corresponds to something other than the original artifact created with
ID "abcdef”.
How are you going to know that the legitimate file has ID abcdef? Cross-reference to another repo? What if there is only one central repo?
I'll check the PGP signature of a checkin manifest that has a checkin in
its ancestry that contains this file.
If an evildoer has taken over the central server, they are just providing a pile of artifacts, and you are trusting that those artifacts are legitimate.
But if you do know that some subset of the (control) artifacts are
legitimate (because you checked the PGP signature or because the lead
dev told you the tip artifact ID in-person), it is so extremely useful
to be able to infer that all of their ancestors must also be legitimate.
Post by Eduard
I might know (through some other source, e.g.
PGP-signed email) that artifact "abcdef" is genuine, and it shouldn't
matter where I download it from.
How many people will be doing such cross-checking?
Again I bring up the XcodeGhost example. People do foolish things in the name of expediency.
Well, I know I will be doing such cross-checking. Hopefully I'm not the
only one. Right? ...right?
Post by Eduard
I don't have to trust my Debian mirror to download packages (and their sources) from it.
You’re referring to the fact that DEBs are GPG-signed, I assume?
That works because the Debian gatekeepers can sign the packages on an assumed-secure box. (Such central package repos have been compromised in the past.) The distro includes a copy of the central source’s public key, so if the package signature doesn’t decrypt correctly, it isn’t legitimate.
You're right; the binary packages are automatically signed on an
assumed-secure box. But the source packages are also signed by the
individual packagers, whose private keys reside only on their computers
(and not on the server).
Where can you put such a root of trust in the Fossil case?
There is no central presumed-secure site with Fossil. Remember, you
were just positing that the central repo’s server got rooted.

There is more than one answer, but one is that the root of trust are the
PGP private keys on the individual developers' personal computers. The
developers' private keys should never ever reach the public central repo
server.
You also can’t solve it by having people with checkin bits submit a GPG public key to the repo along with their login creds and sign checkins, because those keys live on the same compromised server. The evildoer can just generate a new set of keys, re-sign the compromised artifacts, and store the new keys in the user table instead of the original keys.
It’s the problem with all PKIs: who do you trust?
That's exactly correct! This is why it's important to set up a web of
trust in PGP (sign the keys of the people you trust, etc).

https://www.gnupg.org/gph/en/manual.html#AEN554
because those keys live on the same compromised server
(Sorry, I just couldn't let this one pass.) The entire point of PGP is
that it's public key cryptography, so (in that scenario) the developers
would only be submitting their *public* keys to the server (and would
keep the private bits, well, private).
Post by Eduard
That would avoid happenings like the XcodeGhost incident.
Apple has a code-signing mechanism, too, and packages from Apple are always signed. But, the client-side checker (Gatekeeper) is not mandatory, and developers often turn it off entirely, since it gets in the way of developing software.
Plus, you can bypass Gatekeeper for $99: get a code signing cert from Apple and sign your evil packages with it. It’ll work until Apple catches you and revokes your cert. Almost no one checks *who* signed the package; all they know is that the OS let them install it when they double-clicked it.
That's actually kind of depressing.
Post by Eduard
- Every artifact must be hashed by every known algorithm.
I’m assuming it's possible to change from one algorithm to another mid-stream, as long as the client knows all of the algorithms in use, and is told where the change points occur.
Do you know for a fact that you cannot do this?
I'm not sure what you mean nor why it would be necessary.
(snip)
Post by Eduard
Consider a checkin manifest from 3 years ago. It is
very likely that no new checkin/branch will ever refer to it directly,
so nobody will ever refer to it by new-hash. Worse yet, it is likely
nobody will ever refer to the files inside that checkin by new-hash. If
a preimage attack on old-hash becomes possible (or even easy), one could
mess with the artifacts that are only referred to using old-hash.
Yes, if you want old artifacts to be unassailable, you’d have to recompute all the hashes.
But, I think you’re not realizing that artifact chaining removes the attraction of replacing old artifacts. As I understand it, you can’t replace an artifact 10 checkins back from the tip of the branch without recomputing the 9 other hashes on the way back to the tip. Therefore, an attack that takes a year of CPU time to attack a leaf node takes 10 years to attack a node 10 checkins back from the tip.
That's (probably) true, but I'm mostly referring to colliding on
non-control artifacts (i.e. actual files).
Post by Eduard
I personally don't think we'll ever need to go past BetterHash-512.
I’m not sure if you’re saying that 512 bits will be enough forever, or that we already have the last hash algorithm we will ever need.
History says either is a foolish prediction, and that the best hash algorithms remain state-of-the-art for only about a dozen years.
I wouldn't be so quick to jump to that conclusion (at least the part
that 512 bits won't be enough forever; cryptanalysis may very well
advance sufficiently to break BetterHash-512). As for the 512-bit
security part, please see the end of
http://www.mail-archive.com/fossil-***@lists.fossil-scm.org/msg21704.html
(starting at "I know that it may appear so").
Maybe the sunsetting of Moore’s Law will break us out of that pattern. But if Fossil is going to go through the pain of replacing SHA-1, it should be done in a way that makes it easier to do again later.
Absolutely agree.
(snip)
Post by Eduard
(Also are there any issues on any of the supported platforms with having
dollar signs in filenames (or URLs)? Just a random thought.)
Why do file names come into it? The MCF tag would be in one of the cards, which live in the DB.
I don’t even see that it has to be reported in the UI, or accepted on the command line, since the chance of two algorithms having a conflicting hash are near zero. Even if you do come across, say, a 10 hex digit prefix of two hashes that are the same under, say, SHA-1 and SHA-512, Fossil already knows how to stop and make you be specific about which one you mean, if it can’t see that one is obviously correct.
Therefore, command line usage will remain unchanged in this scheme: “fossil up EA5D538D23A7C” will most often uniquely identify one artifact, or none, not 2+.
You're right, never mind.

Best,
Eduard
Warren Young
2015-10-30 00:50:27 UTC
Permalink
Post by Eduard
Post by Warren Young
Post by Eduard
Post by Warren Young
most of the attacks on SHA-1 only apply to standalone blob cases
And individual files (that are part of commits). That won't show up in
the timeline.
Do you mean newly-added files?
I'm talking about generating collisions for non-control artifacts
(actual files), not control artifacts.
Oh, I see what you mean. You’re making the same point Ron W did: If you replace the file blob data in the tip of a branch, you don’t get a timeline entry for that change.

(You can do it farther up the tree, too, but it’s useless unless someone checks out an old version of the software.)

I assume the Fossil sync algorithm won’t allow a remote Fossil to replace an existing artifact. If so, that attack only works if you have control of the server hosting a Fossil repo that others sync from.

This would bypass the problem of not being able to spoof those who already have an existing clone of the repo, since the evil file hashes to the same value as the one they already have.

But by the same token, I don’t see how to get those with existing copies of that file to download the new one. The sync protocol should skip the “unchanged” file, since the client already has an artifact with that ID.

I also wonder what will happen if someone with an existing checkout checks in a diff against the changeling file, and the diffs overlap with the evil bits. I assume the server will try to apply the patch and fail, or the next person to clone the repo will get a clone that fails to open.
Post by Eduard
Post by Warren Young
Where can you put such a root of trust in the Fossil case?
There is no central presumed-secure site with Fossil. Remember, you
were just positing that the central repo’s server got rooted.
There is more than one answer, but one is that the root of trust are the
PGP private keys on the individual developers' personal computers. The
developers' private keys should never ever reach the public central repo
server.
Ah: You’re presupposing the existence of a PGP PKI that everyone’s willing to use.

Observe how PGP email has completely failed to take over the world, even given a quarter of a century.

Yes, I know about keyservers. I also know there’s more than one, and that you get a lot of resistance from most people when you tell them to go get your public key.

TLS works because there’s a financial motivation for people to pay one of the trusted CAs for a database record that costs maybe $1 max over its valid lifetime to generate and store.

Financial arguments will work within a company, but not in an open source project.
Post by Eduard
Post by Warren Young
Plus, you can bypass Gatekeeper for $99: get a code signing cert from Apple and sign your evil packages with it. It’ll work until Apple catches you and revokes your cert. Almost no one checks *who* signed the package; all they know is that the OS let them install it when they double-clicked it.
That's actually kind of depressing.
Every commercial code signing system I’ve used (OS X, Windows, & Adobe Flex/AIR) works this way. They’re basically a variant on the TLS certificate scheme, which is why Verisign is the certificate provider for so many of these schemes:

http://www.symantec.com/products-solutions/families/?fid=code-signing

iOS throws in an additional wrinkle: un-rooted iOS devices won’t install an app that isn’t co-signed by Apple’s signature. In that way, it is more like a Debian package.

The Apple App Store for OS X has the same restriction, but unlike with iOS, there’s nothing in OS X forcing you to get your apps from the App Store.

I believe Android is the same way, except that it has the Gatekeeper-like exception path which lets you install unsigned apps.
Post by Eduard
Post by Warren Young
Post by Eduard
- Every artifact must be hashed by every known algorithm.
I’m assuming it's possible to change from one algorithm to another mid-stream, as long as the client knows all of the algorithms in use, and is told where the change points occur.
Do you know for a fact that you cannot do this?
I'm not sure what you mean nor why it would be necessary.
I mean that I think it’s possible to replace a Fossil server and client pair with new ones that understand two different hash algorithms, and for those two to use the old hash algorithm on old artifacts, and new on new.

At the transition point, you’ll have a manifest containing new-style M cards and old-style P cards. Why can’t that work?

The only reason to recompute old hashes is to prevent replacement of old file artifacts, which is not very useful.
Post by Eduard
I'm mostly referring to colliding on
non-control artifacts (i.e. actual files).
Oh, I see: by successfully executing a preimage attack, you can replace a file blob without rewriting the manifest that refers to it.

I thought the file blobs were also chained somehow, but I can’t back that up by skimming the Fossil file format wiki article. It looks like only the manifests are chained.

Unless I’m missing something, that puts me back in the “time to plan Fossil’s SHA-1 exodus” camp.
Post by Eduard
Post by Warren Young
Post by Eduard
I personally don't think we'll ever need to go past BetterHash-512.
I’m not sure if you’re saying that 512 bits will be enough forever, or that we already have the last hash algorithm we will ever need.
Yes, I know about the heat death of the universe arguments.

I’m just saying that you’re assuming that no one can knock BetterHash-512’s complexity down from 2^256 to the 2^dozens range we’ve seen with MD5 and SHA-1.

I feel more sure about such observations when it comes to things like address space sizes. We just need to get to 256-bit addressing so we can store all relevant parameters about every fundamental particle in the universe, and thus have a perfect simulation of the universe. That’ll end the current universe and start the next one. :)
Richard Hipp
2015-10-30 01:12:58 UTC
Permalink
Post by Warren Young
Oh, I see what you mean. You’re making the same point Ron W did: If you
replace the file blob data in the tip of a branch, you don’t get a timeline
entry for that change.
I assume the Fossil sync algorithm won’t allow a remote Fossil to replace an
existing artifact.
Correct
Post by Warren Young
If so, that attack only works if you have control of the
server hosting a Fossil repo that others sync from.
This would bypass the problem of not being able to spoof those who already
have an existing clone of the repo, since the evil file hashes to the same
value as the one they already have.
But by the same token, I don’t see how to get those with existing copies of
that file to download the new one. The sync protocol should skip the
“unchanged” file, since the client already has an artifact with that ID.
Correct. The first instance of a hash to get into the system
suppresses all others.
--
D. Richard Hipp
***@sqlite.org
Scott Robison
2015-10-30 14:11:54 UTC
Permalink
Post by Warren Young
I also wonder what will happen if someone with an existing checkout
checks in a diff against the changeling file, and the diffs overlap with
the evil bits. I assume the server will try to apply the patch and fail,
or the next person to clone the repo will get a clone that fails to open.

I don't think fossil transfers deltas via the sync protocol, though my
check of the source code was brief. As i think more about it, fossil can't
really sync delta encoded artifacts because artifacts are unordered, so it
would not be possible to use an artifact if you got it before its
dependency.
Richard Hipp
2015-10-30 14:17:00 UTC
Permalink
Post by Scott Robison
I don't think fossil transfers deltas via the sync protocol,
It does. Most artifacts are transmitted as deltas against existing
artifacts that both ends already know about.

Which reminds me - there is a (non-cryptographic) checksum on every
delta that must also match, thus making it even hard to substitute
foo-evil.c for foo.c.
--
D. Richard Hipp
***@sqlite.org
Michal Suchanek
2015-10-30 11:30:16 UTC
Permalink
Post by Eduard
Hi Warren,
Post by Warren Young
Post by Eduard
(...)
I had read 2/3 of them, yes. Thanks for the third one!
...
Post by Eduard
Post by Warren Young
Post by Eduard
I might know (through some other source, e.g.
PGP-signed email) that artifact "abcdef" is genuine, and it shouldn't
matter where I download it from.
How many people will be doing such cross-checking?
Again I bring up the XcodeGhost example. People do foolish things in the name of expediency.
Well, I know I will be doing such cross-checking. Hopefully I'm not the
only one. Right? ...right?
Seriously, large part of the software out there is not signed in any way at all.

For codebase of non-trivial size (more than 2-3 small files) there is
no way to review the code.

It does not suffice to sign the security software. Since we have the
poor security design dating back to the original Unix implementation
all application are allowed to do anything. There are optional
security extension like selinux that technically do allow sandboxing
applications by now but most applications would fail if running
sandboxed because these are optional non-standard extensions. Who
would bother to cater to people who use those to be able to run their
system securely, right?

So you have to trust every single line of code and makefile you run.
Not just the system tools. *everything* you ever download and execute
on your computer. Even proprietary applications and libraries (how
many vendors do sign these?).

So basically any 'security' on a workstation where you actually do
anything useful is just fake.

Thanks

Michal
Ron W
2015-10-29 23:37:08 UTC
Permalink
But, I think you’re not realizing that artifact chaining removes the
attraction of replacing old artifacts. As I understand it, you can’t
replace an artifact 10 checkins back from the tip of the branch without
recomputing the 9 other hashes on the way back to the tip. Therefore, an
attack that takes a year of CPU time to attack a leaf node takes 10 years
to attack a node 10 checkins back from the tip.
Supposed the work on "foo.c" is done. Subsequent commits will refer to most
recent version of "foo.c". Replacing the contents of "foo.c" with
"foo_evil.c", where H(foo.c) == H(foo_evil.c) now only requires creating a
single "foo_evil.c".

Depending on how long ago "foo.c" was last worked on, the devs might not
notice the changes, especially if they don't diff it against the previous
revision.

Something my team does - though not for security reasons - is to archive
the sources of a release to a separate archive, independent of the VCS
repository. Then, when we make a new release, part of the process is to
review all changes in the proposed new release against the source of the
previous release. (This is in addition to the code reviews we do prior to
integrating additions and fixes.)

While not perfect, our release reviews increase the probability of
detecting a "stealth" change before it gets released.
Christopher M. Fuhrman
2015-10-29 20:26:21 UTC
Permalink
Post by Eduard
Hi,
I wish to discuss the issues surrounding the use of SHA1 in Fossil and
their consequences, as well as propose several possibilities to deal
with them.
<snip-snip>
Post by Eduard
As for the exact value of BetterHash, I would like to nominate
BLAKE2b-512 [2]. It is faster than both MD5 and SHA1, it is based upon
BLAKE which has received a lot of cryptanalytic attention during the
SHA3 competition, and it retains a large security margin (the best
(academic) attack to date is on a reduced version that does only 2.5
rounds instead of 10, and even then only downgrades the security from
512 to 481 bits).
What kind of speed hit would using the BLAKE2b algorithm on 32-bit
machines such as i386, vax, or m68k? Yes, there's the BLAKE2s
algorithm for 8-32 bit architectures but that produces different
hashes than BLAKE2b. Is it even possible to use BLAKE2b on a 32-bit
CPU?
Post by Eduard
Please let me know your thoughts on this matter.
Best regards,
Eduard
[1] https://sites.google.com/site/itstheshappening/
[2] https://blake2.net/
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
- --
Christopher M. Fuhrman
***@pobox.com
Eduard
2015-10-29 22:10:57 UTC
Permalink
Post by Christopher M. Fuhrman
What kind of speed hit would using the BLAKE2b algorithm on 32-bit
machines such as i386, vax, or m68k? Yes, there's the BLAKE2s
algorithm for 8-32 bit architectures but that produces different
hashes than BLAKE2b. Is it even possible to use BLAKE2b on a 32-bit
CPU?
Yes it is possible, and not much slower either! Here's what I get on my
Intel L2300 CPU (which is 32-bit only):

blake2s-256
268435456 bytes (268 MB) copied, 3.8387 s, 69.9 MB/s
blake2b-512
268435456 bytes (268 MB) copied, 3.97249 s, 67.6 MB/s
sha1
268435456 bytes (268 MB) copied, 2.23538 s, 120 MB/s
sha2-256
268435456 bytes (268 MB) copied, 5.03352 s, 53.3 MB/s
sha2-512
268435456 bytes (268 MB) copied, 19.9417 s, 13.5 MB/s

And here's what I get on a 64-bit CPU (Intel i5-3470):

blake2s-256
1073741824 bytes (1.1 GB) copied, 2.09602 s, 512 MB/s
blake2b-512
1073741824 bytes (1.1 GB) copied, 1.6206 s, 663 MB/s
sha1
1073741824 bytes (1.1 GB) copied, 2.35279 s, 456 MB/s
sha2-256
1073741824 bytes (1.1 GB) copied, 5.14392 s, 209 MB/s
sha2-512
1073741824 bytes (1.1 GB) copied, 3.47958 s, 309 MB/s

I also remember trying this on an Allwinner A10 ARM (32 bit) CPU, and
the NEON-optimized version was actually faster than SHA1.

Best,
Eduard


Script:
#!/bin/sh
do_hash() { echo "$1"; shift; { dd if=/dev/zero bs=16M count=$n | "$@"
/dev/stdin >/dev/null; } 2>&1 | grep 'copied'; }
n=16
do_hash blake2s-256 ./b2sum -a blake2s
do_hash blake2b-512 ./b2sum -a blake2b
do_hash sha1 sha1sum
do_hash sha2-256 sha256sum
do_hash sha2-512 sha512sum
Continue reading on narkive:
Loading...