Discussion:
OT: security/entropy (was Re: New Fossil user experiences)
(too old to reply)
David Mason
2018-07-13 20:22:05 UTC
Permalink
I use a password generator of my own design - basically takes the userid,
concatenated with a fairly long secret phrase, and then I do SHA1 and
convert it to base64, giving a password like:
Acgq75VpCWjdsJaa5abe9JeX3I (don't worry, this isn't a real password to
anything)

After Warren's comment about wanting 256 bits of entropy, I fed this
through an online entropy calculator (https://planetcalc.com/2476/ - I
wouldn't feed a real password through anything on the web!), and got 4.29
bits of Shannon entropy (replacing a character with a special character
didn't change the number). Calculating it on a whole web page only gave
5.41.

So I tried:
dd if=/dev/random bs=100 count=1|od -c
and the result only gave 5.00 bits

So I'm guessing this isn't what he meant.

http://rumkin.com/tools/password/passchk.php does a version and it says my
fake password above is 130 bits. The 800 bits of random converted to hex
gives 779 bits

So I guess this is what Warren had in mind. Posting this in case it helps
somebody on the list.

Thanks ../Dave
Warren Young
2018-07-13 21:13:36 UTC
Permalink
Acgq75VpCWjdsJaa5abe9JeX3I (don't worry, this isn't a real password to anything)
…I fed this through an online entropy calculator and got 4.29 bits of Shannon entropy
That calculator is giving you bits *per character*.

You can see this several ways:

1. Double the message and the bits per character doesn’t change because the size of the source alphabet doesn’t change.

2. Add a dollar sign to the message, and bpc goes up a bit. (This conflicts with your report that adding a special character didn’t change it, but it did for me.)

3. Turn on the calculator’s case folding option and the bpc value goes down a bit.

One key realization you should get from this calculator is that ASCII text is not 7 or 8 bits of entropy per character. It simply is not, because not all characters in the source text are equally likely. Many code points may never be used in a given corpus.

Another realization is that a random blob of hex noise should asymptotically approach 4 bpc, since each character is 4 bits of data, and the data are supposed to be evenly distributed across the code space.

Here’s some noise from grc.com/pass:

C79683189EFBEBEC30A4C1A6D733F0242FB48E2582F3B2E7581D85E91E0A2FA5

The initial value is 3.91, and pasting it in a bunch of times does increase the value towards 4, suggesting it’s got pretty good entropy.

Now paste in an equivalent number of ‘a’ characters, and you get 0 bits of entropy. Strictly speaking, you get 1 bit of entropy for the whole message, but it shows 0 because the calculator is rounding the result off to 3 significant figures.
dd if=/dev/random bs=100 count=1|od -c
and the result only gave 5.00 bits
That’s plausible. With a much larger sample, the result should approach 7, 8, 16, or 21, depending on your local character set size. (Respectively: pure ASCII, ISO 8859 or similar, UCS-2 and full Unicode.)

Now see if you can guess the asymptotic ideal for this slightly different command:

$ dd if=/dev/random bs=100 count=1 | od


Spoiler below.




















………..












3, because the output is restricted to octal, thus 3 bpc.
Warren Young
2018-07-13 21:27:14 UTC
Permalink
Post by Warren Young
Now paste in an equivalent number of ‘a’ characters, and you get 0 bits of entropy. Strictly speaking, you get 1 bit of entropy for the whole message, but it shows 0 because the calculator is rounding the result off to 3 significant figures.
Hmmm…we also need something like a run-length prefix to reconstruct the message, so this calculator is undershooting slightly.

For example, 100 a’s requires a 7-bit run-length plus zero bits for our only code point, so we should get 0.07 bpc, within this calculator’s apparent precision even without dealing with roundoff errors.

Still, it’s good enough for our purposes here, which is to make it clear to us that if you use a hex string as a passphrase, you need 128 characters of it to fully justify the use of 256-bit symmetric encryption.
Joerg Sonnenberger
2018-07-14 20:18:25 UTC
Permalink
Post by Warren Young
Post by Warren Young
Now paste in an equivalent number of ‘a’ characters, and you get 0 bits of entropy. Strictly speaking, you get 1 bit of entropy for the whole message, but it shows 0 because the calculator is rounding the result off to 3 significant figures.
Hmmm…we also need something like a run-length prefix to reconstruct the message, so this calculator is undershooting slightly.
For example, 100 a’s requires a 7-bit run-length plus zero bits for our
only code point, so we should get 0.07 bpc, within this calculator’s
apparent precision even without dealing with roundoff errors.
You need more than zero bits to encode the original a though. Frankly,
this seems like a bit of pseudo-math to me. The very definition of
entropy depends on the context. So a question of "how much entropy does
this string have" is seriously underdefined. If you can take the output
of any modern CPRNG as hex and don't get 4bpc, the entropy estimator is
broken.

Joeg
Richard Hipp
2018-07-14 20:23:18 UTC
Permalink
Post by Joerg Sonnenberger
If you can take the output
of any modern CPRNG as hex and don't get 4bpc, the entropy estimator is
broken.
I've always understood the output of entropy estimators to me "the
entropy is no greater than this", which is somewhat easier to define,
since you get your choice of models.
--
D. Richard Hipp
***@sqlite.org
Warren Young
2018-07-14 23:24:26 UTC
Permalink
Post by Joerg Sonnenberger
Post by Warren Young
For example, 100 a’s requires a 7-bit run-length plus zero bits for our
only code point
You need more than zero bits to encode the original a though.
There’s only one letter in this alphabet, so all we need is a run length to say how many of them there are in our message.

Another way to look at it is that a bit encodes one of two states, but we only have one state in this example, so we don’t need a whole bit to encode it.

If the example were a’s or no-a’s, then you’d have two states, and thus need to encode it with bits. But materially, that’s no different from an a’s or b’s system, which also requires one bit per state.
Joerg Sonnenberger
2018-07-15 14:47:41 UTC
Permalink
Post by Warren Young
Post by Joerg Sonnenberger
Post by Warren Young
For example, 100 a’s requires a 7-bit run-length plus zero bits for our
only code point
You need more than zero bits to encode the original a though.
There’s only one letter in this alphabet, so all we need is a run length
to say how many of them there are in our message.
You are kind of making my point for me. You are adding a priori
knowledge of a single letter alphabet in a context where this assumption
makes no natural sense. By that line of reasoning, you can also assume
knowledge that passwords are always a multiple of size n and the entropy
would be near zero...

Joerg
Warren Young
2018-07-16 01:26:48 UTC
Permalink
Post by Joerg Sonnenberger
Post by Warren Young
Post by Joerg Sonnenberger
Post by Warren Young
For example, 100 a’s requires a 7-bit run-length plus zero bits for our
only code point
You need more than zero bits to encode the original a though.
There’s only one letter in this alphabet, so all we need is a run length
to say how many of them there are in our message.
You are kind of making my point for me. You are adding a priori
knowledge of a single letter alphabet in a context where this assumption
makes no natural sense. By that line of reasoning, you can also assume
knowledge that passwords are always a multiple of size n and the entropy
would be near zero…
I’m taking inspiration from Huffman coding here, so you actually need 7 bits for the length prefix + sizeof(dictionary_with_one_entry).
Warren Young
2018-07-13 21:40:04 UTC
Permalink
Post by Warren Young
2. Add a dollar sign to the message, and bpc goes up a bit. (This conflicts with your report that adding a special character didn’t change it, but it did for me.)
I just realized where the discrepancy comes from: you *replaced* one character of the original message with a special character, so the resulting alphabet size didn’t change, whereas I *added* a non-alphanumeric character to the message, which did change my alphabet size.

This calculator doesn’t know about “special characters,” all it knows about are the number of unique input symbols it is given and how that relates to the total message size.
Jungle Boogie
2018-07-13 21:49:52 UTC
Permalink
Post by David Mason
So I guess this is what Warren had in mind. Posting this in case it helps
somebody on the list.
Taking this offtopic a little bit more...let's talk about VPNs.

Don't use PPTP and don't get tangled up in ipsec configuration hell.

Be happy with wireguard! https://www.wireguard.com

It runs on everything but Windows (Linux, *BSD, MacOS, Andriod, iphone). There's
no passwords to share and the public keys are easily, easily distributed.
Post by David Mason
Thanks ../Dave
Joerg Sonnenberger
2018-07-14 20:21:13 UTC
Permalink
Post by Jungle Boogie
Post by David Mason
So I guess this is what Warren had in mind. Posting this in case it helps
somebody on the list.
Taking this offtopic a little bit more...let's talk about VPNs.
Don't use PPTP and don't get tangled up in ipsec configuration hell.
Be happy with wireguard! https://www.wireguard.com
It runs on everything but Windows (Linux, *BSD, MacOS, Andriod, iphone). There's
no passwords to share and the public keys are easily, easily distributed.
While I know that the underlaying cryptographic primites are sound, a
lot of the wireguard description is overhyped and misguided. Code
doesn't become more secure by moving it into the kernel and a VPN daemon
does a lot more than just dealing with key exchange. I'd strongly advice
to look at the ingredience list of the KoolAid...

Joerg
Loading...