Discussion:
[john-users] bcrypt cracking on ZTEX 1.15y FPGA boards (bcrypt-ztex)
Solar Designer
2017-06-25 17:07:53 UTC
Permalink
Hi,

After last year's work on descrypt-ztex:

http://www.openwall.com/lists/john-users/2016/11/06/1

Denis proceeded to work on bcrypt-ztex this year. We had listed this as
planned future work on Katja's project in 2014:

http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/

but unfortunately didn't resume that project until this year. I guess
better late than never, especially given that the results achieved are
still good even by modern standards (relative to current GPUs), despite
of those ZTEX 1.15y boards being rather old by now. As far as I can
tell, Denis' implementation is brand new, not building upon Katja's,
although our past experience was of some indirect help.

We finally got the bcrypt-ztex format into bleeding-jumbo this week.
For technical detail on the implementation, you may read:

https://github.com/magnumripper/JohnTheRipper/commit/4c37300e32c5b8c47e34be3a0b28a94ecd30da2a#diff-af56e15c23e8e70150ed23cb93cbae6fR1

The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without
overclocking, ~114k with overclocking. It should scale almost linearly
with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards
on the same host). I can't easily measure the power consumption right
now, but I estimate it's ~20W as both the board (with a large but slowly
rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely
warm to the touch. These used to get much warmer in Bitcoin mining
tests (known to be ~40W).

For comparison, according to Jeremi M Gosney's testing hashcat achieves
~23k c/s at bcrypt cost 5 on GTX 1080 Ti:

https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505

Hashtype: bcrypt $2*$, Blowfish (Unix)

Speed.Dev.#1.....: 23223 H/s (37.63ms)
Speed.Dev.#2.....: 22953 H/s (38.08ms)
Speed.Dev.#3.....: 22958 H/s (38.05ms)
Speed.Dev.#4.....: 22821 H/s (38.30ms)
Speed.Dev.#5.....: 23025 H/s (37.89ms)
Speed.Dev.#6.....: 23266 H/s (37.60ms)
Speed.Dev.#7.....: 23342 H/s (37.41ms)
Speed.Dev.#8.....: 23209 H/s (37.62ms)
Speed.Dev.#*.....: 184.8 kH/s

Thus, these FPGAs from several years back perform slightly faster than
this year's top GPUs at bcrypt, per chip. The four-chip ZTEX 1.15y is
slightly faster at bcrypt than four GTX 1080 Ti cards, while consuming
10+ times less power. (I suspect the GPUs don't reach their peak power
usage on this test, by far, which is why the conservative 10+ figure.)

This doesn't mean these FPGAs are so fast and those GPUs are so slow.
Rather, it means that bcrypt is a better fit for FPGAs than for GPUs.

Now to the setup and testing:

To build JtR bleeding-jumbo with ZTEX 1.15y board support, install
libusb (e.g., the libusb-devel package on Fedora) in addition to jumbo's
usual dependencies. Then use "./configure --enable-ztex". The rest of
the build is as usual for jumbo.

To access a ZTEX board as non-root (and you shouldn't build nor run JtR
as root) on a Linux system with udev, add this:

ATTRS{idVendor}=="221a", ATTRS{idProduct}=="0100", SUBSYSTEMS=="usb", ACTION=="add", MODE="0660", GROUP="ztex"

e.g. to /etc/udev/rules.d/99-local.rules (create this file). Then issue
these commands as root:

groupadd ztex
usermod -a -G ztex user # where "user" is your non-root username
systemctl restart systemd-udevd # or "service udev restart" if without systemd

In order to trigger udev to set the new permissions, (re)connect the
device after this point.

If you use a common Linux distro like Ubuntu or Fedora, the above should
be sufficient. In my case this time, the system is Fedora in a Qubes OS
VM, so I have to use USB passthrough. Moreover, I didn't want to pass
the entire USB controller into the VM, so the data is being proxied
through two userspace processes: one in the VM with JtR, and the other
in sys-usb. It's a setup supported by Qubes. No customizations other
than enabling the passthrough:

https://www.qubes-os.org/doc/usb/#attaching-a-single-usb-device-to-a-qube-usb-passthrough

There's significant CPU load caused in both of these VMs by such
proxying of the candidate passwords stream, and there must be increased
latency too. Speeds would probably be slightly higher if I ran the same
tests without use of VMs. In a way, it's amazing this works at all and
shows decent speeds.

Denis' implementation works around our current synchronous crypt_all()
API by buffering a large number of candidate passwords - many times
larger than the number of cores. The current design has 124 bcrypt
cores per chip, so 496 per board. My tests are with "TargetSetting = 5"
(tuning for bcrypt cost 5) in the "[ZTEX:bcrypt]" section in john.conf,
and this results in:

0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 63488

appearing in john.log. The number 63488 is 496*128. This buffering is
similar to what GPUs commonly require, albeit for different reasons
(greater concurrency and pipelining on GPUs vs. hiding communication
latency to these FPGA boards). Either way, this has usability and
efficiency drawbacks when you interrupt/restore a session (especially
with large salt count), but it results in nearly optimal c/s rate
despite of the synchronous API and the USB latency (especially in my
testing in a VM).

Here is a test run:

$ ./john -form=bcrypt-ztex -mask='tes?l?l?l?l?l' -u=u2781-bf pw-fake-unix
ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:01 1.18% (ETA: 22:05:58) 0g/s 105720p/s 105720c/s 105720C/s tesaaata..tesaaota
0g 0:00:00:05 4.73% (ETA: 22:06:19) 0g/s 106521p/s 106521c/s 106521C/s tesaaale..tesaaole
0g 0:00:00:17 15.38% (ETA: 22:06:24) 0g/s 106583p/s 106583c/s 106583C/s tesaaaan..tesaaoan
0g 0:00:00:34 30.77% (ETA: 22:06:24) 0g/s 106614p/s 106614c/s 106614C/s tesaaaat..tesaaoat
testtest (u2781-bf)
1g 0:00:00:35 DONE (2017-06-24 22:05) 0.02807g/s 106581p/s 106581c/s 106581C/s testtest..tes###st
Use the "--show" option to display all of the cracked passwords reliably
Session completed

This is at 141 MHz, which per the design tools is guaranteed to work.
As you can see, the speed is about 106.6k c/s.

Now hybrid mode, combining mask (in this case simply having it give the
known 3 characters verbatim) with incremental mode (thus, necessarily
feeding the candidate passwords from host):

$ ./john -form=bcrypt-ztex -mask='tes?w' -inc=lower -min-len=8 -max-len=8 -u=u2781-bf pw-fake-unix
ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:02 2.14% (ETA: 22:07:51) 0g/s 102814p/s 102814c/s 102814C/s tesnivfm..tesjrkto
testtest (u2781-bf)
1g 0:00:00:04 DONE (2017-06-24 22:06) 0.2331g/s 103593p/s 103593c/s 103593C/s testtest..tesfedal
Use the "--show" option to display all of the cracked passwords reliably
Session completed

Much quicker running time (4 seconds instead of 35) due to incremental
mode's more optimal ordering of candidate passwords, even though the c/s
rate has reduced to 103.6k c/s (but 4 seconds is too little to measure
this precisely).

Another variation, running against many hashes (and salts) and using
mask mode to double the "words" generated by incremental mode:

$ ./john -form=bcrypt-ztex -mask='?w?w' -inc=lower -min-len=8 -max-len=8 pw-fake-unix
ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:02 0g/s 0p/s 104078c/s 104078C/s lovelove..lvvllvvl
0g 0:00:00:20 0g/s 0p/s 105297c/s 105297C/s lovelove..lvvllvvl
0g 0:00:00:53 0g/s 0p/s 105398c/s 105398C/s lovelove..lvvllvvl
asdfasdf (u915-bf)
1g 0:00:01:22 0.01211g/s 0p/s 105415c/s 105415C/s lovelove..lvvllvvl
1g 0:00:03:18 0.005044g/s 0p/s 105375c/s 105375C/s lovelove..lvvllvvl
1g 0:00:04:40 0.003567g/s 0p/s 105307c/s 105307C/s lovelove..lvvllvvl
Use the "--show" option to display all of the cracked passwords reliably
Session aborted

I interrupted this one, but it does show that 105.3k c/s is possible
even with incremental mode and a mask on top of it.

Now extreme overclocking, setting "Frequency = 163" in the section in
john.conf (it is also possible to set individual frequencies per FPGA -
see the comments in john.conf - but I did not use this here):

$ ./john -form=bcrypt-ztex -mask='tes?l?l?l?l?l' -u=u2781-bf pw-fake-unix
ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:163 163 163 163
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:02 2.37% (ETA: 22:25:57) 0g/s 121213p/s 121213c/s 121213C/s tesaaaka..tesaaoka
0g 0:00:00:08 8.28% (ETA: 22:26:09) 0g/s 122572p/s 122572c/s 122572C/s tesaaani..tesaaoni
0g 0:00:00:18 18.93% (ETA: 22:26:08) 0g/s 122868p/s 122868c/s 122868C/s tesaaaxn..tesaaoxn
0g 0:00:00:26 27.22% (ETA: 22:26:08) 0g/s 122871p/s 122871c/s 122871C/s tesaaais..tesaaois
testtest (u2781-bf)
1g 0:00:00:31 DONE (2017-06-24 22:25) 0.03224g/s 122425p/s 122425c/s 122425C/s testtest..tes###st
Use the "--show" option to display all of the cracked passwords reliably
Session completed

This worked here (and 163 MHz is actually the maximum that does, with
higher values failing even this quick test) achieving 122.4k c/s, but
more thorough testing shows this design and board are unstable at this
high frequency, so I didn't quote it above. The highest that works
reliably for me so far is 152 MHz, where the below tests are supposed to
and do crack all of the 239 short passwords, 7 times in a row:

$ egrep '^([^:]*:){4}[a-z]{4}:' pw-fake-unix > pw-fake-len4
$ for n in `seq 1 7`; do rm john.pot; ./john -form=bcrypt-ztex -mask='?l?l?l?l' -verb=1 pw-fake-len4; done
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:39 N/A 0.5981g/s 1143p/s 114625c/s 114625C/s alex..###q
Session completed
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:40 N/A 0.5972g/s 1141p/s 114458c/s 114458C/s alex..###q
Session completed
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:39 N/A 0.5980g/s 1143p/s 114613c/s 114613C/s alex..###q
Session completed
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:39 N/A 0.5976g/s 1142p/s 114527c/s 114527C/s alex..###q
Session completed
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:39 N/A 0.5976g/s 1142p/s 114542c/s 114542C/s alex..###q
Session completed
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:39 N/A 0.5977g/s 1142p/s 114550c/s 114550C/s alex..###q
Session completed
ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
239g 0:00:06:40 N/A 0.5971g/s 1141p/s 114444c/s 114444C/s alex..###q
Session completed

So that's 114.5k c/s at maximum overclocking here. I must admit this
board is 10% overvolted (extra resistors soldered on by the previous
owner), but per testing at Bitcoin mining this only provided a 1%
increase in maximum reasonable clock rates (vs. other non-overvolted
boards), so it's probably similar here. Denis' boards are not
overvolted, but he mentioned getting similar maximum stable clocks and
speeds. YMMV.

If you test our *-ztex formats as well, please share your feedback.
In case you'd like to reproduce these results, our pw-fake-unix is
available at:

http://openwall.info/wiki/john/sample-hashes#Sample-password-hash-files

Also see this recent reply on what else we could implement on FPGAs:

http://www.openwall.com/lists/john-users/2017/05/31/2

And this Twitter poll/thread:

https://twitter.com/solardiz/status/876087192573104128

PBKDF2-HMAC-SHA* won, and we'll likely have it in a few months from now.
This means things like WPA and dmg.

Another target we intend to explore is AWS F1, but we don't have
anything ready yet. F1 turned out to be reasonably priced - $1.65/hour
per FPGA, spot price now is ~$0.18/hour (I guess not much demand yet):

https://aws.amazon.com/ec2/instance-types/f1/
https://aws.amazon.com/ec2/pricing/on-demand/
https://aws.amazon.com/ec2/spot/pricing/ (choose N. Virginia)

Alexander
Solar Designer
2017-06-26 06:54:23 UTC
Permalink
Post by Solar Designer
The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without
overclocking, ~114k with overclocking. It should scale almost linearly
with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards
on the same host). I can't easily measure the power consumption right
now, but I estimate it's ~20W as both the board (with a large but slowly
rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely
warm to the touch. These used to get much warmer in Bitcoin mining
tests (known to be ~40W).
For comparison, according to Jeremi M Gosney's testing hashcat achieves
https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505
Hashtype: bcrypt $2*$, Blowfish (Unix)
Speed.Dev.#1.....: 23223 H/s (37.63ms)
Speed.Dev.#2.....: 22953 H/s (38.08ms)
Speed.Dev.#3.....: 22958 H/s (38.05ms)
Speed.Dev.#4.....: 22821 H/s (38.30ms)
Speed.Dev.#5.....: 23025 H/s (37.89ms)
Speed.Dev.#6.....: 23266 H/s (37.60ms)
Speed.Dev.#7.....: 23342 H/s (37.41ms)
Speed.Dev.#8.....: 23209 H/s (37.62ms)
Speed.Dev.#*.....: 184.8 kH/s
Jeremi has just posted more benchmarks for bcrypt on different GPUs:

https://gist.github.com/epixoip/9d9b943fd580ff6bfa80e48a0e77520d

| Maxwell/Pascal bcrypt Benchmarks
|
| Product: Sagitta Invictus-based dev box
|
| Software: Hashcat v3.6.0-39-gc918173, Nvidia driver 381.22
|
| Accelerator: 1x GTX 970 reference, 1x GTX 980 reference, 1x GTX Titan X (Maxwell) reference, 1x GTX 1080 Ti FE
|
| ***@dev:~/hashcat# ./hashcat -w 4 -b -m 3200
| hashcat (v3.6.0-39-gc918173) starting in benchmark mode...
|
| OpenCL Platform #1: NVIDIA Corporation
| ======================================
| * Device #1: GeForce GTX 970, 1008/4034 MB allocatable, 13MCU
| * Device #2: GeForce GTX 980, 1008/4033 MB allocatable, 16MCU
| * Device #3: GeForce GTX TITAN X, 3051/12207 MB allocatable, 24MCU
| * Device #4: GeForce GTX 1080 Ti, 2793/11172 MB allocatable, 28MCU
|
| Hashtype: bcrypt $2*$, Blowfish (Unix)
|
| Speed.Dev.#1.....: 7039 H/s (234.21ms)
| Speed.Dev.#2.....: 8465 H/s (236.41ms)
| Speed.Dev.#3.....: 12313 H/s (243.82ms)
| Speed.Dev.#4.....: 21827 H/s (160.32ms)
| Speed.Dev.#*.....: 49644 H/s

Unfortunately, the power usage figures shown further in that gist are
at idle rather than at bcrypt load.
Post by Solar Designer
Thus, these FPGAs from several years back perform slightly faster than
this year's top GPUs at bcrypt, per chip. The four-chip ZTEX 1.15y is
slightly faster at bcrypt than four GTX 1080 Ti cards, while consuming
10+ times less power. (I suspect the GPUs don't reach their peak power
usage on this test, by far, which is why the conservative 10+ figure.)
This doesn't mean these FPGAs are so fast and those GPUs are so slow.
Rather, it means that bcrypt is a better fit for FPGAs than for GPUs.
Alexander
Royce Williams
2017-07-03 15:12:12 UTC
Permalink
Post by Solar Designer
We finally got the bcrypt-ztex format into bleeding-jumbo this week.
Pretty great work - thanks again to you and Denis and anyone else who
has been working on this.
Post by Solar Designer
The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without
overclocking, ~114k with overclocking. It should scale almost linearly
with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards
on the same host). I can't easily measure the power consumption right
now, but I estimate it's ~20W as both the board (with a large but slowly
rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely
warm to the touch. These used to get much warmer in Bitcoin mining
tests (known to be ~40W).
Here are some tests on my cluster, as recently described here:

http://www.openwall.com/lists/john-users/2017/06/30/1

I discovered today that I had a USB power problem with two boards,
which I have fixed. (I had read that these boards require steady power
on the USB side, even though they are independently powered.) They are
still a little finicky, but I can usually coax them into working now.

I now have two more boards for a total of 16, so adjust any
calculations accordingly.
Post by Solar Designer
Denis' implementation works around our current synchronous crypt_all()
API by buffering a large number of candidate passwords - many times
larger than the number of cores. The current design has 124 bcrypt
cores per chip, so 496 per board. My tests are with "TargetSetting = 5"
(tuning for bcrypt cost 5) in the "[ZTEX:bcrypt]" section in john.conf,
0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 63488
I wasn't paying a lot of attention to it at the time, but looking at
john.log, unless I've lost track of something, my value was:

0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 262140

... for values of both 5 and 6 for TargetSetting.


My first tests were with all 16 boards.

The first test used the default john.conf [ZTEX:bcrypt] TargetSetting
= 6 value, with john compiled with the keys_per_crypt *= 2 tweak:

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix

Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:01:54 0g/s 0p/s 1609Kc/s 1609KC/s loveaaaa..loveioia
0g 0:00:04:54 0g/s 0p/s 1611Kc/s 1611KC/s loveaaaa..loveioia
0g 0:00:09:53 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:11:56 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:12:18 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:19:32 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:22:06 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:24:21 0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:27:30 0g/s 0p/s 1613Kc/s 1613KC/s loveaaaa..loveioia
0g 0:00:32:16 0.00% (ETA: 2030-12-18 00:34)
0g/s 491.5p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:00:43:20 0.00% (ETA: 2035-07-31 03:45)
0g/s 366.0p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:00:51:23 0g/s 308.6p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:00:57:27 0g/s 276.0p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:01:00:56 0g/s 260.3p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:01:13:00 0.00% (ETA: 2032-09-23 00:29) 0g/s 434.5p/s 1613Kc/s
1613KC/s lolaaatn..lolaiocn


That test ran at ~505W / 16 = ~31.6W per board, which includes the
power for the onboard fans. The power consumption actually jumps
around quite a bit between 495W and 515W, but 505W seemed about
average.

The second test was with 16 boards, changing to TargetSetting = 5, and
still with keys_per_crypt *= 2:

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix

Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:12 0g/s 0p/s 1625Kc/s 1625KC/s loveaaaa..loveaida
0g 0:00:02:02 0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida
0g 0:00:03:14 0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida
0g 0:00:08:22 0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida
0g 0:00:12:30 0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida
0g 0:00:17:57 0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida
0g 0:00:21:34 0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida
0g 0:00:24:27 0g/s 0p/s 1631Kc/s 1631KC/s loveaaaa..loveaida
0g 0:00:38:52 0.00% (ETA: 2031-03-22 14:56)
0g/s 482.2p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy
0g 0:00:41:28 0.00% (ETA: 2032-02-20 19:37)
0g/s 452.0p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy

For that test, I'd say that power was very slightly higher, maybe
averaging 510W, so ~31.9W per board. But this might be normal
variation.

So across the cluster, with known tweaks and settings without
overclocking, I'm getting 1.632Mc/s for 510W.

Next, here are single-board versions of both tests, using the same
board. (I did this by disconnecting the other boards. Is there a way
to tell john to only use a specific device?)

First, TargetSetting = 5, keys_per_crypt *= 2:

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:14 0g/s 0p/s 106815c/s 106815C/s loveaaaa..loveaaoa
0g 0:00:03:12 0g/s 0p/s 107169c/s 107169C/s loveaaaa..loveaaoa
0g 0:00:05:44 0g/s 0p/s 107173c/s 107173C/s loveaaaa..loveaaoa
0g 0:00:06:51 0g/s 0p/s 107181c/s 107181C/s loveaaaa..loveaaoa
0g 0:00:10:36 0g/s 0p/s 107190c/s 107190C/s loveaaaa..loveaaoa
0g 0:00:15:34 0g/s 0p/s 107197c/s 107197C/s loveaaaa..loveaaoa
0g 0:00:20:13 0g/s 0p/s 107194c/s 107194C/s loveaaaa..loveaaoa
0g 0:00:24:07 0g/s 0p/s 107199c/s 107199C/s loveaaaa..loveaaoa


Then using TargetSetting at the default of 6, keys_per_crypt *= 2
(--progress-every, where have you been all my life?)

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' --progress-every=300 pw-fake-unix
ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:01 0g/s 0p/s 102565c/s 102565C/s loveaaaa..loveomaa
0g 0:00:05:00 0g/s 0p/s 106748c/s 106748C/s loveaaaa..loveomaa
0g 0:00:10:00 0g/s 0p/s 106902c/s 106902C/s loveaaaa..loveomaa
0g 0:00:15:00 0g/s 0p/s 106952c/s 106952C/s loveaaaa..loveomaa
0g 0:00:20:00 0g/s 0p/s 106978c/s 106978C/s loveaaaa..loveomaa
0g 0:00:25:00 0g/s 0p/s 106991c/s 106991C/s loveaaaa..loveomaa
0g 0:00:30:00 0g/s 33.04p/s 107002c/s 107002C/s loveaaco..loveomco
0g 0:00:35:00 0g/s 28.32p/s 107010c/s 107010C/s loveaaco..loveomco
0g 0:00:40:00 0g/s 24.78p/s 107015c/s 107015C/s loveaaco..loveomco
0g 0:00:45:00 0g/s 22.02p/s 107019c/s 107019C/s loveaaco..loveomco
0g 0:00:50:00 0g/s 19.82p/s 107023c/s 107023C/s loveaaco..loveomco
0g 0:00:55:00 0g/s 18.02p/s 107027c/s 107027C/s loveaaco..loveomco
0g 0:01:00:00 0g/s 33.04p/s 107031c/s 107031C/s loveaavl..loveomvl


Then I enabled the full cluster again.

Here are all 16 boards again, with TargetSetting = 5, the
keys_per_crypt *= 2 tweak, and Frequency = 152.

During this test, I was also trying to coax a 17th board into
usability. I include this test anyway because there appears to have
been a slight (temporary?) drop in performance associated with the
attempt to talk to that board (or it might be a coincidence; I will
test further to check this correlation):

Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:01:03 0g/s 0p/s 1654Kc/s 1654KC/s loveaaaa..loveaida
0g 0:00:05:00 0g/s 0p/s 1655Kc/s 1655KC/s loveaaaa..loveaida
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
0g 0:00:08:19 0g/s 0p/s 1644Kc/s 1644KC/s loveaaaa..loveaida
0g 0:00:10:00 0g/s 0p/s 1645Kc/s 1645KC/s loveaaaa..loveaida
0g 0:00:15:00 0g/s 0p/s 1649Kc/s 1649KC/s loveaaaa..loveaida
0g 0:00:18:03 0g/s 0p/s 1650Kc/s 1650KC/s loveaaaa..loveaida
0g 0:00:20:00 0g/s 0p/s 1650Kc/s 1650KC/s loveaaaa..loveaida
0g 0:00:25:00 0g/s 0p/s 1651Kc/s 1651KC/s loveaaaa..loveaida
0g 0:00:30:00 0g/s 0p/s 1652Kc/s 1652KC/s loveaaaa..loveaida
0g 0:00:35:00 0g/s 0p/s 1652Kc/s 1652KC/s loveaaaa..loveaida
0g 0:00:40:00 0.00% (ETA: 2031-08-16 00:16)
0g/s 468.6p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy
0g 0:00:45:00 0.00% (ETA: 2033-05-21 14:48)
0g/s 416.5p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy
0g 0:00:50:00 0.00% (ETA: 2035-02-25 05:21)
0g/s 374.8p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy


And finally, a more focused example - all 16 boards, a single
artificial hash, with bcrypt work factor 12, with the same tweaks:

$ cat single-bf.hash
$2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yNbYSphoTp.

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' --progress-every=300 single-bf.hash
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:12 0.00% (ETA: 2017-12-16 07:45)
0g/s 14299p/s 14299c/s 14299C/s loveisxm..lovehjfc
0g 0:00:05:00 0.00% (ETA: 2017-12-17 12:58)
0g/s 14422p/s 14422c/s 14422C/s laliawhy..lalidtdh
0g 0:00:10:01 0.00% (ETA: 2017-12-17 19:40)
0g/s 14417p/s 14417c/s 14417C/s bebeapeq..bebednqq
0g 0:00:15:00 0.01% (ETA: 2017-12-17 17:52)
0g/s 14417p/s 14417c/s 14417C/s lalluepc..lallqhbd
0g 0:00:20:00 0.01% (ETA: 2017-12-17 20:20)
0g/s 14414p/s 14414c/s 14414C/s pinaidtw..pinahzrz
0g 0:00:25:00 0.01% (ETA: 2017-12-17 18:51)
0g/s 14413p/s 14413c/s 14413C/s poleiswt..polehjjm
0g 0:00:30:00 0.01% (ETA: 2017-12-17 20:20)
0g/s 14412p/s 14412c/s 14412C/s locakkyv..locaeocf
0g 0:00:35:00 0.01% (ETA: 2017-12-17 21:23)
0g/s 14412p/s 14412c/s 14412C/s beednaol..beedbyas
0g 0:00:40:00 0.02% (ETA: 2017-12-17 20:20)
0g/s 14414p/s 14414c/s 14414C/s popenwkp..popebtuj
0g 0:00:45:00 0.02% (ETA: 2017-12-17 19:31)
0g/s 14416p/s 14416c/s 14416C/s luiznpnr..luizbnil
0g 0:00:50:00 0.02% (ETA: 2017-12-17 18:51)
0g/s 14417p/s 14417c/s 14417C/s boolnupg..boolbakp
0g 0:00:55:00 0.02% (ETA: 2017-12-17 19:40)
0g/s 14418p/s 14418c/s 14418C/s puthpzto..puthocln
0g 0:01:00:00 0.02% (ETA: 2017-12-17 19:06)
0g/s 14419p/s 14419c/s 14419C/s joespjwb..joesolvk
0g 0:01:01:00 0.03% (ETA: 2017-12-17 18:31)
0g/s 14419p/s 14419c/s 14419C/s johoiouh..johohbdu

This pulled about 560W from the wall.


I tried to compare this to john on my general-purpose GPU system
(which isn't working the way I expect it to, as it appears to only be
using one GPU. Not sure what I'm doing wrong yet):

$ ./john --format=bcrypt-opencl --device=gpu --fork=6 -inc=lower
-min-len=8 -max-len=8 -mask='?w?l?l?l?l' --progress-every=300
--max-run-time=3660 single-bf.hash
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-opencl [Blowfish OpenCL])
Node numbers 1-6 of 6 (fork)
Device 3: GeForce GTX 1080
Device 0: GeForce GTX 1080
Device 5: GeForce GTX 1080
Device 4: GeForce GTX 1080
Device 1: GeForce GTX 1080
Device 2: GeForce GTX 1080
[ptxas info elided]
Press 'q' or Ctrl-C to abort, almost any other key for status
1 0g 0:00:01:16 0.00% (ETA: 2037-12-19 05:32) 0g/s 53.38p/s 53.38c/s
53.38C/s GPU:34C lilluela..lilleoya

... but maybe all six GPUs might run at 53.38c/s x 6 = 320c/s?


I also compared GPU performance with hashcat.

First, with max power throttled down to 150W per card from the default
of 180, which is how I usually run:

$ hashcat -w 4 -a 3 -m 3200 single-bf.hash ?l?l?l?l?l?l?l
hashcat (v3.6.0-44-g21d10215+) starting...

OpenCL Platform #1: NVIDIA Corporation
======================================
* Device #1: GeForce GTX 1080, 2028/8113 MB allocatable, 20MCU
* Device #2: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #3: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #4: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #5: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #6: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU

Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Applicable optimizers:
* Zero-Byte
* Single-Hash
* Single-Salt
* Brute-Force

Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger disabled.

[s]tatus [p]ause [r]esume [b]ypass [c]heckpoint [q]uit =>

Session..........: hashcat
Status...........: Running
Hash.Type........: bcrypt $2*$, Blowfish (Unix)
Hash.Target......: $2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yN...phoTp.
Time.Started.....: Sun Jul 2 20:51:46 2017 (9 mins, 31 secs)
Time.Estimated...: Thu Nov 2 04:23:40 2017 (122 days, 7 hours)
Guess.Mask.......: ?l?l?l?l?l?l?l [7]
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....: 128 H/s (154.07ms)
Speed.Dev.#2.....: 125 H/s (157.26ms)
Speed.Dev.#3.....: 127 H/s (154.83ms)
Speed.Dev.#4.....: 127 H/s (154.09ms)
Speed.Dev.#5.....: 128 H/s (154.25ms)
Speed.Dev.#6.....: 126 H/s (155.12ms)
Speed.Dev.#*.....: 760 H/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 422400/8031810176 (0.01%)
Rejected.........: 0/422400 (0.00%)
Restore.Point....: 0/308915776 (0.00%)
Candidates.#1....: oarieri -> ombreri
Candidates.#2....: ovhteri -> oibzana
Candidates.#3....: osdyban -> ojkhana
Candidates.#4....: opwzana -> ozanana
Candidates.#5....: oufgeri -> ocwzana
Candidates.#6....: oxckier -> ohydana
HWMon.Dev.#1.....: Temp: 35c Fan:100% Util:100% Core:1911MHz Mem:4513MHz Bus:8
HWMon.Dev.#2.....: Temp: 35c Fan:100% Util:100% Core:1873MHz Mem:4513MHz Bus:4
HWMon.Dev.#3.....: Temp: 39c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:16
HWMon.Dev.#4.....: Temp: 35c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:4
HWMon.Dev.#5.....: Temp: 34c Fan:100% Util:100% Core:1911MHz Mem:4513MHz Bus:1
HWMon.Dev.#6.....: Temp: 33c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:1


Returning the GPUs' default max power (180W) made no difference at all
for a single $12$ bcrypt hash.

In both cases, the GPU system was pulling 500W from the wall, and the
GPUs hardly broke a sweat, temperature-wise. There may be ways to get
more performance from hashcat for this hash type and work factor, but
that will take some research on my part.

So if I'm reading this right, for single-hash bcrypt with work factor
12, just using my own hardware and techniques to compare, the best
performance available to me so far on FPGA (14419c/s) is about 19
times as fast as the best performance I know how to get on my GPU
system (760H/s), at around the same power consumption:

FPGA: 14419c/s / 560W = ~25.75c/s/W
GPU: 760H/s / 500W = 1.52H/s/W

So for a focused, single-hash attack on a modern target using my own
gear, FPGA is ~17 times as efficient as GPU?

I will also do some testing without the keys_per_crypt *= 2 tweak, and
with different keys_per_crypt values, but I wanted to get this posted.

Royce
Solar Designer
2017-07-03 17:14:01 UTC
Permalink
Post by Royce Williams
I now have two more boards for a total of 16, so adjust any
calculations accordingly.
Great. Thank you for providing these benchmarks!
Post by Royce Williams
Post by Solar Designer
Denis' implementation works around our current synchronous crypt_all()
API by buffering a large number of candidate passwords - many times
larger than the number of cores. The current design has 124 bcrypt
cores per chip, so 496 per board. My tests are with "TargetSetting = 5"
(tuning for bcrypt cost 5) in the "[ZTEX:bcrypt]" section in john.conf,
0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 63488
I wasn't paying a lot of attention to it at the time, but looking at
0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 262140
... for values of both 5 and 6 for TargetSetting.
Yes, in your case TargetSetting shouldn't matter, because you have so
many boards that the value is capped anyway. But you could try hacking
this cap in the source, in ztex_bcrypt.c:

262140, // Absolute max. keys/crypt_all_interval for all devices.

Try setting it to 2031616 (as 63488*32), and then TargetSetting will be
making a difference.

Denis - by the way, 262140 isn't even a multiple of 496 (core count per
board) - perhaps that's wrong and should be fixed.
Post by Royce Williams
My first tests were with all 16 boards.
The first test used the default john.conf [ZTEX:bcrypt] TargetSetting
The "keys_per_crypt *= 2 tweak" probably didn't matter because of the
cap above.
Post by Royce Williams
$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:01:54 0g/s 0p/s 1609Kc/s 1609KC/s loveaaaa..loveioia
0g 0:01:00:56 0g/s 260.3p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:01:13:00 0.00% (ETA: 2032-09-23 00:29) 0g/s 434.5p/s 1613Kc/s
1613KC/s lolaaatn..lolaiocn
So this is 101k per board, or about 6% lower than we get with 1 board.
Post by Royce Williams
That test ran at ~505W / 16 = ~31.6W per board, which includes the
power for the onboard fans. The power consumption actually jumps
around quite a bit between 495W and 515W, but 505W seemed about
average.
Cool. The fluctuation is probably in part because the FPGAs are at
times left idle, sometimes several of them at once.
Post by Royce Williams
The second test was with 16 boards, changing to TargetSetting = 5, and
0g 0:00:41:28 0.00% (ETA: 2032-02-20 19:37)
0g/s 452.0p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy
For that test, I'd say that power was very slightly higher, maybe
averaging 510W, so ~31.9W per board. But this might be normal
variation.
Yes, this should be normal variation.
Post by Royce Williams
So across the cluster, with known tweaks and settings without
overclocking, I'm getting 1.632Mc/s for 510W.
That's 102k per board in this test.
Post by Royce Williams
Next, here are single-board versions of both tests, using the same
board. (I did this by disconnecting the other boards. Is there a way
to tell john to only use a specific device?)
I think there's no such way currently. I think we should add that.
Post by Royce Williams
$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:14 0g/s 0p/s 106815c/s 106815C/s loveaaaa..loveaaoa
0g 0:00:03:12 0g/s 0p/s 107169c/s 107169C/s loveaaaa..loveaaoa
0g 0:00:24:07 0g/s 0p/s 107199c/s 107199C/s loveaaaa..loveaaoa
OK, this matches my results.
Post by Royce Williams
Then I enabled the full cluster again.
Here are all 16 boards again, with TargetSetting = 5, the
keys_per_crypt *= 2 tweak, and Frequency = 152.
The overclock doesn't appear to have made much of a difference (5%
overclock, but only 1% speedup - and that's with trying to use a 17th
board). Maybe this is because of the keys_per_crypt cap - so please
hack that as above and re-test.

Also, you seem to have excluded the portions of JtR output where it
reports the clock rates - please include those going forward.
Post by Royce Williams
During this test, I was also trying to coax a 17th board into
usability. I include this test anyway because there appears to have
been a slight (temporary?) drop in performance associated with the
attempt to talk to that board (or it might be a coincidence; I will
This is interesting as a test of JtR's ability to recover from board
failure, but for further benchmarks please just use the 16 boards you
have working reliably.
Post by Royce Williams
And finally, a more focused example - all 16 boards, a single
$ cat single-bf.hash
$2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yNbYSphoTp.
$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' --progress-every=300 single-bf.hash
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:12 0.00% (ETA: 2017-12-16 07:45)
0g/s 14299p/s 14299c/s 14299C/s loveisxm..lovehjfc
0g 0:00:05:00 0.00% (ETA: 2017-12-17 12:58)
0g/s 14422p/s 14422c/s 14422C/s laliawhy..lalidtdh
0g 0:01:01:00 0.03% (ETA: 2017-12-17 18:31)
0g/s 14419p/s 14419c/s 14419C/s johoiouh..johohbdu
This pulled about 560W from the wall.
Looks good. This would be something like 1.85M total or 115k per board
if scaled to bcrypt work factor 5. Is this at standard clocks or o/c?
Post by Royce Williams
I tried to compare this to john on my general-purpose GPU system
(which isn't working the way I expect it to, as it appears to only be
This may very well be broken in JtR right now. We implemented
bcrypt-opencl as an experiment and optimized it a little bit back when
HD 7970 was a current card. We haven't tried tuning it for NVIDIA
Maxwell and Pascal yet - perhaps we should. And bcrypt cost 12 is hard
for GPUs without what we call a split kernel, which I think we lack for
this format. With high bcrypt cost settings, single kernel invocations
may be taking too long, resulting in timeouts. This doesn't explain why
things appear to be working (even if suboptimally) with one of your GPUs
but not with the rest, though - we'll need to debug that and fix it.
Please open a GitHub issue for this. Thanks!
Post by Royce Williams
1 0g 0:00:01:16 0.00% (ETA: 2037-12-19 05:32) 0g/s 53.38p/s 53.38c/s
53.38C/s GPU:34C lilluela..lilleoya
... but maybe all six GPUs might run at 53.38c/s x 6 = 320c/s?
Yes, perhaps. Or more with some tuning, as you see with hashcat. IIRC,
on AMD GCN, JtR's bcrypt-opencl and hashcat's have similar performance,
but on NVIDIA Maxwell and Pascal hashcat's is much faster. We didn't
care much because bcrypt cracking is generally done on CPUs anyway, but
with newer NVIDIA cards showing not so poor bcrypt speeds (compared to
CPUs) perhaps we should revise/tune our code. Please feel free to open
a GitHub issue for that as well.
Post by Royce Williams
I also compared GPU performance with hashcat.
Speed.Dev.#1.....: 128 H/s (154.07ms)
Speed.Dev.#2.....: 125 H/s (157.26ms)
Speed.Dev.#3.....: 127 H/s (154.83ms)
Speed.Dev.#4.....: 127 H/s (154.09ms)
Speed.Dev.#5.....: 128 H/s (154.25ms)
Speed.Dev.#6.....: 126 H/s (155.12ms)
Speed.Dev.#*.....: 760 H/s
Returning the GPUs' default max power (180W) made no difference at all
for a single $12$ bcrypt hash.
In both cases, the GPU system was pulling 500W from the wall, and the
GPUs hardly broke a sweat, temperature-wise. There may be ways to get
more performance from hashcat for this hash type and work factor, but
that will take some research on my part.
So if I'm reading this right, for single-hash bcrypt with work factor
12, just using my own hardware and techniques to compare, the best
performance available to me so far on FPGA (14419c/s) is about 19
times as fast as the best performance I know how to get on my GPU
FPGA: 14419c/s / 560W = ~25.75c/s/W
GPU: 760H/s / 500W = 1.52H/s/W
So for a focused, single-hash attack on a modern target using my own
gear, FPGA is ~17 times as efficient as GPU?
This sounds about right. FPGAs' energy-efficiency advantage at bcrypt
used to be greater than that at the time of Katja's work in 2013-2014,
but with NVIDIA Maxwell and Pascal these GPUs got close to those FPGAs.
Of course, there are also newer FPGAs.
Post by Royce Williams
I will also do some testing without the keys_per_crypt *= 2 tweak, and
with different keys_per_crypt values, but I wanted to get this posted.
Thanks. The keys_per_crypt tweaks shouldn't matter until you lift the
cap, so please do that first.

It would also be nice to see some bcrypt benchmarks that actually crack
passwords - including directly comparable to those I posted in:

http://www.openwall.com/lists/john-users/2017/06/25/1

And you may repeat your descrypt tests with the 16 boards (and keeping
the keys_per_crypt *= 2 tweak). Yours should be on par with Jeremi's
8x1080Ti system at descrypt then. If you do this, please post those to
the descrypt thread.

It would also be interesting to see how the power usage increases with
5% overclocks for both bcrypt and descrypt. The increase from 510W to
560W you report here isn't that - I think it's primarily a result of
going to the higher bcrypt work factor, which reduces the overhead and
keeps the bcrypt cores busier.

Alexander
Solar Designer
2017-07-03 19:25:42 UTC
Permalink
Post by Solar Designer
Post by Royce Williams
0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 262140
... for values of both 5 and 6 for TargetSetting.
Yes, in your case TargetSetting shouldn't matter, because you have so
many boards that the value is capped anyway. But you could try hacking
262140, // Absolute max. keys/crypt_all_interval for all devices.
Try setting it to 2031616 (as 63488*32), and then TargetSetting will be
making a difference.
Denis - by the way, 262140 isn't even a multiple of 496 (core count per
board) - perhaps that's wrong and should be fixed.
Royce, Denis has just explained to me where this value comes from, and
it's correct as-is and isn't to be patched: things are not expected to
work correctly for higher values. So please disregard this part of my
advice. Denis intends to remove this limitation in a future revision.

Alexander
teraflopgroup
2018-04-22 15:04:39 UTC
Permalink
ЗЎравствуйте!
ППЌПгОте, пПжалуйста решОть прПблеЌу.
Плата ztex. 1.15y.
Я устаМПвОл linux mint, сПбрал JtR, МП теперь у ЌеМя с платПй Ме прПхПЎОт
тест.

Hello!
Help please solve the problem.
The board is ztex. 1.15y.
I installed linux mint, compiled JtR, but now I do not pass the test with
the board.

Сразу пПсле сбПркО / Immediately after assembly::
***@intel-desktop ~/src/john/run $ ./john --test=2
Will run 4 OpenMP threads
Benchmarking: descrypt-ztex, traditional crypt(3) [DES ZTEX]...
fpga_progclk_raw(0): LIBUSB_ERROR_PIPE
SN 04A36E1068 error -9 initializing FPGAs.
no valid ZTEX devices found

ПрПбуеЌ bcrypt / try to run bcrypt
***@intel-desktop ~/src/john/run $ ./john --test=2 --form=bcrypt-ztex
Benchmarking: bcrypt-ztex [Blowfish ZTEX]... SN 04A36E1068: uploading
bitstreams.. ok
SN 04A36E1068: device_list_check_bitstreams(): no bitstream or wrong type
no valid ZTEX devices found

***@intel-desktop ~/src/john/run $ ./john --test=2 --form=bcrypt-ztex
Benchmarking: bcrypt-ztex [Blowfish ZTEX]... SN 04A36E1068: uploading
bitstreams.. ok
fpga_progclk_raw(0): LIBUSB_ERROR_PIPE
SN 04A36E1068 error -9 initializing FPGAs.
no valid ZTEX devices found

ПрПбуеЌ DES / try to run DES.
***@intel-desktop ~/src/john/run $ ./john --test=2 --form=descrypt-ztex
Benchmarking: descrypt-ztex, traditional crypt(3) [DES ZTEX]... SN
04A36E1068: uploading bitstreams.. ok
SN 04A36E1068: device_list_check_bitstreams(): no bitstream or wrong type
no valid ZTEX devices found

***@intel-desktop ~/src/john/run $ ./john --test=2 --form=descrypt-ztex
Benchmarking: descrypt-ztex, traditional crypt(3) [DES ZTEX]... SN
04A36E1068: uploading bitstreams.. ok
fpga_progclk_raw(0): LIBUSB_ERROR_PIPE
SN 04A36E1068 error -9 initializing FPGAs.
no valid ZTEX devices found

КПЌаМЎы пПЎавалОсь пПЎряЎ, текст скПпОрПваМ Оз терЌОМала.
ВеМтОлятПр крутОтся, чОпы хПлПЎМые. СбПрка ЎжПМа (magnum ripper 1.8.0.13)
прПшла успешМП. Opencl Ме устаМавлОвал.
Запуск пПЎ sudo вывПЎОт те же результаты. ЧтП ЌПжМП сЎелать? ППЎскажОте,
пПжалуйста!

Commands were submitted in a row, the text was copied from the terminal.
The fan is spinning, the chips are cold. JtR`s assembly (magnum ripper
1.8.0.13) was successful. Opencl did not installed.

https://translate.yandex.ru/

please help to fix

-teraflopgroup-
Post by Royce Williams
Post by Solar Designer
Post by Solar Designer
0:00:00:00 - Candidate passwords will be buffered and tried in chunks
of 262140
Post by Solar Designer
Post by Solar Designer
... for values of both 5 and 6 for TargetSetting.
Yes, in your case TargetSetting shouldn't matter, because you have so
many boards that the value is capped anyway. But you could try hacking
262140, // Absolute max. keys/crypt_all_interval for all devices.
Try setting it to 2031616 (as 63488*32), and then TargetSetting will be
making a difference.
Denis - by the way, 262140 isn't even a multiple of 496 (core count per
board) - perhaps that's wrong and should be fixed.
Royce, Denis has just explained to me where this value comes from, and
it's correct as-is and isn't to be patched: things are not expected to
work correctly for higher values. So please disregard this part of my
advice. Denis intends to remove this limitation in a future revision.
Alexander
Solar Designer
2018-04-22 17:47:58 UTC
Permalink
Hi,
Post by teraflopgroup
The board is ztex. 1.15y.
I installed linux mint, compiled JtR, but now I do not pass the test with
the board.
The problems you report are most likely caused by faulty hardware.
Please start by replacing the USB cable(s). Some old cables were
produced in the USB 1.1 days and wouldn't run at USB 2.0 speeds.

Are you able to use this board with any other software, such as a
Bitcoin miner? Of course, it'll be too slow to be useful these days,
but just as a test you could use cgminer-3.1.1.

Alexander

Loading...