Solar Designer
2015-12-06 14:40:44 UTC
Hi,
Most value of hashcat is in oclHashcat, and I greatly appreciate atom's
generosity in making it open source along with the CPU hashcat. We have
more stuff to learn from there. However, this one posting is about the
CPU hashcat.
What are some reasons why someone may prefer to use hashcat over JtR,
both on CPU? Is it some cracking modes we don't have equivalents for in
JtR? What are those?
hashcat appears to support a subset of hash types that we have in jumbo,
and in my testing today is typically 2 to 3 times slower than JtR, with
few exceptions. (This is consistent with what I heard from others
before. I just didn't test this myself until now.)
The most notable exception, where hashcat is much faster than JtR, is
with its multi-threading support for fast hashes. When using JtR on
fast hashes, currently --fork should be used instead of multiple threads,
and it can be cumbersome (multiple status lines instead of one, the
child processes terminating not exactly at the same time, etc.)
Another exception is bcrypt, where hashcat delivers about the best speed
we can get out of JtR, and in fact better than a default build of JtR
does on our 2x E5-2670 machine (which I am testing this on):
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 3200
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: bcrypt, Blowfish(OpenBSD)
Speed/sec: 16.82k words
JtR is slightly slower by default (built with the same gcc 4.9.1 as
hashcat above):
[***@super src]$ ../run/john -test -form=bcrypt
Will run 32 OpenMP threads
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw: 16128 c/s real, 506 c/s virtual
Its performance on this machine can be improved to 16900 c/s (same as
hashcat) by forcing BF_X2 = 3 in arch.h, but the current logic in jumbo
is to only use that setting on HT-less Intel CPUs (and these Xeons are
HT-capable) as that appears to work slightly better on many other CPUs
(just not on this particular machine).
Another exception I noticed is scrypt, where hashcat is only moderately
slower than JtR:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 8900
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: scrypt
Speed/sec: 639 words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=scrypt
Will run 32 OpenMP threads
Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 128/128 AVX]... (32xOMP) DONE
Speed for cost 1 (N) of 16384, cost 2 (r) of 8, cost 3 (p) of 1
Raw: 878 c/s real, 27.6 c/s virtual
(BTW, I think this used to be ~960 c/s. Looks like we got a performance
regression we need to look into, or just get the latest yescrypt code in
first and then see.)
hashcat is at 639/878 = 73% of JtR's speed at scrypt here
Yet another exception in SunMD5, where I am puzzled about what hashcat
is actually benchmarking:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 3300
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: MD5(Sun)
Speed/sec: 223.64M words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw: 10593 c/s real, 332 c/s virtual
223.64M vs. 10.6K?! This can't be right. SunMD5 with typical settings
is known to be slow.
For most other hash types I checked, JtR is a lot faster, e.g.:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 500
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5
Speed/sec: 269.21k words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=md5crypt
Will run 32 OpenMP threads
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw: 729600 c/s real, 22750 c/s virtual
729600/269210 = 2.71 times faster
sha512crypt:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1800
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: sha512crypt, SHA512(Unix)
Speed/sec: 5.35k words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sha512crypt
Will run 32 OpenMP threads
Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 128/128 AVX 2x]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw: 11299 c/s real, 354 c/s virtual
11299/5350 = 2.11 times faster
Raw MD5:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 0
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: MD5
Speed/sec: 268.55M words
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 0 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: MD5
Speed/sec: 12.71M words
Good multi-threaded efficiency (unlike JtR's at fast hashes like this),
but poor per-thread speed. JtR's is:
[***@super src]$ ../run/john -test -form=raw-md5
Benchmarking: Raw-MD5 [MD5 128/128 AVX 4x3]... DONE
Raw: 38898K c/s real, 38898K c/s virtual
OpenMP is compile-time disabled for fast hashes (which is the current
default in bleeding-jumbo), so this is for 1 thread (and --fork should
be used - yes, with its drawbacks).
38898/12710 = 3.06 times faster
Raw SHA-1:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 100 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: SHA1
Speed/sec: 10.12M words
[***@super src]$ ../run/john -test -form=raw-sha1
Benchmarking: Raw-SHA1 [SHA1 128/128 AVX 4x]... DONE
Raw: 19075K c/s real, 19075K c/s virtual
19075/10120 = 1.88 times faster
Not that bad. I guess hashcat has optimizations here that we don't, but
lacks interleaving. Still, I wouldn't use hashcat over john --fork.
NTLM:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1000 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: NTLM
Speed/sec: 14.21M words
[***@super src]$ ../run/john -test -form=nt
Benchmarking: NT [MD4 128/128 AVX 4x3]... DONE
Raw: 44687K c/s real, 44687K c/s virtual
44687/14210 = 3.14 times faster
Raw SHA-256:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1400 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: SHA256
Speed/sec: 5.10M words
[***@super src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha256
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... DONE
Raw: 9068K c/s real, 9068K c/s virtual
9068/5100 = 1.78 times faster
We also have OpenMP support enabled by default for raw SHA-256, but it
doesn't scale well for 32 threads:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1400
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: SHA256
Speed/sec: 80.85M words
[***@super src]$ ../run/john -test -form=raw-sha256
Will run 32 OpenMP threads
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE
Raw: 39976K c/s real, 3774K c/s virtual
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha256
Will run 32 OpenMP threads
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE
Raw: 40370K c/s real, 3731K c/s virtual
hashcat is 2 times faster with multi-threading, but JtR --fork would be
faster yet.
Raw SHA-512:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1700 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: SHA512
Speed/sec: 1.32M words
[***@super src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha512
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... DONE
Raw: 3856K c/s real, 3856K c/s virtual
3856/1320 = 2.92 times faster
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1700
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: SHA512
Speed/sec: 26.80M words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha512
Will run 32 OpenMP threads
Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... (32xOMP) DONE
Raw: 23330K c/s real, 1577K c/s virtual
SHA-512 is almost slow enough that JtR's (poor) multi-threading support
is almost on par with hashcat's even at 32 threads. Yet --fork would be
2 to 3 times faster than hashcat.
My JtR benchmarks are with yesterday's bleeding-jumbo. It could be
better to (also) use actual cracking runs to compare the tools - maybe
someone else will.
Alexander
Most value of hashcat is in oclHashcat, and I greatly appreciate atom's
generosity in making it open source along with the CPU hashcat. We have
more stuff to learn from there. However, this one posting is about the
CPU hashcat.
What are some reasons why someone may prefer to use hashcat over JtR,
both on CPU? Is it some cracking modes we don't have equivalents for in
JtR? What are those?
hashcat appears to support a subset of hash types that we have in jumbo,
and in my testing today is typically 2 to 3 times slower than JtR, with
few exceptions. (This is consistent with what I heard from others
before. I just didn't test this myself until now.)
The most notable exception, where hashcat is much faster than JtR, is
with its multi-threading support for fast hashes. When using JtR on
fast hashes, currently --fork should be used instead of multiple threads,
and it can be cumbersome (multiple status lines instead of one, the
child processes terminating not exactly at the same time, etc.)
Another exception is bcrypt, where hashcat delivers about the best speed
we can get out of JtR, and in fact better than a default build of JtR
does on our 2x E5-2670 machine (which I am testing this on):
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 3200
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: bcrypt, Blowfish(OpenBSD)
Speed/sec: 16.82k words
JtR is slightly slower by default (built with the same gcc 4.9.1 as
hashcat above):
[***@super src]$ ../run/john -test -form=bcrypt
Will run 32 OpenMP threads
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw: 16128 c/s real, 506 c/s virtual
Its performance on this machine can be improved to 16900 c/s (same as
hashcat) by forcing BF_X2 = 3 in arch.h, but the current logic in jumbo
is to only use that setting on HT-less Intel CPUs (and these Xeons are
HT-capable) as that appears to work slightly better on many other CPUs
(just not on this particular machine).
Another exception I noticed is scrypt, where hashcat is only moderately
slower than JtR:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 8900
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: scrypt
Speed/sec: 639 words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=scrypt
Will run 32 OpenMP threads
Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 128/128 AVX]... (32xOMP) DONE
Speed for cost 1 (N) of 16384, cost 2 (r) of 8, cost 3 (p) of 1
Raw: 878 c/s real, 27.6 c/s virtual
(BTW, I think this used to be ~960 c/s. Looks like we got a performance
regression we need to look into, or just get the latest yescrypt code in
first and then see.)
hashcat is at 639/878 = 73% of JtR's speed at scrypt here
Yet another exception in SunMD5, where I am puzzled about what hashcat
is actually benchmarking:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 3300
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: MD5(Sun)
Speed/sec: 223.64M words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw: 10593 c/s real, 332 c/s virtual
223.64M vs. 10.6K?! This can't be right. SunMD5 with typical settings
is known to be slow.
For most other hash types I checked, JtR is a lot faster, e.g.:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 500
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5
Speed/sec: 269.21k words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=md5crypt
Will run 32 OpenMP threads
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw: 729600 c/s real, 22750 c/s virtual
729600/269210 = 2.71 times faster
sha512crypt:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1800
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: sha512crypt, SHA512(Unix)
Speed/sec: 5.35k words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sha512crypt
Will run 32 OpenMP threads
Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 128/128 AVX 2x]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw: 11299 c/s real, 354 c/s virtual
11299/5350 = 2.11 times faster
Raw MD5:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 0
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: MD5
Speed/sec: 268.55M words
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 0 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: MD5
Speed/sec: 12.71M words
Good multi-threaded efficiency (unlike JtR's at fast hashes like this),
but poor per-thread speed. JtR's is:
[***@super src]$ ../run/john -test -form=raw-md5
Benchmarking: Raw-MD5 [MD5 128/128 AVX 4x3]... DONE
Raw: 38898K c/s real, 38898K c/s virtual
OpenMP is compile-time disabled for fast hashes (which is the current
default in bleeding-jumbo), so this is for 1 thread (and --fork should
be used - yes, with its drawbacks).
38898/12710 = 3.06 times faster
Raw SHA-1:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 100 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: SHA1
Speed/sec: 10.12M words
[***@super src]$ ../run/john -test -form=raw-sha1
Benchmarking: Raw-SHA1 [SHA1 128/128 AVX 4x]... DONE
Raw: 19075K c/s real, 19075K c/s virtual
19075/10120 = 1.88 times faster
Not that bad. I guess hashcat has optimizations here that we don't, but
lacks interleaving. Still, I wouldn't use hashcat over john --fork.
NTLM:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1000 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: NTLM
Speed/sec: 14.21M words
[***@super src]$ ../run/john -test -form=nt
Benchmarking: NT [MD4 128/128 AVX 4x3]... DONE
Raw: 44687K c/s real, 44687K c/s virtual
44687/14210 = 3.14 times faster
Raw SHA-256:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1400 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: SHA256
Speed/sec: 5.10M words
[***@super src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha256
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... DONE
Raw: 9068K c/s real, 9068K c/s virtual
9068/5100 = 1.78 times faster
We also have OpenMP support enabled by default for raw SHA-256, but it
doesn't scale well for 32 threads:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1400
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: SHA256
Speed/sec: 80.85M words
[***@super src]$ ../run/john -test -form=raw-sha256
Will run 32 OpenMP threads
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE
Raw: 39976K c/s real, 3774K c/s virtual
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha256
Will run 32 OpenMP threads
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE
Raw: 40370K c/s real, 3731K c/s virtual
hashcat is 2 times faster with multi-threading, but JtR --fork would be
faster yet.
Raw SHA-512:
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1700 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1
Hash type: SHA512
Speed/sec: 1.32M words
[***@super src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha512
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... DONE
Raw: 3856K c/s real, 3856K c/s virtual
3856/1320 = 2.92 times faster
[***@super hashcat-build]$ ./hashcat-cli64.bin -b -m 1700
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...
Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32
Hash type: SHA512
Speed/sec: 26.80M words
[***@super src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha512
Will run 32 OpenMP threads
Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... (32xOMP) DONE
Raw: 23330K c/s real, 1577K c/s virtual
SHA-512 is almost slow enough that JtR's (poor) multi-threading support
is almost on par with hashcat's even at 32 threads. Yet --fork would be
2 to 3 times faster than hashcat.
My JtR benchmarks are with yesterday's bleeding-jumbo. It could be
better to (also) use actual cracking runs to compare the tools - maybe
someone else will.
Alexander