Discussion:
[john-users] NVIDIA Jetson TK1 GPU & John
Leonard Rose
2018-04-12 22:27:38 UTC
Permalink
Has anyone used this development board successfully with JTR? I recently bought one used that I wanted to try JtR with 192 CUDA cores (up to 326 GFLOPS!) In the past I have built a small cluster using MPI of 66 ARM cpu but was hoping to build something useful with these NVIDIA boards. If I can get this working I can see a lot of fun in the future learning about GPU and OpenCL....
Solar Designer
2018-05-31 14:50:55 UTC
Permalink
Hi Leonard,
Post by Leonard Rose
Has anyone used this development board successfully with JTR? I recently bought one used that I wanted to try JtR with 192 CUDA cores (up to 326 GFLOPS!) In the past I have built a small cluster using MPI of 66 ARM cpu but was hoping to build something useful with these NVIDIA boards. If I can get this working I can see a lot of fun in the future learning about GPU and OpenCL....
I'm sorry no one seems to have replied to you so far.

Yes, people tried JtR on NVIDIA Jetson TK1 before:

http://www.openwall.com/lists/john-users/2014/07/17/4
http://www.openwall.com/lists/john-users/2015/10/29/1

These threads mention some old build issues, etc., but those are
supposed to be fixed or otherwise irrelevant in current bleeding-jumbo
(and yes, it'll be solely OpenCL now, including on NVIDIA, as we've
dropped CUDA support). So I am referring to the threads not for the way
outdated advice/workarounds given there, but merely to answer your
question. You'll actually want to use bleeding-jumbo, and just try to
build it in the usual way (e.g., "./configure && make -sj4") without any
tweaks first.

Please just give this a try and report any issues there might be - or
report success even if there are no issues.

Of course, these boards are actually very slow compared to modern large
GPUs that you'd plug into x86 boxes. But that shouldn't stop you from
having fun.

Alexander
Eric Oyen
2018-05-31 15:14:13 UTC
Permalink
older GPU's?

ok, I have a couple of older model desktop machines here and both have NVidia GT-9xxx series video cards with substantial ram on board. can john utilize these? it's not like I have been doing anything else with them over the last 10 years.

-eric

PGP fingerprint: 6DFB D6B0 3771 90F1 373E 570C 7EA2 1FF3 6B68 0386
Post by Solar Designer
Hi Leonard,
Post by Leonard Rose
Has anyone used this development board successfully with JTR? I recently bought one used that I wanted to try JtR with 192 CUDA cores (up to 326 GFLOPS!) In the past I have built a small cluster using MPI of 66 ARM cpu but was hoping to build something useful with these NVIDIA boards. If I can get this working I can see a lot of fun in the future learning about GPU and OpenCL....
I'm sorry no one seems to have replied to you so far.
http://www.openwall.com/lists/john-users/2014/07/17/4
http://www.openwall.com/lists/john-users/2015/10/29/1
These threads mention some old build issues, etc., but those are
supposed to be fixed or otherwise irrelevant in current bleeding-jumbo
(and yes, it'll be solely OpenCL now, including on NVIDIA, as we've
dropped CUDA support). So I am referring to the threads not for the way
outdated advice/workarounds given there, but merely to answer your
question. You'll actually want to use bleeding-jumbo, and just try to
build it in the usual way (e.g., "./configure && make -sj4") without any
tweaks first.
Please just give this a try and report any issues there might be - or
report success even if there are no issues.
Of course, these boards are actually very slow compared to modern large
GPUs that you'd plug into x86 boxes. But that shouldn't stop you from
having fun.
Alexander
Solar Designer
2018-05-31 20:28:34 UTC
Permalink
Post by Eric Oyen
ok, I have a couple of older model desktop machines here and both have NVidia GT-9xxx series video cards with substantial ram on board. can john utilize these? it's not like I have been doing anything else with them over the last 10 years.
Probably yes, but maybe you'd actually have to go with our obsolete CUDA
branch for that. I don't recall if NVIDIA introduced OpenCL support
before or after dropping support for those GPUs from their newer drivers.

That said, this is off-topic for this thread. Let's use this thread for
discussions around NVIDIA Jetson TK1, as the Subject says.

Alexander
Leonard Rose
2018-05-31 20:38:19 UTC
Permalink
It was my understanding there was no OpenCL support for the Jetson TK1.

----- Original Message -----
From: "Solar Designer" <***@openwall.com>
To: "john-users" <john-***@lists.openwall.com>
Sent: Thursday, May 31, 2018 2:28:34 PM
Subject: Re: [john-users] NVIDIA Jetson TK1 GPU & John
Post by Eric Oyen
ok, I have a couple of older model desktop machines here and both have NVidia GT-9xxx series video cards with substantial ram on board. can john utilize these? it's not like I have been doing anything else with them over the last 10 years.
Probably yes, but maybe you'd actually have to go with our obsolete CUDA
branch for that. I don't recall if NVIDIA introduced OpenCL support
before or after dropping support for those GPUs from their newer drivers.

That said, this is off-topic for this thread. Let's use this thread for
discussions around NVIDIA Jetson TK1, as the Subject says.

Alexander
Solar Designer
2018-06-01 10:37:19 UTC
Permalink
Post by Leonard Rose
It was my understanding there was no OpenCL support for the Jetson TK1.
Oh. I was unaware there existed non-ancient NVIDIA devices for which
NVIDIA didn't support OpenCL, but you seem to be right. According to a
January 2014 whitepaper by NVIDIA, "Tegra K1 is capable of OpenCL 1.2.
It will be supported based on customer needs." I guess "customer needs"
never happened, in NVIDIA's opinion.

You can possibly get OpenCL going via a third-party project anyway, but
it'd be tricky:

http://portablecl.org/cuda-backend.html

Alexander
Leonard Rose
2018-06-01 15:42:02 UTC
Permalink
Thank you for the link, I will give it a try. It seems promising!

----- Original Message -----
From: "Solar Designer" <***@openwall.com>
To: "john-users" <john-***@lists.openwall.com>
Sent: Friday, June 1, 2018 4:37:19 AM
Subject: Re: [john-users] NVIDIA Jetson TK1 GPU & John
Post by Leonard Rose
It was my understanding there was no OpenCL support for the Jetson TK1.
Oh. I was unaware there existed non-ancient NVIDIA devices for which
NVIDIA didn't support OpenCL, but you seem to be right. According to a
January 2014 whitepaper by NVIDIA, "Tegra K1 is capable of OpenCL 1.2.
It will be supported based on customer needs." I guess "customer needs"
never happened, in NVIDIA's opinion.

You can possibly get OpenCL going via a third-party project anyway, but
it'd be tricky:

http://portablecl.org/cuda-backend.html

Alexander

Leonard Rose
2018-05-31 16:00:36 UTC
Permalink
Hi! I was able to get this working easily! I got a link to the CUDA branch https://github.com/magnumripper/JohnTheRipper/tree/CUDA from someone at the NVIDIA developer's forum and then proceeded to build the code. With the addition of a few additional packages in Tegra OpenMPI and CUDA support works well using configure options.

I have been using a cluster of (4) TK1 Kepler GPUs for about a week now and I am very pleased with how easy it was. I have been using mpich-2 on another model cluster (66 ARM A10 cpu) I built so have worked with MPI John a long time. Building this was straightforward and it simply works out of the box.

I had some issues with NVIDIA's Tegra namely they had some mistakes in their shared library installation you have to fix. In order to get the code working with OpenMPI on multiple nods you need to add /usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib to LD environment. I share John run via NFS to all nodes in the cluster.

When I tried to run ldconfig with this new path ld produced an error. After a few minutes of study you will find that NVIDIA's development package has an error in how it installed some of the library packages.

It turns out that instead of linking (2) additional libraries (libcudnn.so.6.5 and libcudnn.so) to libcudnn.so.6.5.48 they made binary copies. All you have to do to resolve this to remove the files and recreate the proper links.

IE:


***@gpu02:/etc/ld.so.conf.d# ldconfig
/sbin/ldconfig.real: /usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib/libcudnn.so.6.5 is not a symbolic link

***@gpu02:/etc/ld.so.conf.d# cd /usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ls -l *cudnn*
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so.6.5
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so.6.5.48
-rwxr-xr-x 1 root root 9308614 Apr 26 21:49 libcudnn_static.a
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# rm libcudnn.so libcudnn.so.6.5
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ln -s libcudnn.so.6.5.48 libcudnn.so.6.5
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ln -s libcudnn.so.6.5.48 libcudnn.so
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ls -l *cudnn*
lrwxrwxrwx 1 root root 18 May 25 01:02 libcudnn.so -> libcudnn.so.6.5.48
lrwxrwxrwx 1 root root 18 May 25 01:02 libcudnn.so.6.5 -> libcudnn.so.6.5.48
-rwxr-xr-x 1 root root 8978224 Apr 26 21:49 libcudnn.so.6.5.48
-rwxr-xr-x 1 root root 9308614 Apr 26 21:49 libcudnn_static.a
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ldconfig
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib#
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib# ldconfig
***@gpu02:/usr/local/cuda-6.5/targets/armv7-linux-gnueabihf/lib#

I am always amazed with the advances since the days of standalone array processors and clunky fortran to the current day where we have magnitudes more power in tiny wafers of silicon lit up by the code that drives it all. For me this was one of those epiphanies when I realized the level of performance I was seeing (even without OpenCL support):

I.E.

mpirun: Forwarding signal 10 to job
2 0g 5:09:20:40 3/3 0g/s 844.7p/s 844.7c/s 844.7C/s l0g12421..l0g10674
1 0g 5:09:18:53 3/3 0g/s 847.0p/s 847.0c/s 847.0C/s sitcr2d3..sitatziz
3 0g 5:09:35:26 3/3 0g/s 840.1p/s 840.1c/s 840.1C/s st.gearte..st.grucho
4 0g 5:09:20:16 3/3 0g/s 752.0p/s 752.0c/s 752.0C/s mhbcmlp..mhhurg9

(snip)

1 1:20:13:11 - Switching to distributing words
1 1:20:13:13 Proceeding with "incremental" mode: ASCII
1 1:20:13:13 - Lengths 0 to 13, up to 95 different characters
2 1:22:39:09 - Switching to distributing words
2 1:22:39:10 Proceeding with "incremental" mode: ASCII
2 1:22:39:10 - Lengths 0 to 13, up to 95 different characters
4 2:04:24:32 - Switching to distributing words
4 2:04:24:54 Proceeding with "incremental" mode: ASCII
4 2:04:24:54 - Lengths 0 to 13, up to 95 different characters



Thank you to everyone who worked on CUDA and GPU support, it's really incredible what you have done placing such raw power at or fingertips.

Here is what it looks like as of today Loading Image... hasn't quite made it off the bench yet.


----- Original Message -----
From: "Solar Designer" <***@openwall.com>
To: "Len Rose" <***@attitude.net>
Cc: "john-users" <john-***@lists.openwall.com>
Sent: Thursday, May 31, 2018 8:50:55 AM
Subject: Re: [john-users] NVIDIA Jetson TK1 GPU & John

Hi Leonard,
Post by Leonard Rose
Has anyone used this development board successfully with JTR? I recently bought one used that I wanted to try JtR with 192 CUDA cores (up to 326 GFLOPS!) In the past I have built a small cluster using MPI of 66 ARM cpu but was hoping to build something useful with these NVIDIA boards. If I can get this working I can see a lot of fun in the future learning about GPU and OpenCL....
I'm sorry no one seems to have replied to you so far.

Yes, people tried JtR on NVIDIA Jetson TK1 before:

http://www.openwall.com/lists/john-users/2014/07/17/4
http://www.openwall.com/lists/john-users/2015/10/29/1

These threads mention some old build issues, etc., but those are
supposed to be fixed or otherwise irrelevant in current bleeding-jumbo
(and yes, it'll be solely OpenCL now, including on NVIDIA, as we've
dropped CUDA support). So I am referring to the threads not for the way
outdated advice/workarounds given there, but merely to answer your
question. You'll actually want to use bleeding-jumbo, and just try to
build it in the usual way (e.g., "./configure && make -sj4") without any
tweaks first.

Please just give this a try and report any issues there might be - or
report success even if there are no issues.

Of course, these boards are actually very slow compared to modern large
GPUs that you'd plug into x86 boxes. But that shouldn't stop you from
having fun.

Alexander
Solar Designer
2018-05-31 20:23:53 UTC
Permalink
Post by Leonard Rose
Hi! I was able to get this working easily!
Great!
Post by Leonard Rose
I got a link to the CUDA branch https://github.com/magnumripper/JohnTheRipper/tree/CUDA from someone at the NVIDIA developer's forum and then proceeded to build the code.
Why CUDA? Did you run into any issues with OpenCL? As you can see,
we've abandoned the CUDA branch 1.5 years ago, and for good reasons.
Our OpenCL support is far superior, and it works great on NVIDIA too.

There isn't any major issue with CUDA per se, but it just happened that
we could do all we needed so far in OpenCL, and that let us support
devices by different vendors at once. So there was no point in doing
more CUDA-specific work, and over time our CUDA support lagged behind.

At this time (as well as 1.5 years ago when our decision to drop CUDA
was made), we have more JtR formats supported and with better
performance in OpenCL than we ever did in CUDA.
Post by Leonard Rose
With the addition of a few additional packages in Tegra OpenMPI and CUDA support works well using configure options.
OK. I hope you try OpenCL with latest bleeding-jumbo as well. If there
are any issues with that, please report those.

Meanwhile, thank you for including the detail about CUDA and MPI issues.
Maybe someone will find this helpful.
Post by Leonard Rose
mpirun: Forwarding signal 10 to job
2 0g 5:09:20:40 3/3 0g/s 844.7p/s 844.7c/s 844.7C/s l0g12421..l0g10674
1 0g 5:09:18:53 3/3 0g/s 847.0p/s 847.0c/s 847.0C/s sitcr2d3..sitatziz
3 0g 5:09:35:26 3/3 0g/s 840.1p/s 840.1c/s 840.1C/s st.gearte..st.grucho
4 0g 5:09:20:16 3/3 0g/s 752.0p/s 752.0c/s 752.0C/s mhbcmlp..mhhurg9
What hash type (and cost settings, if applicable) is this for?

Alexander
Loading...