Thanx for infos, after I have raised the memory sizes and the space for
temp, the sort went well. Iwas sorting it to know how many duplicates (when
ignoring the character case) are in the superwpa wordlist. The original file
the same words with modified character case.
casing.
Post by Solar DesignerHi,
Post by JohnyKrekanHello, I would like to ask whether someone has experience with good tool
to sort large text files with possibilities such as gnu sort. I am using
it to sort wordlists but when I tried to sort 11 gb wordlist, it crashed
while writing final output file after writing around 7 gb of data and
did not delete some temp files. When I was sorting smaller (2gb) wordlist
it took me just about 15 minutes while this 11 gb took 4.5 hours (Intel
core I 7 2.6ghz, 12 gb ram, ssd drives).
Most importantly, usually you do not need to "sort" - you just need to
eliminate duplicates. In fact, in many cases you'd prefer to eliminate
duplicates without sorting, in case your input list is sorted roughly
for non-increasing estimated probability of hitting a real password -
e.g., if it's produced by concatenating common/leaked password lists
first with other general wordlists next, or/and by pre-applying wordlist
rules (which their authors generally order such that better performing
rules come first).
You can eliminate duplicates without sorting using JtR's bundled
"unique" program. In jumbo and running on a 64-bit platform, it will by
default use a memory buffer of 2 GB (the maximum it can use). It does
not use any temporary files (instead, it reads back the output file
./unique output.lst < input.lst
cat ~/wordlists/* | ./unique output.lst
cat ~/wordlists/common/* ~/wordlists/uncommon/* | ./unique output.lst
./john -w=password.lst --rules=jumbo --stdout | ./unique output.lst
As to sorting, recent GNU sort from the coreutils package works well.
You'll want to use the "-S" option to let it use more RAM, and less
temporary files, e.g. "-S 5G". You can also use e.g. "--parallel=8".
As to it running out of space for the temporary files, perhaps you have
your /tmp on tmpfs, so in RAM+swap, and this might be too limiting. If
so, you may use the "-T" option, e.g. "-T /home/user/tmp", to let it use
your SSDs instead. Combine this with e.g. "-S 5G" to also use your RAM.
As to "it crashed while writing final output file after writing around 7
gb of data", did you possibly put the output file in /tmp as well? Just
don't do that.
I hope this helps.
Alexander