SL
2010-02-03 13:04:25 UTC
I would like to use ./john --rules=Pre-Mangle --stdout | ./ unique to
clean up arbitrary (large) "dirty" wordlists.
In other words: I have target-specific generated wordlists (of about
2GB size), which still contain a lot of "unusable junk" like raw MD5
hashes, punctuation, Base64 fragments, QP-encoded fragments, falsely
decoded UTF-8 etc.
My intention is to put together a number of word mangling rules that
help to reduce this chaos and only let through "reasonable" candidates
for future processing with ./john --rules and ./john --rules=Single.
(My currently running "--rules=Single" session on that 2GB list has
got an ETA of mid-November 2010 (salted raw MD5 hashes, JimF's patch).)
Does such a collection of rules already exist? I couldn't find one,
and I must admit that the complexity of http://www.openwall.com/john/doc/RULES.shtml
is a bit too much for me to start from scratch.
What it should accomplish:
* obviously no no-op (:)
* include "dictionary-like" words up to à certain length (haven't seen
any password longer than 18 chars in my samples, so lenght 22 should
probably be sufficient)
* shorter alphanumeric "words" might be included as-is, maybe up to 8
or 10 chars
* punctuation should probably be purged (or truncated?)
* words with false transcodings (lots of /(.[ÂÃ])+/) should get
rejected
Could anybody please point me to a reasonable start? I shall follow-up
with a patch to john.conf, if this idea proves succesful.
clean up arbitrary (large) "dirty" wordlists.
In other words: I have target-specific generated wordlists (of about
2GB size), which still contain a lot of "unusable junk" like raw MD5
hashes, punctuation, Base64 fragments, QP-encoded fragments, falsely
decoded UTF-8 etc.
My intention is to put together a number of word mangling rules that
help to reduce this chaos and only let through "reasonable" candidates
for future processing with ./john --rules and ./john --rules=Single.
(My currently running "--rules=Single" session on that 2GB list has
got an ETA of mid-November 2010 (salted raw MD5 hashes, JimF's patch).)
Does such a collection of rules already exist? I couldn't find one,
and I must admit that the complexity of http://www.openwall.com/john/doc/RULES.shtml
is a bit too much for me to start from scratch.
What it should accomplish:
* obviously no no-op (:)
* include "dictionary-like" words up to à certain length (haven't seen
any password longer than 18 chars in my samples, so lenght 22 should
probably be sufficient)
* shorter alphanumeric "words" might be included as-is, maybe up to 8
or 10 chars
* punctuation should probably be purged (or truncated?)
* words with false transcodings (lots of /(.[ÂÃ])+/) should get
rejected
Could anybody please point me to a reasonable start? I shall follow-up
with a patch to john.conf, if this idea proves succesful.