			Hacking the Crypto.c Algorithm
                                by Jitsu-Disk


Object  
------

This paper presents (briefly) how the Novell one-way hash function operates,
and how the speed of routines may be increased.          

                 Contents : Credits
                            Foreword
                            The one-way hash
                            About optimization
                            Useless and Over-used routines
                            Recursion vs. Iteration
                           *Added : cipher/decipher algorithm of Remote NLM

Credits
-------

Credits go to :   Simple Nomad, for being Simply Elite
                  Itsme, for putting together the first Crypto routine
                  Rich, for teaching me Novell in the first place!
                  TheRuiner ('dreamer'), for remote NLM cipher alorithm

Foreword
--------

Sometimes I wonder why computers are drawing so many people to them, 
people that often have very different jobs and inclinations when it comes to 
leisure or intellectual orientation. Moreover, many of the people who teach 
computer-related stuff or simply design them, come from mathematical fields 
and believe that you just can't understand a thing if you can't solve an
integral. To me this is all bullshit. Computers are built by dreamers and are
incredible tools for expressing your artistic feelings;
Computer science is like a sphere : you can start by exploring any part
of that sphere, and by going deeper you will very soon reach the
core-algorithms. You can then bounce around to any other facet of that sphere,
your learning is not limited by any academic idea of how one should learn and what
vocabulary should be used in this process. By comparison, math and physics
are like a pyramidal : you must learn (painfully) each layer before moving on
to the next one.

Regarding the encryption scheme used by Novell, I want to say that I'm not 
in any way good at CryptoAnalysis and that the Crypto.c routine is being 
approached from an algorithmic standpoint only.

Now lets get dirty...


The one-way hash
----------------

All information provided below is assumed by regarding at the resulting 
crypto.c and only this.

A one-way-hash function is a formula where you input a cleartext word or
phrase and the output is gibberish with two interesting properties : There is
no way to reverse the process (inputting the hash and obtaining the cleartext),
and the odds that two different cleartexts will give the same hash is almost
nonexistent. Therefore a hash is a safe way to store a password, the computer
processes the password when entered using the function and compares it to 
the stored hash.

The one-way-hash scheme we're interested in, here, processes the cleartext 
password in four steps.

0) On a NetWare server you can enter passwords up to 128 characters in
   length, however the process used to generate the password-hash takes up
   to 32 characters in input and gives 16 bytes as the output. Therefore,
   if a password is longer than 32 characters, it is divided in chunks of
   32 and chunk[0] is XORed against chunk[1] the result is XORed against
   chunk[2] and so on until we've reached the end of the password. This
   process IS NOT implemented nor further discussed here. The reason is
   simple : no one holds enough computer power to brute force such long
   passwords (given the algorithm below).

1) The password string is repeated enough times to feed a 32-byte length 
   string (crypt2data[0->31]) and between each sequence of the password in
   crypt2data[] a byte is inserted from a fixed table called crypTab[].

2) Each byte in buf[32] is XORed with the unique IDentification number
   attached to the user who owns that password, giving crypt2data[32].
   note : At end of step_2 the ciphered crypt2data[] is still reversible to 
          give the cleartext by XORing it again with the ID

3) A special non-linear function lengthens crypt2data[32] to crypt2data[64]

4) Finally the Password hash is 16 byte length, assuming Z[] is crypt2data[],
   nybTab a fixed table of nibbles, we have :

   hash[0]= nybTab[Z[64]] OR nybTab[Z[65]<<4]
   hash[1]= nybTab[Z[66]] OR nybTab[Z[67]<<4]
     .
     .
     .
   hash[15]= nybTab[Z[94]] OR nybTab[Z[95]<<4]

   note : Z[65]->Z[95] are processed using the same formula
          as in step_3

In a minute we're gonna go deeper into those four steps, but 2 very ESSENTIAL
points must be stated :

 * The hash is a 16byte word where each byte is built upon a crypt2data[]
   pair because the nybTab is made of bytes with all the 4left-bits set to 0:

 nybTab = {0x7, 0x8, 0x0, 0x8, 0x6 ...}

              nybTab[high]              nybTab[low]
             high=Z[64+2*i]           low=Z[64+2*i+1]

 hash[i]= |____4 left bits_______||____4 right bits______|

 ex : if i=0 and Z[64]=0 and Z[65]=4 then hash[i]=0x76

 * If the non-linear function is reversed, the "nybTab" would render this,
   break-through almost useless: if you study its contents you'll see that a
   set of 16 bytes are repeated various times (about 9 to 20) in a 265 byte
   length table, so that if you tried to reverse the hash starting from this
   table, you obviously had to perform the test at most 32power16 (rough
   average) Therefore, if the user used a password longer than 9-11
   characters (rough average) you'd be better off trying to reverse the hash
   (if you could)... Besides, if the inventors of the Algorithm wanted to
   introduce some "BackDoor" (statistical trend known only to themselves) the
   nybTab would be a very good spot...

   (Note : unless you've got the computing power of the NSA, trying to brute
    force or reverse a password longer than 8 is useless).

Step_1) There isn't much more to say about it, just to note that the purpose
 of this lengthening thing is to provide enough "space" for the formula of 
 step_3 to play with. Besides the introduction of bytes from a fixed table 
 throws in a little more entropy, and finally the longer a password is the 
 less time it is repeated in crypt2data.

Step_2) You may by asking yourself "What if some people with computers
 holding enough RAM (or organizing their data in B-Tree) generated all 
 the hashes possible so that they would only have to compute them once". 
 Step_2 is the reason why this can't happen : XORing with the ID 
 generates a unique hash to this user since ID's are unique. You have to bear
 in mind that ID's are fixed-length (4Bytes) but passwords have variable
 lengths therefore, if you analyze crypto.c you will notice a little formula
 to match the XOR between the correct bytes of crypt2data[] and the ID.

 ex : Say we initialized the brute force with a 5-character length password,
      AAAAA. On the first turn is AAAAA processed to feed the 32 first bytes
      of crypt2data (Z), on the second turn we try AAAAB, therefore we only
      need to process the last byte of the password "B" in every occurrence of
      Z :

   {                        Byte introduced form crypTab (cTb[5], cTb[11]...)
   { AAAA XORed on turn_1   and also XORed on turn_1
   {  ___|___                           |
_1_{ |_|_|_|_|B|x|_|_|_|_|B|x|_|_|_|_|B|x|_|_|_|_|B|x|_|_|_|_|B|x|_|_|
BYT{  0 1.....4...........10..........16..........22..........28....31
   {                     
   { |_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|
BYT{4ID_Bytes 0           2           0           2           0
   {          |                                               |
_2_{  -->   Z[4]="B"XorID[0]................................Z[28]="B"XorID[0]

Step_3) Here comes the fun, the non-linear formula is used to generate a non-
 reversible hash. This formula must have a 'good' property : it must be
 strong against CryptoAnalysis (i.e. you can't use some kind of statistical
 trend against the hash to sufficiently narrow the search upon a limited
 key-space). From the analysis of encryptp here's what comes out :

We have two series U and Z, where U[]=Crypt_2_c_val[]  (64 byte lengh)
                              and Z[]=Crypt2data[]     (96 byte lengh)
We have one table of fix data     D[]=CrypTab[]        (32 byte lengh)

U is defined by U[i]=U[i-1]+Z[i+32]

Z is defined by { let i'=i&0x1f
                { let k =(U[i-33]+i')&0x1f
                {
                { if k<i  Z[i]=(Z[i-i'+k]-D[i']) XOR (Z[i-32]+U[i-33])
                {    else Z[i]=(Z[i-32-i'+k]-D[i']) XOR (Z[i-32]+U[i-33])
                {
                { Note : We have initialized Z[0->31] in step_2

So this boils down to (understanding i&0x1f <=> i modulo 32) :

{ let i be the current iteration, i=0..63
{
{ {  Z[i+32]=( Z(U[i]&0x1f+i) - D[i&0x1f] ) XOR ( Z[i] + U[i] )
{ {
{ {  U[i+1]=U[i]+Z[i+32]
{
{ with    U[0] = Z[32] = (Z[0]-D[0]) XOR Z[0]
{
{
{ Note : We don't need the "memory" provided by the use of a table for the
{        series U, therefore we could replace U[] by a variable U, and
{        U[i+1]=U[i]+Z[i+32] by U += Z[i+32]

A few things have to be said here, first we can see that Z[] depends on U[],
and U[] depends on Z[]. Therefore it is not possible to come up with a formula
that could give us U[i] (and therefore Z[i]) without calculating U[i-1] first.
This idea is reinforced by the use of a XOR in the formula, since the XOR
operation has no "good" arithmetical properties allowing any simplifications.
Second, Z[] is not linear and therefore U[] isn't either, and from some simple
tests performed, it doesnt look like i' or k can be easily guessed. Finally,
the formula relies on a fixed table (CrypTab) to throw in a little more
entropy, so two passwords right next to each other (say AAAA and AAAB) give
a very different hash. It is to be said about this last point, that if the
inventors of this algorithm wanted to introduce some kind of "BackDoor",
the fixed table would certainly be another good spot, this would result in a
thing called "collision" where two different passwords give the same hash.
** Added note : well, a collision exists in this algorithm !!! This was proven
and an exploit coded by P.Semjanov. You can get the source code to this in the 
Novell section of : http://194.85.96.197/psw/crack.html. Yet they are better 
means to defeat the cryptic mechanism : we can use the hash directly, see the 
"PanMount" for Linux on Pandora's download page and details in the "Novell Cries
Pandora" paper.

Step_4) The aim of this last step is to shrink the 4Bits pairs of nybTab[Z[]]
 to give 8Bits words (so it ain't ugly like Unicode ASCII) and to make
 the word "one-way" meaningful...

 Note : It is obvious that there is no need to compare each Byte of the
 hash each turn, in fact, comparing the low 4-Bits of hash[0] with Z[64]
 proves to be a good test (i.e. rejected) 90% of the time.


About Optimization
------------------

Although the aim of optimization is obvious : accelerating the execution of
a program, the path to it is not always so obvious.

First, lets consider a few facts : in your computer the "slowest" device is
your hard-drive, next (not considering the various add-on cards) comes your
RAM, then Cache Memory and finally the processor. Therefore minimizing disk
access is a top priority (the number of attempts the brute-force performs
before saving things to disk was HIGHLY increased) (could someone spread the
word to Micro$oft ?), and under certain conditions it is sometimes preferable
to have the processor do multiple times the same simple operation than
storing the result in a variable and calling the memory stack every time its
needed.

Second, we will assume you're using an Intel Pentium processor, or a RISC
processor. One of the interesting facts about these is that almost every
assembly instruction uses 1 computer cycle to execute (i.e. if you have a 100Mhz
processor, youre computing at most 100 million instructions per seconds (not
considering pipeline)). So the question when you're programming in something
else than assembly isn't much "In what assembly instruction is the compiler
gonna transform this ?", but rather "How many assembly instructions is the
compiler gonna come up with for this ?". Therefore, if you want your program
to run fast, 1) Use a low-level programming language (i.e. close to the actual
assembly code, like the C language) 2) Always turn the compiler optimize
options ON for better results.

Finally, maybe the most important of all is how your program is structured.
I always try to keep two questions in mind, where the first one prevails
on the second (because the answers might go in opposite directions):
1) "Does the program perform only the minimum needed to reach our goal"
2) "Is this the simplest (or easily read) code possible"

One good example of how these two questions can have opposite answers is
found in the XorwithID() function. The function itself is pretty clear but
the trouble is that it performs the XOR thing on all the 32 first Bytes
of Crypt2Data each turn, whereas only a few bytes actually changes each
turn (see the diagram of step_2). So the answer to 1) is "No" because you'd
save time only by changing the bytes that need it, but the answer to 2) is
"Yes" because obviously the XorwithID function doesn't go into the
complications of calculating the position of the bytes that really need to
change.

Another example that answer 1) is the way that the testing of the one-way
hash was approached : previously all bytes were processed and tested against
the original hash; in the modified version each brute-forced byte is
processed independently and you only move to the next if the current one
matches its original hash counter part. This went even further because now
the first byte of the hash is first tested for its 4-last bits (see step_4
for details).

One last thing : during the optimization process, it is wise to start with
optimizing the "deepest" functions, that is to say the ones that are
repeated the most frequently and to end up with the ones that are processed
scarcely.


Useless and Over-used routines
------------------------------

Sometimes during the development process of a program, you come up with
plenty of small routines that you'd thought be more important than they
actually are. These sub-routines should be included directly in the ones
that call them as long as the clarity of the code is preserved and this
doesn't cause the code to be too redundant. Redundancy is ugly, yes,
but it gives faster code because calling a sub-routine is always time
consuming, whatever the size of the sub-routine, and this increases with the
number of local variable in it, and worsen if its a very "deep" sub-routine.

Moreover, these sub-routines prevent from easily seeing two things : two
or more sub-routines could be merged and a sub-routine might even prove
useless.

For instance Shrinkbuf() and Crypt2() were merged within the "for" loop of
encryptp(), preparepw() and XorwithID() where modified and moved to the main()
(so that they were on a less "inner" level) and finally fowardcrypt(),
InitRevcalc() and revcrypt2() were simply removed.

Recursion vs. Iteration
----------------------

Recursion happens when a function is defined by itself. Ex :

function Get_Drunk(where_to_go)
{If Alcohol_Test is positive return home;
 Drink(Same_as(the_guy_next_to_you));
 If (the_guy_next_to_you=drunk)
   then Get_Drunk(another_bar)
   else Get_Drunk(same_place)
}

A recursive function usually needs a "seed" to start with (Get_Drunk(bar)),
but depending on your goal you will be looking for either a solution to a
problem (forward recursion, i.e. : how can I get drunk) or the problem which a
specific solution solves (reverse recursion, i.e. : what steps it took me to
get drunk before I got home).

If either your problem or your solution is unknown to you, and you came up
with a recursive algorithm to solve it, it may just be that there just isn't
any other way to do it. If there is another way that means using an Iterative
process where you can clearly define a beginning and an end to it.

One of the particularities of a recursive algorithm, is that usually you can't
say WHEN it will stop and worse IF it will stop. Moreover, recursive means
a function call each turn, and there's nothing more CPU consuming than a
function call.

The good news is that if you know the recursive algorithm always comes up
with a solution and in a determinable or even better fixed number of turns,
then you're assured there is an Iterative equivalent to your Recursion.

Some will argue that you can always turn a recursive function in an iterative
loop by stack saving the variables each step in the loop, but this doesnt
give good optimization.

Regarding the crypto.c, we had a two-level recursive call between calc_c and
calcElement, but some simples tests proved that they always gave their
solution in a fixed number of turns. Therefore the first thing was to track
down the "seed", by looking at how the algorithm gave U[0] (=Z[32]), and then
crack down one after the other the two functions to their Iterative
equivalent.

Cipher/Decipher Algorithm of Remote NLM, by TheRuiner ('dreamer')
-----------------------------------------------------------------

    'dreamer' cryptographically cracked Novell's  Remote.NLM  password
    encryption  algorithm.   It  is  a  very  weak  algorithm compared
    to what Novell  has  implemented in  NDS, as it is instantaneously
    decryptable.   RConsole   password   encryption  is different from
    Remote.NLM password encryption because:

    1) Encrypted  RConsole  passwords  are  sent across the wire,  via
       RConsole.   Remote.NLM's encrypted  passwords are  generated at
       the server console  by typing REMOTE  ENCRYPT MyPass, and  they
       are optionally stored in SYS:System\LDRemote.NCF.

    2) They use a  different password encryption algorithm.   RConsole
       passwords are encrypted  with information from  the workstation
       that is currently running  RConsole.  Remote.NLM passwords  are
       encrypted with  a time  byte, one  of 255  constants in  a hash
       table,  appended   characters,  some   XORing,  and   bit-order
       separation.

    3) Encrypted RConsole passwords are locally obtained with a packet
       sniffer, but  Remote.NLM passwords  are remotely  accessible to
       anyone with the ability to view SYS:System\LDRemote.NCF.

    The Remote.NLM passwords are decrypted using only five steps.   To
    encrypt,  simply  reverse  the  steps.   The  password  will  look
    something like this:

        AF8CBBF48CA9955F5ADAFDADAA23

    The structure of the password is as follows:

        AF8CBBF48CA99 55F5ADAFDADAA - 23

    The first section contains the low-order bits, and the second, the
    high-order bits.  23 is the time byte, which is decremented by the
    server once every two seconds, from FF to 02, then back up to  FF,
    etc.

    Step 1)  Realign the low-order bits and high-order bits
    =======================================================
    This is extremely simple to do.  The high-order bits are in  order
    from the  first character  to the  last, and  so are the low-order
    bits.  Example:

        Password: AF8CBBF48CA99 - 55F5ADAFDADAA,
        Output: 5A 5F F8 5C AB DB AF F4 D8 AC DA A9 A9

    At this point, ignore 5A 5F F8 5C, or the first four bytes.   They
    are appended somewhere during  encryption, and decrypt to  "%*@$".
    It  was  a  TERRIBLE  idea  for  Novell  to  implement  those four
    characters into  every single  password, as  those are  what helps
    rebuild their hash table from scratch.  Also, if the length of the
    password is 10, the password is automatically decryptable to nul.

    Step 2)
    =======
    Match  each  of  the  password   characters  (group  of  two   hex
    characters) to the hash table below.  Use their position from  the
    beginning of  the table  to determine  the value  of the  pre-hash
    encrypted  password.   Example:  F4,  the  8th  character  of  the
    password, matches the hash table at 95.  This means that 95 is the
    pre-hash  value  of  F4.   Thus  far,  (ignoring  the  first  four
    characters) the password was:

        AB DB AF F4 D8 AC DA A9 A9

    and now the password is:

        98 A0 9B 95 A1 9D A6 9C 9C

    Remote.NLM Hash Table

            00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

        00  5B 58 5E 5F 59 5C 5A 5D-73 70 76 77 71 74 72 75
        10  13 10 16 17 11 14 12 15-7B 78 7E 7F 79 7C 7A 7D
        20  53 50 56 57 51 54 52 55-03 00 06 07 01 04 02 05
        30  1B 18 1E 1F 19 1C 1A 1D-0B 08 0E 0F 09 0C 0A 0D
        40  2B 28 2E 2F 29 2C 2A 2D-63 60 66 67 61 64 62 65
        50  83 80 86 87 81 84 82 85-3B 38 3E 3F 39 3C 3A 3D
        60  8B 88 8E 8F 89 8C 8A 8D-33 30 36 37 31 34 32 35
        70  93 90 96 97 91 94 92 95-6B 68 6E 6F 69 6C 6A 6D
        80  9B 98 9E 9F 99 9C 9A 9D-A3 A0 A6 A7 A1 A4 A2 A5
        90  F3 F0 F6 F7 F1 F4 F2 F5-AB A8 AE AF A9 AC AA AD
        A0  DB D8 DE DF D9 DC DA DD-FB F8 FE FF F9 FC FA FD
        B0  23 20 26 27 21 24 22 25-B3 B0 B6 B7 B1 B4 B2 B5
        C0  CB C8 CE CF C9 CC CA CD-BB B8 BE BF B9 BC BA BD
        D0  C3 C0 C6 C7 C1 C4 C2 C5-D3 D0 D6 D7 D1 D4 D2 D5
        E0  43 40 46 47 41 44 42 45-E3 E0 E6 E7 E1 E4 E2 E5
        F0  4B 48 4E 4F 49 4C 4A 4D-EB E8 EE EF E9 EC EA ED

    Step 3)
    =======
    Subtract  the  length  (the  number  of  groups of hex characters,
    excluding  the  time  character)  of  the  full password from each
    encrypted password character.   Now you have  the ACTUAL  pre-hash
    encrypted  password.   If  the  subtracted  value is negative then
    simply continue from FF down to the negative value.  Example:   if
    the password character is at 04, and the length is 6, the value of
    the password character will be FF.   The length is 13 (D in  hex),
    so the password was:

        98 A0 9B 95 A1 9D A6 9C 9C

    and is now:

        8B 93 8E 88 94 90 99 8F 8F

    Step 4)
    =======
    Get the  time var,  in this  situation 23  (hex), and  subtract it
    from FF.  This new character is for use in Step 5.  Example:

        FF-23=DC.

    Step 5)
    =======
    Finally, XOR  each character  (group of  2 hex  characters) of the
    encrypted password with the new  time character, and you now  have
    the decrypted password!  The password was:

        8B 93 8E 88 94 90 99 8F 8F  (before the XOR)

    Now, the decrypted password is:

        57 4F 52 54 48 4C 45 53 53

    "WORTHLESS"

    The code to this algorithm was added to the Pandora API in Pandora rel.4    

______________________________________________________________________________
I hope all the crap pulled here proves some kind of interest (and criticism).
The result of all this is a speed increase of about 76 times compared to
the first release of the project. Have Fun. Jitsu-Disk (jitsu@nmrc.org).


