BASM for beginners, lesson 1C
<<BASM ѧ>>  1  C

http://www.cnpack.org
QQ Group: 130970
룺SkyJacker
汾ݸ
״̬δУ
ʱ䣺2007

This is the third lesson in the BASM for beginners series. The first two lessons
introduced a number of integer instructions and this lesson will continue doing that.
The example function is this
 BASM ѧϵеĵ 3 Ρǰνָһνǡ
Ӻ£

function StrLen1(const Str: PChar) : Cardinal;
begin
Result := 0;
while (Str[Result] <> #0) do
Inc(Result);
end;

Its functionality is the same as the RTL function of the same name. It searches a PChar
string for the zero terminator and returns the length of the string. As usual we copy the
assembler code generated by the compiler from the cpu view.
Ĺ RTL ͬĺһ #0 βַַĳȡ
ǰһǴ Cpu View иƱĻ롣
function StrLen2(const Str: PChar) : Cardinal;
begin
Result := 0;
{
xor edx,edx
jmp +$01
}
while (Str[Result] <> #0) do
{
inc edx
cmp byte ptr [eax+edx],$00
jnz -$07
}
Inc(Result);
{
}
{
mov eax,edx
ret
}
end;
At first this code listing looks confusing. This is because the optimizer relocated code.
An example of this is the inc edx instruction that increments the Result variable. It is not
located under the Pascal line Inc(Result) where we expected it to be.

һۿȽҡΪŻˡ
һͨ edx һӡ
ǲΪ pascal е inc(Result) λá

Let us go through the code line by line and see what it does. The first line looks like this.
Ūÿһд롣һ£

Result := 0;

{
xor edx,edx

It clears the edx register. All bits are set to zero. Edx is allocated for the result variable.
The result of a function is returned in the eax register if it is an integer value.
Therefore we would expect eax being allocated for the Result variable, but eax is in use for
the input parameter Str. The compiler allocated edx, which can be used freely, for Result
temporarily and just before the functions exits it is copied into eax by the line
 edx ĴλΪ㣬Edx Ϊ˱ġ
һͨ eax ĴؽΪǻϣ eax Ϊ
 eax  str ʹá޸ĵ edx ʱ
ֻں˳ʱƵ eax У
{
mov eax,edx
ret
}
end;
The second line of assembler is a jump instruction that jumps 1 byte forward.
ڶдһǰ 1 ֽڵתָ
jmp +$01
}

This way the inc edx instruction that increments Result by one is bypassed.
inc edx ָʹһ
while (Str[Result] <> #0) do
{
inc edx
The inc edx instruction is a result of the Pascal code Inc(Result); and it would have looked
like this if the compiler did not relocate it.
inc edx ָ Pascal  Inc(Result) Ļ룻
û²ôӦ
Inc(Result);
{
inc edx
}

The while loop is compiled into three lines of assembler code of which the inc edx line is
the loop body and the two remaining lines are the loop control code.
while ѭ䱻л룬inc edx ѭڣʣµѭָ
while (Str[Result] <> #0) do
{
inc edx
cmp byte ptr [eax+edx],$00
jnz -$07
}

һ
cmp byte ptr [eax+edx],$00

compares a byte of the PChar string with zero. The Pascal code Str[Result] is generating
this code byte ptr [eax+edx]
Eax is a pointer to the beginning of the PChar and it is what the function received as the
Str parameter.

 Pchar ַһֽȽϡPascal  Str[Result] ˴ [eax+edx]
Eax ָ PChar ʼָ룬ɺ Str ġ

Edx is the Result variable. In the first loop iteration it is zero and the
first character of the string is compared to the immediate value $00, which is simply a
complicated way of writing 0. Because we only want to compare one character to zero at a
time it is necessary to express that the [eax+edx] pointer should be understood as a pointer
to a byte. The byte ptr code does this. A compare instruction sets the flags in the EFLAGS
register according to the result of the compare. The jump instruction

jnz -$07
}

tests the zero flag and jumps 7 bytes back if the flag is not zero. Jnz stands for Jump Not
Zero. If the pointer [eax+edx] is not pointing at a zero terminator the loop is iterated
once more.

Edx ǽѭĿʼ㣬ַĵһַ $00 Ƚϣ
һӵд 0 ־ķΪֻһһַȽ, Ҫһָ
ֽڵָָʾ [eax+edx]롡byte ptr ܡ
ȽָݱȽϵĽñ־ĴӦλ
תָ
jnz -$07
}
־λ־λ 7 ֽڡ
Jnz ʾΪʱת
ָ [eax+edx] ָβѭ

If we want to translate the function into a pure BASM function we have to investigate where
the two jumps are jumping to. This can be done by tracing through the code with the cpu view
open. We also saw earlier that the first jump bypassed the one byte instruction inc edx.
Therefore we need a label right after this line. Because I had a day where my fantasy was
sleeping I simply named it L1 for Label 1 ;-) It is also possible to use our understanding
of the code to realize that the last jump jumps to the start of the loop and the start of
the loop is just before the single loop body instruction inc edx.
Then the function looks like this.
Ҫ BASM 򲻵òоת
 cpu view ١
ע⵽ˣһתָƹ˵ָֽ inc edx
ˣһеҪһǩ
һҷΣ L1ʾ Lablel 1 :) ҲʹĴʵ
תָѭĿʼͬʱѭĿʼָ inc edx
£

function StrLen3(const Str: PChar) : Cardinal;
asm
//Result := 0;
xor edx,edx
jmp @L1
//while (Str[Result] <> #0) do
//Inc(Result);
@LoopStart :
inc edx
@L1 :
cmp byte ptr [eax+edx],$00
jnz @LoopStart
mov eax,edx
//ret
end;

We can make it look a little nicer by writing a zero the simple way and by removing the
outcommented ret instruction.
ǿһ̵д㷽ͬʱɾע͵ ret ָ
 
function StrLen4(const Str: PChar) : Cardinal;
asm
//Result := 0;
xor edx,edx
jmp @L1
//while (Str[Result] <> #0) do
//Inc(Result);
@LoopStart :
inc edx
@L1 :
cmp byte ptr [eax+edx], 0
jnz @LoopStart
mov eax,edx
end;

    Ƚָһִ 1 ֽڣ۵㽫ϱдһ飺
    cmp byte ptr [eax+edx], 0
    ΪУ
    mov cl, byte ptr [eax+edx]
    cmp cl, 0
    
Step through the function with the CPU view open and watch how the lowest byte of the ecx 
register holds the ASCII value of the character from the string under inspection.
 CPU View ееԣ۲ ecx Ĵֽڣַװַ ascii ֵ
The line 
cmp cl, 0

can be coded as ԱΪ
test cl, cl

This is the simplest form of a peephole optimization. Changing one instruction with another
that performs the same logic.
һ򵥿Żʽһִ߼ָͬǰָ
Another one is this   һ
xor edx,edx

"optimized" into  "Ż"Ϊ
mov edx, 0

The preferred way of zeroing a register on P4 is the first one as described at page 103 of
the Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual. This is also
true for other processors.

 P4 ѡ㷽ʽǵһ(xor edx,edx) Intel Pentium 4  Intel Xeon Żֲ
 103 ҳܡҲá

What new instructions did we learn? Xor, jmp, inc, cmp, test and jnz. We also learned how to
implement a loop and how to work with one byte of data at a time. The peephole optimization
technique was also introduced.
һѧЩָأ xor, jmp, inc, cmp, test  jnz
ҲѧִһѭһִһֽڡҲ˿Ż