Google

星期二, 十二月 18, 2007

Learning Assembler with Delphi 4

Project 1 - Matrices
What follows is the definition of a new structure TMatrix. Together with explanations of the design decisions taken, you should find it a useful reference. A 2x2 matrix has been chosen because it is adequate to illustrate the ideas involved, and can be easily expanded to a 4x4 which is far more useful, especially in graphics where homogeneous coordinates are used.

First we must decide what this class is to be used for. 2x2 matrices are most commonly employed in describing basic geometric transformations in a 2-dimensional vector space. For simplicity we shall assume the use of a cartesian coordinate system. Given this, two data structure types are required, one for a vector or point, the other for the matrix. It will also be assumed that dynamic allocation of both vectors and matrices will be required.

Now we have some ground rules, let's consider just one more design point. To implement a vector space with only integer values, transformations will be limited to a subset of translations, reflections and skews. Using real numbers is the ideal, but quite honestly it's painful and there is a speed penalty. What we must realize, is that in a digital computer a 'real' number is really an approximation made using a quotient. Having taken this on board, and given that we are unlikely to use a plotting range in excess of -32767 to +32767 (does your monitor have a resolution in excess of 1600x1280 ?) , we shall store each coordinate as 65536 times its actual value. This gives us a quotient with which we can approximate quite accurately the range of real values we shall use. Moreover we can implement this using the standard 32bit integer type, thus enhancing performance. Recovering the 'integer' part of a value is achieved by dividing it's value by 65536 which is easily done using shr, similarly setting a value requires multiplying by 65536 for which we use shl. If these ideas are new to you, you've just learnt one of the key approaches in code optimization. Important note, when assigning and recovering values in Delphi, we must use multiplication and division, otherwise negative values get rather upset.

The type definitions are consequently straightforward.

type

{remember each stored value is 65536
times its actual value}

PVector = ^TVector;

{pointer type for dynamic instantiation}

TVector = record
X:integer;
Y:integer;
end;

{the vector is of the form X }
{ Y }

PMatrix = ^TMatrix;

TMatrix = record
A:integer;
B:integer;
C:integer;
D:integer;
end;

{the matirx is of the form A,B }
{ C,D }

{to avoid problems with memory allocation
each of the following functions will adjust
the actual value of the first parameter, the
second parameter will be considered a source,
as in assembler and as such remains unaffected}

function VectorAdd(Vdest,Vsrc:PVector):boolean;

{Vdest := Vdest + Vsrc}

function VectorSub(Vdest,Vsrc:PVector):boolean;

{Vdest := Vdest - Vsrc}

function VectorScale(Vdest:PVector;K:integer):boolean;

{Vdest := K * Vdest, Nb. K is 65536 times value again}

function VectorAssign(Vdest,Vsrc:PVector):boolean;

{Vdest := Vsrc}

function MatrixMul(Mdest,Msrc:PMatrix):boolean;

{Mdest := Msrc * Mdest, Nb. the order is important}

function MatrixIdent(Mdest:PMatrix):boolean;

{set Mdest to the identity matrix}

function MatrixScale(Mdest:PMatrix;K:integer):boolean;

{Mdest := K * Mdest, Nb K is 65536 times value again}

function MatrixAssign(Mdest,Msrc:PMatrix):boolean;

{Mdest := Msrc}

function Transform(Vdest:PVector;Msrc:PMatrix):boolean;

{Vdest := Msrc * Vdest}



Before going through the implementation of these functions, we have two more points to discuss. Firstly, error trapping. If any of these functions is sent a null or invalid pointer, a memory exception error will occur unless some form of checking is used. Checking for a null pointer is easy, but discovering whether a pointer is valid, this is nigh impossible. This aside, we should not loose sight of why we are using assembler. Speed is the essence. My own view is that for such routines as these, error checking is not required, as the cause of the error will be self evident in debugging the main delphi program. However, a simple null pointer check will be implemented in the first function as a simple example. Further error checking routines will be implemented in the next project, where identifying the causes of errors will not be so easy.

Secondly, as the reason for using assembler is speed, we should consider how this is best achieved. It is clear that given a generic NxN matrix, a generic algorithm capable of calculating the desired result for any element, nested in two loops, is the natural approach, and would produce a very memory efficient routine. If we were to explicitly write optimized code for each element of the matrix and tag these end to end we would avoid the time taken to execute any loop control code. Such an algorithm would only work for a given matrix size, and with even moderately sized matrices, would consume a far greater chunk memory. It would, however, be quicker. Given these two approaches, we must decide which route to take, and clearly with a 2x2 matrix the latter is preferable. It is also worth considering just how little memory is used by a line of assembler, typically 8 or 12 bytes, how many lines of assembler you are going to write for a routine, more than 1000 ?, and then look at how much memory your machine has, 16Mb+ perhaps ? You will find that in general, using more memory produces faster routines, only experience will tell you where to draw the line.

The implementation of the Vector and Matrix functions.

function VectorAdd
(Vdest,Vsrc:PVector):boolean;assembler;
asm
{as this is a stand alone function
eax holds the pointer Vdest and
edx holds the pointer Vsrc
remember ebx must be saved}

{the error checking I promised}
jmp @errorcheck
@seterrorcode:
mov eax,0
jmp @adddone
{eax holds return value,
0 is equivalent to False}
@errorcheck:
cmp eax,0
je @seterrorcode
cmp edx,0
je @seterrorcode
{error checking done, now start routine}
{Nb. addition and subtraction are unaffected by our
multiplying the values by 65536}
mov ecx,[eax]
{eax holds pointer to first value of
Vdest, so ecx = Vdest.X}
add ecx,[edx]
{ecx = Vdest.X + Vsrc.X}
mov [eax],ecx
{change Vdest.X}
add edx,4
add eax,4

{as the X element of a Vector record is an
integer it consumes four bytes of memory,
thus by adding four to eax and edx, both
now point to the Y element of the Vector
record}
mov ecx,[eax]
add ecx,[edx]
mov [eax],ecx
{update Y element of Vdest}
mov eax,1
{set return value to True}
@adddone:
end;

function VectorSub(Vdest,Vsrc:PVector):boolean;assembler;
asm
{no error checking, speed gain 30+%}
mov ecx,[eax]
sub ecx,[edx]
mov [eax],ecx
add eax,4
add edx,4
mov ecx,[eax]
sub ecx,[edx]
mov [eax],ecx
end;

function VectorScale
(Vdest:PVector;K:integer):boolean;assembler;
asm
{I'll go through this step by step,
and try not to loose you}
mov ecx,edx

{we're going to use the mul operation,
which overwrites the value of edx, since
edx holds the value of K, we move the
value to ecx}
push eax
{mul also changes eax, and we
need to store the pointer Vdest}
mov eax,[eax]
{before we multiply let's
look at what is stored where.
eax = 65536 * Vdest.X
ecx = 65536 * K
and the pointer Vdest is on the stack}
imul ecx
{the 64bit result of the mul operation is
65536 * 65536 * Vdest.X * K
and is stored in registers edx and eax
do the arithmetic and you'll find
edx = K * Vdest.X
eax = the remainder of the quotient}
shl edx,16
{we actually want 65536 * K * Vdest.X
so multiply edx by 65536, the shl operation does this}
shr eax,16

{to maintain accuracy we must add the
remainder of the quotient to edx. As eax
holds the 32bit remainder and we only want a
16bit remainder for edx. We divide eax by
65536 using shr}
xor edx,eax
{we now combine edx and eax using the
binary operation xor, we could use add, but
xor is generally quicker}
pop eax
{restore the value of eax, Vdest}
{At this point it's worth looking
at the register values
eax = Vdest
ebx = unchanged
ecx = 65536 * K
edx = 65536 * K * Vdest.X
and the stack is clear}
mov [eax],edx
{update value of Vdest.X}
add eax,4
{move pointer from Vdest.X to
Vdest.Y and do it all again}
push eax
mov eax,[eax]
imul ecx
shl edx,16
shr eax,16
xor edx,eax
pop eax
mov [eax],edx
end;

function VectorAssign
(Vdest,Vsrc:PVector):boolean;assembler;
asm
{by now you should be able to follow this}
mov ecx,[edx]
mov [eax],ecx
add edx,4
add eax,4
mov ecx,[edx]
mov [eax],ecx
end;

function MatrixMul(Mdest,Msrc:PMatrix):boolean;assembler;
var nA,nB,nC,nD,dest,src:integer;
{there are too many intermediary values to
keep track of, which makes using the stack confusing,
so we'll just define some local values for the
new element values of Mdest and the pointer
values of Mdest and Msrc}
asm
mov dest,eax
mov src,ecx
{save dest and src}
mov eax,[eax].TMatrix.A
mov ecx,[ecx].TMatrix.A
{a little earlier I mentioned index addressing, these two lines
ask the complier to calculate the offset required to address
the 'A' field of a TMatrix record and use the appropriate index
address. For example
[eax].TMatrix.B = [eax].TVector.Y = [eax+4]
or rather the offset to both 'B' and 'Y' is 4 bytes
Otherwise nothing else new in this function because I am
going to assume you know how to multiply two matrices}
imul ecx
shl edx,16
shr eax,16
xor edx,eax
mov nA,edx
mov eax,dest
mov ecx,src
mov eax,[eax].TMatrix.C
mov ecx,[ecx].TMatrix.B
imul ecx
shl edx,16
shr eax,16
xor edx,eax
add edx,nA
mov nA,edx
{new A calculated now start on new B}
mov eax,dest
mov ecx,src
mov eax,[eax].TMatrix.B
mov ecx,[ecx].TMatrix.A
imul ecx
shl edx,16
shr eax,16
xor edx,eax
mov nB,edx
mov eax,dest
mov ecx,src
mov eax,[eax].TMatrix.D
mov ecx,[ecx].TMatrix.B
imul ecx
shl edx,16
shr eax,16
xor edx,eax
add edx,nB
mov nB,edx
{new C}
mov eax,dest
mov ecx,src
mov eax,[eax]
mov ecx,[ecx].TMatrix.C
imul ecx
shl edx,16
shr eax,16
xor edx,eax
mov nC,edx
mov eax,dest
mov ecx,src
mov eax,[eax].TMatrix.C
mov ecx,[ecx].TMatrix.D
imul ecx
shl edx,16
shr eax,16
xor edx,eax
add edx,nC
mov nC,edx
{and finally D}
mov eax,dest
mov ecx,src
mov eax,[eax].TMatrix.B
mov ecx,[ecx].TMatrix.C
imul ecx
shl edx,16
shr eax,16
xor edx,eax
mov nD,edx
mov eax,dest
mov ecx,src
mov eax,[eax].TMatrix.D
mov ecx,[ecx].TMatrix.D
imul ecx
shl edx,16
shr eax,16
xor edx,eax
add edx,nD
{finish by updating Mdest, Nb. as edx holds final value of D
there is little point in updating the local variable nD}
mov eax,dest
mov [eax].TMatrix.D,edx
mov edx,nA
mov [eax].TMatrix.A,edx
mov edx,nB
mov [eax].TMatrix.B,edx
mov edx,nC
mov [eax].TMatrix.C,edx
end;

function MatrixIdent(Mdest:PMatrix):boolean;assembler;
asm
{no comments needed}
mov [eax].TMatrix.A,65536
mov [eax].TMatrix.B,0
mov [eax].TMatrix.C,0
mov [eax].TMatrix.D,65536
end;

function MatrixScale(Mdest:PMatrix;K:integer):boolean;assembler;
asm
{same as VectorScale but twice as long}
mov ecx,edx
push eax
mov eax,[eax].TMatrix.A
imul ecx
shl edx,16
shr eax,16
xor edx,eax
pop eax
mov [eax].TMatrix.A,edx
push eax
mov eax,[eax].TMatrix.B
imul ecx
shl edx,16
shr eax,16
xor edx,eax
pop eax
mov [eax].TMatrix.B,edx
push eax
mov eax,[eax].TMatrix.C
imul ecx
shl edx,16
shr eax,16
xor edx,eax
pop eax
mov [eax].TMatrix.C,edx
push eax
mov eax,[eax].TMatrix.D
imul ecx
shl edx,16
shr eax,16
xor edx,eax
pop eax
mov [eax].TMatrix.D,edx
end;

function MatrixAssign(Mdest,Msrc:PMatrix):boolean;assembler;
asm
{no comments needed}
mov ecx,[edx].TMatrix.A
mov [eax].TMatrix.A,ecx
mov ecx,[edx].TMatrix.B
mov [eax].TMatrix.B,ecx
mov ecx,[edx].TMatrix.C
mov [eax].TMatrix.C,ecx
mov ecx,[edx].TMatrix.D
mov [eax].TMatrix.D,ecx
end;

function Transform(Vdest:PVector;Msrc:PMatrix):boolean;assembler;
var nX,nY,dest,src:integer;
asm
{MatrixMul cut in half really}
mov dest,eax
mov src,edx
mov eax,[eax].TVector.X
mov ecx,[edx].TMatrix.A
imul ecx
shl edx,16
shr eax,16
xor edx,eax
mov nX,edx
mov eax,dest
mov ecx,src
mov eax,[eax].TVector.Y
mov ecx,[ecx].TMatrix.B
imul ecx
shl edx,16
shr eax,16
xor edx,eax
add edx,nX
mov nX,edx
{new X done}
mov eax,dest
mov ecx,src
mov eax,[eax].TVector.X
mov ecx,[ecx].TMatrix.C
imul ecx
shl edx,16
shr eax,16
xor edx,eax
mov nY,edx
mov eax,dest
mov ecx,src
mov eax,[eax].TVector.Y
mov ecx,[ecx].TMatrix.D
imul ecx
shl edx,16
shr eax,16
xor edx,eax
add edx,nY
{new Y done update Vdest}
mov eax,dest
mov [eax].TVector.Y,edx
mov edx,nX
mov [eax].TVector.X,edx
end;



Using the Vector and Matrix functions is very straightforward with few points to remember. Firstly, the functions assume that a pointer is to be passed as the parameter, but you won't forget this as the complier will remind you. This means, that to create vector and matrix variables, you must first define your variable with the appropriate PVector or PMatrix type in the required var section as in the example below.

procedure Example;
var aVector:PVector;aMatrix:PMatrix;
begin
new(aVector);
aVector^.X := 65536*32;
aVector^.Y := 65536*21;
new(aMatrix);
with aMatrix^ do
begin
A := round(65536 * 0.5);
B := round(65536 * cos(pi /3));
C := 3 * 65536;
D := 65536;
end;
Transform(aVector,aMatrix);
dispose(aVector);
dispose(aMatrix);
end;



Pascal's new() function is used to allocate memory for the new variable, this must be done otherwise the pointer will remain null with the consequences discussed earlier. Furthermore you must tidy up after yourself, that is to say, every new() must have an associated dispose() to free the memory you have used. Otherwise dynamically allocated variables are just like any other, except one must use the '^' to access the variable's fields.

It is likely you will use the PVector and PMatrix types with TList's as these deal explicitly with pointers and their manipulation.

You may be asking yourself why the Vector and Matrix were defined as a record rather than as an object as one would expect with Object Pascal. The reason is that each object carries with it the overhead of a virtual method table (see Delphi documentation) thus increasing the memory requirement for data storage. Records store just the data and when one considers that a half serious vector graphics application may deal with 500,000 vectors or more saving just one byte in each record soon makes a difference.

As an exercise you might like to write a function to find the inverse of a Matrix. For those of you with aspirations of writing a 3D graphics engine, you now have all the knowledge of assembler you will need to rewrite the vector and matrix definitions. All that is missing is an understanding of homogeneous equations, a good book on modern algebra will help here, and a method of fast screen access. I shall deal with the latter next.

标签:

0 条评论:

发表评论

<< 主页

辽ICP备05003652号
流风洄雪听天籁,轻云蔽日看落花

Powered by Blogger