Assembly Language Library by David Parker |

Home | Subscribe | List of Site Subscribers | Free Viewer Latest news: 5 Sept 2013 | Update to newest version: 5 Sept 2013 Math Art Gallery | Documentation | Links | Privacy | Contact |

"Tornado" (movie) by David Parker. |

NOTE: Until I get more time to work on this page, I'm just marking broken links instead of fixing them.

Free tools | Documentation | Tips and tricks

I always write my commercially available programs entirely in assembly language. This page contains bits of information that I've found to be useful.

Free
tools |

- Introduction
- Assemblers
- Linkers

Introduction |

Everything you need to program in assembly language for DOS or Windows (and probably for UNIX, Linux, etc.) is available for free on the internet. In this chapter I list how to obtain the newest versions (and sometimes older versions) of the tools that I actually use. The list is heavily slanted toward tools for writing true 32-bit programs for Windows 95/98/NT4 or later, although most of the tools can still create 16-bit programs for DOS or Windows 3.1.

Sometimes a program that works with an older version of a tool stops working
with a newer version. To avoid that problem, I keep different versions of the
tools I use in different subdirectories. These subdirectories all live in a top
level directory called` tools`

. Right now it looks like this:

Volume in drive C has no label. Volume Serial Number is B099-2C6C Directory of C:\tools [.] [..] [alink-1-6] [cvtres-5-00-1736-1] [h2inc-6-12a] [imagedit-3-10-031] [link-2-60-5046] [link-5-12-8078] [link-5-12-8181] [link-5-13] [masm-6-00b] [masm-6-11c] [masm-6-11d] [masm-6-13-7299] [masm-6-14-8444] [nasm-0-97] [pedump-1993] [rc-3-10] [rc-4-00-1367] [win32lib-1998-10-1] [windbg-4-1381] 21 File(s) 0 bytes 48,501,248 bytes free

I use dashes (minus signs) to separate the version information in subdirectory names instead of periods or underscores. I don't use periods because they are also used to separate file types from file names (for example, example-6-c is less confusing than example.6.c, which some systems will interpret as meaning that example.6 is written in c). I don't use underscores because most browsers underline HTML links, which sometimes hides the underscores (for example, masm_6_14_8444).

In the batch files that I use to build my programs I specifically state which version of each tool to use. Instead of a batch file that looks like this:

ml /coff /c /w /Sn /Sl132 /Sp80 /nologo /X project.asm link @link.tmp

I use something that looks like this:

\tools\masm-6-14-8444\ml /coff /c /w /Sn /Sl132 /Sp80 /nologo /X project.asm \tools\link-5-12-8181\link @link.tmp

In parentheses after the name of each tool in this chapter is the name that I use for that tool's subdirectory.

Microsoft Macro Assembler
(masm-6-11d) |

These instructions are nearly the same as those for downloading the Microsoft 32-bit Linker, so you may want to grab a copy of that while you're getting the assembler.

- Click on the following two links to download two files that you'll need.
Make sure that the directory is set to Desktop when your browser prompts you for
a location to save in.
- http://www.microsoft.com/ddk/download/98/BINS_DDK.EXE (broken link) (2.9 MB).
- http://www.microsoft.com/ddk/download/98/98SETUP.EXE (broken link) (2.3 MB).

- To extract the files, double click on each of the two new icons on your
Desktop. When you are prompted for a location for the extracted files, type
in:
`c:\98ddk`

**Warning:**If you already have a directory called`c:\98ddk`

, then use another name because you might be mixing old and new versions of files in your current`c:\98ddk`

directory. The order in which you extract the files doesn't matter. After you have extracted the files you can drag the icons to the Recycle Bin because you won't need them anymore. - To finish installing the files, first click on Start, then click on Run,
then type in:
`c:\98ddk\setup.exe`

then click on OK, and then follow the instructions. - Make a directory for MASM 6.11d (for example
`c:\tools\masm-6-11d`

). Now copy the following files from`c:\98ddk\bin\win98`

to`c:\tools\masm-6-11d`

:`ml.exe`

ml.err - There are other interesting files in the
`c:\98ddk`

directory, but if you don't need any of them, you can uninstall the whole directory. To uninstall, go to Control Panel, go to Add/Remove Programs, select Microsoft 98 DDK in the bottom window, then select Add/Remove.

Microsoft Macro
Assembler (masm-6-14-8444) |

- First, follow the directions above to install MASM 6.11d.
- Create a new directory for MASM 6.14.8444 (for example,
`c:\tools\masm-6-14-8444`

) and copy all of the files from`c:\tools\masm-6-11d`

. - Click on the following link to download the file ML614.exe. When your
browser prompts you for a directory to save it in, enter
`c:\tools\masm-6-14-8444`

.- http://support.microsoft.com/download/support/mslfiles/ML614.exe (broken link) (0.9 MB).

- To extract the patch files, first click on Start, then click on Run, then
type in:
`c:\tools\masm-6-14-8444\ml614.exe`

and then click on OK. If the program leaves a DOS window open, feel free to close it. - To finish patching the Macro Assembler, first click on Start, then click on
Run, then type in:
`c:\tools\masm-6-14-8444\patch.exe`

and then click on OK.

Netwide Assembler
(nasm-2-01) |

NASM 2.01 is the current stable release of the Netwide Assembler. It is available from many sites, for many operating systems. Follow the links from the NASM home page at http://nasm.sourceforge.net/ for more details.

Anthony's Linker
(alink-1-6) |

You can download the latest version of Anthony's Linker (alink-1-6) for Windows 95/98/NT4 or later from http://alink.sourceforge.net/download.html (42 KB).

Anthony's Programming Page at http://alink.sourceforge.net/ has information on other useful programs, including other versions of ALINK and ALINK's source code.

Microsoft 16-bit Linker
(link-5-60-339) |

The latest version of the Microsoft 16-bit Linker (link-5-60-339) for
creating DOS programs is available from
ftp://ftp.microsoft.com/softlib/MSLFILES/LNK563.EXE (broken link)
(281 KB). Create a directory for the linker (for example,```
c:\tools\link-5-60-339
```

), download the file to the directory, and then
run` LNK563.EXE `

to install the linker.

Even though the version number is greater than the 32-bit linker, the 16-bit linker is older.

Microsoft 32-bit Linker
(link-5-12-8181) |

These instructions are nearly the same as those for downloading the Microsoft Macro Assembler, so you may want to grab a copy of that while you're getting the linker.

- Click on the following two links to download two files that you'll need.
Make sure that the directory is set to Desktop when your browser prompts you for
a location to save in.
- http://www.microsoft.com/ddk/download/98/BINS_DDK.EXE (broken link) (2.9 MB).
- http://www.microsoft.com/ddk/download/98/98SETUP.EXE (broken link) (2.3 MB).

- To extract the files, double click on each of the two new icons on your
Desktop. When you are prompted for a location for the extracted files, type
in:
`c:\98ddk`

**Warning:**If you already have a directory called`c:\98ddk`

, then use another name because you might be mixing old and new versions of files in your current`c:\98ddk`

directory. The order in which you extract the files doesn't matter. After you have extracted the files you can drag the icons to the Recycle Bin because you won't need them anymore. - To finish installing the files, first click on Start, then click on Run,
then type in:
`c:\98ddk\setup.exe`

then click on OK, and then follow the instructions. - Make a directory for Link 5.12.8181 (for example
`c:\tools\link-5-12-8181`

). Now copy the following files from`c:\98ddk\bin`

to`c:\tools\link-5-12-8181`

:`link.exe`

mspdb50.dll - There are other interesting files in the
`c:\98ddk`

directory, but if you don't need any of them, you can uninstall the whole directory. To uninstall, go to Control Panel, go to Add/Remove Programs, select Microsoft 98 DDK in the bottom window, then select Add/Remove.

Documentation |

- (http://webster.cs.ucr.edu/Page_asm/ArtOfAsm.html broken link) The Art of Assembly Language Programming, by Randall Hyde
- (http://developer.intel.com/design/ia64/devinfo.htm broken link) IA-64 Application Developer's Architecture Guide
- (http://developer.intel.com/design/pentiumiii/manuals/ broken link) Intel Architecture Software Developer's Manual
- (http://developer.intel.com/design/pentiumiii/manuals/ broken link) Intel Architecture Software Optimization Reference Manual
- Microsoft Developer Network Library
- (http://support.microsoft.com/support/search/c.asp broken link) Microsoft Knowledge Base
- (http://webster.cs.ucr.edu/Page_asm/moreasm/asmstyle.htm broken link) Assembly Language Style Guidelines, by Randall Hyde

Tips and
tricks |

- Divide by a constant
- Pop with Esp as base register gotcha
- Test for unsigned integer power of two
- Double precision unsigned integer division
- Saturated unsigned integer addition
- Absolute value of an integer
- Convert from binary to ASCII hex
- Divide a signed dividend by an unsigned divisor
- Use the parity flag to test three values with one instruction
- Parity flag gotcha

Divide by a
constant |

There are many ways to avoid the expense of an integer multiply instruction
when multiplying by small integer constants. For example, multiplying```
Eax
```

by 2 is as easy as:

; Eax = integer. Add Eax,Eax ; Multiply Eax by 2. ; Eax = 2*integer.

Or, multiplying` Eax `

by 9 is as easy as:

; Eax = integer. Lea Eax,[Eax+8*Eax] ; Multiply Eax by 9. ; Eax = 9*integer.

Unfortunately, there don't seem to be any similar tricks for avoiding the
expense of an integer divide, unless the constant you are dividing by is a power
of 2. For example, to divide an unsigned integer in` Edx `

by 4, you
could use:

; Edx = unsigned integer. Shr Edx,2 ; Divide Edx by 4. ; Edx = Edx/4, rounded toward 0.

or, if` Edx `

is a signed integer:

; Edx = signed integer. Sar Edx,2 ; Divide Edx by 4. ; Edx = Edx/4, rounded toward minus infinity.

The closest trick that I have seen for constants that are not powers of two
is to multiply by 2^32/(integer constant), making sure that 2^32/(integer
constant) is rounded towards infinity. For example, to divide```
Edx
```

by 10, you could use the following:

; Edx = unsigned integer. Mov Eax,(1 Shl 31)/5+1 ; 2^32/10 = 2^31/5, plus 1 for rounding. Mul Edx ; Multiply Edx by 1/10. ; Edx = Edx/10, rounded toward 0, Eax = destroyed.

On many machines the` Mul `

instruction isn't all that speedy, so
this trick is of limited usefulness. On the other hand, if you are using
floating point numbers then multiplying by 1/constant is usually much more
efficient than dividing by the constant.

Pop with Esp as base
register gotcha |

Suppose you want to copy a` DWord `

from somewhere on the stack to
the bottom of the stack. If the` DWord `

is at offset 16 (for
example) from` Esp`

, then the simplest way to copy it to the bottom
of the stack is:

; DWord to copy is at [Esp+16]. Push [Esp+16] ; Push DWord at [Esp+16]. ; New copy is at [Esp], original DWord is now at [Esp+20].

Now suppose our program modifies the copied` DWord `

at```
[Esp]
```

, and we want to copy it back to the original` DWord`

,
which is now at` [Esp+20]`

. The naive approach would be to try
something like:

; Modified DWord is at [Esp], original DWord is at [Esp+20]. Pop [Esp+20] ; Pop DWord from [Esp] to [Esp+20] ; Modified DWord is now at [Esp+16].

Unfortunately, this would be wrong. Unlike any other instruction,```
Pop
```

calculates the address of the operand as though the instruction has
already executed. That means that` Esp `

will be 4 more than
expected. The correct code sequence is:

; Modified DWord is at [Esp], original DWord is at [Esp+20]. Pop [Esp+16] ; Pop DWord from [Esp] to [Esp+20] ; Modified DWord is now at [Esp+16].

Test for unsigned
integer power of two |

You can use a classic assembly language trick to test if a unsigned integer is a power of two. In other words, this trick enables you to test an integer to see if one, and only one, bit is set to 1 (and thus the other bits are all set to 0's).

For example, the integers 00100b and 10000b are unsigned integer powers of two because only one bit in each is set to 1, but the integer 11100b is not an unsigned integer power of two because more than one bit is set to 1. The integer 00000b is not an unsigned integer power of two because no bits are set to 1.

The trick relies on a combination of arithmetic and logical operations, and the peculiarities of two's complement arithmetic. The basic idea is that if you bitwise-and an integer with its own negative, and if the result equals the original integer, then one, and only one, bit in the original integer was set to 1, or else the original integer was 0. For example, 00100b gives (-00100b) And 00100b = 11100b And 00100b = 00100b, so 00100b is an unsigned integer power of 2. For another example, 11100b gives (-11100b) And 11100b = 00100b And 11100b = 00100b, so 11100b is not an unsigned integer power of 2. The result of bitwise-anding an integer with its own negative gives the lowest order 1 bit in the integer, or 0 if the integer was 0.

The following code snippet uses the` Eax `

and```
Ebx
```

registers, but any two integer registers excluding` Esp `

will
work. To sanitize the source code for inclusion in this HTML file, I changed
all "<" characters to "<", and all ">" characters to ">".
If you use this code in your program, you might want to reverse those
transformations.

; Eax contains integer to test, Ebx = scratch register. Xor Ebx,Ebx ; Eax = integer, Ebx = 0. Sub Ebx,Eax ; Eax = integer, Ebx = -integer. Jz Zero ; If -integer = 0, integer = 0. And Ebx,Eax ; Eax = integer, Ebx = (-integer) And integer. Cmp Ebx,Eax ; Eax = integer, Ebx = (-integer) And integer. Jne NotPowerOfTwo ; If integer<>(-integer) And integer, multiple 1 bits. ... ; If integer=(-integer) And integer, only one 1 bit. ; Eax contains integer to test, Ebx = lowest order 1 bit in Eax, else 0.

If you don't need the lowest order 1 bit at the end of the sequence, then you can use a shorter alternative. It is based on the fact that if you bitwise-and an integer with one less than the integer, then the result is zero only if the integer was a power of two, or if the integer was zero.

; Eax contains integer to test, Ebx = scratch register. Lea Ebx,[Eax-1] ; Eax = integer, Ebx = integer-1. And Ebx,Eax ; Eax = integer, Ebx = (integer-1) And integer. Jnz NotPowerOfTwo ; If result is non-zero, integer is not 0 or power of 2. Test Eax,Eax ; Was integer 0? Jz Zero ; Yep. ... ; Integer was a power of 2 (i.e. one and only one 1 bit was set). ; Eax contains integer to test, Ebx = integer minus lowest order 1 bit, else 0.

Double precision
unsigned integer division |

Here's a tricky algorithm I use when I need to find an exact double precision unsigned integer quotient and remainder. The place I seem to use it the most is in 3D edge clipping.

To sanitize the source code for inclusion in this HTML file, I changed all "<" characters to "<", and all ">" characters to ">". If you use this code in your program, you might want to reverse those transformations.

One simple way you might want to customize this code is to check at the beginning if both operands are single precision (right now it only checks to see if the divisor is single precision), and in that case performing only one single precision divide. I didn't do it when I wrote this code because the dividends I was dealing with at the time were almost never single precision.

; Divides the double precision quantity a1*b+a2 by c1*b+c2, where b is the base ; (in this case 2^32), and a1, a2, c1, and c2 are all in the range 0 to b-1, ; inclusive. If both c1 and c2 are zero, then a divide exception occurs. In ; all other cases, a double precision quotient and remainder are returned. ; ; By David Parker, from www.dpgraph.com/assembly.html ; ; Expects on entry: Eax=c2 ; Ebx=a2 ; Ecx=a1 ; Edx=c1 ; ; Returns on exit: Eax=lo dword of quotient. ; Ebx=lo dword of remainder. ; Ecx=hi dword of remainder. ; Edx=hi dword of quotient. ; ; The algorithm handles the division using 3 special cases. ; ; CASE 1. ; ; The first case is if c1=0. ; We want to find numbers q1 and r1 such that ; ; (a1*b+a2) = q1*c2 + r1, where 0 <= r1 < c2. ; ; We note that 0 <= q1 <= (b-1)*b+(b-1). The minimum occurs, for example, ; when a1=a2=0, and c2 = 1. The maximum occurs when a1=a2=(b-1) and c2=1. ; We first divide a1 by c2 to get q2 and r2 such that: ; ; a1 = q2*c2 + r2, where 0 <= r2 < c2. ; ; The bounds on q2 are 0 <= q2 <= (b-1), the minimum occurring at a2=0, ; the maximum at a1=(b-1), c2=1. ; We then divide r2*b+a2 by c2 to get q3 and r3 such that: ; ; r2*b + a2 = q3*c2 + r3, where 0<=r3<c2. ; ; The bounds on q3 are 0 <= q3 <= (b-1), the minimum occurring at r2=a2=0, ; the maximum at r2=(c2-1), a2=(b-1). ; Starting from (a1*b+a2), we then start substituting: ; ; (a1*b+a2) ; = (q2*c2+r2)*b + a2 ; = q2*c2*b + r2*b + a2 ; = q2*c2*b + q3*c2 + r3 ; = (q2*b+q3)*c2 + r3. ; ; Comparing this to the equation for q1 and r1, we see that since 0 <= r3 < c2, ; that q1=(q2*b+q3) and r1=r3. ; ; CASE 2. ; ; The second special case is if 1 <= c1 < (b-1). In this case, we want to find ; q1 and r1 such that: ; ; a1*b+a2 = q1*(c1*b+c2) + r1, where 0 <= r1 < (c1*b+c2). ; ; We note that 0 <= q1 <= (b-1), the minimum occurring if a1=a2=0, c1=c2=(b-1), ; and the maximum occurring when c1=1,c2=0,a1=a2=(b-1). ; First, we divide (a1*b+a2) by (c1+1) to get q2 and r2 such that: ; ; (a1*b+a2) = q2*(c1+1) + r2, where 0 <= r2 < (c1+1). ; ; Q2 may be a double precision quantity. ; Next, we divide (c1*b+c2) by (c1+1) to get q3 and r3 such that: ; ; (c1*b+c2) = q3*(c1+1) - r3, where 0 <= r3 < (c1+1). ; ; We note that 1 <= q3 <= b. ; The minus sign in front of r3 is important, so that later things come out ; positive. ; ; Next, we divide q2 by q3 (which, if q3 = b, means we don't have to divide), ; to get q4 and r4 such that: ; ; q2 = q4*q3 + r4, where 0 <= r4 < q3. ; ; Substituting in this last equation for q2 and q3, and rearranging a little ; gives us: ; ; (a1*b+a2) = q4*(c1*b+c2) + q4*r3 + r2 + r4*(c1+1). ; ; We will call this last remainder r5: r5 = q4*r3 + r2 + r4*(c1+1). ; We will now show that either q4=q1, or q4=q1-1. We note that since all the ; terms in r5 are positive, we have that q4 <= q1. To see how large r5 can ; get, we first note that since r4<q3, then: ; ; r5< q4*r3 + r2 + (c1*b+c2+r3)/(c1+1)*(c1*1), or ; r5< q4*r3 + r2 + (c1*b+c2+r3), or ; r5< (q4+1)*r3 + r2 + (c1*b+c2). ; ; Since r3 and r2 are both < (c1+1), we can substitute c1 for them to get ; ; r5< (q4+1)*c1 + c1 + (c1*b +c2), or ; r5< (q4+2)*c1 + (c1*b+c2). ******************************* ; ; Now, since q4 <= q1 and q1 <= (b-1), we have that q4 <= (b-1), so that ; ; r5 < (b+1)*c1 + (c1*b+c2), or ; r5 < c1*b + c1 + c1*b + c2, or ; r5 < 2*c1*b + c1 + c2. ; ; Finally, since c1*(b-1)+2*c2 > 0, we have ; ; r5 < 2*c1*b + c1 + c2 + c1*(b-1) + 2*c2, or ; r5 < 3*c1*b + 3*c2, or ; r5 < 3*(c1*b+c2). ; ; Thus, we must have that q1-2 <= q4 <= q1. However, we can examine more ; closely the case that q1-2=q4 to show that it never occurs. If we assume ; that q4=q1-2, then we must have that r5 >= 2*(c1*b+c2). Repeating the ; equation with the long line of asterisks, we know that ; ; r5 < (q4+2)*c1 + (c1*b+c2). ; ; If we assume that q4=q1-2, then we have ; ; r5 < q1*c1 + (c1*b+c2). ; ; We know that q1 <= (b-1), so we have ; ; r5 < (b-1)*c1 + (c1*b+c2), or ; r5 < 2*c1*b - c1 + c2. ; ; Since c1+c2 >= 0, we have that ; ; r5 < 2*c1*b - c1 + c2 + c1 + c2, or ; r5 < 2*c1*b + 2*c2, or ; r5 < 2*(c1*b+c2). ; ; This contradicts our assumption that q4=q1-2, so that case can never occur. ; It is fairly easy to construct cases where q4=q1-1, so our final result for ; the case 1 <= c1 < (b-1) is that ; ; q1-1 <= q4 <= q1, and 0 <= r5 < 2*(c1*b+c2). ; ; Thus, once we have calculated q4 and r5, we can calculate q1 and r1 by ; comparing r5 to (c1*b+c2), and if r5 >= (c1*b+c2), we increment q4 by 1 and ; decrement r5 by (c1*b+c2). ; ; There is some trickiness involved in calculating the remainder r5. First, we ; need to show that r5 never exceeds the value that can be stored as a ; double precision quantity. We can start by determining a bound on q1 in ; terms of c1. We have: ; ; q1=(a1*b+a2-r1)/(c1*b+c2) ; ; q1 will be at a maximum when the numerator is as large as possible (so that ; a1=a2=(b-1) and r1=0), and the denominator is as small as possible (so that ; c2=0). Thus the maximal value of q1 for a given value of c1 is ; ; q1=((b-1)*b+(b-1))/(c1*b) ; ; Knowing that the maximal values for r2, r3, and r4 are c1, c1, and (b-1), ; respectively, and that q4<=q1, we can plug these into the expression for r5 ; to get: ; ; r5<=((b-1)*b+(b-1))/(c1*b)*c1 + c1 + (b-1)*(c1+1), or ; r5<=(c1+2)*b-1-1/b ; ; Since the maximum possible value for c1 is (b-2), we then have ; ; r5<=b^2-1-1/b ; ; Since 1/b is positive, we can add it to the right so that ; ; r5<b^2-1 ; ; Noting that the maximum double precision value is (b-1)*b+(b-1)=b^2-1, we ; see that calculating r5 will never overflow double precision. ; ; We can make the calculation of r5 slightly more efficient by noting that the ; term q4*r3 never overflows single precision. We have to start: ; ; q4*r3<=q1*r3 ; ; Using our above limit on q1 in terms of c1, we have ; ; q4*r3<=((b-1)*b+(b-1))/(c1*b)*r3 ; ; Using the fact that 0<=r3<(c1+1), we can substitute c1 for r3 to get: ; ; q4*r3<=((b-1)*b+(b-1))/(c1*b)*c1, or ; q4*r3<=((b-1)*b+(b-1))/b, or ; q4*r3<=b-1/b. ; ; Since 1/b is positive, we have ; ; q4*r3<b, or ; q4*r3<=(b-1) ; ; Since b-1 is the maximum single precision integer, we see that q4*r3 never ; overflows single precision. ; ; CASE 3. ; ; The third case is if c1=(b-1). ; Then, since 0<=a1<=(b-1), we have one of the two possibilities: ; ; If a1=(b-1) and a2>=c2, then q1=1 and r1=a2-c2, otherwise ; ; q1=0, and r1=a1*b+a2. DivDDDD_EcxEbxDDEdxEaxDR_EdxEaxQUEcxEbxRM: Test Edx,Edx ; Is c1 zero? Jnz DivDDDD_Case2 ; Nope, go try something else. Xchg Eax,Ecx ; Yep, Eax=a1,Ebx=a2,Ecx=c2,Edx=0. Div Ecx ; Eax=q2,Ebx=a2,Ecx=c2,Edx=r2. Xchg Eax,Ebx ; Eax=a2,Ebx=q2,Ecx=c2,Edx=r2. Div Ecx ; Eax=q3,Ebx=q2,Ecx=c2,Edx=r3. Xchg Ebx,Edx ; Eax=q3,Ebx=r3,Ecx=c2,Edx=q2. Xor Ecx,Ecx ; Eax=q3,Ebx=r3,Ecx=0,Edx=q2. Retn ; Return to caller. DivDDDD_Case2: Inc Edx ; Eax=c2,Ebx=a2,Ecx=a1,Edx=c1+1. Jz DivDDDD_Case3 ; If c1=65535 (c1+1=65536=0), go to case 3. Push Esi ; Stack Esi Mov Esi,Eax ; Eax=c2,Ebx=a2,Ecx=a1,Edx=c1+1,Esi=c2. Push Edi ; Stack Edi. Mov Edi,Edx ; Eax=c2,Ebx=a2,Ecx=a1,Edx=c1+1,Esi=c2,Edi=c1+1. Push Ebp ; Stack Ebp. Xor Edx,Edx ; Eax=a1,Ebx=a2,Ecx=c2,Edx=0,Esi=c2,Edi=c1+1. Xchg Eax,Ecx ; Eax=a1,Ebx=a2,Ecx=c2,Edx=c1+1,Esi=c2,Edi=c1+1. Div Edi ; Eax=Q2,Ebx=a2,Ecx=c2,Edx=Ecx,Esi=c2,Edi=c1+1. Xchg Eax,Ebx ; Eax=a2,Ebx=Q2,Ecx=c2,Edx=Ecx,Esi=c2,Edi=c1+1. Div Edi ; Eax=q2,Ebx=Q2,Ecx=c2,Edx=r2,Esi=c2,Edi=c1+1. Xchg Eax,Ecx ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=r2,Esi=c2,Edi=c1+1. Mov Ebp,Edx ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=r2,Esi=c2,Edi=c1+1,Ebp=r2. Mov Edx,Edi ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=c1+1,Esi=c2,Edi=c1+1,Ebp=r2. Dec Edx ; Eax=c2,Ebx=Q2,Ecx=q2,Edx=c1,Esi=c2,Edi=c1+1,Ebp=r2. Div Edi ; Eax=tq3,Ebx=Q2,Ecx=q2,Edx=tr3,Esi=c2,Edi=c1+1,Ebp=r2. Xchg Ebx,Edx ; Eax=tq3,Ebx=tr3,Ecx=q2,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Xchg Ecx,Eax ; Eax=q2,Ebx=tr3,Ecx=tq3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Test Ebx,Ebx ; Is r3 zero? Jz DivDDDD_DoDiv ; Yep, no need to invert r3 or increment q3. Sub Ebx,Edi ; Eax=q2,Ebx=-r3,Ecx=tq3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Neg Ebx ; Eax=q2,Ebx=r3,Ecx=tq3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Inc Ecx ; Eax=q2,Ebx=r3,Ecx=q3,Edx=Q2,Esi=c2,Edi=c1+1,Ebp=r2. Jz DivDDDD_Rem ; If q3=65536, no need to do next division. DivDDDD_DoDiv: Div Ecx ; Eax=q4,Ebx=r3,Ecx=q3,Edx=r4,Esi=c2,Edi=c1+1,Ebp=r2. Xchg Eax,Edx ; Eax=r4,Ebx=r3,Ecx=q3,Edx=q4,Esi=c2,Edi=c1+1,Ebp=r2. DivDDDD_Rem: Xchg Eax,Ebx ; Eax=r3,Ebx=r4,Ecx=q3,Edx=q4,Esi=c2,Edi=c1+1,Ebp=r2. Mov Ecx,Edx ; Eax=r3,Ebx=r4,Ecx=q4,Edx=q4,Esi=c2,Edi=c1+1,Ebp=r2. Mul Edx ; Eax=r3*q4,Ebx=r4,Ecx=q4,Edx=0,Esi=c2,Edi=c1+1,Ebp=r2. Add Eax,Ebp ; Eax=r3*q4+r2,Ebx=r4,Ecx=q4,Edx=0,Esi=c2,Edi=c1+1. Adc Edx,Edx ; Eax=r3*q4+r2,Ebx=r4,Ecx=q4,Edx=Edx*Q4+Ecx,Esi=c2, ; Edi=c1+1. Mov Ebp,Ecx ; Eax=r3*q4+r2,Ebx=r4,Ecx=q4,Edx=Edx*Q4+Ecx,Esi=c2, ; Edi=c1+1,Ebp=q4. Mov Ecx,Edx ; Eax=r3*q4+r2,Ebx=r4,Ecx=Edx*Q4+Ecx,Edx=Edx*Q4+Ecx, ; Esi=c2,Edi=c1+1,Ebp=q4. Xchg Eax,Ebx ; Eax=r4,Ebx=r3*q4+r2,Ecx=Edx*Q4+Ecx,Edx=Edx*Q4+Ecx, ; Esi=c2,Edi=c1+1,Ebp=q4. Mul Edi ; Eax=r4*(c1+1),Ebx=r3*q4+r2,Ecx=Edx*Q4+Ecx, Xchg Eax,Ebp ; Eax=q4,Ebx=r5,Ecx=Edi,Edx=Esi*(C1+1),Esi=c2,Edi=c1+1. ; Edx=Esi*(C1+1),Esi=c2,Edi=c1+1,Ebp=q4. Dec Edi ; Eax=q4,Ebx=r5,Ecx=Edi,Edx=0,Esi=c2,Edi=c1. Add Ebx,Ebp ; Eax=r4*(c1+1),Ebx=r5,Ecx=Edx*Q4+Ecx,Edx=Esi*(C1+1), ; Esi=c2,Edi=c1+1,Ebp=q4. Adc Ecx,Edx ; Eax=r4*(c1+1),Ebx=r5,Ecx=Edi,Edx=Esi*(C1+1),Esi=c2, ; Edi=c1+1,Ebp=q4. Xor Edx,Edx ; Eax=q4,Ebx=r5,Ecx=Edi,Edx=0,Esi=c2,Edi=c1+1. Pop Ebp ; Unstack Ebp. Cmp Ecx,Edi ; Compare Edi to c1. Ja DivDDDD_DecRe ; If Edi>c1, then we need to decrease remainder. Jb DivDDDD_RemOk ; If Edi<c1, then the remainder is okay. Cmp Ebx,Esi ; Compare r5 to c2. Jb DivDDDD_RemOk ; If r5<c2, then the remainder is okay. DivDDDD_DecRe: Inc Eax ; Eax=q4+1,Ebx=r5,Ecx=Edi,Edx=0,Esi=c2,Edi=c1. Sub Ebx,Esi ; Eax=q4+1,Ebx=r5-c2,Ecx=Edi,Edx=0,Edi=c1. Sbb Ecx,Edi ; Eax=q4+1,Ebx=r5-c2,Ecx=Edi-c1,Edx=0. DivDDDD_RemOk: Pop Edi ; Unstack Edi. Pop Esi ; Unstack Esi. Retn ; Return to caller. DivDDDD_Case3: Inc Ecx ; Eax=c2,Ebx=a2,Ecx=a1+1,Edx=0. Jnz DivDDDD_Q02 ; If a1<65535, then quotient is 0. Sub Ebx,Eax ; Eax=c2,Ebx=a2-c2,Ecx=0,Edx=0. Jb DivDDDD_Q01 ; If a2<c2, then quotient is 0. Mov Eax,1 ; Eax=1,Ebx=a2-c2,Ecx=0,Edx=0. Retn ; Return to caller. DivDDDD_Q01: Add Ebx,Eax ; Eax=c2,Ebx=a2,Ecx=a1+1,Edx=0. DivDDDD_Q02: Dec Ecx ; Eax=c2,Ebx=a2,Ecx=a1,Edx=0. Mov Eax,Edx ; Eax=0,Ebx=a2,Ecx=a1,Edx=0. Retn ; Return to caller.

Saturated unsigned
integer addition |

When adding two unsigned integers, if the answer overflows we sometimes want to reset the result back to a maximum value. This is called saturated addition, and it is often used in graphics operations. For example, if we are adding two color intensities, each of which ranges from 0 to 255, and the answer exceeds 255, then we usually want to reset the sum back to 255.

Many of the newer Intel-compatible CPU's support saturated addition in hardware via MMX instructions. However, detecting and invoking the MMX instructions is often more of a hassle than it's worth, and the MMX instructions aren't even supported on many older CPU's, such as Intel's original Pentium.

Fortunately, saturated addition is fairly efficient to implement using standard instructions if the saturation value is 255 in the case of byte operations, 65535 in the case of word operations, or 2^32-1 in the case of dword operations. Here is the code for the byte case; the code is analogous for the word and dword cases:

; Al contains sum, Bl contains value to add. Add Al,Bl ; Increment sum. Sbb Bl,Bl ; Bl = 0 if no overflow (no carry), or 255 if overflow. Or Al,Bl ; Al = sum if no overflow (no carry), or 255 if overflow. ; Al contains sum or 255, Bl contains 0 or 255.

Absolute value of an
integer |

One of the classic assembly language tricks is taking the absolute value of a 2's complement integer. The naive method would be:

; Eax contains integer. Test Eax,Eax ; Is Eax negative? Jns GotAbsoluteValue ; If sign bit is 0, integer is already non-negative. Neg Eax ; Negate a negative integer to get a positive integer. GotAbsoluteValue: ; Eax contains abs(integer).

This method is good for clarity and for using the minimum number of registers, but is bad for speed because of the test and jump. The classic trick is:

; Eax contains integer. Cdq ; Edx = -1 if Eax negative, Edx = 0 otherwise. Xor Eax,Edx ; Eax = -integer-1 if Eax negative, Eax = integer otherwise. Sub Eax,Edx ; Eax = -integer if Eax negative, Eax = integer otherwise. ; Eax contains abs(integer), Edx = -1 if Eax was negative, Edx = 0 otherwise.

This method is bad for clarity (although most experienced assembly
language programmers would recognize it as a common idiom) and for register
usage, but is good for speed because it gets rid of the test and jump. Since
the` Cdq `

instruction works only with the` Eax `

and```
Edx
```

registers, a variation that will work with any pair of registers
is:

; Eax contains integer. Mov Edx,Eax ; Copy integer. Sar Edx,31 ; Edx = -1 if Eax negative, Edx = 0 otherwise. Xor Eax,Edx ; Eax = -integer-1 if Eax negative, Eax = integer otherwise. Sub Eax,Edx ; Eax = -integer if Eax negative, Eax = integer otherwise. ; Eax contains abs(integer), Edx = -1 if Eax was negative, Edx = 0 otherwise.

And as a final thought, sometimes the full absolute value isn't needed. See, for example, Divide a signed dividend by an unsigned divisor. For that tip/trick, if the integer is positive or zero we want to leave the integer unchanged (just as for the absolute value). However, if the integer is negative what we want is the absolute value minus 1. So, we need only the first three instructions above:

; Eax contains integer. Mov Edx,Eax ; Copy integer. Sar Edx,31 ; Edx = -1 if Eax negative, Edx = 0 otherwise. Xor Eax,Edx ; Eax = -integer-1 if Eax negative, Eax = integer otherwise. ; Eax contains result, Edx = -1 if Eax was negative, Edx = 0 otherwise.

Convert from binary to
ASCII hex |

Here's a clever trick that I read about in the
(http://asmjournal.freeservers.com/ broken link) Assembly Programming Journal,
February/March 1999. Suppose you have a binary number from 0h to 0Fh
in` Al`

, and you want to convert it to an ASCII character from "0" to
"9" or "A" to "F". The following sequence of four instructions (six bytes) will
do it:

; Al is from 0h to 0Fh. Add Al,90h ; Al is from 90h to 09Fh. Daa ; Al is from 90h to 99h (carry clear) or 00h to 06h (carry set). Adc Al,40h ; Al is from 0D0h to 0D9h or 41h to 46h. Daa ; Al is from 30h to 39h ("0" to "9") or 41h to 46h ("A" to "F").

Divide a signed
dividend by an unsigned divisor |

Suppose you want to divide a signed dividend (numerator) by a non-zero unsigned divisor (denominator), to get a signed quotient and a non-negative remainder. In other words, given Num and Den where

-2^31 <= Num <= 2^31-1 and 0 < Den <= 2^32-1

you would like to calculate Quo and Rem such that

Num = Quo*Den + Rem and 0 <= Rem < Den.

Unfortunately, the` Idiv `

instruction can't do this kind of
division, because if Num is negative,` Idiv `

will return a negative
or zero remainder. Also, the` Idiv `

instruction would require
narrowing the range of Den to 0 < Den <= 2^31-1.

Here is the algorithm I use. The basic idea is to convert Num to a positive
value if it is negative, perform unsigned division to obtain an intermediate
Quo' and Rem', and then correct Quo' and Rem' if Num was originally negative.
The algorithm involves no jump instructions, and so won't break the pipeline.
It requires two extra registers in addition to` Eax `

and```
Edx
```

. The two extra registers can be any two of` Ebx`

,```
Ecx
```

,` Esi`

,` Edi`

, and` Ebp`

. In the
following code snippet, I use` Esi `

and` Edi`

.

; Eax = Num; Esi = Den. Mov Edi,Eax ; Copy Num. Sar Edi,31 ; If Num < 0 then Edi = -1 else Edi = 0. Xor Edx,Edx ; Zero hi dword of dividend. Xor Eax,Edi ; If Num < 0 then Eax = -Num-1 else Eax = Num. Div Esi ; If Num < 0 then Quo'*Den + Rem' = -Num-1 ; else Quo'*Den + Rem' = Num. Xor Eax,Edi ; If Num < 0 then Eax=Quo=-Quo'-1 else Eax=Quo=Quo'. Xor Edx,Edi ; If Num < 0 then Edx = -Rem'-1 else Edx = Rem'. And Edi,Esi ; If Num < 0 then Edi = Den else Edi = 0. Add Edx,Edi ; If Num < 0 then Edx=Rem=Den-Rem'-1 else Edx=Rem=Rem'. ; Eax = Quo; Edx = Rem; Esi = Den; Edi = Den if Num < 0 else Edi = 0.

You can save a few cycles if you know that -2^15 <= Num <= 2^15-1 and 0
< Den <= 2^16-1 (i.e. that Num is a signed 16-bit integer and Den is a
non-zero unsigned 16-bit integer). If so, change` Div Esi `

to```
Div Si
```

to use faster 16-bit division, and leave the other instructions
unchanged.

To show that the algorithm works as advertised, first note that in the case
of a non-negative Num the algorithm reduces to simple unsigned division so there
is nothing to prove. In the case of a negative Num, we first note that -Num-1
is non-negative if Num is negative. Hence, the result of the```
Div
```

instruction is a pair of non-negative intermediate numbers Quo' and Rem'
which satisfy

-Num-1 = Quo'*Den + Rem' and 0 <= Rem' < Den.

We then calculate the true Quo and Rem using

Quo = -Quo'-1 and Rem = Den-Rem'-1.

Solving for Quo' and Rem' in terms of Quo and Rem, and plugging into the equation for -Num-1 gives

-Num-1 = (-Quo-1)*Den + (Den-Rem-1)

which simplifies to

Num = Quo*Den + Rem

Since 0 <= Rem' < Den, then 0 <= Den-Rem'-1 < Den, and so 0 <= Rem < Den, which shows that the algorithm works as advertised.

Use the parity flag to
test three values with one instruction |

One bit can store two values: 0b or 1b. Two bits can store four values: 00b, 01b, 10b, or 11b. Sometimes, however, you only need to store three values. If so, you can take advantage of the parity flag to check for the three values using a single instruction. If you let the three values be 00b, 11b, and 01b (or 10b), then you can use the Jpo (jump on parity odd) instruction to do something like:

Test Al,11b ; Test bottom two bits of Al. Jpo Value01bOr10b ; 01b or 10b in bottom two bits. Jnz Value11b ; 11b in bottom two bits. ... ; Fall through to here if 00b in bottom two bits.

Note that the order of the jump instructions is important, and also note the
Parity flag gotcha below -
both of the bits you are testing must be in the same byte. Putting the value
into the` Al `

register saves a byte of code, because testing
the` Al `

register takes only two bytes of code, while testing any
other eight-bit register takes three bytes of code. You could have used a```
Test Eax,03h
```

instruction, but that takes five bytes of code, and testing
any other 32-bit register takes six bytes of code.

Parity flag
gotcha |

Instructions that alter the parity flag clear it to 0 if there are an odd
number of bits in the lowest order byte, and set it to 1 if there are an even
number of bits in the lowest order byte. That may seem backwards, but the flag
is technically a "parity even" flag so it gets set if there are an even number
of bits in the lowest order byte. The gotcha, however, is in the words "lowest
order byte". Suppose the` Eax `

register contains the value```
0101h
```

(i.e. 1 in the low order byte, and 1 in the second lowest order
byte). If you test the whole` Eax `

register using the```
Test
Eax,Eax
```

instruction, you might expect that the parity flag would get set
to 1 because there are an even number of bits in the` Eax `

register.
However, the parity flag gets cleared to 0 because the parity flag depends only
on the number of bits in the lowest order byte. This is always true for any
instruction that alters the parity flag; only the bits in the lowest order byte
affect the parity flag. So, when using the parity flag for a test, it is often
best to use eight-bit registers or single byte memory operands.

Copyright © 1997-2013 by David Parker. All rights reserved.