Advanced MS-DOS Programming ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ Advanced MS-DOS Programming The Microsoft(R) Guide for Assembly Language and C Programmers By Ray Duncan ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ PUBLISHED BY Microsoft Press A Division of Microsoft Corporation 16011 NE 36th Way, Box 97017, Redmond, Washington 98073-9717 Copyright (C) 1986, 1988 by Ray Duncan Published 1986. Second edition 1988. All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher. Library of Congress Cataloging in Publication Data Duncan, Ray, 1952- Advanced MS-DOS programming. Rev. ed. of: Advanced MS-DOS. (C)1986. Includes index. 1. MS-DOS (Computer operating system) 2. Assembler language (Computer program language) 3. C (Computer program language) I. Duncan, Ray, 1952- Advanced MS-DOS. II. Title. QA76.76.063D858 1988 005.4'46 88-1251 ISBN 1-55615-157-8 Printed and bound in the United States of America. 1 2 3 4 5 6 7 8 9 FGFG 3 2 1 0 9 8 Distributed to the book trade in the United States by Harper & Row. Distributed to the book trade in Canada by General Publishing Company, Ltd. Penguin Books Ltd., Harmondworth, Middlesex, England Penguin Books Australia Ltd., Ringwood, Victoria, Australia Penguin Books N.Z. Ltd., 182-190 Wairu Road, Auckland 10, New Zealand British Cataloging in Publication Data available IBM(R), PC/AT(R), and PS/2(R) are registered trademarks of International Business Machines Corporation. CodeView(R), Microsoft(R), MS-DOS(R), and XENIX(R) are registered trademarks and InPort TM is a trademark of Microsoft Corporation. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Technical Editor: Mike Halvorson Production Editor: Mary Ann Jones ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Dedication For Carolyn ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Contents Road Map to Figures and Tables Acknowledgments Introduction SECTION 1 PROGRAMMING FOR MS-DOS Chapter 1 Genealogy of MS-DOS Chapter 2 MS-DOS in Operation Chapter 3 Structure of MS-DOS Application Programs Chapter 4 MS-DOS Programming Tools Chapter 5 Keyboard and Mouse Input Chapter 6 Video Display Chapter 7 Printer and Serial Port Chapter 8 File Management Chapter 9 Volumes and Directories Chapter 10 Disk Internals Chapter 11 Memory Management Chapter 12 The EXEC Function Chapter 13 Interrupt Handlers Chapter 14 Installable Device Drivers Chapter 15 Filters Chapter 16 Compatibility and Portability SECTION 2 MS-DOS FUNCTIONS REFERENCE SECTION 3 IBM ROM BIOS AND MOUSE FUNCTIONS REFERENCE SECTION 4 LOTUS/INTEL/MICROSOFT EMS FUNCTIONS REFERENCE Index ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Road Map to Figures and Tables MS-DOS versions and release dates MS-DOS memory map Structure of program segment prefix (PSP) Structure of .EXE load module Register conditions at program entry Segments, groups, and classes Macro Assembler switches C Compiler switches Linker switches MAKE switches ANSI escape sequences Video attributes Structure of normal file control block (FCB) Structure of extended file control block MS-DOS error codes Structure of boot sector Structure of directory entry Structure of fixed-disk master block LIM EMS error codes Intel 80x86 internal interrupts (faults) Intel 80x86, MS-DOS, and ROM BIOS interrupts Device-driver attribute word Device-driver command codes Structure of BIOS parameter block (BPB) Media descriptor byte ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Acknowledgments My renewed thanks to the outstanding editors and production staff at Microsoft Press, who make beautiful books happen, and to the talented Microsoft developers, who create great programs to write books about. Special thanks to Mike Halvorson, Jeff Hinsch, Mary Ann Jones, Claudette Moore, Dori Shattuck, and Mark Zbikowski; if this book has anything unique to offer, these people deserve most of the credit. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Introduction Advanced MS-DOS Programming is written for the experienced C or assembly-language programmer. It provides all the information you need to write robust, high-performance applications under the MS-DOS operating system. Because I believe that working, well-documented programs are unbeatable learning tools, I have included detailed programming examples throughoutÄÄincluding complete utility programs that you can adapt to your own needs. This book is both a tutorial and a reference and is divided into four sections, so that you can find information more easily. Section 1 discusses MS-DOS capabilities and services by functional group in the context of common programming issues, such as user input, control of the display, memory management, and file handling. Special classes of programs, such as interrupt handlers, device drivers, and filters, have their own chapters. Section 2 provides a complete reference guide to MS-DOS function calls, organized so that you can see the calling sequence, results, and version dependencies of each function at a glance. I have also included notes, where relevant, about quirks and special uses of functions as well as cross-references to related functions. An assembly-language example is included for each entry in Section 2. Sections 3 and 4 are references to IBM ROM BIOS, Microsoft Mouse driver, and Lotus/Intel/Microsoft Expanded Memory Specification functions. The entries in these two sections have the same form as in Section 2, except that individual programming examples have been omitted. The programs in this book were written with the marvelous Brief editor from Solution Systems and assembled or compiled with Microsoft Macro Assembler version 5.1 and Microsoft C Compiler version 5.1. They have been tested under MS-DOS versions 2.1, 3.1, 3.3, and 4.0 on an 8088-based IBM PC, an 80286-based IBM PC/AT, and an 80386-based IBM PS/2 Model 80. As far as I am aware, they do not contain any software or hardware dependencies that will prevent them from running properly on any IBM PCÄcompatible machine running MS-DOS version 2.0 or later. Changes from the First Edition Readers who are familiar with the first edition will find many changes in the second edition, but the general structure of the book remains the same. Most of the material comparing MS-DOS to CP/M and UNIX/XENIX has been removed; although these comparisons were helpful a few years ago, MS-DOS has become its own universe and deserves to be considered on its own terms. The previously monolithic chapter on character devices has been broken into three more manageable chapters focusing on the keyboard and mouse, the display, and the serial port and printer. Hardware-dependent video techniques have been de-emphasized; although this topic is more important than ever, it has grown so complex that it requires a book of its own. A new chapter discusses compatibility and portability of MS-DOS applications and also contains a brief introduction to Microsoft OS/2, the new multitasking, protected-mode operating system. A road map to vital figures and tables has been added, following the Table of Contents, to help you quickly locate the layouts of the program segment prefix, file control block, and the like. The reference sections at the back of the book have been extensively updated and enlarged and are now complete through MS-DOS version 4.0, the IBM PS/2 Model 80 ROM BIOS and the VGA video adapter, the Microsoft Mouse driver version 6.0, and the Lotus/Intel/Microsoft Expanded Memory Specification version 4.0. In the two years since Advanced MS-DOS Programming was first published, hundreds of readers have been kind enough to send me their comments, and I have tried to incorporate many of their suggestions in this new edition. As before, please feel free to contact me via MCI Mail (user name LMI), CompuServe (user ID 72406,1577), or BIX (user name rduncan). Ray Duncan Los Angeles, California September 1988 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ SECTION 1 PROGRAMMING FOR MS-DOS ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 1 Genealogy of MS-DOS In only seven years, MS-DOS has evolved from a simple program loader into a sophisticated, stable operating system for personal computers that are based on the Intel 8086 family of microprocessors (Figure 1-1). MS-DOS supports networking, graphical user interfaces, and storage devices of every description; it serves as the platform for thousands of application programs; and it has over 10 million licensed usersÄÄdwarfing the combined user bases of all of its competitors. The progenitor of MS-DOS was an operating system called 86-DOS, which was written by Tim Paterson for Seattle Computer Products in mid-1980. At that time, Digital Research's CP/M-80 was the operating system most commonly used on microcomputers based on the Intel 8080 and Zilog Z-80 microprocessors, and a wide range of application software (word processors, database managers, and so forth) was available for use with CP/M-80. To ease the process of porting 8-bit CP/M-80 applications into the new 16-bit environment, 86-DOS was originally designed to mimic CP/M-80 in both available functions and style of operation. Consequently, the structures of 86-DOS's file control blocks, program segment prefixes, and executable files were nearly identical to those of CP/M-80. Existing CP/M-80 programs could be converted mechanically (by processing their source-code files through a special translator program) and, after conversion, would run under 86-DOS either immediately or with very little hand editing. Because 86-DOS was marketed as a proprietary operating system for Seattle Computer Products' line of S-100 bus, 8086-based microcomputers, it made very little impact on the microcomputer world in general. Other vendors of 8086-based microcomputers were understandably reluctant to adopt a competitor's operating system and continued to wait impatiently for the release of Digital Research's CP/M-86. In October 1980, IBM approached the major microcomputer-software houses in search of an operating system for the new line of personal computers it was designing. Microsoft had no operating system of its own to offer (other than a stand-alone version of Microsoft BASIC) but paid a fee to Seattle Computer Products for the right to sell Paterson's 86-DOS. (At that time, Seattle Computer Products received a license to use and sell Microsoft's languages and all 8086 versions of Microsoft's operating system.) In July 1981, Microsoft purchased all rights to 86-DOS, made substantial alterations to it, and renamed it MS-DOS. When the first IBM PC was released in the fall of 1981, IBM offered MS-DOS (referred to as PC-DOS 1.0) as its primary operating system. IBM also selected Digital Research's CP/M-86 and Softech's P-system as alternative operating systems for the PC. However, they were both very slow to appear at IBM PC dealers and suffered the additional disadvantages of higher prices and lack of available programming languages. IBM threw its considerable weight behind PC-DOS by releasing all the IBM-logo PC application software and development tools to run under it. Consequently, most third-party software developers targeted their products for PC-DOS from the start, and CP/M-86 and P-system never became significant factors in the IBM PCÄcompatible market. In spite of some superficial similarities to its ancestor CP/M-80, MS-DOS version 1.0 contained a number of improvements over CP/M-80, including the following: þ An improved disk-directory structure that included information about a file's attributes (such as whether it was a system or a hidden file), its exact size in bytes, and the date that the file was created or last modified þ A superior disk-space allocation and management method, allowing extremely fast sequential or random record access and program loading þ An expanded set of operating-system services, including hardware-independent function calls to set or read the date and time, a filename parser, multiple-block record I/O, and variable record sizes þ An AUTOEXEC.BAT batch file to perform a user-defined series of commands when the system was started or reset IBM was the only major computer manufacturer (sometimes referred to as OEM, for original equipment manufacturer) to ship MS-DOS version 1.0 (as PC-DOS 1.0) with its products. MS-DOS version 1.25 (equivalent to IBM PC-DOS 1.1) was released in June 1982 to fix a number of bugs and also to support double-sided disks and improved hardware independence in the DOS kernel. This version was shipped by several vendors besides IBM, including Texas Instruments, COMPAQ, and Columbia, who all entered the personal computer market early. Due to rapid decreases in the prices of RAM and fixed disks, MS-DOS version 1 is no longer in common use. MS-DOS version 2.0 (equivalent to PC-DOS 2.0) was first released in March 1983. It was, in retrospect, a new operating system (though great care was taken to maintain compatibility with MS-DOS version 1). It contained many significant innovations and enhanced features, including those listed on the following page. þ Support for both larger-capacity floppy disks and hard disks þ Many UNIX/XENIX-like features, including a hierarchical file structure, file handles, I/O redirection, pipes, and filters þ Background printing (print spooling) þ Volume labels, plus additional file attributes þ Installable device drivers þ A user-customizable system-configuration file that controlled the loading of additional device drivers, the number of system disk buffers, and so forth þ Maintenance of environment blocks that could be used to pass information between programs þ An optional ANSI display driver that allowed programs to position the cursor and control display characteristics in a hardware-independent manner þ Support for the dynamic allocation, modification, and release of memory by application programs þ Support for customized user command interpreters (shells) þ System tables to assist application software in modifying its currency, time, and date formats (known as international support) MS-DOS version 2.11 was subsequently released to improve international support (table-driven currency symbols, date formats, decimal-point symbols, currency separators, and so forth), to add support for 16-bit Kanji characters throughout, and to fix a few minor bugs. Version 2.11 rapidly became the base version shipped for 8086/8088-based personal computers by every major OEM, including Hewlett-Packard, Wang, Digital Equipment Corporation, Texas Instruments, COMPAQ, and Tandy. MS-DOS version 2.25, released in October 1985, was distributed in the Far East but was never shipped by OEMs in the United States and Europe. In this version, the international support for Japanese and Korean character sets was extended even further, additional bugs were repaired, and many of the system utilities were made compatible with MS-DOS version 3.0. MS-DOS version 3.0 was introduced by IBM in August 1984 with the release of the 80286-based PC/AT machines. It represented another major rewrite of the entire operating system and included the important new features listed on the following page. þ Direct control of the print spooler by application software þ Further expansion of international support for currency formats þ Extended error reporting, including a code that suggests a recovery strategy to the application program þ Support for file and record locking and sharing þ Support for larger fixed disks MS-DOS version 3.1, which was released in November 1984, added support for the sharing of files and printers across a network. Beginning with version 3.1, a new operating-system module called the redirector intercepts an application program's requests for I/O and filters out the requests that are directed to network devices, passing these requests to another machine for processing. Since version 3.1, the changes to MS-DOS have been evolutionary rather than revolutionary. Version 3.2, which appeared in 1986, generalized the definition of device drivers so that new media types (such as 3.5-inch floppy disks) could be supported more easily. Version 3.3 was released in 1987, concurrently with the new IBM line of PS/2 personal computers, and drastically expanded MS-DOS's multilanguage support for keyboard mappings, printer character sets, and display fonts. Version 4.0, delivered in 1988, was enhanced with a visual shell as well as support for very large file systems. While MS-DOS has been evolving, Microsoft has also put intense efforts into the areas of user interfaces and multitasking operating systems. Microsoft Windows, first shipped in 1985, provides a multitasking, graphical user "desktop" for MS-DOS systems. Windows has won widespread support among developers of complex graphics applications such as desktop publishing and computer-aided design because it allows their programs to take full advantage of whatever output devices are available without introducing any hardware dependence. Microsoft Operating System/2 (MS OS/2), released in 1987, represents a new standard for application developers: a protected-mode, multitasking, virtual-memory system specifically designed for applications requiring high-performance graphics, networking, and interprocess communications. Although MS OS/2 is a new product and is not a derivative of MS-DOS, its user interface and file system are compatible with MS-DOS and Microsoft Windows, and it offers the ability to run one real-mode (MS-DOS) application alongside MS OS/2 protected-mode applications. This compatibility allows users to move between the MS-DOS and OS/2 environments with a minimum of difficulty. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MS-DOS 1.0 ³ 1981: First operating system on IBM PC ³ PC-DOS 1.0 ³ ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MS-DOS 1.25 ³ Double-sided disk support and bug fixes added: ³ PC-DOS 1.1 ³ widely distributed by OEMs other than IBM ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1983: Introduced with IBM PC/XT; ³ MS-DOS 2.0 ³ support for UNIX/XENIX-like hierarchical ³ PC-DOS 2.0 ³ file structure and hard disks added ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MS-DOS 2.01 ³ 2.0 with international ³ PC-DOS 2.1 ³ Introduced with PCjr; ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ support ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 2.0 with bug fixes ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MS-DOS 2.11 ³ 2.01 with bug fixes ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1984: Introduced with ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1985: Far East OEMs; ³ MS-DOS 3.0 ³ PC/AT; support for ³ MS-DOS 2.25 ³ support for extended ³ PC-DOS 3.0 ³ 1.2 MB floppy disk, ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ character sets ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ larger hard disk added ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MS-DOS 3.1 ³ Support for Microsoft ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1985: Graphical ³ PC-DOS 3.1 ³ Networks added ³ Windows ³ user interface ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ³ 1.0 ³ for MS-DOS ³ ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ MS-DOS 3.2 ³ 1986: Support for 3.5- ³ ³ PC-DOS 3.2 ³ inch disks added ³ ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1987: Compatibility ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1987: Introduced with ³ Windows ³ with OS/2 ³ MS-DOS 3.3 ³ IBM PS/2; generalized ³ 2.0 ³ Presentation Manager ³ PC-DOS 3.3 ³ code-page (font) ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÂÄÄÄÄÄÄÙ support ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ 1988: Support for ³ MS-DOS 4.0 ³ logical volumes larger ³ PC-DOS 4.0 ³ than 32 MB; visual shell ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 1-1. The evolution of MS-DOS. What does the future hold for MS-DOS? Only the long-range planning teams at Microsoft and IBM know for sure. But it seems safe to assume that MS-DOS, with its relatively small memory requirements, adaptability to diverse hardware configurations, and enormous base of users, will remain important to programmers and software publishers for years to come. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 2 MS-DOS in Operation It is unlikely that you will ever be called upon to configure the MS-DOS software for a new model of computer. Still, an acquaintance with the general structure of MS-DOS can often be very helpful in understanding the behavior of the system as a whole. In this chapter, we will discuss how MS-DOS is organized and how it is loaded into memory when the computer is turned on. The Structure of MS-DOS MS-DOS is partitioned into several layers that serve to isolate the kernel logic of the operating system, and the user's perception of the system, from the hardware it is running on. These layers are þ The BIOS (Basic Input/Output System) þ The DOS kernel þ The command processor (shell) We'll discuss the functions of each of these layers separately. The BIOS Module The BIOS is specific to the individual computer system and is provided by the manufacturer of the system. It contains the default resident hardware-dependent drivers for the following devices: þ Console display and keyboard (CON) þ Line printer (PRN) þ Auxiliary device (AUX) þ Date and time (CLOCK$) þ Boot disk device (block device) The MS-DOS kernel communicates with these device drivers through I/O request packets; the drivers then translate these requests into the proper commands for the various hardware controllers. In many MS-DOS systems, including the IBM PC, the most primitive parts of the hardware drivers are located in read-only memory (ROM) so that they can be used by stand-alone applications, diagnostics, and the system startup program. The terms resident and installable are used to distinguish between the drivers built into the BIOS and the drivers installed during system initialization by DEVICE commands in the CONFIG.SYS file. (Installable drivers will be discussed in more detail later in this chapter and in Chapter 14.) The BIOS is read into random-access memory (RAM) during system initialization as part of a file named IO.SYS. (In PC-DOS, the file is called IBMBIO.COM.) This file is marked with the special attributes hidden and system. The DOS Kernel The DOS kernel implements MS-DOS as it is seen by application programs. The kernel is a proprietary program supplied by Microsoft Corporation and provides a collection of hardware-independent services called system functions. These functions include the following: þ File and record management þ Memory management þ Character-device input/output þ Spawning of other programs þ Access to the real-time clock Programs can access system functions by loading registers with function-specific parameters and then transferring to the operating system by means of a software interrupt. The DOS kernel is read into memory during system initialization from the MSDOS.SYS file on the boot disk. (The file is called IBMDOS.COM in PC-DOS.) This file is marked with the attributes hidden and system. The Command Processor The command processor, or shell, is the user's interface to the operating system. It is responsible for parsing and carrying out user commands, including the loading and execution of other programs from a disk or other mass-storage device. The default shell that is provided with MS-DOS is found in a file called COMMAND.COM. Although COMMAND.COM prompts and responses constitute the ordinary user's complete perception of MS-DOS, it is important to realize that COMMAND.COM is not the operating system, but simply a special class of program running under the control of MS-DOS. COMMAND.COM can be replaced with a shell of the programmer's own design by simply adding a SHELL directive to the system-configuration file (CONFIG.SYS) on the system startup disk. The product COMMAND-PLUS from ESP Systems is an example of such an alternative shell. More about COMMAND.COM The default MS-DOS shell, COMMAND.COM, is divided into three parts: þ A resident portion þ An initialization section þ A transient module The resident portion is loaded in lower memory, above the DOS kernel and its buffers and tables. It contains the routines to process Ctrl-C and Ctrl-Break, critical errors, and the termination (final exit) of other transient programs. This part of COMMAND.COM issues error messages and is responsible for the familiar prompt Abort, Retry, Ignore? The resident portion also contains the code required to reload the transient portion of COMMAND.COM when necessary. The initialization section of COMMAND.COM is loaded above the resident portion when the system is started. It processes the AUTOEXEC.BAT batch file (the user's list of commands to execute at system startup), if one is present, and is then discarded. The transient portion of COMMAND.COM is loaded at the high end of memory, and its memory can also be used for other purposes by application programs. The transient module issues the user prompt, reads the commands from the keyboard or batch file, and causes them to be executed. When an application program terminates, the resident portion of COMMAND.COM does a checksum of the transient module to determine whether it has been destroyed and fetches a fresh copy from the disk if necessary. The user commands that are accepted by COMMAND.COM fall into three categories: þ Internal commands þ External commands þ Batch files Internal commands, sometimes called intrinsic commands, are those carried out by code embedded in COMMAND.COM itself. Commands in this category include COPY, REN(AME), DIR(ECTORY), and DEL(ETE). The routines for the internal commands are included in the transient part of COMMAND.COM. External commands, sometimes called extrinsic commands or transient programs, are the names of programs stored in disk files. Before these programs can be executed, they must be loaded from the disk into the transient program area (TPA) of memory. (See "How MS-DOS Is Loaded" in this chapter.) Familiar examples of external commands are CHKDSK, BACKUP, and RESTORE. As soon as an external command has completed its work, it is discarded from memory; hence, it must be reloaded from disk each time it is invoked. Batch files are text files that contain lists of other intrinsic, extrinsic, or batch commands. These files are processed by a special interpreter that is built into the transient portion of COMMAND.COM. The interpreter reads the batch file one line at a time and carries out each of the specified operations in order. In order to interpret a user's command, COMMAND.COM first looks to see if the user typed the name of a built-in (intrinsic) command that it can carry out directly. If not, it searches for an external command (executable program file) or batch file by the same name. The search is carried out first in the current directory of the current disk drive and then in each of the directories specified in the most recent PATH command. In each directory inspected, COMMAND.COM first tries to find a file with the extension .COM, then .EXE, and finally .BAT. If the search fails for all three file types in all of the possible locations, COMMAND.COM displays the familiar message Bad command or file name If a .COM file or a .EXE file is found, COMMAND.COM uses the MS-DOS EXEC function to load and execute it. The EXEC function builds a special data structure called a program segment prefix (PSP) above the resident portion of COMMAND.COM in the transient program area. The PSP contains various linkages and pointers needed by the application program. Next, the EXEC function loads the program itself, just above the PSP, and performs any relocation that may be necessary. Finally, it sets up the registers appropriately and transfers control to the entry point for the program. (Both the PSP and the EXEC function will be discussed in more detail in Chapters 3 and 12.) When the transient program has finished its job, it calls a special MS-DOS termination function that releases the transient program's memory and returns control to the program that caused the transient program to be loaded (COMMAND.COM, in this case). A transient program has nearly complete control of the system's resources while it is executing. The only other tasks that are accomplished are those performed by interrupt handlers (such as the keyboard input driver and the real-time clock) and operations that the transient program requests from the operating system. MS-DOS does not support sharing of the central processor among several tasks executing concurrently, nor can it wrest control away from a program when it crashes or executes for too long. Such capabilities are the province of MS OS/2, which is a protected-mode system with preemptive multitasking (time-slicing). How MS-DOS Is Loaded When the system is started or reset, program execution begins at address 0FFFF0H. This is a feature of the 8086/8088 family of microprocessors and has nothing to do with MS-DOS. Systems based on these processors are designed so that address 0FFFF0H lies within an area of ROM and contains a jump machine instruction to transfer control to system test code and the ROM bootstrap routine (Figure 2-1). The ROM bootstrap routine reads the disk bootstrap routine from the first sector of the system startup disk (the boot sector) into memory at some arbitrary address and then transfers control to it (Figure 2-2). (The boot sector also contains a table of information about the disk format.) The disk bootstrap routine checks to see if the disk contains a copy of MS-DOS. It does this by reading the first sector of the root directory and determining whether the first two files are IO.SYS and MSDOS.SYS (or IBMBIO.COM and IBMDOS.COM), in that order. If these files are not present, the user is prompted to change disks and strike any key to try again. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ROM bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  Top of RAM ³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ ³ ³ ³ ³ 00400H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Interrupt vectors ³ 00000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 2-1. A typical 8086/8088-based computer system immediately after system startup or reset. Execution begins at location 0FFFF0H, which contains a jump instruction that directs program control to the ROM bootstrap routine. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ROM bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  Top of RAM ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Disk bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  Arbitrary ³ ³ load location ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ ³ ³ 00400H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Interrupt vectors ³ 00000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 2-2. The ROM bootstrap routine loads the disk bootstrap routine into memory from the first sector of the system startup disk and then transfers control to it. If the two system files are found, the disk bootstrap reads them into memory and transfers control to the initial entry point of IO.SYS (Figure 2-3). (In some implementations, the disk bootstrap reads only IO.SYS into memory, and IO.SYS in turn loads the MSDOS.SYS file.) The IO.SYS file that is loaded from the disk actually consists of two separate modules. The first is the BIOS, which contains the linked set of resident device drivers for the console, auxiliary port, printer, block, and clock devices, plus some hardware-specific initialization code that is run only at system startup. The second module, SYSINIT, is supplied by Microsoft and linked into the IO.SYS file, along with the BIOS, by the computer manufacturer. SYSINIT is called by the manufacturer's BIOS initialization code. It determines the amount of contiguous memory present in the system and then relocates itself to high memory. Then it moves the DOS kernel, MSDOS.SYS, from its original load location to its final memory location, overlaying the original SYSINIT code and any other expendable initialization code that was contained in the IO.SYS file (Figure 2-4). Next, SYSINIT calls the initialization code in MSDOS.SYS. The DOS kernel initializes its internal tables and work areas, sets up the interrupt vectors 20H through 2FH, and traces through the linked list of resident device drivers, calling the initialization function for each. (See Chapter 14.) ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ROM bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  Top of RAM ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Disk bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ DOS kernel (from MSDOS.SYS) ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  In temporary ³ SYSINIT (from IO.SYS) ³ location ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ BIOS (from IO.SYS) ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ 00400H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Interrupt vectors ³ 00000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 2-3. The disk bootstrap reads the file IO.SYS into memory. This file contains the MS-DOS BIOS (resident device drivers) and the SYSINIT module. Either the disk bootstrap or the BIOS (depending upon the manufacturer's implementation) then reads the DOS kernel into memory from the MSDOS.SYS file. These driver functions determine the equipment status, perform any necessary hardware initialization, and set up the vectors for any external hardware interrupts the drivers will service. As part of the initialization sequence, the DOS kernel examines the disk-parameter blocks returned by the resident block-device drivers, determines the largest sector size that will be used in the system, builds some drive-parameter blocks, and allocates a disk sector buffer. Control then returns to SYSINIT. When the DOS kernel has been initialized and all resident device drivers are available, SYSINIT can call on the normal MS-DOS file services to open the CONFIG.SYS file. This optional file can contain a variety of commands that enable the user to customize the MS-DOS environment. For instance, the user can specify additional hardware device drivers, the number of disk buffers, the maximum number of files that can be open at one time, and the filename of the command processor (shell). If it is found, the entire CONFIG.SYS file is loaded into memory for processing. All lowercase characters are converted to uppercase, and the file is interpreted one line at a time to process the commands. Memory is allocated for the disk buffer cache and the internal file control blocks used by the handle file and record system functions. (See Chapter 8.) Any device drivers indicated in the CONFIG.SYS file are sequentially loaded into memory, initialized by calls to their init modules, and linked into the device-driver list. The init function of each driver tells SYSINIT how much memory to reserve for that driver. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ROM bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  Top of RAM ³ SYSINIT module ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Installable drivers ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File control blocks ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Disk buffer cache ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ DOS kernel ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  In final ³ BIOS ³ location ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 00400H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Interrupt vectors ³ 00000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 2-4. SYSINIT moves itself to high memory and relocates the DOS kernel, MSDOS.SYS, downward to its final address. The MS-DOS disk buffer cache and file control block areas are allocated, and then the installable device drivers specified in the CONFIG.SYS file are loaded and linked into the system. After all installable device drivers have been loaded, SYSINIT closes all file handles and reopens the console (CON), printer (PRN), and auxiliary (AUX) devices as the standard input, standard output, standard error, standard list, and standard auxiliary devices. This allows a user-installed character-device driver to override the BIOS's resident drivers for the standard devices. Finally, SYSINIT calls the MS-DOS EXEC function to load the command interpreter, or shell. (The default shell is COMMAND.COM, but another shell can be substituted by means of the CONFIG.SYS file.) Once the shell is loaded, it displays a prompt and waits for the user to enter a command. MS-DOS is now ready for business, and the SYSINIT module is discarded (Figure 2-5). ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ROM bootstrap routine ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´  Top of RAM ³ Transient part of COMMAND.COM ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Transient program area ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Resident part of COMMAND.COM ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Installable drivers ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File control blocks ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Disk buffer cache ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ DOS kernel ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ BIOS ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ 00400H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Interrupt vectors ³ 00000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 2-5. The final result of the MS-DOS startup process for a typical system. The resident portion of COMMAND.COM lies in low memory, above the DOS kernel. The transient portion containing the batch-file interpreter and intrinsic commands is placed in high memory, where it can be overlaid by extrinsic commands and application programs running in the transient program area. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 3 Structure of MS-DOS Application Programs Programs that run under MS-DOS come in two basic flavors: .COM programs, which have a maximum size of approximately 64 KB, and .EXE programs, which can be as large as available memory. In Intel 8086 parlance, .COM programs fit the tiny model, in which all segment registers contain the same value; that is, the code and data are mixed together. In contrast, .EXE programs fit the small, medium, or large model, in which the segment registers contain different values; that is, the code, data, and stack reside in separate segments. .EXE programs can have multiple code and data segments, which are respectively addressed by long calls and by manipulation of the data segment (DS) register. A .COM-type program resides on the disk as an absolute memory image, in a file with the extension .COM. The file does not have a header or any other internal identifying information. A .EXE program, on the other hand, resides on the disk in a special type of file with a unique header, a relocation map, a checksum, and other information that is (or can be) used by MS-DOS. Both .COM and .EXE programs are brought into memory for execution by the same mechanism: the EXEC function, which constitutes the MS-DOS loader. EXEC can be called with the filename of a program to be loaded by COMMAND.COM (the normal MS-DOS command interpreter), by other shells or user interfaces, or by another program that was previously loaded by EXEC. If there is sufficient free memory in the transient program area, EXEC allocates a block of memory to hold the new program, builds the program segment prefix (PSP) at its base, and then reads the program into memory immediately above the PSP. Finally, EXEC sets up the segment registers and the stack and transfers control to the program. When it is invoked, EXEC can be given the addresses of additional information, such as a command tail, file control blocks, and an environment block; if supplied, this information will be passed on to the new program. (The exact procedure for using the EXEC function in your own programs is discussed, with examples, in Chapter 12.) .COM and .EXE programs are often referred to as transient programs. A transient program "owns" the memory block it has been allocated and has nearly total control of the system's resources while it is executing. When the program terminates, either because it is aborted by the operating system or because it has completed its work and systematically performed a final exit back to MS-DOS, the memory block is then freed (hence the term transient) and can be used by the next program in line to be loaded. The Program Segment Prefix A thorough understanding of the program segment prefix is vital to successful programming under MS-DOS. It is a reserved area, 256 bytes long, that is set up by MS-DOS at the base of the memory block allocated to a transient program. The PSP contains some linkages to MS-DOS that can be used by the transient program, some information MS-DOS saves for its own purposes, and some information MS-DOS passes to the transient programÄÄto be used or not, as the program requires (Figure 3-1). Offset 0000H ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Int 20H ³ 0002H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Segment, end of allocation block ³ 0004H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ 0005H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Long call to MS-DOS function dispatcher ³ 000AH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Previous contents of termination handler ³ ³ interrupt vector (Int 22H) ³ 000EH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Previous contents of Ctrl-C interrupt vector (Int 23H) ³ 0012H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Previous contents of critical-error handler ³ ³ interrupt vector (Int 24H) ³ 0016H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ 002CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Segment address of environment block ³ 002EH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ 005CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Default file control block #1 ³ 006CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Default file control block #2 ³ ³ (overlaid if FCB #1 opened) ³ 008OH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Command tail and default disk transfer area (buffer) ³ OOFFH ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 3-1. The structure of the program segment prefix. In the first versions of MS-DOS, the PSP was designed to be compatible with a control area that was built beneath transient programs under Digital Research's venerable CP/M operating system, so that programs could be ported to MS-DOS without extensive logical changes. Although MS-DOS has evolved considerably since those early days, the structure of the PSP is still recognizably similar to its CP/M equivalent. For example, offset 0000H in the PSP contains a linkage to the MS-DOS process-termination handler, which cleans up after the program has finished its job and performs a final exit. Similarly, offset 0005H in the PSP contains a linkage to the MS-DOS function dispatcher, which performs disk operations, console input/output, and other such services at the request of the transient program. Thus, calls to PSP:0000 and PSP:0005 have the same effect as CALL 0000 and CALL 0005 under CP/M. (These linkages are not the "approved" means of obtaining these services, however.) The word at offset 0002H in the PSP contains the segment address of the top of the transient program's allocated memory block. The program can use this value to determine whether it should request more memory to do its job or whether it has extra memory that it can release for use by other processes. Offsets 000AH through 0015H in the PSP contain the previous contents of the interrupt vectors for the termination, Ctrl-C, and critical-error handlers. If the transient program alters these vectors for its own purposes, MS-DOS restores the original values saved in the PSP when the program terminates. The word at PSP offset 002CH holds the segment address of the environment block, which contains a series of ASCIIZ strings (sequences of ASCII characters terminated by a null, or zero, byte). The environment block is inherited from the program that called the EXEC function to load the currently executing program. It contains such information as the current search path used by COMMAND.COM to find executable programs, the location on the disk of COMMAND.COM itself, and the format of the user prompt used by COMMAND.COM. The command tailÄÄthe remainder of the command line that invoked the transient program, after the program's nameÄÄis copied into the PSP starting at offset 0081H. The length of the command tail, not including the return character at its end, is placed in the byte at offset 0080H. Redirection or piping parameters and their associated filenames do not appear in the portion of the command line (the command tail) that is passed to the transient program, because redirection is transparent to applications. To provide compatibility with CP/M, MS-DOS parses the first two parameters in the command tail into two default file control blocks (FCBs) at PSP:005CH and PSP:006CH, under the assumption that they may be filenames. However, if the parameters are filenames that include a path specification, only the drive code will be valid in these default FCBs, because FCB-type file- and record-access functions do not support hierarchical file structures. Although the default FCBs were an aid in earlier years, when compatibility with CP/M was more of a concern, they are essentially useless in modern MS-DOS application programs that must provide full path support. (File control blocks are discussed in detail in Chapter 8 and hierarchical file structures are discussed in Chapter 9.) The 128-byte area from 0080H through 00FFH in the PSP also serves as the default disk transfer area (DTA), which is set by MS-DOS before passing control to the transient program. If the program does not explicitly change the DTA, any file read or write operations requested with the FCB group of function calls automatically use this area as a data buffer. This is rarely useful and is another facet of MS-DOS's handling of the PSP that is present only for compatibility with CP/M. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ WARNING Programs must not alter any part of the PSP below offset 005CH. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Introduction to .COM Programs Programs of the .COM persuasion are stored in disk files that hold an absolute image of the machine instructions to be executed. Because the files contain no relocation information, they are more compact, and are loaded for execution slightly faster, than equivalent .EXE files. Note that MS-DOS does not attempt to ascertain whether a .COM file actually contains executable code (there is no signature or checksum, as in the case of a .EXE file); it simply brings any file with the .COM extension into memory and jumps to it. Because .COM programs are loaded immediately above the program segment prefix and do not have a header that can specify another entry point, they must always have an origin of 0100H, which is the length of the PSP. Location 0100H must contain an executable instruction. The maximum length of a .COM program is 65,536 bytes, minus the length of the PSP (256 bytes) and a mandatory word of stack (2 bytes). When control is transferred to the .COM program from MS-DOS, all of the segment registers point to the PSP (Figure 3-2). The stack pointer register contains 0FFFEH if memory allows; otherwise, it is set as high as possible in memory minus 2 bytes. (MS-DOS pushes a zero word on the stack before entry.) SS:SP ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ ³ Stack grows downward from top of segment ³ ³ ³ ³ ³  ³ ³  ³ ³ ³ ³ ³ Program code and data ³ ³ ³ CS:0100H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Program segment prefix ³ CS:0000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ DS:0000H ES:0000H SS:0000H Figure 3-2. A memory image of a typical .COM-type program after loading. The contents of the .COM file are brought into memory just above the program segment prefix. Program, code, and data are mixed together in the same segment, and all segment registers contain the same value. Although the size of an executable .COM file can't exceed 64 KB, the current versions of MS-DOS allocate all of the transient program area to .COM programs when they are loaded. Because many such programs date from the early days of MS-DOS and are not necessarily "well-behaved" in their approach to memory management, the operating system simply makes the worst-case assumption and gives .COM programs everything that is available. If a .COM program wants to use the EXEC function to invoke another process, it must first shrink down its memory allocation to the minimum memory it needs in order to continue, taking care to protect its stack. (This is discussed in more detail in Chapter 12.) When a .COM program finishes executing, it can return control to MS-DOS by several means. The preferred method is Int 21H Function 4CH, which allows the program to pass a return code back to the program, shell, or batch file that invoked it. However, if the program is running under MS-DOS version 1, it must exit by means of Int 20H, Int 21H Function 0, or a NEAR RETURN. (Because a word of zero was pushed onto the stack at entry, a NEAR RETURN causes a transfer to PSP:0000, which contains an Int 20H instruction.) A .COM-type application can be linked together from many separate object modules. All of the modules must use the same code-segment name and class name, and the module with the entry point at offset 0100H within the segment must be linked first. In addition, all of the procedures within a .COM program should have the NEAR attribute, because all executable code resides in one segment. When linking a .COM program, the linker will display the message Warning: no stack segment This message can be ignored. The linker output is a .EXE file, which must be converted into a .COM file with the MS-DOS EXE2BIN utility before execution. You can then delete the .EXE file. (An example of this process is provided in Chapter 4.) An Example .COM Program The HELLO.COM program listed in Figure 3-3 demonstrates the structure of a simple assembly-language program that is destined to become a .COM file. (You may find it helpful to compare this listing with the HELLO.EXE program later in this chapter.) Because this program is so short and simple, a relatively high proportion of the source code is actually assembler directives that do not result in any executable code. The NAME statement simply provides a module name for use during the linkage process. This aids understanding of the map that the linker produces. In MASM versions 5.0 and later, the module name is always the same as the filename, and the NAME statement is ignored. The PAGE command, when used with two operands, as in line 2, defines the length and width of the page. These default respectively to 66 lines and 80 characters. If you use the PAGE command without any operands, a formfeed is sent to the printer and a heading is printed. In larger programs, use the PAGE command liberally to place each of your subroutines on separate pages for easy reading. The TITLE command, in line 3, specifies the text string (limited to 60 characters) that is to be printed at the upper left corner of each page. The TITLE command is optional and cannot be used more than once in each assembly-language source file. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 1: name hello 2: page 55,132 3: title HELLO.COM--print hello on terminal 4: 5: ; 6: ; HELLO.COM: demonstrates various components 7: ; of a functional .COM-type assembly- 8: ; language program, and an MS-DOS 9: ; function call. 10: ; 11: ; Ray Duncan, May 1988 12: ; 13: 14: stdin equ 0 ; standard input handle 15: stdout equ 1 ; standard output handle 16: stderr equ 2 ; standard error handle 17: 18: cr equ 0dh ; ASCII carriage return 19: lf equ 0ah ; ASCII linefeed 20: 21: 22: _TEXT segment word public 'CODE' 23: 24: org 100h ; .COM files always have 25: ; an origin of 100h 26: 27: assume cs:_TEXT,ds:_TEXT,es:_TEXT,ss:_TEXT 28: 29: print proc near ; entry point from MS-DOS 30: 31: mov ah,40h ; function 40h = write 32: mov bx,stdout ; handle for standard output 33: mov cx,msg_len ; length of message 34: mov dx,offset msg ; address of message 35: int 21h ; transfer to MS-DOS 36: 37: mov ax,4c00h ; exit, return code = 0 38: int 21h ; transfer to MS-DOS 39: 40: print endp 41: 42: 43: msg db cr,lf ; message to display 44: db 'Hello World!',cr,lf 45: 46: msg_len equ $-msg ; length of message 47: 48: 49: _TEXT ends 50: 51: end print ; defines entry point ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 3-3. The HELLO.COM program listing. Dropping down past a few comments and EQU statements, we come to a declaration of a code segment that begins in line 22 with a SEGMENT command and ends in line 49 with an ENDS command. The label in the leftmost field of line 22 gives the code segment the name _TEXT. The operand fields at the right end of the line give the segment the attributes WORD, PUBLIC, and `CODE'. (You might find it helpful to read the Microsoft Macro Assembler manual for detailed explanations of each possible segment attribute.) Because this program is going to be converted into a .COM file, all of its executable code and data areas must lie within one code segment. The program must also have its origin at offset 0100H (immediately above the program segment prefix), which is taken care of by the ORG statement in line 24. Following the ORG instruction, we encounter an ASSUME statement on line 27. The concept of ASSUME often baffles new assembly-language programmers. In a way, ASSUME doesn't "do" anything; it simply tells the assembler which segment registers you are going to use to point to the various segments of your program, so that the assembler can provide segment overrides when they are necessary. It's important to notice that the ASSUME statement doesn't take care of loading the segment registers with the proper values; it merely notifies the assembler of your intent to do that within the program. (Remember that, in the case of a .COM program, MS-DOS initializes all the segment registers before entry to point to the PSP.) Within the code segment, we come to another type of block declaration that begins with the PROC command on line 29 and closes with ENDP on line 40. These two instructions declare the beginning and end of a procedure, a block of executable code that performs a single distinct function. The label in the leftmost field of the PROC statement (in this case, print) gives the procedure a name. The operand field gives it an attribute. If the procedure carries the NEAR attribute, only other code in the same segment can call it, whereas if it carries the FAR attribute, code located anywhere in the CPU's memory-addressing space can call it. In .COM programs, all procedures carry the NEAR attribute. For the purposes of this example program, I have kept the print procedure ridiculously simple. It calls MS-DOS Int 21H Function 40H to send the message Hello World! to the video screen, and calls Int 21H Function 4CH to terminate the program. The END statement in line 51 tells the assembler that it has reached the end of the source file and also specifies the entry point for the program. If the entry point is not a label located at offset 0100H, the .EXE file resulting from the assembly and linkage of this source program cannot be converted into a .COM file. Introduction to .EXE Programs We have just discussed a program that was written in such a way that it could be assembled into a .COM file. Such a program is simple in structure, so a programmer who needs to put together this kind of quick utility can concentrate on the program logic and do a minimum amount of worrying about control of the assembler. However, .COM-type programs have some definite disadvantages, and so most serious assembly-language efforts for MS-DOS are written to be converted into .EXE files. Although .COM programs are effectively restricted to a total size of 64 KB for machine code, data, and stack combined, .EXE programs can be practically unlimited in size (up to the limit of the computer's available memory). .EXE programs also place the code, data, and stack in separate parts of the file. Although the normal MS-DOS program loader does not take advantage of this feature of .EXE files, the ability to load different parts of large programs into several separate memory fragments, as well as the opportunity to designate a "pure" code portion of your program that can be shared by several tasks, is very significant in multitasking environments such as Microsoft Windows. The MS-DOS loader always brings a .EXE program into memory immediately above the program segment prefix, although the order of the code, data, and stack segments may vary (Figure 3-4). The .EXE file has a header, or block of control information, with a characteristic format (Figures 3-5 and 3-6). The size of this header varies according to the number of program instructions that need to be relocated at load time, but it is always a multiple of 512 bytes. Before MS-DOS transfers control to the program, the initial values of the code segment (CS) register and instruction pointer (IP) register are calculated from the entry-point information in the .EXE file header and the program's load address. This information derives from an END statement in the source code for one of the program's modules. The data segment (DS) and extra segment (ES) registers are made to point to the PSP so that the program can access the environment-block pointer, command tail, and other useful information contained there. SS:SP ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ ³ Stack segment: ³ ³ stack grows downward from top of segment ³ ³ ³ ³ ³  ³ SS:0000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Data segment ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Program code ³ CS:0000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Program segment prefix ³ DS:0000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ES:0000H Figure 3-4. A memory image of a typical .EXE-type program immediately after loading. The contents of the .EXE file are relocated and brought into memory above the program segment prefix. Code, data, and stack reside in separate segments and need not be in the order shown here. The entry point can be anywhere in the code segment and is specified by the END statement in the main module of the program. When the program receives control, the DS (data segment) and ES (extra segment) registers point to the program segment prefix; the program usually saves this value and then resets the DS and ES registers to point to its data area. The initial contents of the stack segment (SS) and stack pointer (SP) registers come from the header. This information derives from the declaration of a segment with the attribute STACK somewhere in the program's source code. The memory space allocated for the stack may be initialized or uninitialized, depending on the stack-segment definition; many programmers like to initialize the stack memory with a recognizable data pattern so that they can inspect memory dumps and determine how much stack space is actually used by the program. When a .EXE program finishes processing, it should return control to MS-DOS through Int 21H Function 4CH. Other methods are available, but they offer no advantages and are considerably less convenient (because they usually require the CS register to point to the PSP). Byte offset 0000H ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ First of .EXE file signature (4DH) ³ 0001H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Second part of .EXE file signature (5AH) ³ 0002H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Length of file MOD 512 ³ 0004H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Size of file in 512-byte pages, including header ³ 0006H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Number of relocation-table items ³ 0008H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Size of header in paragraphs (16-byte units) ³ 000AH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Minimum number of paragraphs needed above program ³ 000CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Maximum number of paragraphs desired above program ³ 000EH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Segment displacement of stack module ³ 0010H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Contents of SP register at entry ³ 0012H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Word checksum ³ 0014H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Contents of IP register at entry ³ 0016H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Segment displacement of code module ³ 0018H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Offset of first relocation item in file ³ 001AH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Overlay number (0 for resident part of program) ³ 001BH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Variable reserved space ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Relocation table ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Variable reserved space ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Program and data segments ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Stack segment ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 3-5. The format of a .EXE load module. The input to the linker for a .EXE-type program can be many separate object modules. Each module can use a unique code-segment name, and the procedures can carry either the NEAR or the FAR attribute, depending on naming conventions and the size of the executable code. The programmer must take care that the modules linked together contain only one segment with the STACK attribute and only one entry point defined with an END assembler directive. The output from the linker is a file with a .EXE extension. This file can be executed immediately. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ C>DUMP HELLO.EXE 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 4D 5A 28 00 02 00 01 00 20 00 09 00 FF FF 03 00 MZ(..... ....... 0010 80 00 20 05 00 00 00 00 1E 00 00 00 01 00 01 00 .. ............. 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ . . . 0200 B8 01 00 8E D8 B4 40 BB 01 00 B9 10 00 90 BA 08 ......@......... 0210 00 CD 21 B8 00 4C CD 21 0D 0A 48 65 6C 6C 6F 20 ..!..L.!..Hello 0220 57 6F 72 6C 64 21 0D 0A World!.. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 3-6. A hex dump of the HELLO.EXE program, demonstrating the contents of a simple .EXE load module. Note the following interesting values: the .EXE signature in bytes 0000H and 0001H, the number of relocation-table items in bytes 0006H and 0007H, the minimum extra memory allocation (MIN_ALLOC) in bytes 000AH and 000BH, the maximum extra memory allocation (MAX_ALLOC) in bytes 000CH and 000DH, and the initial IP (instruction pointer) register value in bytes 0014H and 0015H. See also Figure 3-5. An Example .EXE Program The HELLO.EXE program in Figure 3-7 demonstrates the fundamental structure of an assembly-language program that is destined to become a .EXE file. At minimum, it should have a module name, a code segment, a stack segment, and a primary procedure that receives control of the computer from MS-DOS after the program is loaded. The HELLO.EXE program also contains a data segment to provide a more complete example. The NAME, TITLE, and PAGE directives were covered in the HELLO.COM example program and are used in the same manner here, so we'll move to the first new item of interest. After a few comments and EQU statements, we come to a declaration of a code segment that begins on line 21 with a SEGMENT command and ends on line 41 with an ENDS command. As in the HELLO.COM example program, the label in the leftmost field of the line gives the code segment the name _TEXT. The operand fields at the right end of the line give the attributes WORD, PUBLIC, and `CODE'. Following the code-segment instruction, we find an ASSUME statement on line 23. Notice that, unlike the equivalent statement in the HELLO.COM program, the ASSUME statement in this program specifies several different segment names. Again, remember that this statement has no direct effect on the contents of the segment registers but affects only the operation of the assembler itself. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 1: name hello 2: page 55,132 3: title HELLO.EXE--print Hello on terminal 4: ; 5: ; HELLO.EXE: demonstrates various components 6: ; of a functional .EXE-type assembly- 7: ; language program, use of segments, 8: ; and an MS-DOS function call. 9: ; 10: ; Ray Duncan, May 1988 11: ; 12: 13: stdin equ 0 ; standard input handle 14: stdout equ 1 ; standard output handle 15: stderr equ 2 ; standard error handle 16: 17: cr equ 0dh ; ASCII carriage return 18: lf equ 0ah ; ASCII linefeed 19: 20: 21: _TEXT segment word public 'CODE' 22: 23: assume cs:_TEXT,ds:_DATA,ss:STACK 24: 25: print proc far ; entry point from MS-DOS 26: 27: mov ax,_DATA ; make our data segment 28: mov ds,ax ; addressable... 29: 30: mov ah,40h ; function 40h = write 31: mov bx,stdout ; standard output handle 32: mov cx,msg_len ; length of message 33: mov dx,offset msg ; address of message 34: int 21h ; transfer to MS-DOS 35: 36: mov ax,4c00h ; exit, return code = 0 37: int 21h ; transfer to MS-DOS 38: 39: print endp 40: 41: _TEXT ends 42: 43: 44: _DATA segment word public 'DATA' 45: 46: msg db cr,lf ; message to display 47: db 'Hello World!',cr,lf 48: 49: msg_len equ $-msg ; length of message 50: 51: _DATA ends 52: 53: 54: STACK segment para stack `STACK' 55: 56: db 128 dup (?) 57: 58: STACK ends 59: 60: end print ; defines entry point ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 3-7. The HELLO.EXE program listing. Within the code segment, the main print procedure is declared by the PROC command on line 25 and closed with ENDP on line 39. Because the procedure resides in a .EXE file, we have given it the FAR attribute as an example, but the attribute is really irrelevant because the program is so small and the procedure is not called by anything else in the same program. The print procedure first initializes the DS register, as indicated in the earlier ASSUME statement, loading it with a value that causes it to point to the base of the data area. (MS-DOS automatically sets up the CS and SS registers.) Next, the procedure uses MS-DOS Int 21H Function 40H to display the message Hello World! on the screen, just as in the HELLO.COM program. Finally, the procedure exits back to MS-DOS with an Int 21H Function 4CH on lines 36 and 37, passing a return code of zero (which by convention means a success). Lines 44 through 51 declare a data segment named _DATA, which contains the variables and constants the program will use. If the various modules of a program contain multiple data segments with the same name, the linker will collect them and place them in the same physical memory segment. Lines 54 through 58 establish a stack segment; PUSH and POP instructions will access this area of scratch memory. Before MS-DOS transfers control to a .EXE program, it sets up the SS and SP registers according to the declared size and location of the stack segment. Be sure to allow enough room for the maximum stack depth that can occur at runtime, plus a safe number of extra words for registers pushed onto the stack during an MS-DOS service call. If the stack overflows, it may damage your other code and data segments and cause your program to behave strangely or even to crash altogether! The END statement on line 60 winds up our brief HELLO.EXE program, telling the assembler that it has reached the end of the source file and providing the label of the program's point of entry from MS-DOS. The differences between .COM and .EXE programs are summarized in Figure 3-8. .COM program .EXE program ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Maximum size 65,536 bytes minus 256 No limit bytes for PSP and 2 bytes for stack Entry point PSP:0100H Defined by END statement AL at entry 00H if default FCB #1 has Same valid drive, 0FFH if invalid drive AH at entry 00H if default FCB #2 has Same valid drive, 0FFH if invalid drive CS at entry PSP Segment containing module with entry point IP at entry 0100H Offset of entry point within its segment DS at entry PSP PSP ES at entry PSP PSP SS at entry PSP Segment with STACK attribute SP at entry 0FFFEH or top word in Size of segment defined with available memory, STACK attribute whichever is lower Stack at entry Zero word Initialized or uninitialized Stack size 65,536 bytes minus 256 Defined in segment with bytes for PSP and size of STACK attribute executable code and data Subroutine calls Usually NEAR NEAR or FAR Exit method Int 21H Function 4CH Int 21H Function 4CH preferred, NEAR RET if preferred MS-DOS version 1 Size of file Exact size of program Size of program plus header (multiple of 512 bytes) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 3-8. Summary of the differences between .COM and .EXE programs, including their entry conditions. More About Assembly-Language Programs Now that we've looked at working examples of .COM and .EXE assembly-language programs, let's backtrack and discuss their elements a little more formally. The following discussion is based on the Microsoft Macro Assembler, hereafter referred to as MASM. If you are familiar with MASM and are an experienced assembly-language programmer, you may want to skip this section. MASM programs can be thought of as having three structural levels: þ The module level þ The segment level þ The procedure level Modules are simply chunks of source code that can be independently maintained and assembled. Segments are physical groupings of like items (machine code or data) within a program and a corresponding segregation of dissimilar items. Procedures are functional subdivisions of an executable programÄÄroutines that carry out a particular task. Program Modules Under MS-DOS, the module-level structure consists of files containing the source code for individual routines. Each source file is translated by the assembler into a relocatable object module. An object module can reside alone in an individual file or with many other object modules in an object-module library of frequently used or related routines. The Microsoft Object Linker (LINK) combines object-module files, often with additional object modules extracted from libraries, into an executable program file. Using modules and object-module libraries reduces the size of your application source files (and vastly increases your productivity), because these files need not contain the source code for routines they have in common with other programs. This technique also allows you to maintain the routines more easily, because you need to alter only one copy of their source code stored in one place, instead of many copies stored in different applications. When you improve (or fix) one of these routines, you can simply reassemble it, put its object module back into the library, relink all of the programs that use the routine, and voilga: instant upgrade. Program Segments The term segments refers to two discrete programming concepts: physical segments and logical segments. Physical segments are 64 KB blocks of memory. The Intel 8086/8088 and 80286 microprocessors have four segment registers, which are essentially used as pointers to these blocks. (The 80386 has six segment registers, which are a superset of those found on the 8086/8088 and 80286.) Each segment register can point to the bottom of a different 64 KB area of memory. Thus, a program can address any location in memory by appropriate manipulation of the segment registers, but the maximum amount of memory that it can address simultaneously is 256 KB. As we discussed earlier in the chapter, .COM programs assume that all four segment registers always point to the same placeÄÄthe bottom of the program. Thus, they are limited to a maximum size of 64 KB. .EXE programs, on the other hand, can address many different physical segments and can reset the segment registers to point to each segment as it is needed. Consequently, the only practical limit on the size of a .EXE program is the amount of available memory. The example programs throughout the remainder of this book focus on .EXE programs. Logical segments are the program components. A minimum of three logical segments must be declared in any .EXE program: a code segment, a data segment, and a stack segment. Programs with more than 64 KB of code or data have more than one code or data segment. The routines or data that are used most frequently are put into the primary code and data segments for speed, and routines or data that are used less frequently are put into secondary code and data segments. Segments are declared with the SEGMENT and ENDS directives in the following form: name SEGMENT attributes . . . name ENDS The attributes of a segment include its align type (BYTE, WORD, or PARA), combine type (PUBLIC, PRIVATE, COMMON, or STACK), and class type. The segment attributes are used by the linker when it is combining logical segments to create the physical segments of an executable program. Most of the time, you can get by just fine using a small selection of attributes in a rather stereotypical way. However, if you want to use the full range of attributes, you might want to read the detailed explanation in the MASM manual. Programs are classified into one memory model or another based on the number of their code and data segments. The most commonly used memory model for assembly-language programs is the small model, which has one code and one data segment, but you can also use the medium, compact, and large models (Figure 3-9). (Two additional models exist with which we will not be concerning ourselves further: the tiny model, which consists of intermixed code and data in a single segmentÄÄ for example, a .COM file under MS-DOS; and the huge model, which is supported by the Microsoft C Optimizing Compiler and which allows use of data structures larger than 64 KB.) Model Code segments Data segments ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Small One One Medium Multiple One Compact One Multiple Large Multiple Multiple ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 3-9. Memory models commonly used in assembly-language and C programs. For each memory model, Microsoft has established certain segment and class names that are used by all its high-level-language compilers (Figure 3-10). Because segment names are arbitrary, you may as well adopt the Microsoft conventions. Their use will make it easier for you to integrate your assembly-language routines into programs written in languages such as C, or to use routines from high-level-language libraries in your assembly-language programs. Another important Microsoft high-level-language convention is to use the GROUP directive to name the near data segment (the segment the program expects to address with offsets from the DS register) and the stack segment as members of DGROUP (the automatic data group), a special name recognized by the linker and also by the program loaders in Microsoft Windows and Microsoft OS/2. The GROUP directive causes logical segments with different names to be combined into a single physical segment so that they can be addressed using the same segment base address. In C programs, DGROUP also contains the local heap, which is used by the C runtime library for dynamic allocation of small amounts of memory. Memory Segment Align Combine Class Group model name type type type ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Small _TEXT WORD PUBLIC CODE _DATA WORD PUBLIC DATA DGROUP STACK PARA STACK STACK DGROUP Medium module_TEXT WORD PUBLIC CODE . WORD PUBLIC DATA DGROUP . . _DATA STACK PARA STACK STACK DGROUP Compact _TEXT WORD PUBLIC CODE data PARA PRIVATE FAR_DATA . WORD PUBLIC DATA DGROUP . . _DATA STACK PARA STACK STACK DGROUP Large module_TEXT WORD PUBLIC CODE . . . data PARA PRIVATE FAR_DATA . . . _DATA WORD PUBLIC DATA DGROUP STACK PARA STACK STACK DGROUP ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 3-10. Segments, groups, and classes for the standard memory models as used with assembly-language programs. The Microsoft C Optimizing Compiler and other high-level-language compilers use a superset of these segments and classes. For pure assembly-language programs that will run under MS-DOS, you can ignore DGROUP. However, if you plan to integrate assembly-language routines and programs written in high-level languages, you'll want to follow the Microsoft DGROUP convention. For example, if you are planning to link routines from a C library into an assembly-language program, you should include the line DGROUP group _DATA,STACK near the beginning of the program. The final Microsoft convention of interest in creating .EXE programs is segment order. The high-level compilers assume that code segments always come first, followed by far data segments, followed by the near data segment, with the stack and heap last. This order won't concern you much until you begin integrating assembly-language code with routines from high-level-language libraries, but it is easiest to learn to use the convention right from the start. Program Procedures The procedure level of program structure is partly real and partly conceptual. Procedures are basically just a fancy guise for subroutines. Procedures within a program are declared with the PROC and ENDP directives in the following form: name PROC attribute . . . RET name ENDP The attribute carried by a PROC declaration, which is either NEAR or FAR, tells the assembler what type of call you expect to use to enter the procedureÄÄthat is, whether the procedure will be called from other routines in the same segment or from routines in other segments. When the assembler encounters a RET instruction within the procedure, it uses the attribute information to generate the correct opcode for either a near (intra-segment) or far (inter-segment) return. Each program should have a main procedure that receives control from MS-DOS. You specify the entry point for the program by including the name of the main procedure in the END statement in one of the program's source files. The main procedure's attribute (NEAR or FAR) is really not too important, because the program returns control to MS-DOS with a function call rather than a RET instruction. However, by convention, most programmers assign the main procedure the FAR attribute anyway. You should break the remainder of the program into procedures in an orderly way, with each procedure performing a well-defined single function, returning its results to its caller, and avoiding actions that have global effects within the program. Ideally procedures invoke each other only by CALL instructions, have only one entry point and one exit point, and always exit by means of a RET instruction, never by jumping to some other location within the program. For ease of understanding and maintenance, a procedure should not exceed one page (about 60 lines); if it is longer than a page, it is probably too complex and you should delegate some of its function to one or more subsidiary procedures. You should preface the source code for each procedure with a detailed comment that states the procedure's calling sequence, results returned, registers affected, and any data items accessed or modified. The effort invested in making your procedures compact, clean, flexible, and well-documented will be repaid many times over when you reuse the procedures in other programs. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 4 MS-DOS Programming Tools Preparing a new program to run under MS-DOS is an iterative process with four basic steps: þ Use of a text editor to create or modify an ASCII source-code file þ Use of an assembler or high-level-language compiler (such as the Microsoft Macro Assembler or the Microsoft C Optimizing Compiler) to translate the source file into relocatable object code þ Use of a linker to transform the relocatable object code into an executable MS-DOS load module þ Use of a debugger to methodically test and debug the program Additional utilities the MS-DOS software developer may find necessary or helpful include the following: þ LIB, which creates and maintains object-module libraries þ CREF, which generates a cross-reference listing þ EXE2BIN, which converts .EXE files to .COM files þ MAKE, which compares dates of files and carries out operations based on the result of the comparison This chapter gives an operational overview of the Microsoft programming tools for MS-DOS, including the assembler, the C compiler, the linker, and the librarian. In general, the information provided here also applies to the IBM programming tools for MS-DOS, which are really the Microsoft products with minor variations and different version numbers. Even if your preferred programming language is not C or assembly language, you will need at least a passing familiarity with these tools because all of the examples in the IBM and Microsoft DOS reference manuals are written in one of these languages. The survey in this chapter, together with the example programs and reference section elsewhere in the book, should provide the experienced programmer with sufficient information to immediately begin writing useful programs. Readers who do not have a background in C, assembly language, or the Intel 80x86 microprocessor architecture should refer to the tutorial and reference works listed at the end of this chapter. File Types The MS-DOS programming tools can create and process many different file types. The following extensions are used by convention for these files: Extension File type ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ .ASM Assembly-language source file .C C source file .COM MS-DOS executable load module that does not require relocation at runtime .CRF Cross-reference information file produced by the assembler for processing by CREF.EXE .DEF Module-definition file describing a program's segment behavior (MS OS/2 and Microsoft Windows programs only; not relevant to normal MS-DOS applications) .EXE MS-DOS executable load module that requires relocation at runtime .H C header file containing C source code for constants, macros, and functions; merged into another C program with the #include directive .INC Include file for assembly-language programs, typically containing macros and/or equates for systemwide values such as error codes .LIB Object-module library file made up of one or more .OBJ files; indexed and manipulated by LIB.EXE .LST Program listing, produced by the assembler, that includes memory locations, machine code, the original program text, and error messages .MAP Listing of symbols and their locations within a load module; produced by the linker .OBJ Relocatable-object-code file produced by an assembler or compiler .REF Cross-reference listing produced by CREF.EXE from the information in a .CRF file ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The Microsoft Macro Assembler The Microsoft Macro Assembler (MASM) is distributed as the file MASM.EXE. When beginning a program translation, MASM needs the following information: þ The name of the file containing the source program þ The filename for the object program to be created þ The destination of the program listing þ The filename for the information that is later processed by the cross-reference utility (CREF.EXE) You can invoke MASM in two ways. If you enter the name of the assembler alone, it prompts you for the names of each of the various input and output files. The assembler supplies reasonable defaults for all the responses except the source filename, as shown in the following example: C>MASM Microsoft (R) Macro Assembler Version 5.10 Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. Source filename [.ASM]: HELLO Object filename [HELLO.OBJ]: Source listing [NUL.LST]: Cross-reference [NUL.CRF]: 49006 Bytes symbol space free 0 Warning Errors 0 Severe Errors C> You can use a logical device name (such as PRN or COM1) at any of the MASM prompts to send that output of the assembler to a character device rather than a file. Note that the default for the listing and cross-reference files is the NUL deviceÄÄthat is, no file is created. If you end any response with a semicolon, MASM assumes that the remaining responses are all to be the default. A more efficient way to use MASM is to supply all parameters in the command line, as follows: MASM [options] source,[object],[listing],[crossref] For example, the following command lines are equivalent to the preceding interactive session: C>MASM HELLO,,NUL,NUL or C>MASM HELLO; These commands use the file HELLO.ASM as the source, generate the object-code file HELLO.OBJ, and send the listing and cross-reference files to the bit bucket. MASM accepts several optional switches in the command line, to control code generation and output files. Figure 4-1 lists the switches accepted by MASM version 5.1. As shown in the following example, you can put frequently used options in a MASM environment variable, where they will be found automatically by the assembler: C>SET MASM=/T /Zi The switches in the environment variable will be overridden by any that you enter in the command line. In other versions of the Microsoft Macro Assembler, additional or fewer switches may be available. For exact instructions, see the manual for the version of MASM that you are using. Switch Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /A Arrange segments in alphabetic order. /Bn Set size of source-file buffer (in KB). /C Force creation of a cross-reference (.CRF) file. /D Produce listing on both passes (to find phase errors). /Dsymbol Define symbol as a null text string (symbol can be referenced by conditional assembly directives in file). /E Assemble for 80x87 numeric coprocessor emulator using IEEE real-number format. /Ipath Set search path for include files. /L Force creation of a program-listing file. /LA Force listing of all generated code. /ML Preserve case sensitivity in all names (uppercase names distinct from their lowercase equivalents). /MX Preserve lowercase in external names only (names defined with PUBLIC or EXTRN directives). /MU Convert all lowercase names to uppercase. /N Suppress generation of tables of macros, structures, records, segments, groups, and symbols at the end of the listing. /P Check for impure code in 80286/80386 protected mode. /S Arrange segments in order of occurrence (default). /T "Terse" mode; suppress all messages unless errors are encountered during the assembly. /V "Verbose" mode; report number of lines and symbols at end of assembly. /Wn Set error display (warning) level; n=0Ä2. /X Force listing of false conditionals. /Z Display source lines containing errors on the screen. /Zd Include line-number information in .OBJ file. /Zi Include line-number and symbol information in .OBJ file. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-1. Microsoft Macro Assembler version 5.1 switches. MASM allows you to override the default extensions on any fileÄÄa feature that can be rather dangerous. For example, if in the preceding example you had responded to the Object filename prompt with HELLO.ASM, the assembler would have accepted the entry without comment and destroyed your source file. This is not too likely to happen in the interactive command mode, but you must be very careful with file extensions when MASM is used in a batch file. The Microsoft C Optimizing Compiler The Microsoft C Optimizing Compiler consists of three executable filesÄÄ C1.EXE, C2.EXE, and C3.EXEÄÄthat implement the C preprocessor, language translator, code generator, and code optimizer. An additional control program, CL.EXE, executes the three compiler files in order, passing each the necessary information about filenames and compilation options. Before using the C compiler and the linker, you need to set up four environment variables: Variable Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ PATH=path Specifies the location of the three executable C compiler files (C1, C2, and C3) if they are not in the current directory; used by CL.EXE. INCLUDE=path Specifies the location of #include files (default extension .H) that are not found in the current directory. LIB=path Specifies the location(s) for object-code libraries that are not found in the current directory. TMP=path Specifies the location for temporary working files created by the C compiler and linker. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ CL.EXE does not support an interactive mode or response files. You always invoke it with a command line of the following form: CL [options] file [file ...] You may list any number of filesÄÄif a file has a .C extension, it will be compiled into a relocatable-object-module (.OBJ) file. Ordinarily, if the compiler encounters no errors, it automatically passes all resulting .OBJ files and any additional .OBJ files specified in the command line to the linker, along with the names of the appropriate runtime libraries. The C compiler has many optional switches controlling its memory models, output files, code generation, and code optimization. These are summarized in Figure 4-2. The C compiler's arcane switch syntax is derived largely from UNIX/XENIX, so don't expect it to make any sense. Switch Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /Ax Select memory model: C = compact model H = huge model L = large model M = medium model S = small model (default) /c Compile only; do not invoke linker. /C Do not strip comments. /D[=text] Define macro. /E Send preprocessor output to standard output. /EP Send preprocessor output to standard output without line numbers. /F Set stack size (in hexadecimal bytes). /Fa [filename] Generate assembly listing. /Fc [filename] Generate mixed source/object listing. /Fe [filename] Force executable filename. /Fl [filename] Generate object listing. /Fm [filename] Generate map file. /Fo [filename] Force object-module filename. /FPx Select floating-point control: a = calls with alternate math library c = calls with emulator library c87 = calls with 8087 library i = in-line with emulator (default) i87 = in-line with 8087 /Fs [filename] Generate source listing. /Gx Select code generation: 0 = 8086 instructions (default) 1 = 186 instructions 2 = 286 instructions c = Pascal style function calls s = no stack checking t[n] = data size threshold /H Specify external name length. /I Specify additional #include path. /J Specify default char type as unsigned. /link [options] Pass switches and library names to linker. /Ox Select optimization: a = ignore aliasing d = disable optimizations i = enable intrinsic functions l = enable loop optimizations n = disable "unsafe" optimizations p = enable precision optimizations r = disable in-line return s = optimize for space /Ox t = optimize for speed (default) w = ignore aliasing except across function calls x = enable maximum optimization (equivalent to /Oailt /Gs) /P Send preprocessor output to file. /Sx Select source-listing control: l = set line width p = set page length s = set subtitle string t = set title string /Tc Compile file without .C extension. /u Remove all predefined macros. /U Remove specified predefined macro. /V Set version string. /W Set warning level (0Ä3). /X Ignore "standard places" for include files. /Zx Select miscellaneous compilation control: a = disable extensions c = make Pascal functions case-insensitive d = include line-number information e = enable extensions (default) g = generate declarations i = include symbolic debugging information l = remove default library info p = pack structures on n-byte boundary s = check syntax only ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-2. Microsoft C Optimizing Compiler version 5.1 switches. The Microsoft Object Linker The object module produced by MASM from a source file is in a form that contains relocation information and may also contain unresolved references to external locations or subroutines. It is written in a common format that is also produced by the various high-level compilers (such as FORTRAN and C) that run under MS-DOS. The computer cannot execute object modules without further processing. The Microsoft Object Linker (LINK), distributed as the file LINK.EXE, accepts one or more of these object modules, resolves external references, includes any necessary routines from designated libraries, performs any necessary offset relocations, and writes a file that can be loaded and executed by MS-DOS. The output of LINK is always in .EXE load-module format. (See Chapter 3.) As with MASM, you can give LINK its parameters interactively or by entering all the required information in a single command line. If you enter the name of the linker alone, the following type of dialog ensues: C>LINK Microsoft (R) Overlay Linker Version 3.61 Copyright (C) Microsoft Corp 1983-1987. All rights reserved. Object Modules [.OBJ]: HELLO Run File [HELLO.EXE]: List File [NUL.MAP]: HELLO Libraries [.LIB]: C> If you are using LINK version 4.0 or later, the linker also asks for the name of a module-definition (.DEF) file. Simply press the Enter key in response to such a prompt. Module-definition files are used when building Microsoft Windows or MS OS/2 "new .EXE" executable files but are not relevant in normal MS-DOS applications. The input file for this example was HELLO.OBJ; the output files were HELLO.EXE (the executable program) and HELLO.MAP (the load map produced by the linker after all references and addresses were resolved). Figure 4-3 shows the load map. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Start Stop Length Name Class 00000H 00017H 00018H _TEXT CODE 00018H 00027H 00010H _DATA DATA 00030H 000AFH 00080H STACK STACK 000B0H 000BBH 0000CH $$TYPES DEBTYP 000C0H 000D6H 00017H $$SYMBOLS DEBSYM Address Publics by Name Address Publics by Value Program entry point at 0000:0000 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-3. Map produced by the Microsoft Object Linker (LINK) during the generation of the HELLO.EXE program from Chapter 3. The program contains one CODE, one DATA, and one STACK segment. The first instruction to be executed lies in the first byte of the CODE segment. The $$TYPES and $$SYMBOLS segments contain information for the CodeView debugger and are not part of the program; these segments are ignored by the normal MS-DOS loader. You can obtain the same result more quickly by entering all parameters in the command line, in the following form: LINK options objectfile, [exefile], [mapfile], [libraries] Thus, the command-line equivalent to the preceding interactive session is C>LINK HELLO,HELLO,HELLO,, or C>LINK HELLO,,HELLO; If you enter a semicolon as the last character in the command line, LINK assumes the default values for all further parameters. A third method of commanding LINK is with a response file. A response file contains lines of text that correspond to the responses you would give the linker interactively. You specify the name of the response file in the command line with a leading @ character, as follows: LINK @filename You can also enter the name of a response file at any prompt. If the response file is not complete, LINK will prompt you for the missing information. When entering linker commands, you can specify multiple object files with the + operator or with spaces, as in the following example: C>LINK HELLO+VMODE+DOSINT,MYPROG,,GRAPHICS; This command would link the files HELLO.OBJ, VMODE.OBJ, and DOSINT.OBJ, searching the library file GRAPHICS.LIB to resolve any references to symbols not defined in the specified object files, and would produce a file named MYPROG.EXE. LINK uses the current drive and directory when they are not explicitly included in a filename; it will not automatically use the same drive and directory you specified for a previous file in the same command line. By using the + operator or space characters in the libraries field, you can specify up to 32 library files to be searched. Each high-level- language compiler provides default libraries that are searched automatically during the linkage process if the linker can find them (unless they are explicitly excluded with the /NOD switch). LINK looks for libraries first in the current directory of the default disk drive, then along any paths that were provided in the command line, and finally along the path(s) specified by the LIB variable if it is present in the environment. LINK accepts several optional switches as part of the command line or at the end of any interactive prompt. Figure 4-4 lists these switches. The number of switches available and their actions vary among different versions of LINK. See your Microsoft Object Linker instruction manual for detailed information about your particular version. Switch Full form Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /A:n /ALIGNMENT:n Set segment sector alignment factor. N must be a power of 2 (default = 512). Not related to logical-segment alignment (BYTE, WORD, PARA, PAGE, and so forth). Relevant to segmented executable files (Microsoft Windows and MS OS/2) only. /B /BATCH Suppress linker prompt if a library cannot be found in the current directory or in the locations specified by the LIB environment variable. /CO /CODEVIEW Include symbolic debugging information in the .EXE file for use by CodeView. /CP /CPARMAXALLOC Set the field in the .EXE file header controlling the amount of memory allocated to the program in addition to the memory required for the program's code, stack, and initialized data. /DO /DOSSEG Use standard Microsoft segment naming and ordering conventions. /DS /DSALLOCATE Load data at high end of the data segment. Relevant to real-mode programs only. /E /EXEPACK Pack executable file by removing sequences of repeated bytes and optimizing relocation table. /F /FARCALLTRANSLATION Optimize far calls to labels within the same physical segment for speed by replacing them with near calls and NOPs. /HE /HELP Display information about available options. /HI /HIGH Load program as high in memory as possible. /I /INFORMATION Display information about progress of linking, including pass numbers and the names of object files being linked. /INC /INCREMENTAL Force production of .SYM and .ILK files for subsequent use by ILINK (incremental linker). May not be used with /EXEPACK. Relevant to segmented executable files (Microsoft Windows and MS OS/2) only. /LI /LINENUMBERS Write address of the first instruction that corresponds to each source-code line to the map file. Has no effect if the compiler does not include line-number information in the object module. Force creation of a map file. /M[:n] /MAP[:n] Force creation of a .MAP file listing all public symbols, sorted by name and by location. The optional value n is the maximum number of symbols that can be sorted (default = 2048); when n is supplied, the alphabetically sorted list is omitted. /NOD /NODEFAULTLIBRARYSEARCH Skip search of any default compiler libraries specified in the .OBJ file. /NOE /NOEXTENDEDDICTSEARCH Ignore extended library dictionary (if it is present). The extended dictionary ordinarily provides the linker with information about inter-module dependencies, to speed up linking. /NOF /NOFARCALLTRANSLATION Disable optimization of far calls to labels within the same segment. /NOG /NOGROUPASSOCIATION Ignore group associations when assigning addresses to data and code items. /NOI /NOIGNORECASE Do not ignore case in names during linking. /NON /NONULLSDOSSEG Arrange segments as for /DOSSEG but do not insert 16 null bytes at start of _TEXT segment. /NOP /NOPACKCODE Do not pack contiguous logical code segments into a single physical segment. /O:n /OVERLAYINTERRUPT:n Use interrupt number n with the overlay manager supplied with some Microsoft high-level languages. /PAC[:n] /PACKCODE[:n] Pack contiguous logical code segments into a single physical code segment. The optional value n is the maximum size for each packed physical code segment (default = 65,536 bytes). Segments in different groups are not packed. /PADC:n /PADCODE:n Add n filler bytes to end of each code module so that a larger module can be inserted later with ILINK. Relevant to segmented executable files (Windows and MS OS/2) only. /PADD:n /PADDATA:n Add n filler bytes to end of each data module so that a larger module can be inserted later with ILINK. Relevant to segmented executable files (Microsoft Windows and MS OS/2) only. /PAU /PAUSE Pause during linking, allowing a change of disks before .EXE file is written. /SE:n /SEGMENTS:n Set maximum number of segments in linked program (default = 128). /ST:n /STACK:n Set stack size of program in bytes; ignore stack segment size declarations within object modules and definition file. /W /WARNFIXUP Display warning messages for offsets relative to a segment base that is not the same as the group base. Relevant to segmented executable files (Microsoft Windows and MS OS/2) only. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-4. Switches accepted by the Microsoft Object Linker (LINK) version 5.0. Earlier versions use a subset of these switches. Note that any abbreviation for a switch is acceptable as long as it is sufficient to specify the switch uniquely. The EXE2BIN Utility The EXE2BIN utility (EXE2BIN.EXE) transforms a .EXE file created by LINK into an executable .COM file, if the program meets the following prerequisites: þ It cannot contain more than one declared segment and cannot define a stack. þ It must be less than 64 KB in length. þ It must have an origin at 0100H. þ The first location in the file must be specified as the entry point in the source code's END directive. Although .COM files are somewhat more compact than .EXE files, you should avoid using them. Programs that use separate segments for code, data, and stack are much easier to port to protected-mode environments such as MS OS/2; in addition, .COM files do not support the symbolic debugging information used by CodeView. Another use for the EXE2BIN utility is to convert an installable device driverÄÄafter it is assembled and linked into a .EXE fileÄÄinto a memory-image .BIN or .SYS file with an origin of zero. This conversion is required in MS-DOS version 2, which cannot load device drivers as .EXE files. The process of writing an installable device driver is discussed in more detail in Chapter 14. Unlike most of the other programming utilities, EXE2BIN does not have an interactive mode. It always takes its source and destination filenames, separated by spaces, from the MS-DOS command line, as follows: EXE2BIN sourcefile [destinationfile] If you do not supply the source-file extension, it defaults to .EXE; the destination-file extension defaults to .BIN. If you do not specify a name for the destination file, EXE2BIN gives it the same name as the source file, with a .BIN extension. For example, to convert the file HELLO.EXE into HELLO.COM, you would use the following command line: C>EXE2BIN HELLO.EXE HELLO.COM The EXE2BIN program also has other capabilities, such as pure binary conversion with segment fixup for creating program images to be placed in ROM; but because these features are rarely used during MS-DOS application development, they will not be discussed here. The CREF Utility The CREF cross-reference utility CREF.EXE processes a .CRF file produced by MASM, creating an ASCII text file with the default extension .REF. The file contains a cross-reference listing of all symbols declared in the program and the line numbers in which they are referenced. (See Figure 4-5.) Such a listing is very useful when debugging large assembly-language programs with many interdependent procedures and variables. CREF may be supplied with its parameters interactively or in a single command line. If you enter the utility name alone, CREF prompts you for the input and output filenames, as shown in the following example: C>CREF Microsoft (R) Cross-Reference Utility Version 5.10 Copyright (C) Microsoft Corp 1981-1985, 1987. All rights reserved. Cross-reference [.CRF]: HELLO Listing [HELLO.REF]: 15 Symbols C> ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Microsoft Cross-Reference Version 5.10 Thu May 26 11:09:34 1988 HELLO.EXE --- print Hello on terminal Symbol Cross-Reference (# definition, + modification)Cref-1 @CPU . . . . . . . . . . . . . . 1# @VERSION . . . . . . . . . . . . 1# CODE . . . . . . . . . . . . . . 21 CR . . . . . . . . . . . . . . . 17# 46 47 DATA . . . . . . . . . . . . . . 44 LF . . . . . . . . . . . . . . . 18# 46 47 MSG. . . . . . . . . . . . . . . 33 46# MSG_LEN. . . . . . . . . . . . . 32 49# PRINT. . . . . . . . . . . . . . 25# 39 60 STACK. . . . . . . . . . . . . . 23 54# 54 58 STDERR . . . . . . . . . . . . . 15# STDIN. . . . . . . . . . . . . . 13# STDOUT . . . . . . . . . . . . . 14# 31 _DATA. . . . . . . . . . . . . . 23 27 44# 51 _TEXT. . . . . . . . . . . . . . 21# 23 41 15 Symbols ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-5. Cross-reference listing HELLO.REF produced by the CREF utility from the file HELLO.CRF, for the HELLO.EXE program example from Chapter 3. The symbols declared in the program are listed on the left in alphabetic order. To the right of each symbol is a list of all the lines where that symbol is referenced. The number with a # sign after it denotes the line where the symbol is declared. Numbers followed by a + sign indicate that the symbol is modified at the specified line. The line numbers given in the cross-reference listing correspond to the line numbers generated by the assembler in the program-listing (.LST) file, not to any physical line count in the original source file. The parameters may also be entered in the command line in the following form: CREF CRF_file, listing_file For example, the command-line equivalent to the preceding interactive session is: C>CREF HELLO,HELLO If CREF cannot find the specified .CRF file, it displays an error message. Otherwise, it leaves the cross-reference listing in the specified file on the disk. You can send the file to the printer with the COPY command, in the following form: COPY listing_file PRN: You can also send the cross-reference listing directly to a character device as it is generated by responding to the Listing prompt with the name of the device. The Microsoft Library Manager Although the object modules that are produced by MASM or by high-level- language compilers can be linked directly into executable load modules, they can also be collected into special files called object-module libraries. The modules in a library are indexed by name and by the public symbols they contain, so that they can be extracted by the linker to satisfy external references in a program. The Microsoft Library Manager (LIB) is distributed as the file LIB.EXE. LIB creates and maintains program libraries, adding, updating, and deleting object files as necessary. LIB can also check a library file for internal consistency or print a table of its contents (Figure 4-6). LIB follows the command conventions of most other Microsoft programming tools. You must supply it with the name of a library file to work on, one or more operations to perform, the name of a listing file or device, and (optionally) the name of the output library. If you do not specify a name for the output library, LIB gives it the same name as the input library and changes the extension of the input library to .BAK. The LIB operations are simply the names of object files, with a prefix character that specifies the action to be taken: Prefix Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ - Delete an object module from the library. * Extract a module and place it in a separate .OBJ file. + Add an object module or the entire contents of another library to the library. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ You can combine command prefixes. For example, -+ replaces a module, and *- extracts a module into a new file and then deletes it from the library. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ _abort............abort _abs..............abs _access...........access _asctime..........asctime _atof.............atof _atoi.............atoi _atol.............atol _bdos.............bdos _brk..............brk _brkctl...........brkctl _bsearch..........bsearch _calloc...........calloc _cgets............cgets _chdir............dir _chmod............chmod _chsize...........chsize . . . _exit Offset: 00000010H Code and data size: 44H __exit _filbuf Offset: 00000160H Code and data size: BBH __filbuf _file Offset: 00000300H Code and data size: CAH __iob __iob2 __lastiob . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-6. Extract from the table-of-contents listing produced by the Microsoft Library Manager (LIB) for the Microsoft C library SLIBC.LIB. The first part of the listing is an alphabetic list of all public names declared in all of the modules in the library. Each name is associated with the object module to which it belongs. The second part of the listing is an alphabetic list of the object-module names in the library, each followed by its offset within the library file and the actual size of the module in bytes. The entry for each module is followed by a summary of the public names that are declared within it. When you invoke LIB with its name alone, it requests the other information it needs interactively, as shown in the following example: C>LIB Microsoft (R) Library Manager Version 3.08 Copyright (C) Microsoft Corp 1983-1987. All rights reserved. Library name: SLIBC Operations: +VIDEO List file: SLIBC.LST Output library: SLIBC2 C> In this example, LIB added the object module VIDEO.OBJ to the library SLIBC.LIB, wrote a library table of contents into the file SLIBC.LST, and named the resulting new library SLIBC2.LIB. The Library Manager can also be run with a command line of the following form: LIB library [commands],[list],[newlibrary] For example, the following command line is equivalent to the preceding interactive session: C>LIB SLIBC +VIDEO,SLIBC.LST,SLIBC2; As with the other Microsoft utilities, a semicolon at the end of the command line causes LIB to use the default responses for any parameters that are omitted. Like LINK, LIB can also accept its commands from a response file. The contents of the file are lines of text that correspond exactly to the responses you would give LIB interactively. You specify the name of the response file in the command line with a leading @ character, as follows: LIB @filename LIB has only three switches: /I (/IGNORECASE), /N (/NOIGNORECASE), and /PAGESIZE:number. The /IGNORECASE switch is the default. The /NOIGNORECASE switch causes LIB to regard as distinct any symbols that differ only in the case of their component letters. You should place the /PAGESIZE switch, which defines the size of a unit of allocation space for a given library, immediately after the library filename. The library page size is in bytes and must be a power of 2 between 16 and 32,768 (16, 32, 64, and so forth); the default is 16 bytes. Because the index to a library is always a fixed number of pages, setting a larger page size allows you to store more object modules in that library; on the other hand, it will result in more wasted space within the file. The MAKE Utility The MAKE utility (MAKE.EXE) compares dates of files and carries out commands based on the result of that comparison. Because of this single, rather basic capability, MAKE can be used to maintain complex programs built from many modules. The dates of source, object, and executable files are simply compared in a logical sequence; the assembler, compiler, linker, and other programming tools are invoked as appropriate. The MAKE utility processes a plain ASCII text file called, as you might expect, a make file. You start the utility with a command-line entry in the following form: MAKE makefile [options] By convention, a make file has the same name as the executable file that is being maintained, but without an extension. The available MAKE switches are listed in Figure 4-7. A simple make file contains one or more dependency statements separated by blank lines. Each dependency statement can be followed by a list of MS-DOS commands, in the following form: targetfile : sourcefile ... command command . . . If the date and time of any source file are later than those of the target file, the accompanying list of commands is carried out. You may use comment lines, which begin with a # character, freely in a make file. MAKE can also process inference rules and macro definitions. For further details on these advanced capabilities, see the Microsoft or IBM documentation. Switch Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /D Display last modification date of each file as it is processed. /I Ignore exit (return) codes returned by commands and programs executed as a result of dependency statements. /N Display commands that would be executed as a result of dependency statements but do not execute those commands. /S Do not display commands as they are executed. /X Direct error messages from MAKE, or any program that MAKE runs, to the specified file. If filename is a hyphen (-), direct error messages to the standard output. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 4-7. Switches for the MAKE utility. A Complete Example Let's put together everything we've learned about using the MS-DOS programming tools so far. Figure 4-8 shows a sketch of the overall process of building an executable program. Assume that we have the source code for the HELLO.EXE program from Chapter 3 in the file HELLO.ASM. To assemble the source program into the relocatable object module HELLO.OBJ with symbolic debugging information included, also producing a program listing in the file HELLO.LST and a cross-reference data file HELLO.CRF, we would enter C>MASM /C /L /Zi /T HELLO; To convert the cross-reference raw-data file HELLO.CRF into a cross-reference listing in the file HELLO.REF, we would enter C>CREF HELLO,HELLO ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ MASM ³ ³ C or other ³ ³ source-code ³ ³ HLL source- ³ ³ file ³ ³ code file ³ ÀÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Compiler ÚÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Relocatable ³ ³ object-module ÃÄÄÄÄ¿ ³ file (.OBJ) ³ ³ ÀÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ LIB ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Object-module ³  LINK ³ Executable ³ ³ libraries ÃÄÄÄÄÄÄÄÄÄÄÄÄÄ program ³ ³ (.LIB) ³  ³ (.EXE) ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ EXE2BIN ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ HLL ³ ³ ³ Executable ³ ³ runtime ÃÄÄÄÄÄÄÙ ³ program ³ ³ libraries ³ ³ (.COM) ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 4-8. Creation of an MS-DOS application program, from source code to executable file. To convert the relocatable object file HELLO.OBJ into the executable file HELLO.EXE, creating a load map in the file HELLO.MAP and appending symbolic debugging information to the executable file, we would enter C>LINK /MAP /CODEVIEW HELLO; We could also automate the entire process just described by creating a make file named HELLO (with no extension) and including the following instructions: hello.obj : hello.asm masm /C /L /Zi /T hello; cref hello,hello hello.exe : hello.obj link /MAP /CODEVIEW hello; Then, when we have made some change to HELLO.ASM and want to rebuild the executable HELLO.EXE file, we need only enter C>MAKE HELLO Programming Resources and References The literature on IBM PCÄcompatible personal computers, the Intel 80x86 microprocessor family, and assembly-language and C programming is vast. The list below contains a selection of those books that I have found to be useful and reliable. The list should not be construed as an endorsement by Microsoft Corporation. MASM Tutorials Assembly Language Primer for the IBM PC and XT, by Robert Lafore. New American Library, New York, NY, 1984. ISBN 0-452-25711-5. 8086/8088/80286 Assembly Language, by Leo Scanlon. Brady Books, Simon and Schuster, New York, NY, 1988. ISBN 0-13-246919-7. C Tutorials Microsoft C Programming for the IBM, by Robert Lafore. Howard K. Sams & Co., Indianapolis, IN, 1987. ISBN 0-672-22515-8. Proficient C, by Augie Hansen. Microsoft Press, Redmond, WA, 1987. ISBN 1-55615-007-5. Intel 80x86 Microprocessor References iAPX 88 Book. Intel Corporation, Literature Department SV3-3, 3065 Bowers Ave., Santa Clara, CA 95051. Order no. 210200. iAPX 286 Programmer's Reference Manual. Intel Corporation, Literature Department SV3-3, 3065 Bowers Ave., Santa Clara, CA 95051. Order no. 210498. iAPX 386 Programmer's Reference Manual. Intel Corporation, Literature Department SV3-3, 3065 Bowers Ave., Santa Clara, CA 95051. Order no. 230985. PC, PC/AT, and PS/2 Architecture The IBM Personal Computer from the Inside Out (Revised Edition), by Murray Sargent and Richard L. Shoemaker. Addison-Wesley Publishing Company, Reading, MA, 1986. ISBN 0-201-06918-0. Programmer's Guide to PC & PS/2 Video Systems, by Richard Wilton. Microsoft Press, Redmond, WA, 1987. ISBN 1-55615-103-9. Personal Computer Technical Reference. IBM Corporation, IBM Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 6322507. Personal Computer AT Technical Reference. IBM Corporation, IBM Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 6280070. Options and Adapters Technical Reference. IBM Corporation, IBM Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 6322509. Personal System/2 Model 30 Technical Reference. IBM Corporation, IBM Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 68X2201. Personal System/2 Model 50/60 Technical Reference. IBM Corporation, IBM Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 68X2224. Personal System/2 Model 80 Technical Reference. IBM Corporation, IBM Technical Directory, P. O. Box 2009, Racine, WI 53404. Part no. 68X2256. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 5 Keyboard and Mouse Input The fundamental means of user input under MS-DOS is the keyboard. This follows naturally from the MS-DOS command-line interface, whose lineage can be traced directly to minicomputer operating systems with Teletype consoles. During the first few years of MS-DOS's existence, when 8088/8086-based machines were the norm, nearly every popular application program used key-driven menus and text-mode displays. However, as high-resolution graphics adapters (and 80286/80386-based machines with enough power to drive them) have become less expensive, programs that support windows and a graphical user interface have steadily grown more popular. Such programs typically rely on a pointing device such as a mouse, stylus, joystick, or light pen to let the user navigate in a "point-and-shoot" manner, reducing keyboard entry to a minimum. As a result, support for pointing devices has become an important consideration for all software developers. Keyboard Input Methods Applications running under MS-DOS on IBM PCÄcompatible machines can use several methods to obtain keyboard input: þ MS-DOS handle-oriented functions þ MS-DOS traditional character functions þ IBM ROM BIOS keyboard-driver functions These methods offer different degrees of flexibility, portability, and hardware independence. The handle, or stream-oriented, functions are philosophically derived from UNIX/XENIX and were first introduced in MS-DOS version 2.0. A program uses these functions by supplying a handle, or token, for the desired device, plus the address and length of a buffer. When a program begins executing, MS-DOS supplies it with predefined handles for certain commonly used character devices, including the keyboard: Handle Device name Opened to ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 Standard input (stdin) CON 1 Standard output (stdout) CON 2 Standard error (stderr) CON 3 Standard auxiliary (stdaux) AUX 4 Standard printer (stdprn) PRN ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ These handles can be used for read and write operations without further preliminaries. A program can also obtain a handle for a character device by explicitly opening the device for input or output using its logical name (as though it were a file). The handle functions support I/O redirection, allowing a program to take its input from another device or file instead of the keyboard, for example. Redirection is discussed in detail in Chapter 15. The traditional character-input functions are a superset of the character I/O functions that were present in CP/M. Originally included in MS-DOS simply to facilitate the porting of existing applications from CP/M, they are still widely used. In MS-DOS versions 2.0 and later, most of the traditional functions also support I/O redirection (although not as well as the handle functions do). Use of the IBM ROM BIOS keyboard functions presupposes that the program is running on an IBM PCÄcompatible machine. The ROM BIOS keyboard driver operates at a much more primitive level than the MS-DOS functions and allows a program to circumvent I/O redirection or MS-DOS's special handling of certain control characters. Programs that use the ROM BIOS keyboard driver are inherently less portable than those that use the MS-DOS functions and may interfere with the proper operation of other programs; many of the popular terminate-and-stay-resident (TSR) utilities fall into this category. Keyboard Input with Handles The principal MS-DOS function for keyboard input using handles is Int 21H Function 3FH (Read File or Device). The parameters for this function are a handle, the segment and offset of a buffer, and the length of the buffer. (For a more detailed explanation of this function, see Section II of this book, "MS-DOS Functions Reference.") As an example, let's use the predefined standard input handle (0) and Int 21H Function 3FH to read a line from the keyboard: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ buffer db 80 dup (?) ; keyboard input buffer . . . mov ah,3fh ; function 3fh = read file or device mov bx,0 ; handle for standard input mov cx,80 ; maximum bytes to read mov dx,seg buffer ; DS:DX = buffer address mov ds,dx mov dx,offset buffer int 21h ; transfer to MS-DOS jc error ; jump if error detected . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ When control returns from Int 21H Function 3FH, the carry flag is clear if the function was successful, and AX contains the number of characters read. If there was an error, the carry flag is set and AX contains an error code; however, this should never occur when reading the keyboard. The standard input is redirectable, so the code just shown is not a foolproof way of obtaining input from the keyboard. Depending upon whether a redirection parameter was included in the command line by the user, program input might be coming from the keyboard, a file, another character device, or even the bit bucket (NUL device). To bypass redirection and be absolutely certain where your input is coming from, you can ignore the predefined standard input handle and open the console as though it were a file, using the handle obtained from that open operation to perform your keyboard input, as in the following example: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ buffer db 80 dup (?) ; keyboard input buffer fname db 'CON',0 ; keyboard device name handle dw 0 ; keyboard device handle . . . mov ah,3dh ; function 3dh = open mov al,0 ; mode = read mov dx,seg fname ; DS:DX = device name mov ds,dx mov dx,offset fname int 21h ; transfer to MS-DOS jc error ; jump if open failed mov handle,ax ; save handle for CON . . . mov ah,3fh ; function 3fh = read file or device mov bx,handle ; BX = handle for CON mov cx,80 ; maximum bytes to read mov dx,offset buffer ; DS:DX = buffer address int 21h ; transfer to MS-DOS jc error ; jump if error detected . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ When a programmer uses Int 21H Function 3FH to read from the keyboard, the exact result depends on whether MS-DOS regards the handle to be in ASCII mode or binary mode (sometimes known as cooked mode and raw mode). ASCII mode is the default, although binary mode can be selected with Int 21H Function 44H (IOCTL) when necessary. In ASCII mode, MS-DOS initially places characters obtained from the keyboard in a 128-byte internal buffer, and the user can edit the input with the Backspace key and the special function keys. MS-DOS automatically echoes the characters to the standard output, expanding tab characters to spaces (although they are left as the ASCII code 09H in the buffer). The Ctrl-C, Ctrl-S, and Ctrl-P key combinations receive special handling, and the Enter key is translated to a carriage returnÄlinefeed pair. When the user presses Enter or Ctrl-Z, MS-DOS copies the requested number of characters (or the actual number of characters entered, if less than the number requested) out of the internal buffer into the calling program's buffer. In binary mode, MS-DOS never echoes input characters. It passes the Ctrl-C, Ctrl-S, Ctrl-P, and Ctrl-Z key combinations and the Enter key through to the application unchanged, and Int 21H Function 3FH does not return control to the application until the exact number of characters requested has been received. Ctrl-C checking is discussed in more detail at the end of this chapter. For now, simply note that the application programmer can substitute a custom handler for the default MS-DOS Ctrl-C handler and thereby avoid having the application program lose control of the machine when the user enters a Ctrl-C or Ctrl-Break. Keyboard Input with Traditional Calls The MS-DOS traditional keyboard functions offer a variety of character and line-oriented services with or without echo and Ctrl-C detection. These functions are summarized on the following page. Int 21H Function Action Ctrl-C checking ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 01H Keyboard input with echo Yes 06H Direct console I/O No 07H Keyboard input without echo No 08H Keyboard input without echo Yes 0AH Buffered keyboard input Yes 0BH Input-status check Yes 0CH Input-buffer reset and input Varies ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ In MS-DOS versions 2.0 and later, redirection of the standard input affects all these functions. In other words, they act as though they were special cases of an Int 21H Function 3FH call using the predefined standard input handle (0). The character-input functions (01H, 06H, 07H, and 08H) all return a character in the AL register. For example, the following sequence waits until a key is pressed and then returns it in AL: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ mov ah,1 ; function 01h = read keyboard int 21h ; transfer to MS-DOS ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The character-input functions differ in whether the input is echoed to the screen and whether they are sensitive to Ctrl-C interrupts. Although MS-DOS provides no pure keyboard-status function that is immune to Ctrl-C, a program can read keyboard status (somewhat circuitously) without interference by using Int 21H Function 06H. Extended keys, such as the IBM PC keyboard's special function keys, require two calls to a character-input function. As an alternative to single-character input, a program can use buffered-line input (Int 21H Function 0AH) to obtain an entire line from the keyboard in one operation. MS-DOS builds up buffered lines in an internal buffer and does not pass them to the calling program until the user presses the Enter key. While the line is being entered, all the usual editing keys are active and are handled by the MS-DOS keyboard driver. You use Int 21H Function 0AH as follows: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ buff db 81 ; maximum length of input db 0 ; actual length (from MS-DOS) db 81 dup (0) ; receives keyboard input . . . mov ah,0ah ; function 0ah = read buffered line mov dx,seg buff ; DS:DX = buffer address mov ds,dx mov dx,offset buff int 21h ; transfer to MS-DOS . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Int 21H Function 0AH differs from Int 21H Function 3FH in several important ways. First, the maximum length is passed in the first byte of the buffer, rather than in the CX register. Second, the actual length is returned in the second byte of the structure, rather than in the AX register. Finally, when the user has entered one less than the specified maximum number of characters, MS-DOS ignores all subsequent characters and sounds a warning beep until the Enter key is pressed. For detailed information about each of the traditional keyboard-input functions, see Section II of this book, "MS-DOS Functions Reference." Keyboard Input with ROM BIOS Functions Programmers writing applications for IBM PC compatibles can bypass the MS-DOS keyboard functions and choose from two hardware-dependent techniques for keyboard input. The first method is to call the ROM BIOS keyboard driver using Int 16H. For example, the following sequence reads a single character from the keyboard input buffer and returns it in the AL register: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ mov ah,0 ; function 0=read keyboard int 16h ; transfer to ROM BIOS ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Int 16H Function 00H also returns the keyboard scan code in the AH register, allowing the program to detect key codes that are not ordinarily returned by MS-DOS. Other Int 16H services return the keyboard status (that is, whether a character is waiting) or the keyboard shift state (from the ROM BIOS data area 0000:0417H). For a more detailed explanation of ROM BIOS keyboard functions, see Section III of this book, "IBM ROM BIOS and Mouse Functions Reference." You should consider carefully before building ROM BIOS dependence into an application. Although this technique allows you to bypass any I/O redirection that may be in effect, ways exist to do this without introducing dependence on the ROM BIOS. And there are real disadvantages to calling the ROM BIOS keyboard driver: þ It always bypasses I/O redirection, which sometimes may not be desirable. þ It is dependent on IBM PC compatibility and does not work correctly, unchanged, on some older machines such as the Hewlett-Packard TouchScreen or the Wang Professional Computer. þ It may introduce complicated interactions with TSR utilities. The other and more hardware-dependent method of keyboard input on an IBM PC is to write a new handler for ROM BIOS Int 09H and service the keyboard controller's interrupts directly. This involves translation of scan codes to ASCII characters and maintenance of the type-ahead buffer. In ordinary PC applications, there is no reason to take over keyboard I/O at this level; therefore, I will not discuss this method further here. If you are curious about the techniques that would be required, the best reference is the listing for the ROM BIOS Int 09H handler in the IBM PC or PC/AT technical reference manual. Ctrl-C and Ctrl-Break Handlers In the discussion of keyboard input with the MS-DOS handle and traditional functions, I made some passing references to the fact that Ctrl-C entries can interfere with the expected behavior of those functions. Let's look at this subject in more detail now. During most character I/O operations, MS-DOS checks for a Ctrl-C (ASCII code 03H) waiting at the keyboard and executes an Int 23H if one is detected. If the system break flag is on, MS-DOS also checks for a Ctrl-C entry during certain other operations (such as file reads and writes). Ordinarily, the Int 23H vector points to a routine that simply terminates the currently active process and returns control to the parent processÄÄ usually the MS-DOS command interpreter. In other words, if your program is executing and you enter a Ctrl-C, accidentally or intentionally, MS-DOS simply aborts the program. Any files the program has opened using file control blocks will not be closed properly, any interrupt vectors it has altered may not be restored correctly, and if it is performing any direct I/O operations (for example, if it contains an interrupt driver for the serial port), all kinds of unexpected events may occur. Although you can use a number of partially effective methods to defeat Ctrl-C checking, such as performing keyboard input with Int 21H Functions 06H and 07H, placing all character devices into binary mode, or turning off the system break flag with Int 21H Function 33H, none of these is completely foolproof. The simplest and most elegant way to defeat Ctrl-C checking is simply to substitute your own Int 23H handler, which can take some action appropriate to your program. When the program terminates, MS-DOS automatically restores the previous contents of the Int 23H vector from information saved in the program segment prefix. The following example shows how to install your own Ctrl-C handler (which in this case does nothing at all): ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ push ds ; save data segment ; set int 23h vector... mov ax,2523h ; function 25h = set interrupt ; int 23h = vector for ; Ctrl-C handler mov dx,seg handler ; DS:DX = handler address mov ds,dx mov dx,offset handler int 21h ; transfer to MS-DOS pop ds ; restore data segment . . . handler: ; a Ctrl-C handler iret ; that does nothing ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The first part of the code (which alters the contents of the Int 23H vector) would be executed in the initialization part of the application. The handler receives control whenever MS-DOS detects a Ctrl-C at the keyboard. (Because this handler consists only of an interrupt return, the Ctrl-C will remain in the keyboard input stream and will be passed to the application when it requests a character from the keyboard, appearing on the screen as ^C.) When an Int 23H handler is called, MS-DOS is in a stable state. Thus, the handler can call any MS-DOS function. It can also reset the segment registers and the stack pointer and transfer control to some other point in the application without ever returning control to MS-DOS with an IRET. On IBM PC compatibles, an additional interrupt handler must be taken into consideration. Whenever the ROM BIOS keyboard driver detects the key combination Ctrl-Break, it calls a handler whose address is stored in the vector for Int 1BH. The default ROM BIOS Int 1BH handler does nothing. MS-DOS alters the Int 1BH vector to point to its own handler, which sets a flag and returns; the net effect is to remap the Ctrl-Break into a Ctrl-C that is forced ahead of any other characters waiting in the keyboard buffer. Taking over the Int 1BH vector in an application is somewhat tricky but extremely useful. Because the keyboard is interrupt driven, a press of Ctrl-Break lets the application regain control under almost any circumstanceÄÄoften, even if the program has crashed or is in an endless loop. You cannot, in general, use the same handler for Int 1BH that you use for Int 23H. The Int 1BH handler is more limited in what it can do, because it has been called as a result of a hardware interrupt and MS-DOS may have been executing a critical section of code at the time the interrupt was serviced. Thus, all registers except CS:IP are in an unknown state; they may have to be saved and then modified before your interrupt handler can execute. Similarly, the depth of the stack in use when the Int 1BH handler is called is unknown, and if the handler is to perform stack-intensive operations, it may have to save the stack segment and the stack pointer and switch to a new stack that is known to have sufficient depth. In normal application programs, you should probably avoid retaining control in an Int 1BH handler, rather than performing an IRET. Because of subtle differences among non-IBM ROM BIOSes, it is difficult to predict the state of the keyboard controller and the 8259 Programmable Interrupt Controller (PIC) when the Int 1BH handler begins executing. Also, MS-DOS itself may not be in a stable state at the point of interrupt, a situation that can manifest itself in unexpected critical errors during subsequent I/O operations. Finally, MS-DOS versions 3.2 and later allocate a stack from an internal pool for use by the Int 09H handler. If the Int 1BH handler never returns, the Int 09H handler never returns either, and repeated entries of Ctrl-Break will eventually exhaust the stack pool, halting the system. Because Int 1BH is a ROM BIOS interrupt and not an MS-DOS interrupt, MS-DOS does not restore the previous contents of the Int 1BH vector when a program exits. If your program modifies this vector, it must save the original value and restore it before terminating. Otherwise, the vector will be left pointing to some random area in the next program that runs, and the next time the user presses Ctrl-Break a system crash is the best you can hope for. Ctrl-C and Ctrl-Break Handlers and High-Level Languages Capturing the Ctrl-C and Ctrl-Break interrupts is straightforward when you are programming in assembly language. The process is only slightly more difficult with high-level languages, as long as you have enough information about the language's calling conventions that you can link in a small assembly-language routine as part of the program. The BREAK.ASM listing in Figure 5-1 contains source code for a Ctrl-Break handler that can be linked with small-model Microsoft C programs running on an IBM PC compatible. The short C program in Figure 5-2 demonstrates use of the handler. (This code should be readily portable to other C compilers.) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ page 55,132 title Ctrl-C & Ctrl-Break Handlers name break ; ; Ctrl-C and Ctrl-Break handler for Microsoft C ; programs running on IBM PC compatibles ; ; by Ray Duncan ; ; Assemble with: C>MASM /Mx BREAK; ; ; This module allows C programs to retain control ; when the user enters a Ctrl-Break or Ctrl-C. ; It uses Microsoft C parameter-passing conventions ; and assumes the C small memory model. ; ; The procedure _capture is called to install ; a new handler for the Ctrl-C and Ctrl-Break ; interrupts (1bh and 23h). _capture is passed ; the address of a static variable, which will be ; set to true by the handler whenever a Ctrl-C ; or Ctrl-Break is detected. The C syntax is: ; ; static int flag; ; capture(&flag); ; ; The procedure _release is called by the C program ; to restore the original Ctrl-Break and Ctrl-C ; handler. The C syntax is: ; release(); ; ; The procedure ctrlbrk is the actual interrupt ; handler. It receives control when a software ; int 1bh is executed by the ROM BIOS or int 23h ; is executed by MS-DOS. It simply sets the C ; program's variable to true (1) and returns. ; args equ 4 ; stack offset of arguments, ; C small memory model cr equ 0dh ; ASCII carriage return lf equ 0ah ; ASCII linefeed _TEXT segment word public 'CODE' assume cs:_TEXT public _capture _capture proc near ; take over Ctrl-Break ; and Ctrl-C interrupt vectors push bp ; set up stack frame mov bp,sp push ds ; save registers push di push si ; save address of ; calling program's "flag" mov ax,word ptr [bp+args] mov word ptr cs:flag,ax mov word ptr cs:flag+2,ds ; save address of original mov ax,3523h ; int 23h handler int 21h mov word ptr cs:int23,bx mov word ptr cs:int23+2,es mov ax,351bh ; save address of original int 21h ; int 1bh handler mov word ptr cs:int1b,bx mov word ptr cs:int1b+2,es push cs ; set DS:DX = address pop ds ; of new handler mov dx,offset _TEXT:ctrlbrk mov ax,02523h ; set int 23h vector int 21h mov ax,0251bh ; set int 1bh vector int 21h pop si ; restore registers pop di pop ds pop bp ; discard stack frame ret ; and return to caller _capture endp public _release _release proc near ; restore original Ctrl-C ; and Ctrl-Break handlers push bp ; save registers push ds push di push si lds dx,cs:int1b ; get address of previous ; int 1bh handler mov ax,251bh ; set int 1bh vector int 21h lds dx,cs:int23 ; get address of previous ; int 23h handler mov ax,2523h ; set int 23h vector int 21h pop si ; restore registers pop di ; and return to caller pop ds pop bp ret release endp ctrlbrk proc far ; Ctrl-C and Ctrl-Break ; interrupt handler push bx ; save registers push ds lds bx,cs:flag ; get address of C program's ; "flag variable" ; and set the flag "true" mov word ptr ds:[bx],1 pop ds ; restore registers pop bx iret ; return from handler ctrlbrk endp flag dd 0 ; far pointer to caller's ; Ctrl-Break or Ctrl-C flag int23 dd 0 ; address of original ; Ctrl-C handler int1b dd 0 ; address of original ; Ctrl-Break handler _TEXT ends end ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 5-1. BREAK.ASM: A Ctrl-C and Ctrl-Break interrupt handler that can be linked with Microsoft C programs. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /* TRYBREAK.C Demo of BREAK.ASM Ctrl-Break and Ctrl-C interrupt handler, by Ray Duncan To create the executable file TRYBREAK.EXE, enter: MASM /Mx BREAK; CL TRYBREAK.C BREAK.OBJ */ #include main(int argc, char *argv[]) { int hit = 0; /* flag for key press */ int c = 0; /* character from keyboard */ static int flag = 0; /* true if Ctrl-Break or Ctrl-C detected */ puts("\n*** TRYBREAK.C running ***\n"); puts("Press Ctrl-C or Ctrl-Break to test handler,"); puts("Press the Esc key to exit TRYBREAK.\n"); capture(&flag); /* install new Ctrl-C and Ctrl-Break handler and pass address of flag */ puts("TRYBREAK has captured interrupt vectors.\n"); while(1) { hit = kbhit(); /* check for key press */ /* (MS-DOS sees Ctrl-C when keyboard polled) */ if(flag != 0) /* if flag is true, an */ { /* interrupt has occurred */ puts("\nControl-Break detected.\n"); flag = 0; /* reset interrupt flag */ } if(hit != 0) /* if any key waiting */ { c = getch(); /* read key, exit if Esc */ if( (c & 0x7f) == 0x1b) break; putch(c); /* otherwise display it */ } } release(); /* restore original Ctrl-C and Ctrl-Break handlers */ puts("\n\nTRYBREAK has released interrupt vectors."); } ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 5-2. TRYBREAK.C: A simple Microsoft C program that demonstrates use of the interrupt handler BREAK.ASM from Figure 5-1. In the example handler, the procedure named capture is called with the address of an integer variable within the C program. It saves the address of the variable, points the Int 1BH and Int 23H vectors to its own interrupt handler, and then returns. When MS-DOS detects a Ctrl-C or Ctrl-Break, the interrupt handler sets the integer variable within the C program to true (1) and returns. The C program can then poll this variable at its leisure. Of course, to detect more than one Ctrl-C, the program must reset the variable to zero again. The procedure named release simply restores the Int 1BH and Int 23H vectors to their original values, thereby disabling the interrupt handler. Although it is not strictly necessary for release to do anything about Int 23H, this action does give the C program the option of restoring the default handler for Int 23H without terminating. Pointing Devices Device drivers for pointing devices are supplied by the hardware manufacturer and are loaded with a DEVICE statement in the CONFIG.SYS file. Although the hardware characteristics of the available pointing devices differ greatly, nearly all of their drivers present the same software interface to application programs: the Int 33H protocol used by the Microsoft Mouse driver. Version 6 of the Microsoft Mouse driver (which was current as this was written) offers the following functions: Function Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00H Reset mouse and get status. 01H Show mouse pointer. 02H Hide mouse pointer. 03H Get button status and pointer position. 04H Set pointer position. 05H Get button-press information. 06H Get button-release information. 07H Set horizontal limits for pointer. 08H Set vertical limits for pointer. 09H Set graphics pointer type. 0AH Set text pointer type. 0BH Read mouse-motion counters. 0CH Install interrupt handler for mouse events. 0DH Turn on light pen emulation. 0EH Turn off light pen emulation. 0FH Set mickeys to pixel ratio. 10H Set pointer exclusion area. 13H Set double-speed threshold. 14H Swap mouse-event interrupt routines. 15H Get buffer size for mouse-driver state. 16H Save mouse-driver state. 17H Restore mouse-driver state. 18H Install alternate handler for mouse events. 19H Get address of alternate handler. 1AH Set mouse sensitivity. 1BH Get mouse sensitivity. 1CH Set mouse interrupt rate. 1DH Select display page for pointer. 1EH Get display page for pointer. 1FH Disable mouse driver. 20H Enable mouse driver. 21H Reset mouse driver. 22H Set language for mouse-driver messages. 23H Get language number. 24H Get driver version, mouse type, and IRQ number. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Although this list of mouse functions may appear intimidating, the average application will only need a few of them. A program first calls Int 33H Function 00H to initialize the mouse driver for the current display mode and to check its status. At this point, the mouse is "alive" and the application can obtain its state and position; however, the pointer does not become visible until the process calls Int 33H Function 01H. The program can then call Int 33H Functions 03H, 05H, and 06H to monitor the mouse position and the status of the mouse buttons. Alternatively, the program can register an interrupt handler for mouse events, using Int 33H Function 0CH. This latter technique eliminates the need to poll the mouse driver; the driver will notify the program by calling the interrupt handler whenever the mouse is moved or a button is pressed or released. When the application is finished with the mouse, it can call Int 33H Function 02H to hide the mouse pointer. If the program has registered an interrupt handler for mouse events, it should disable further calls to the handler by resetting the mouse driver again with Int 33H Function 00H. For a complete description of the mouse-driver functions, see Section III of this book, "IBM ROM BIOS and Mouse Functions Reference." Figure 5-3 shows a small demonstration program that polls the mouse continually, to display its position and status. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /* Simple Demo of Int 33H Mouse Driver (C) 1988 Ray Duncan Compile with: CL MOUDEMO.C */ #include #include union REGS regs; void cls(void); /* function prototypes */ void gotoxy(int, int); main(int argc, char *argv[]) { int x,y,buttons; /* some scratch variables */ /* for the mouse state */ regs.x.ax = 0; /* reset mouse driver */ int86(0x33, ®s, ®s); /* and check status */ if(regs.x.ax == 0) /* exit if no mouse */ { printf("\nMouse not available\n"); exit(1); } cls(); /* clear the screen */ gotoxy(45,0); /* and show help info */ puts("Press Both Mouse Buttons To Exit"); regs.x.ax = 1; /* display mouse cursor */ int86(0x33, ®s, ®s); do { regs.x.ax = 3; /* get mouse position */ int86(0x33, ®s, ®s); /* and button status */ buttons = regs.x.bx & 3; x = regs.x.cx; y = regs.x.dx; gotoxy(0,0); /* display mouse position */ printf("X = %3d Y = %3d", x, y); } while(buttons != 3); /* exit if both buttons down */ regs.x.ax = 2; /* hide mouse cursor */ int86(0x33, ®s, ®s); cls(); /* display message and exit */ gotoxy(0,0); puts("Have a Mice Day!"); } /* Clear the screen */ void cls(void) { regs.x.ax = 0x0600; /* ROM BIOS video driver */ regs.h.bh = 7; /* int 10h function 06h */ regs.x.cx = 0; /* initializes a window */ regs.h.dh = 24; regs.h.dl = 79; int86(0x10, ®s, ®s); } /* Position cursor to (x,y) */ void gotoxy(int x, int y) { regs.h.dl = x; /* ROM BIOS video driver */ regs.h.dh = y; /* int 10h function 02h */ regs.h.bh = 0; /* positions the cursor */ regs.h.ah = 2; int86(0x10, ®s, ®s); } ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 5-3. MOUDEMO.C: A simple Microsoft C program that polls the mouse and continually displays the coordinates of the mouse pointer in the upper left corner of the screen. The program uses the ROM BIOS video driver, which is discussed in Chapter 6, to clear the screen and position the text cursor. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 6 Video Display The visual presentation of an application program is one of its most important elements. Users frequently base their conclusions about a program's performance and "polish" on the speed and attractiveness of its displays. Therefore, a feel for the computer system's display facilities and capabilities at all levels, from MS-DOS down to the bare hardware, is important to you as a programmer. Video Display Adapters The video display adapters found in IBM PCÄcompatible computers have a hybrid interface to the central processor. The overall display characteristics, such as vertical and horizontal resolution, background color, and palette, are controlled by values written to I/O ports whose addresses are hardwired on the adapter, whereas the appearance of each individual character or graphics pixel on the display is controlled by a specific location within an area of memory called the regen buffer or refresh buffer. Both the CPU and the video controller access this memory; the software updates the display by simply writing character codes or bit patterns directly into the regen buffer. (This is called memory-mapped I/O.) The following adapters are in common use as this book is being written: þ Monochrome/Printer Display Adapter (MDA). Introduced with the original IBM PC in 1981, this adapter supports 80-by-25 text display on a green (monochrome) screen and has no graphics capabilities at all. þ Color/Graphics Adapter (CGA). Also introduced by IBM in 1981, this adapter supports 40-by-25 and 80-by-25 text modes and 320-by-200, 4-color or 640-by-200, 2-color graphics (all-points-addressable, or APA) modes on composite or digital RGB monitors. þ Enhanced Graphics Adapter (EGA). Introduced by IBM in 1985 and upwardly compatible from the CGA, this adapter adds support for 640-by-350, 16-color graphics modes on digital RGB monitors. It also supports an MDA-compatible text mode. þ Multi-Color Graphics Array (MCGA). Introduced by IBM in 1987 with the Personal System/2 (PS/2) models 25 and 30, this adapter is partially compatible with the CGA and EGA and supports 640-by-480, 2-color or 320-by-200, 256-color graphics on analog RGB monitors. þ Video Graphics Array (VGA). Introduced by IBM in 1987 with the PS/2 models 50, 60, and 80, this adapter is upwardly compatible from the EGA and supports 640-by-480, 16-color or 320-by-200, 256-color graphics on analog RGB monitors. It also supports an MDA-compatible text mode. þ Hercules Graphics Card, Graphics CardPlus, and InColor Cards. These are upwardly compatible from the MDA for text display but offer graphics capabilities that are incompatible with all of the IBM adapters. The locations of the regen buffers for the various IBM PCÄcompatible adapters are shown in Figure 6-1. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ROM BIOS ³ FE000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ System ROM, Stand-alone BASIC, etc. ³ F4000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved for BIOS extensions ³ ³ (hard-disk controller, etc.) ³ C0000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ BC000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 16 KB regen buffer for CGA, EGA, MCGA, and VGA ³ ³ in text modes and 200-line graphics modes ³ B8000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ B1000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ 4 KB Monochrome Adapter regen buffer ³ B0000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Regen buffer area for EGA, MCGA, and VGA ³ ³ in 350-line or 480-line graphics modes ³ A0000H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Transient part of COMMAND.COM ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Transient program area ³ varies ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ MS-DOS and its buffers, ³ ³ tables, and device drivers ³ 00400H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Interrupt vectors ³ 00000H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 6-1. Memory diagram of an IBM PCÄcompatible personal computer, showing the locations of the regen buffers for various adapters. Support Considerations MS-DOS offers several functions to transfer text to the display. Version 1 supported only Teletype-like output capabilities; version 2 added an optional ANSI console driver to allow the programmer to clear the screen, position the cursor, and select colors and attributes with standard escape sequences embedded in the output. Programs that use only the MS-DOS functions will operate properly on any computer system that runs MS-DOS, regardless of the level of IBM hardware compatibility. On IBM PCÄcompatible machines, the ROM BIOS contains a video driver that programs can invoke directly, bypassing MS-DOS. The ROM BIOS functions allow a program to write text or individual pixels to the screen or to select display modes, video pages, palette, and foreground and background colors. These functions are relatively efficient (compared with the MS-DOS functions, at least), although the graphics support is primitive. Unfortunately, the display functions of both MS-DOS and the ROM BIOS were designed around the model of a cursor-addressable terminal and therefore do not fully exploit the capabilities of the memory-mapped, high-bandwidth display adapters used on IBM PCÄcompatible machines. As a result, nearly every popular interactive application with full-screen displays or graphics capability ignores both MS-DOS and the ROM BIOS and writes directly to the video controller's registers and regen buffer. Programs that control the hardware directly are sometimes called "ill-behaved," because they are performing operations that are normally reserved for operating-system device drivers. These programs are a severe management problem in multitasking real-mode environments such as DesqView and Microsoft Windows, and they are the main reason why such environments are not used more widely. It could be argued, however, that the blame for such problematic behavior lies not with the application programs but with the failure of MS-DOS and the ROM BIOSÄÄeven six years after the first appearance of the IBM PCÄÄto provide display functions of adequate range and power. MS-DOS Display Functions Under MS-DOS versions 2.0 and later, the preferred method for sending text to the display is to use handle-based Int 21H Function 40H (Write File or Device). When an application program receives control, MS-DOS has already assigned it handles for the standard output (1) and standard error (2) devices, and these handles can be used immediately. For example, the sequence at the top of the following page writes the message hello to the display using the standard output handle. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' ; message to display msg_len equ $-msg ; length of message . . . mov ah,40h ; function 40h = write file or device mov bx,1 ; BX = standard output handle mov cx,msg_len ; CX = message length mov dx,seg msg ; DS:DX = address of message mov ds,dx mov dx,offset msg int 21h ; transfer to MS-DOS jc error ; jump if error detected . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ If there is no error, the function returns the carry flag cleared and the number of characters actually transferred in register AX. Unless a Ctrl-Z is embedded in the text or the standard output is redirected to a disk file and the disk is full, this number should equal the number of characters requested. As in the case of keyboard input, the user's ability to specify command-line redirection parameters that are invisible to the application means that if you use the predefined standard output handle, you can't always be sure where your output is going. However, to ensure that your output actually goes to the display, you can use the predefined standard error handle, which is always opened to the CON (logical console) device and is not redirectable. As an alternative to the standard output and standard error handles, you can bypass any output redirection and open a separate channel to CON, using the handle obtained from that open operation for character output. For example, the following code opens the console display for output and then writes the string hello to it: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ fname db 'CON',0 ; name of CON device handle dw 0 ; handle for CON device msg db 'hello' ; message to display msg_len equ $-msg ; length of message . . . mov ax,3d02h ; AH = function 3dh = open ; AL = mode = read/write mov dx,seg fname ; DS:DX = device name mov ds,dx mov dx,offset fname int 21h ; transfer to MS-DOS jc error ; jump if open failed mov handle,ax ; save handle for CON . . . mov ah,40h ; function 40h = write mov cx,msg_len ; CX = message length mov dx,seg msg ; DS:DX = address of message mov ds,dx mov dx,offset msg mov bx,handle ; BX = CON device handle int 21h ; transfer to MS-DOS jc error ; jump if error detected . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ As with the keyboard input functions, MS-DOS also supports traditional display functions that are upwardly compatible from the corresponding CP/M output calls: þ Int 21H Function 02H sends the character in the DL register to the standard output device. It is sensitive to Ctrl-C interrupts, and it handles carriage returns, linefeeds, bell codes, and backspaces appropriately. þ Int 21H Function 06H transfers the character in the DL register to the standard output device, but it is not sensitive to Ctrl-C interrupts. You must take care when using this function, because it can also be used for input and for status requests. þ Int 21H Function 09H sends a string to the standard output device. The string is terminated by the $ character. With MS-DOS version 2 or later, these three traditional functions are converted internally to handle-based writes to the standard output and thus are susceptible to output redirection. The sequence at the top of the following page sounds a warning beep by sending an ASCII bell code (07H) to the display driver using the traditional character-output call Int 21H Function 02H. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . mov dl,7 ; 07h = ASCII bell code mov ah,2 ; function 02h = display character int 21h ; transfer to MS-DOS . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The following sequence uses the traditional string-output call Int 21H Function 09H to display a string: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello$' . . . mov dx,seg msg ; DS:DX = message address mov ds,dx mov dx,offset msg mov ah,9 ; function 09h = write string int 21h ; transfer to MS-DOS . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Note that MS-DOS detects the $ character as a terminator and does not display it on the screen. Screen Control with MS-DOS Functions With version 2.0 or later, if MS-DOS loads the optional device driver ANSI.SYS in response to a DEVICE directive in the CONFIG.SYS file, programs can clear the screen, control the cursor position, and select foreground and background colors by embedding escape sequences in the text output. Escape sequences are so called because they begin with an escape character (1BH), which alerts the driver to intercept and interpret the subsequent characters in the sequence. When the ANSI driver is not loaded, MS-DOS simply passes the escape sequence to the display like any other text, usually resulting in a chaotic screen. The escape sequences that can be used with the ANSI driver for screen control are a subset of those defined in the ANSI 3.64Ä1979 Standard. These standard sequences are summarized in Figure 6-2. Note that case is significant for the last character in an escape sequence and that numbers must always be represented as ASCII digit strings, not as their binary values. (A separate set of escape sequences supported by ANSI.SYS, but not compatible with the ANSI standard, may be used for reprogramming and remapping the keyboard.) Escape sequence Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Esc[2J Clear screen; place cursor in upper left corner (home position). Esc[K Clear from cursor to end of line. Esc[row;colH Position cursor. (Row is the y coordinate in the range 1Ä25 and col is the x coordinate in the range 1Ä80 for 80-by-25 text display modes.) Escape sequences terminated with the letter f instead of H have the same effect. Esc[nA Move cursor up n rows. Esc[nB Move cursor down n rows. Esc[nC Move cursor right n columns. Esc[nD Move cursor left n columns. Esc[s Save current cursor position. Esc[u Restore cursor to saved position. Esc[6n Return current cursor position on the standard input handle in the format Esc[row;colR. Esc[nm Select character attributes: 0 = no special attributes 1 = high intensity 2 = low intensity 3 = italic 4 = underline 5 = blink 6 = rapid blink 7 = reverse video 8 = concealed text (no display) 30 = foreground black 31 = foreground red 32 = foreground green 33 = foreground yellow 34 = foreground blue 35 = foreground magenta 36 = foreground cyan 37 = foreground white 40 = background black 41 = background red 42 = background green 43 = background yellow 44 = background blue 45 = background magenta 46 = background cyan 47 = background white Esc[=nh Select display mode: 0 = 40-by-25, 16-color text (color burst off) 1 = 40-by-25, 16-color text 2 = 80-by-25, 16-color text (color burst off) 3 = 80-by-25, 16-color text 4 = 320-by-200, 4-color graphics 5 = 320-by-200, 4-color graphics (color burst off) 6 = 620-by-200, 2-color graphics 14 = 640-by-200, 16-color graphics (EGA and VGA, MS-DOS 4.0) 15 = 640-by-350, 2-color graphics (EGA and VGA, MS-DOS 4.0) 16 = 640-by-350, 16-color graphics (EGA and VGA, MS-DOS 4.0) 17 = 640-by-480, 2-color graphics (MCGA and VGA, MS-DOS 4.0) 18 = 640-by-480, 16-color graphics (VGA, MS-DOS 4.0) 19 = 320-by-200, 256-color graphics (MCGA and VGA, MS-DOS 4.0) Escape sequences terminated with l instead of h have the same effect. Esc[=7h Enable line wrap. Esc[=7l Disable line wrap. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 6-2. The ANSI escape sequences supported by the MS-DOS ANSI.SYS driver. Programs running under MS-DOS 2.0 or later may use these functions, if ANSI.SYS is loaded, to control the appearance of the display in a hardware-independent manner. The symbol Esc indicates an ASCII escape codeÄÄa character with the value 1BH. Note that cursor positions in ANSI escape sequences are one-based, unlike the cursor coordinates used by the IBM ROM BIOS, which are zero-based. Numbers embedded in an escape sequence must always be represented as a string of ASCII digits, not as their binary values. Binary Output Mode Under MS-DOS version 2 or later, you can substantially increase display speeds for well-behaved application programs without sacrificing hardware independence by selecting binary (raw) mode for the standard output. In binary mode, MS-DOS does not check between each character it transfers to the output device for a Ctrl-C waiting at the keyboard, nor does it filter the output string for certain characters such as Ctrl-Z. Bit 5 in the device information word associated with a device handle controls binary mode. Programs access the device information word by using Subfunctions 00H and 01H of the MS-DOS IOCTL function (I/O Control, Int 21H Function 44H). For example, the sequence on the following page places the standard output handle into binary mode. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ; get device information... mov bx,1 ; standard output handle mov ax,4400h ; function 44h subfunction 00h int 21h ; transfer to MS-DOS mov dh,0 ; set upper byte of DX = 0 or dl,20h ; set binary mode bit in DL ; write device information... ; (BX still has handle) mov ax,4401h ; function 44h subfunction 01h int 21h ; transfer to MS-DOS ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Note that if a program changes the mode of any of the standard handles, it should restore those handles to ASCII (cooked) mode before it exits. Otherwise, subsequent application programs may behave in unexpected ways. For more detailed information on the IOCTL function, see Section II of this book, "MS-DOS Functions Reference." The ROM BIOS Display Functions You can somewhat improve the display performance of programs that are intended for use only on IBM PCÄcompatible machines by using the ROM BIOS video driver instead of the MS-DOS output functions. Accessed by means of Int 10H, the ROM BIOS driver supports the following functions for all of the currently available IBM display adapters: Function Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Display mode control 00H Set display mode. 0FH Get display mode. Cursor control 01H Set cursor size. 02H Set cursor position. 03H Get cursor position and size. Writing to the display 09H Write character and attribute at cursor. 0AH Write character-only at cursor. 0EH Write character in teletype mode. Reading from the display 08H Read character and attribute at cursor. Graphics support 0CH Write pixel. 0DH Read pixel. Scroll or clear display 06H Scroll up or initialize window. 07H Scroll down or initialize window. Miscellaneous 04H Read light pen. 05H Select display page. 0BH Select palette/set border color. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Additional ROM BIOS functions are available on the EGA, MCGA, VGA, and PCjr to support the enhanced features of these adapters, such as programmable palettes and character sets (fonts). Some of the functions are valid only in certain display modes. Each display mode is characterized by the number of colors it can display, its vertical resolution, its horizontal resolution, and whether it supports text or graphics memory mapping. The ROM BIOS identifies it with a unique number. Section III of this book, "IBM ROM BIOS and Mouse Functions Reference," documents all of the ROM BIOS Int 10H functions and display modes. As you can see from the preceding list, the ROM BIOS offers several desirable capabilities that are not available from MS-DOS, including initialization or scrolling of selected screen windows, modification of the cursor shape, and reading back the character being displayed at an arbitrary screen location. These functions can be used to isolate your program from the hardware on any IBM PCÄcompatible adapter. However, the ROM BIOS functions do not suffice for the needs of a high-performance, interactive, full-screen program such as a word processor. They do not support the rapid display of character strings at an arbitrary screen position, and they do not implement graphics operations at the level normally required by applications (for example, bit-block transfers and rapid drawing of lines, circles, and filled polygons). And, of course, they are of no use whatsoever in non-IBM display modes such as the monochrome graphics mode of the Hercules Graphics Card. Let's look at a simple example of a call to the ROM BIOS video driver. The following sequence writes the string hello to the screen: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' msg_len equ $-msg . . . mov si,seg msg ; DS:SI = message address mov ds,si mov si,offset msg mov cx,msg_len ; CX = message length cld next: lodsb ; get AL = next character push si ; save message pointer mov ah,0eh ; int 10h function 0eh = write ; character in teletype mode mov bh,0 ; assume video page 0 mov bl,color ; (use in graphics modes only) int 10h ; transfer to ROM BIOS pop si ; restore message pointer loop next ; loop until message done . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ (Note that the SI and DI registers are not necessarily preserved across a call to a ROM BIOS video function.) Memory-mapped Display Techniques Display performance is best when an application program takes over complete control of the video adapter and the refresh buffer. Because the display is memory-mapped, the speed at which characters can be put on the screen is limited only by the CPU's ability to copy bytes from one location in memory to another. The trade-off for this performance is that such programs are highly sensitive to hardware compatibility and do not always function properly on "clones" or even on new models of IBM video adapters. Text Mode Direct programming of the IBM PCÄcompatible video adapters in their text display modes (sometimes also called alphanumeric display modes) is straightforward. The character set is the same for all, and the cursor home positionÄÄ(x,y) = (0,0)ÄÄis defined to be the upper left corner of the screen (Figure 6-3). The MDA uses 4 KB of memory starting at segment B000H as a regen buffer, and the various adapters with both text and graphics capabilities (CGA, EGA, MCGA, and VGA) use 16 KB of memory starting at segment B800H. (See Figure 6-1.) In the latter case, the 16 KB is divided into "pages" that can be independently updated and displayed. (0,0)ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿(79,0) ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ (0,24)ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ(79,24) Figure 6-3. Cursor addressing for 80-by-25 text display modes (IBM ROM BIOS modes 2, 3, and 7). Each character-display position is allotted 2 bytes in the regen buffer. The first byte (even address) contains the ASCII code of the character, which is translated by a special hardware character generator into a dot-matrix pattern for the screen. The second byte (odd address) is the attribute byte. Several bit fields in this byte control such features as blinking, intensity (highlighting), and reverse video, depending on the adapter type and display mode (Figures 6-4 and 6-5). Figure 6-6 shows a hex and ASCII dump of part of the video map for the MDA. Display Background Foreground ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ No display (black) 000 000 No display (white) VGA only 111 111 Underline 000 001 Normal video 000 111 Reverse video 111 000 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 6-4. Attribute byte for 80-by-25 monochrome text display mode on the MDA, Hercules cards, EGA, and VGA (IBM ROM BIOS mode 7). Value Color ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 Black 1 Blue 2 Green 3 Cyan 4 Red 5 Magenta 6 Brown 7 White 8 Gray 9 Light blue 10 Light green 11 Light cyan 12 Light red 13 Light magenta 14 Yellow 15 Intense white ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 6-5. Attribute byte for the 40-by-25 and 80-by-25 text display modes on the CGA, EGA, MCGA, and VGA (IBM ROM BIOS modes 0Ä3). The table of color values assumes default palette programming and that the B or I bit controls intensity. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ B000:0000 3e 07 73 07 65 07 6c 07 65 07 63 07 74 07 20 07 B000:0010 74 07 65 07 6d 07 70 07 20 07 20 07 20 07 20 07 B000:0020 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0030 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0040 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0050 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0060 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0070 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0080 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 B000:0090 20 07 20 07 20 07 20 07 20 07 20 07 20 07 20 07 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 6-6. Example dump of the first 160 bytes of the MDA's regen buffer. These bytes correspond to the first visible line on the screen. Note that ASCII character codes are stored in even bytes and their respective character attributes in odd bytes; all the characters in this example line have the attribute normal video. You can calculate the memory offset of any character on the display as the line number (y coordinate) times 80 characters per line times 2 bytes per character, plus the column number (x coordinate) times 2 bytes per character, plus (for the text/graphics adapters) the page number times the size of the page (4 KB per page in 80-by-25 modes; 2 KB per page in 40-by-25 modes). In short, the formula for the offset of the character-attribute pair for a given screen position (x,y) in 80-by-25 text modes is offset = ((y * 50H + x) * 2) + (page * 1000H) In 40-by-25 text modes, the formula is offset = ((y * 50H + x) * 2) + (page * 0800H) Of course, the segment register being used to address the video buffer must be set appropriately, depending on the type of display adapter. As a simple example, assume that the character to be displayed is in the AL register, the desired attribute byte for the character is in the AH register, the x coordinate (column) is in the BX register, and the y coordinate (row) is in the CX register. The following code stores the character and attribute byte into the MDA's video refresh buffer at the proper location: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ push ax ; save char and attribute mov ax,160 mul cx ; DX:AX = Y * 160 shl bx,1 ; multiply X by 2 add bx,ax ; BX = (Y*160) + (X*2) mov ax,0b000h ; ES = segment of monochrome mov es,ax ; adapter refresh buffer pop ax ; restore char and attribute mov es:[bx],ax ; write them to video buffer ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ More frequently, we wish to move entire strings into the refresh buffer, starting at a given coordinate. In the next example, assume that the DS:SI registers point to the source string, the ES:DI registers point to the starting position in the video buffer (calculated as shown in the previous example), the AH register contains the attribute byte to be assigned to every character in the string, and the CX register contains the length of the string. The following code moves the entire string into the refresh buffer: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ xfer: lodsb ; fetch next character stosw ; store char + attribute loop xfer ; until all chars moved ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Of course, the video drivers written for actual application programs must take into account many additional factors, such as checking for special control codes (linefeeds, carriage returns, tabs), line wrap, and scrolling. Programs that write characters directly to the CGA regen buffer in text modes must deal with an additional complicating factorÄÄthey must examine the video controller's status port and access the refresh buffer only during the horizontal retrace or vertical retrace intervals. (A retrace interval is the period when the electron beam that illuminates the screen phosphors is being repositioned to the start of a new scan line.) Otherwise, the contention for memory between the CPU and the video controller is manifest as unsightly "snow" on the display. (If you are writing programs for any of the other IBM PCÄcompatible video adapters, such as the MDA, EGA, MCGA, or VGA, you can ignore the retrace intervals; snow is not a problem with these video controllers.) A program can detect the occurrence of a retrace interval by monitoring certain bits in the video controller's status register. For example, assume that the offset for the desired character position has been calculated as in the preceding example and placed in the BX register, the segment for the CGA's refresh buffer is in the ES register, and an ASCII character code to be displayed is in the CL register. The following code waits for the beginning of a new horizontal retrace interval and then writes the character into the buffer: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ mov dx,03dah ; DX = video controller's ; status port address cli ; disable interrupts ; if retrace is already ; in progress, wait for ; it to end... wait1: in al,dx ; read status port and al,1 ; check if retrace bit on jnz wait1 ; yes, wait ; wait for new retrace ; interval to start... wait2: in al,dx ; read status port and al,1 ; retrace bit on yet? jz wait2 ; jump if not yet on mov es:[bx],cl ; write character to ; the regen buffer sti ; enable interrupts again ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The first wait loop "synchronizes" the code to the beginning of a horizontal retrace interval. If only the second wait loop were used (that is, if a character were written when a retrace interval was already in progress), the write would occasionally begin so close to the end of a horizontal retrace "window" that it would partially miss the retrace, resulting in scattered snow at the left edge of the display. Notice that the code also disables interrupts during accesses to the video buffer, so that service of a hardware interrupt won't disrupt the synchronization process. Because of the retrace-interval constraints just outlined, the rate at which you can update the CGA in text modes is severely limited when the updating is done one character at a time. You can obtain better results by calculating all the relevant addresses and setting up the appropriate registers, disabling the video controller by writing to register 3D8H, moving the entire string to the buffer with a REP MOVSW operation, and then reenabling the video controller. If the string is of reasonable length, the user won't even notice a flicker in the display. Of course, this procedure introduces additional hardware dependence into your code because it requires much greater knowledge of the 6845 controller. Luckily, snow is not a problem in CGA graphics modes. Graphics Mode Graphics-mode memory-mapped programming for IBM PCÄcompatible adapters is considerably more complicated than text-mode programming. Each bit or group of bits in the regen buffer corresponds to an addressable point, or pixel, on the screen. The mapping of bits to pixels differs for each of the available graphics modes, with their differences in resolution and number of supported colors. The newer adapters (EGA, MCGA, and VGA) also use the concept of bit planes, where bits of a pixel are segregated into multiple banks of memory mapped at the same address; you must manipulate these bit planes by a combination of memory-mapped I/O and port addressing. IBM-video-systems graphics programming is a subject large enough for a book of its own, but we can use the 640-by-200, 2-color graphics display mode of the CGA (which is also supported by all subsequent IBM text/graphics adapters) to illustrate a few of the techniques involved. This mode is simple to deal with because each pixel is represented by a single bit. The pixels are assigned (x,y) coordinates in the range (0,0) through (639,199), where x is the horizontal displacement, y is the vertical displacement, and the home position (0,0) is the upper left corner of the display. (See Figure 6-7.) (0,0)ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿(639,0) ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ (0,199)ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ(639,199) Figure 6-7. Point addressing for 640-by-200, 2-color graphics modes on the CGA, EGA, MCGA, and VGA (IBM ROM BIOS mode 6). Each successive group of 80 bytes (640 bits) represents one horizontal scan line. Within each byte, the bits map one-for-one onto pixels, with the most significant bit corresponding to the leftmost displayed pixel of a set of eight pixels and the least significant bit corresponding to the rightmost displayed pixel of the set. The memory map is set up so that all the even y coordinates are scanned as a set and all the odd y coordinates are scanned as a set; this mapping is referred to as the memory interlace. To find the regen buffer offset for a particular (x,y) coordinate, you would use the following formula: offset = ((y AND 1) * 2000H) + (y/2 * 50H) + (x/8) The assembly-language implementation of this formula is as follows: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ; assume AX = Y, BX = X shr bx,1 ; divide X by 8 shr bx,1 shr bx,1 push ax ; save copy of Y shr ax,1 ; find (Y/2) * 50h mov cx,50h ; with product in DX:AX mul cx add bx,ax ; add product to X/8 pop ax ; add (Y AND 1) * 2000h and ax,1 jz label1 add bx,2000h label1: ; now BX = offset into ; video buffer ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ After calculating the correct byte address, you can use the following formula to calculate the bit position for a given pixel coordinate: bit = 7 - (x MOD 8) where bit 7 is the most significant bit and bit 0 is the least significant bit. It is easiest to build an 8-byte table, or array of bit masks, and use the operation X AND 7 to extract the appropriate entry from the table: (X AND 7) Bit mask (X AND 7) Bit mask ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 80H 4 08H 1 40H 5 04H 2 20H 6 02H 3 10H 7 01H ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The assembly-language implementation of this second calculation is as follows: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ table db 80h ; X AND 7 = offset 0 db 40h ; X AND 7 = offset 1 db 20h ; X AND 7 = offset 2 db 10h ; X AND 7 = offset 3 db 08h ; X AND 7 = offset 4 db 04h ; X AND 7 = offset 5 db 02h ; X AND 7 = offset 6 db 01h ; X AND 7 = offset 7 . . . ; assume BX = X coordinate and bx,7 ; isolate 0Ä7 offset mov al,[bx+table] ; now AL = mask from table . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The program can then use the mask, together with the byte offset previously calculated, to set or clear the appropriate bit in the video controller's regen buffer. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 7 Printer and Serial Port MS-DOS supports printers, plotters, modems, and other hard-copy output or communication devices with device drivers for parallel ports and serial ports. Parallel ports are so named because they transfer a byteÄÄ8 bitsÄÄ in parallel to the destination device over eight separate physical paths (plus additional status and handshaking signals). The serial port, on the other hand, communicates with the CPU with bytes but sends data to or receives data from its destination device seriallyÄÄa bit at a timeÄÄover a single physical connection. Parallel ports are typically used for high-speed output devices, such as line printers, over relatively short distances (less than 50 feet). They are rarely used for devices that require two-way communication with the computer. Serial ports are used for lower-speed devices, such as modems and terminals, that require two-way communication (although some printers also have serial interfaces). A serial port can drive its device reliably over much greater distances (up to 1000 feet) over as few as three wiresÄÄ transmit, receive, and ground. The most commonly used type of serial interface follows a standard called RS-232. This standard specifies a 25-wire interface with certain electrical characteristics, the use of various handshaking signals, and a standard DB-25 connector. Other serial-interface standards existÄÄfor example, the RS-422, which is capable of considerably higher speeds than the RS-232ÄÄ but these are rarely used in personal computers (except for the Apple Macintosh) at this time. MS-DOS has built-in device drivers for three parallel adapters, and for two serial adapters on the PC or PC/AT and three serial adapters on the PS/2. The logical names for these devices are LPT1, LPT2, LPT3, COM1, COM2, and COM3. The standard printer (PRN) and standard auxiliary (AUX) devices are normally aliased to LPT1 and COM1, but you can redirect PRN to one of the serial ports with the MS-DOS MODE command. As with keyboard and video display I/O, you can manage printer and serial-port I/O at several levels that offer different degrees of flexibility and hardware independence: þ MS-DOS handle-oriented functions þ MS-DOS traditional character functions þ IBM ROM BIOS driver functions In the case of the serial port, direct control of the hardware by application programs is also common. I will discuss each of these I/O methods briefly, with examples, in the following pages. Printer Output The preferred method of printer output is to use the handle write function (Int 21H Function 40H) with the predefined standard printer handle (4). For example, you could write the string hello to the printer as follows: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' ; message for printer msg_len equ $-msg ; length of message . . . mov ah,40h ; function 40h = write file or device mov bx,4 ; BX = standard printer handle mov cx,msg_len ; CX = length of string mov dx,seg msg ; DS:DX = string address mov ds,dx mov dx,offset msg int 21h ; transfer to MS-DOS jc error ; jump if error . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ If there is no error, the function returns the carry flag cleared and the number of characters actually transferred to the list device in register AX. Under normal circumstances, this number should always be the same as the length requested and the carry flag indicating an error should never be set. However, the output will terminate early if your data contains an end-of-file mark (Ctrl-Z). You can write independently to several list devices (for example, LPT1, LPT2) by issuing a specific open request (Int 21H Function 3DH) for each device and using the handles returned to access the printers individually with Int 21H Function 40H. You have already seen this general approach in Chapters 5 and 6. An alternative method of printer output is to use the traditional Int 21H Function 05H, which transfers the character in the DL register to the printer. (This function is sensitive to Ctrl-C interrupts.) For example, the assembly-language code sequence at the top of the following page would write the the string hello to the line printer. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' ; message for printer msg_len equ $-msg ; length of message . . . mov bx,seg msg ; DS:BX = string address mov ds,bx mov bx,offset msg mov cx,msg_len ; CX = string length next: mov dl,[bx] ; get next character mov ah,5 ; function 05h = printer output int 21h ; transfer to MS-DOS inc bx ; bump string pointer loop next ; loop until string done . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Programs that run on IBM PCÄcompatible machines can obtain improved printer throughput by bypassing MS-DOS and calling the ROM BIOS printer driver directly by means of Int 17H. Section III of this book, "IBM ROM BIOS and Mouse Functions Reference," documents the Int 17H functions in detail. Use of the ROM BIOS functions also allows your program to test whether the printer is off line or out of paper, a capability that MS-DOS does not offer. For example, the following sequence of instructions calls the ROM BIOS printer driver to send the string hello to the line printer: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' ; message for printer msg_len equ $-msg ; length of message . . . mov bx,seg msg ; DS:BX = string address mov ds,bx mov bx,offset msg mov cx,msg_len ; CX = string length mov dx,0 ; DX = printer number next: mov al,[bx] ; AL = character to print mov ah,0 ; function 00h = printer output int 17h ; transfer to ROM BIOS inc bx ; bump string pointer loop next ; loop until string done . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Note that the printer numbers used by the ROM BIOS are zero-based, whereas the printer numbers in MS-DOS logical-device names are one-based. For example, ROM BIOS printer 0 corresponds to LPT1. Finally, the most hardware-dependent technique of printer output is to access the printer controller directly. Considering the functionality already provided in MS-DOS and the IBM ROM BIOS, as well as the speeds of the devices involved, I cannot see any justification for using direct hardware control in this case. The disadvantage of introducing such extreme hardware dependence for such a low-speed device would far outweigh any small performance gains that might be obtained. The Serial Port MS-DOS support for serial ports (often referred to as the auxiliary device in MS-DOS manuals) is weak compared with its keyboard, video-display, and printer support. This is one area where the application programmer is justified in making programs hardware dependent to extract adequate performance. Programs that restrict themselves to MS-DOS functions to ensure portability can use the handle read and write functions (Int 21H Functions 3FH and 40H), with the predefined standard auxiliary handle (3) to access the serial port. For example, the following code writes the string hello to the serial port that is currently defined as the AUX device: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' ; message for serial port msg_len equ $-msg ; length of message . . . mov ah,40h ; function 40h = write file or device mov bx,3 ; BX = standard aux handle mov cx,msg_len ; CX = string length mov dx,seg msg ; DS:DX = string address mov ds,dx mov dx,offset msg int 21h ; transfer to MS-DOS jc error ; jump if error . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The standard auxiliary handle gives access to only the first serial port (COM1). If you want to read or write COM2 and COM3 using the handle calls, you must issue an open request (Int 21H Function 3DH) for the desired serial port and use the handle returned by that function with Int 21H Functions 3FH and 40H. Some versions of MS-DOS have a bug in character-device handling that manifests itself as follows: If you issue a read request with Int 21H Function 3FH for the exact number of characters that are waiting in the driver's buffer, the length returned in the AX register is the number of characters transferred minus one. You can circumvent this problem by always requesting more characters than you expect to receive or by placing the device handle into binary mode using Int 21H Function 44H. MS-DOS also supports two traditional functions for serial-port I/O. Int 21H Function 03H inputs a character from COM1 and returns it in the AL register; Int 21H Function 04H transmits the character in the DL register to COM1. Like the other traditional calls, these two are direct descendants of the CP/M auxiliary-device functions. For example, the following code sends the string hello to COM1 using the traditional Int 21H Function 04H: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ msg db 'hello' ; message for serial port msg_len equ $-msg ; length of message . . . mov bx,seg msg ; DS:BX = string address mov ds,bx mov bx,offset msg mov cx,msg_len ; CX = length of string mov dl,[bx] ; get next character mov ah,4 ; function 04h = aux output int 21h ; transfer to MS-DOS inc bx ; bump pointer to string loop next ; loop until string done . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ MS-DOS translates the traditional auxiliary-device functions into calls on the same device driver used by the handle calls. Therefore, it is generally preferable to use the handle functions in the first place, because they allow very long strings to be read or written in one operation, they give access to serial ports other than COM1, and they are symmetrical with the handle video-display, keyboard, printer, and file I/O methods described elsewhere in this book. Although the handle or traditional serial-port functions allow you to write programs that are portable to any machine running MS-DOS, they have a number of disadvantages: þ The built-in MS-DOS serial-port driver is slow and is not interrupt driven. þ MS-DOS serial-port I/O is not buffered. þ Determininÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrs no standardized function to configure the serial port from within a program. For programs that are going to run on the IBM PC or compatibles, a more flexible technique for serial-port I/O is to call the IBM ROM BIOS serial-port driver by means of Int 14H. You can use this driver to initialize the serial port to a desired configuration and baud rate, examine the status of the controller, and read or write characters. Section III of this book, "IBM ROM BIOS and Mouse Functions Reference," documents the functions available from the ROM BIOS serial-port driver. For example, the following sequence sends the character X to the first serial port (COM1): ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . mov ah,1 ; function 01h = send character mov al,'X' ; AL = character to transmit mov dx,0 ; DX = serial-port number int 14h ; transfer to ROM BIOS and ah,80h ; did transmit fail? jnz error ; jump if transmit error . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ As with the ROM BIOS printer driver, the serial-port numbers used by the ROM BIOS are zero-based, whereas the serial-port numbers in MS-DOS logical-device names are one-based. In this example, serial port 0 corresponds to COM1. Unfortunately, like the MS-DOS auxiliary-device driver, the ROM BIOS serial-port driver is not interrupt driven. Although it will support higher transfer speeds than the MS-DOS functions, at rates greater than 2400 baud it may still lose characters. Consequently, most programmers writing high-performance applications that use a serial port (such as telecommunications programs) take complete control of the serial-port controller and provide their own interrupt driver. The built-in functions provided by MS-DOS, and by the ROM BIOS in the case of the IBM PC, are simply not adequate. Writing such programs requires a good understanding of the hardware. In the case of the IBM PC, the chips to study are the INS8250 Asynchronous Communications Controller and the Intel 8259A Programmable Interrupt Controller. The IBM technical reference documentation for these chips is a bit disorganized, but most of the necessary information is there if you look for it. The TALK Program The simple terminal-emulator program TALK.ASM (Figure 7-1) is an example of a useful program that performs screen, keyboard, and serial-port I/O. This program recapitulates all of the topics discussed in Chapters 5 through 7. TALK uses the IBM PC's ROM BIOS video driver to put characters on the screen, to clear the display, and to position the cursor; it uses the MS-DOS character-input calls to read the keyboard; and it contains its own interrupt driver for the serial-port controller. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ name talk page 55,132 .lfcond ; List false conditionals too title TALK--Simple terminal emulator ; ; TALK.ASM--Simple IBM PC terminal emulator ; ; Copyright (c) 1988 Ray Duncan ; ; To assemble and link this program into TALK.EXE: ; ; C>MASM TALK; ; C>LINK TALK; ; stdin equ 0 ; standard input handle stdout equ 1 ; standard output handle stderr equ 2 ; standard error handle cr equ 0dh ; ASCII carriage return lf equ 0ah ; ASCII linefeed bsp equ 08h ; ASCII backspace escape equ 1bh ; ASCII escape code dattr equ 07h ; display attribute to use ; while in emulation mode bufsiz equ 4096 ; size of serial-port buffer echo equ 0 ; 0 = full-duplex, -1 = half-duplex equ -1 false equ 0 com1 equ true ; use COM1 if nonzero com2 equ not com1 ; use COM2 if nonzero pic_mask equ 21h ; 8259 interrupt mask port pic_eoi equ 20h ; 8259 EOI port if com1 com_data equ 03f8h ; port assignments for COM1 com_ier equ 03f9h com_mcr equ 03fch com_sts equ 03fdh com_int equ 0ch ; COM1 interrupt number int_mask equ 10h ; IRQ4 mask for 8259 endif if com2 com_data equ 02f8h ; port assignments for COM2 com_ier equ 02f9h com_mcr equ 02fch com_sts equ 02fdh com_int equ 0bh ; COM2 interrupt number int_mask equ 08h ; IRQ3 mask for 8259 endif _TEXT segment word public 'CODE' assume cs:_TEXT,ds:_DATA,es:_DATA,ss:STACK talk proc far ; entry point from MS-DOS mov ax,_DATA ; make data segment addressable mov ds,ax mov es,ax ; initialize display for ; terminal emulator mode... mov ah,15 ; get display width and int 10h ; current display mode dec ah ; save display width for use mov columns,ah ; by the screen-clear routine cmp al,7 ; enforce text display mode je talk2 ; mode 7 ok, proceed cmp al,3 jbe talk2 ; modes 0-3 ok, proceed mov dx,offset msg1 mov cx,msg1_len jmp talk6 ; print error message and exit talk2: mov bh,dattr ; clear screen and home cursor call cls call asc_enb ; capture serial-port interrupt ; vector and enable interrupts mov dx,offset msg2 ; display message mov cx,msg2_len ; 'terminal emulator running' mov bx,stdout ; BX = standard output handle mov ah,40h ; function 40h = write file or device int 21h ; transfer to MS-DOS talk3: call pc_stat ; keyboard character waiting? jz talk4 ; nothing waiting, jump call pc_in ; read keyboard character cmp al,0 ; is it a function key? jne talk32 ; not function key, jump call pc_in ; function key, discard 2nd ; character of sequence jmp talk5 ; then terminate program talk32: ; keyboard character received if echo push ax ; if half-duplex, echo call pc_out ; character to PC display pop ax endif call com_out ; write char to serial port talk4: call com_stat ; serial-port character waiting? jz talk3 ; nothing waiting, jump call com_in ; read serial-port character cmp al,20h ; is it control code? jae talk45 ; jump if not call ctrl_code ; control code, process it jmp talk3 ; check keyboard again talk45: ; noncontrol char received, call pc_out ; write it to PC display jmp talk4 ; see if any more waiting talk5: ; function key detected, ; prepare to terminate... mov bh,07h ; clear screen and home cursor call cls mov dx,offset msg3 ; display farewell message mov cx,msg3_len talk6: push dx ; save message address push cx ; and message length call asc_dsb ; disable serial-port interrupts ; and release interrupt vector pop cx ; restore message length pop dx ; and address mov bx,stdout ; handle for standard output mov ah,40h ; function 40h = write device int 21h ; transfer to MS-DOS mov ax,4c00h ; terminate program with int 21h ; return code = 0 talk endp com_stat proc near ; check asynch status; returns ; Z = false if character ready ; Z = true if nothing waiting push ax mov ax,asc_in ; compare ring buffer pointers cmp ax,asc_out pop ax ret ; return to caller stat endp com_in proc near ; get character from serial- ; port buffer; returns ; new character in AL push bx ; save register BX com_in1: ; if no char waiting, wait mov bx,asc_out ; until one is received cmp bx,asc_in je com_in1 ; jump, nothing waiting mov al,[bx+asc_buf] ; character is ready, ; extract it from buffer inc bx ; update buffer pointer cmp bx,bufsiz jne com_in2 xor bx,bx ; reset pointer if wrapped com_in2: mov asc_out,bx ; store updated pointer pop bx ; restore register BX ret ; and return to caller com_in endp com_out proc near ; write character in AL ; to serial port push dx ; save register DX push ax ; save character to send mov dx,com_sts ; DX = status port address com_out1: ; check if transmit buffer in al,dx ; is empty (TBE bit = set) and al,20h jz com_out1 ; no, must wait pop ax ; get character to send mov dx,com_data ; DX = data port address out dx,al ; transmit the character pop dx ; restore register DX ret ; and return to caller com_out endp pc_stat proc near ; read keyboard status; returns ; Z = false if character ready ; Z = true if nothing waiting ; register DX destroyed mov al,in_flag ; if character already or al,al ; waiting, return status jnz pc_stat1 mov ah,6 ; otherwise call MS-DOS to mov dl,0ffh ; determine keyboard status int 21h jz pc_stat1 ; jump if no key ready mov in_char,al ; got key, save it for mov in_flag,0ffh ; "pc_in" routine pc_stat1: ; return to caller with ret ; Z flag set appropriately pc_stat endp pc_in proc near ; read keyboard character, ; return it in AL ; DX may be destroyed mov al,in_flag ; key already waiting? or al,al jnz pc_in1 ; yes, return it to caller call pc_stat ; try to read a character jmp pc_in pc_in1: mov in_flag,0 ; clear char-waiting flag mov al,in_char ; and return AL = character ret pc_in endp pc_out proc near ; write character in AL ; to the PC's display mov ah,0eh ; ROM BIOS function 0eh = ; "teletype output" push bx ; save register BX xor bx,bx ; assume page 0 int 10h ; transfer to ROM BIOS pop bx ; restore register BX ret ; and return to caller pc_out endp cls proc near ; clear display using ; char attribute in BH ; registers AX, CX, ; and DX destroyed mov dl,columns ; set DL,DH = X,Y of mov dh,24 ; lower right corner mov cx,0 ; set CL,CH = X,Y of ; upper left corner mov ax,600h ; ROM BIOS function 06h = ; "scroll or initialize ; window" int 10h ; transfer to ROM BIOS call home ; set cursor at (0,0) ret ; and return to caller cls endp clreol proc near ; clear from cursor to end ; of line using attribute ; in BH, registers AX, CX, ; and DX destroyed call getxy ; get current cursor position mov cx,dx ; current position = "upper ; left corner" of window; mov dl,columns ; "lower right corner" X is ; max columns, Y is same ; as upper left corner mov ax,600h ; ROM BIOS function 06h = ; "scroll or initialize ; window" int 10h ; transfer to ROM BIOS ret ; return to caller clreol endp home proc near ; put cursor at home position mov dx,0 ; set (X,Y) = (0,0) call gotoxy ; position the cursor ret ; return to caller home endp gotoxy proc near ; position the cursor ; call with DL = X, DH = Y push bx ; save registers push ax mov bh,0 ; assume page 0 mov ah,2 ; ROM BIOS function 02h = ; set cursor position int 10h ; transfer to ROM BIOS pop ax ; restore registers pop bx ret ; and return to caller gotoxy endp getxy proc near ; get cursor position, ; returns DL = X, DH = Y push ax ; save registers push bx push cx mov ah,3 ; ROM BIOS function 03h = ; get cursor position mov bh,0 ; assume page 0 int 10h ; transfer to ROM BIOS pop cx ; restore registers pop bx pop ax ret ; and return to caller getxy endp ctrl_code proc near ; process control code ; call with AL = char cmp al,cr ; if carriage return je ctrl8 ; just send it cmp al,lf ; if linefeed je ctrl8 ; just send it cmp al,bsp ; if backspace je ctrl8 ; just send it cmp al,26 ; is it cls control code? jne ctrl7 ; no, jump mov bh,dattr ; cls control code, clear call cls ; screen and home cursor jmp ctrl9 ctrl7: cmp al,escape ; is it Escape character? jne ctrl9 ; no, throw it away call esc_seq ; yes, emulate CRT terminal jmp ctrl9 ctrl8: call pc_out ; send CR, LF, or backspace ; to the display ctrl9: ret ; return to caller ctrl_code endp esc_seq proc near ; decode Televideo 950 escape ; sequence for screen control call com_in ; get next character cmp al,84 ; is it clear to end of line? jne esc_seq1 ; no, jump mov bh,dattr ; yes, clear to end of line call clreol jmp esc_seq2 ; then exit esc_seq1: cmp al,61 ; is it cursor positioning? jne esc_seq2 ; no jump call com_in ; yes, get Y parameter sub al,33 ; and remove offset mov dh,al call com_in ; get X parameter sub al,33 ; and remove offset mov dl,al call gotoxy ; position the cursor esc_seq2: ; return to caller ret esc_seq endp asc_enb proc near ; capture serial-port interrupt ; vector and enable interrupt ; save address of previous ; interrupt handler... mov ax,3500h+com_int ; function 35h = get vector int 21h ; transfer to MS-DOS mov word ptr oldvec+2,es mov word ptr oldvec,bx ; now install our handler... push ds ; save our data segment mov ax,cs ; set DS:DX = address mov ds,ax ; of our interrupt handler mov dx,offset asc_int mov ax,2500h+com_int ; function 25h = set vector int 21h ; transfer to MS-DOS pop ds ; restore data segment mov dx,com_mcr ; set modem-control register mov al,0bh ; DTR and OUT2 bits out dx,al mov dx,com_ier ; set interrupt-enable mov al,1 ; register on serial- out dx,al ; port controller in al,pic_mask ; read current 8259 mask and al,not int_mask ; set mask for COM port out pic_mask,al ; write new 8259 mask ret ; back to caller asc_enb endp asc_dsb proc near ; disable interrupt and ; release interrupt vector in al,pic_mask ; read current 8259 mask or al,int_mask ; reset mask for COM port out pic_mask,al ; write new 8259 mask push ds ; save our data segment lds dx,oldvec ; load address of ; previous interrupt handler mov ax,2500h+com_int ; function 25h = set vector int 21h ; transfer to MS-DOS pop ds ; restore data segment ret ; back to caller asc_dsb endp asc_int proc far ; interrupt service routine ; for serial port sti ; turn interrupts back on push ax ; save registers push bx push dx push ds mov ax,_DATA ; make our data segment mov ds,ax ; addressable cli ; clear interrupts for ; pointer manipulation mov dx,com_data ; DX = data port address in al,dx ; read this character mov bx,asc_in ; get buffer pointer mov [asc_buf+bx],al ; store this character inc bx ; bump pointer cmp bx,bufsiz ; time for wrap? jne asc_int1 ; no, jump xor bx,bx ; yes, reset pointer asc_int1: ; store updated pointer mov asc_in,bx sti ; turn interrupts back on mov al,20h ; send EOI to 8259 out pic_eoi,al pop ds ; restore all registers pop dx pop bx pop ax iret ; return from interrupt asc_int endp _TEXT ends _DATA segment word public 'DATA' in_char db 0 ; PC keyboard input char in_flag db 0 ; <>0 if char waiting columns db 0 ; highest numbered column in ; current display mode (39 or 79) msg1 db cr,lf db 'Display must be text mode.' db cr,lf msg1_len equ $-msg1 msg2 db 'Terminal emulator running...' db cr,lf msg2_len equ $-msg2 msg3 db 'Exit from terminal emulator.' db cr,lf msg3_len equ $-msg3 oldvec dd 0 ; original contents of serial- ; port interrupt vector asc_in dw 0 ; input pointer to ring buffer asc_out dw 0 ; output pointer to ring buffer asc_buf db bufsiz dup (?) ; communications buffer _DATA ends STACK segment para stack 'STACK' db 128 dup (?) STACK ends end talk ; defines entry point ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 7-1. TALK.ASM: A simple terminal-emulator program for IBM PCÄcompatible computers. This program demonstrates use of the MS-DOS and ROM BIOS video and keyboard functions and direct control of the serial-communications adapter. The TALK program illustrates the methods that an application should use to take over and service interrupts from the serial port without running afoul of MS-DOS conventions. The program begins with some equates and conditional assembly statements that configure the program for half- or full-duplex and for the desired serial port (COM1 or COM2). At entry from MS-DOS, the main routine of the programÄÄthe procedure named talkÄÄchecks the status of the serial port, initializes the display, and calls the asc_enb routine to take over the serial-port interrupt vector and enable interrupts. The talk procedure then enters a loop that reads the keyboard and sends the characters out the serial port and then reads the serial port and puts the characters on the displayÄÄin other words, it causes the PC to emulate a simple CRT terminal. The TALK program intercepts and handles control codes (carriage return, linefeed, and so forth) appropriately. It detects escape sequences and handles them as a subset of the Televideo 950 terminal capabilities. (You can easily modify the program to emulate any other cursor-addressable terminal.) When one of the PC's special function keys is pressed, the program disables serial-port interrupts, releases the serial-port interrupt vector, and exits back to MS-DOS. There are several TALK program procedures that are worth your attention because they can easily be incorporated into other programs. These are listed in the table on the following page. Procedure Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ asc_enb Takes over the serial-port interrupt vector and enables interrupts by writing to the modem-control register of the INS8250 and the interrupt-mask register of the 8259A. asc_dsb Restores the original state of the serial-port interrupt vector and disables interrupts by writing to the interrupt-mask register of the 8259A. asc_int Services serial-port interrupts, placing received characters into a ring buffer. com_stat Tests whether characters from the serial port are waiting in the ring buffer. com_in Removes characters from the interrupt handler's ring buffer and increments the buffer pointers appropriately. com_out Sends one character to the serial port. cls Calls the ROM BIOS video driver to clear the screen. clreol Calls the ROM BIOS video driver to clear from the current cursor position to the end of the line. home Places the cursor in the upper left corner of the screen. gotoxy Positions the cursor at the desired position on the display. getxy Obtains the current cursor position. pc_out Sends one character to the PC's display. pc_stat Gets status for the PC's keyboard. pc_in Returns a character from the PC's keyboard. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 8 File Management The dual heritage of MS-DOSÄÄCP/M and UNIX/XENIXÄÄis perhaps most clearly demonstrated in its file-management services. In general, MS-DOS provides at least two distinct operating-system calls for each major file or record operation. This chapter breaks this overlapping battery of functions into two groups and explains the usage, advantages, and disadvantages of each. I will refer to the set of file and record functions that are compatible with CP/M as FCB functions. These functions rely on a data structure called a file control block (hence, FCB) to maintain certain bookkeeping information about open files. This structure resides in the application program's memory space. The FCB functions allow the programmer to create, open, close, and delete files and to read or write records of any size at any record position within such files. These functions do not support the hierarchical (treelike) file structure that was first introduced in MS-DOS version 2.0, so they can be used only to access files in the current subdirectory for a given disk drive. I will refer to the set of file and record functions that provide compatibility with UNIX/XENIX as the handle functions. These functions allow the programmer to open or create files by passing MS-DOS a null-terminated string that describes the file's location in the hierarchical file structure (the drive and path), the file's name, and its extension. If the open or create operation is successful, MS-DOS returns a 16-bit token, or handle, that is saved by the application program and used to specify the file in subsequent operations. When you use the handle functions, the operating system maintains the data structures that contain bookkeeping information about the file inside its own memory space, and these structures are not accessible to the application program. The handle functions fully support the hierarchical file structure, allowing the programmer to create, open, close, and delete files in any subdirectory on any disk drive and to read or write records of any size at any byte offset within such files. Although we are discussing the FCB functions first in this chapter for historical reasons, new MS-DOS applications should always be written using the more powerful handle functions. Use of the FCB functions in new programs should be avoided, unless compatibility with MS-DOS version 1.0 is needed. Using the FCB Functions Understanding the structure of the file control block is the key to success with the FCB family of file and record functions. An FCB is a 37-byte data structure allocated within the application program's memory space; it is divided into many fields (Figure 8-1). Typically, the program initializes an FCB with a drive code, a filename, and an extension (conveniently accomplished with the parse-filename service, Int 21H Function 29H) and then passes the address of the FCB to MS-DOS to open or create the file. If the file is successfully opened or created, MS-DOS fills in certain fields of the FCB with information from the file's entry in the disk directory. This information includes the file's exact size in bytes and the date and time the file was created or last updated. MS-DOS also places certain other information within a reserved area of the FCB; however, this area is used by the operating system for its own purposes and varies among different versions of MS-DOS. Application programs should never modify the reserved area. For compatibility with CP/M, MS-DOS automatically sets the record-size field of the FCB to 128 bytes. If the program does not want to use this default record size, it must place the desired size (in bytes) into the record-size field after the open or create operation. Subsequently, when the program needs to read or write records from the file, it must pass the address of the FCB to MS-DOS; MS-DOS, in turn, keeps the FCB updated with information about the current position of the file pointer and the size of the file. Data is always read to or written from the current disk transfer area (DTA), whose address is set with Int 21H Function 1AH. If the application program wants to perform random record access, it must set the record number into the FCB before issuing each function call; when sequential record access is being used, MS-DOS maintains the FCB and no special intervention is needed from the application. Byte offset 00H ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Drive identification ³ Note 1 01H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Filename (8 characters) ³ Note 2 09H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Extension (3 characters) ³ Note 2 0CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Current block number ³ Note 9 0EH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Record size ³ Note 10 10H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File size (4 bytes) ³ Notes 3, 6 14H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Date created/updated ³ Note 7 16H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Time created/updated ³ Note 8 18H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ 20H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Current-record number ³ Note 9 21H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Relative-record number (4 bytes) ³ Note 5 ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 8-1. Normal file control block. Total length is 37 bytes (25H bytes). See notes on pages 133Ä34. In general, MS-DOS functions that use FCBs accept the full address of the FCB in the DS:DX register and pass back a return code in the AL register (Figure 8-2). For file-management calls (open, close, create, and delete), this return code is zero if the function was successful and 0FFH (255) if the function failed. For the FCB-type record read and write functions, the success code returned in the AL register is again zero, but there are several failure codes. Under MS-DOS version 3.0 or later, more detailed error reporting can be obtained by calling Int 21H Function 59H (Get Extended Error Information) after a failed FCB function call. When a program is loaded under MS-DOS, the operating system sets up two FCBs in the program segment prefix, at offsets 005CH and 006CH. These are often referred to as the default FCBs, and they are included to provide upward compatibility from CP/M. MS-DOS parses the first two parameters in the command line that invokes the program (excluding any redirection directives) into the default FCBs, under the assumption that they may be file specifications. The application must determine whether they really are filenames or not. In addition, because the default FCBs overlap and are not in a particularly convenient location (especially for .EXE programs), they usually must be copied elsewhere in order to be used safely. (See Chapter 3.) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ; filename was previously ; parsed into "my_fcb" mov dx,seg my_fcb ; DS:DX = address of mov ds,dx ; file control block mov dx,offset my_fcb mov ah,0fh ; function 0fh = open int 21h or al,al ; was open successful? jnz error ; no, jump to error routine . . . my_fcb db 37 dup (0) ; file control block ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-2. A typical FCB file operation. This sequence of code attempts to open the file whose name was previously parsed into the FCB named my_fcb. Note that the structures of FCBs under CP/M and MS-DOS are not identical. However, the differences lie chiefly in the reserved areas of the FCBs (which should not be manipulated by application programs in any case), so well-behaved CP/M applications should be relatively easy to port into MS-DOS. It seems, however, that few such applications exist. Many of the tricks that were played by clever CP/M programmers to increase performance or circumvent the limitations of that operating system can cause severe problems under MS-DOS, particularly in networking environments. At any rate, much better performance can be achieved by thoroughly rewriting the CP/M applications to take advantage of the superior capabilities of MS-DOS. You can use a special FCB variant called an extended file control block to create or access files with special attributes (such as hidden or read-only files), volume labels, and subdirectories. An extended FCB has a 7-byte header followed by the 37-byte structure of a normal FCB (Figure 8-3). The first byte contains 0FFH, which could never be a legal drive code and thus indicates to MS-DOS that an extended FCB is being used. The next 5 bytes are reserved and are unused in current versions of MS-DOS. The seventh byte contains the attribute of the special file type that is being accessed. (Attribute bytes are discussed in more detail in Chapter 9.) Any MS-DOS function that uses a normal FCB can also use an extended FCB. The FCB file- and record-management functions may be gathered into the following broad classifications: Byte offset 00H ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ 0FFH ³ Note 11 01H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved (5 bytes, must be zero) ³ 06H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Attribute byte ³ Note 12 07H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Drive identification ³ Note 1 08H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Filename (8 characters) ³ Note 2 10H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Extension (3 characters) ³ Note 2 13H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Current-block number ³ Note 9 15H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Record size ³ Note 10 17H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File size (4 bytes) ³ Notes 3, 6 1BH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Date created/updated ³ Note 7 1DH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Time created/updated ³ Note 8 1FH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ 27H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Current-record number ³ Note 9 28H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Relative-record number (4 bytes) ³ Note 5 ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 8-3. Extended file control block. Total length is 44 bytes (2CH bytes). See notes on pages 133Ä34. Function Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Common FCB file operations 0FH Open file. 10H Close file. 16H Create file. Common FCB record operations 14H Perform sequential read. 15H Perform sequential write. 21H Perform random read. 22H Perform random write. 27H Perform random block read. 28H Perform random block write. Other vital FCB operations 1AH Set disk transfer address. 29H Parse filename. Less commonly used FCB file operations 13H Delete file. 17H Rename file. Less commonly used FCB record operations 23H Obtain file size. 24H Set relative-record number. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Several of these functions have special properties. For example, Int 21H Functions 27H (Random Block Read) and 28H (Random Block Write) allow reading and writing of multiple records of any size and also update the random-record field automatically (unlike Int 21H Functions 21H and 22H). Int 21H Function 28H can truncate a file to any desired size, and Int 21H Function 17H used with an extended FCB can alter a volume label or rename a subdirectory. Section 2 of this book, "MS-DOS Functions Reference," gives detailed specifications for each of the FCB file and record functions, along with assembly-language examples. It is also instructive to compare the preceding groups with the corresponding groups of handle-type functions listed on pages 140Ä41. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Notes for Figures 8-1 and 8-3 1. The drive identification is a binary number: 00=default drive, 01=drive A:, 02=drive B:, and so on. If the application program supplies the drive code as zero (default drive), MS-DOS fills in the code for the actual current disk drive after a successful open or create call. 2. File and extension names must be left justified and padded with blanks. 3. The file size, date, time, and reserved fields should not be modified by applications. 4. All word fields are stored with the least significant byte at the lower address. 5. The relative-record field is treated as 4 bytes if the record size is less than 64 bytes; otherwise, only the first 3 bytes of this field are used. 6. The file-size field is in the same format as in the directory, with the less significant word at the lower address. 7. The date field is mapped as in the directory. Viewed as a 16-bit word (as it would appear in a register), the field is broken down as follows: F E D C B A 9 8 7 6 5 4 3 2 1 0 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Year ³ Month ³ Day ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Bits Contents ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00HÄ04H Day (1Ä31) 05HÄ08H Month (1Ä12) 09HÄ0FH Year, relative to 1980 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 8. The time field is mapped as in the directory. Viewed as a 16-bit word (as it would appear in a register), the field is broken down as follows: F E D C B A 9 8 7 6 5 4 3 2 1 0 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Hours ³ Minutes ³ 2-second increments ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Bits Contents ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00HÄ04H 2-second increments (0Ä29) 05HÄ0AH Minutes (0Ä59) 0BHÄ0FH Hours (0Ä23) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 9. The current-block and current-record numbers are used together on sequential reads and writes. This simulates the behavior of CP/M. 10. The Int 21H open (0FH) and create (16H) functions set the record-size field to 128 bytes, to provide compatibility with CP/M. If you use another record size, you must fill it in after the open or create operation. 11. An 0FFH (255) in the first byte of the structure signifies that it is an extended file control block. You can use extended FCBs with any of the functions that accept an ordinary FCB. (See also note 12.) 12. The attribute byte in an extended FCB allows access to files with the special characteristics hidden, system, or read-only. You can also use extended FCBs to read volume labels and the contents of special subdirectory files. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ FCB File-Access Skeleton The following is a typical program sequence to access a file using the FCB, or traditional, functions (Figure 8-4): 1. Zero out the prospective FCB. 2. Obtain the filename from the user, from the default FCBs, or from the command tail in the PSP. 3. If the filename was not obtained from one of the default FCBs, parse the filename into the new FCB using Int 21H Function 29H. 4. Open the file (Int 21H Function 0FH) or, if writing new data only, create the file or truncate any existing file of the same name to zero length (Int 21H Function 16H). 5. Set the record-size field in the FCB, unless you are using the default record size. Recall that it is important to do this after a successful open or create operation. (See Figure 8-5.) 6. Set the relative-record field in the FCB if you are performing random record I/O. 7. Set the disk transfer area address using Int 21H Function 1AH, unless the buffer address has not been changed since the last call to this function. If the application never performs a set DTA, the DTA address defaults to offset 0080H in the PSP. 8. Request the needed read- or write-record operation (Int 21H Function 14HÄSequential Read, 15HÄSequential Write, 21HÄRandom Read, 22HÄRandom Write, 27HÄRandom Block Read, 28HÄRandom Block Write). 9. If the program is not finished processing the file, go to step 6; otherwise, close the file (Int 21H Function 10H). If the file was used for reading only, you can skip the close operation under early versions of MS-DOS. However, this shortcut can cause problems under MS-DOS versions 3.0 and later, especially when the files are being accessed across a network. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ recsize equ 1024 ; file record size . . . mov ah,29h ; parse input filename mov al,1 ; skip leading blanks mov si,offset fname1 ; address of filename mov di,offset fcb1 ; address of FCB int 21h or al,al ; jump if name jnz name_err ; was bad . . . mov ah,29h ; parse output filename mov al,1 ; skip leading blanks mov si,offset fname2 ; address of filename mov di,offset fcb2 ; address of FCB int 21h or al,al ; jump if name jnz name_err ; was bad . . . mov ah,0fh ; open input file mov dx,offset fcb1 int 21h or al,al ; open successful? jnz no_file ; no, jump . . . mov ah,16h ; create and open mov dx,offset fcb2 ; output file int 21h or al,al ; create successful? jnz disk_full ; no, jump . . . ; set record sizes mov word ptr fcb1+0eh,recsize mov word ptr fcb2+0eh,recsize . . . mov ah,1ah ; set disk transfer mov dx,offset buffer ; address for reads int 21h ; and writes . next: . ; process next record . mov ah,14h ; sequential read from mov dx,offset fcb1 ; input file int 21h cmp al,01 ; check for end of file je file_end ; jump if end of file cmp al,03 je file_end ; jump if end of file or al,al ; other read fault? jnz bad_read ; jump if bad read . . . mov ah,15h ; sequential write to mov dx,offset fcb2 ; output file int 21h or al,al ; write successful? jnz bad_write ; jump if write failed . . . jmp next ; process next record . file_end: . ; reached end of input . mov ah,10h ; close input file mov dx,offset fcb1 int 21h . . . mov ah,10h ; close output file mov dx,offset fcb2 int 21h . . . mov ax,4c00h ; exit with return int 21h ; code of zero . . . fname1 db 'OLDFILE.DAT',0 ; name of input file fname2 db 'NEWFILE.DAT',0 ; name of output file fcb1 db 37 dup (0) ; FCB for input file fcb2 db 37 dup (0) ; FCB for output file buffer db recsize dup (?) ; buffer for file I/O ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-4. Skeleton of an assembly-language program that performs file and record I/O using the FCB family of functions. Byte Offset FCB before open FCB contents FCB after open ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ 00H ³ 00 ³ Drive ³ 03 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 01H ³ 4D ³ ³ 4D ³ 02H ³ 59 ³ ³ 59 ³ 03H ³ 46 ³ ³ 46 ³ 04H ³ 49 ³ Filename ³ 49 ³ 05H ³ 4C ³ ³ 4C ³ 06H ³ 45 ³ ³ 45 ³ 07H ³ 20 ³ ³ 20 ³ 08H ³ 20 ³ ³ 20 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 09H ³ 44 ³ ³ 44 ³ 0AH ³ 41 ³ Extension ³ 41 ³ 0BH ³ 54 ³ ³ 54 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 0CH ³ 00 ³ ³ 00 ³ 0DH ³ 00 ³ Current block ³ 00 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 0EH ³ 00 ³ ³ 80 ³ 0FH ³ 00 ³ Record size ³ 00 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 10H ³ 00 ³ ³ 80 ³ 11H ³ 00 ³ ³ 3D ³ 12H ³ 00 ³ File size ³ 00 ³ 13H ³ 00 ³ ³ 00 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 14H ³ 00 ³ ³ 43 ³ 15H ³ 00 ³ File date ³ 0B ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 16H ³ 00 ³ ³ A1 ³ 17H ³ 00 ³ File time ³ 52 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 18H ³ 00 ³ ³ 03 ³ 19H ³ 00 ³ ³ 02 ³ 1AH ³ 00 ³ ³ 42 ³ 1BH ³ 00 ³ ³ 73 ³ 1CH ³ 00 ³ Reserved ³ 00 ³ 1DH ³ 00 ³ ³ 01 ³ 1EH ³ 00 ³ ³ 35 ³ 1FH ³ 00 ³ ³ 0F ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 20H ³ 00 ³ Current record ³ 00 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ 21H ³ 00 ³ ³ 00 ³ 22H ³ 00 ³ Relative-record ³ 00 ³ 23H ³ 00 ³ number ³ 00 ³ 24H ³ 00 ³ ³ 00 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 8-5. A typical file control block before and after a successful open call (Int 21H Function 0FH). Points to Remember Here is a summary of the pros and cons of using the FCB-related file and record functions in your programs. Advantages: þ Under MS-DOS versions 1 and 2, the number of files that can be open concurrently when using FCBs is unlimited. (This is not true under MS-DOS versions 3.0 and later, especially if networking software is running.) þ File-access methods using FCBs are familiar to programmers with a CP/M background, and well-behaved CP/M applications require little change in logical flow to run under MS-DOS. þ MS-DOS supplies the size, time, and date for a file to its FCB after the file is opened. The calling program can inspect this information. Disadvantages: þ FCBs take up room in the application program's memory space. þ FCBs offer no support for the hierarchical file structure (no access to files outside the current directory). þ FCBs provide no support for file locking/sharing or record locking in networking environments. þ In addition to the read or write call itself, file reads or writes using FCBs require manipulation of the FCB to set record size and record number, plus a previous call to a separate MS-DOS function to set the DTA address. þ Random record I/O using FCBs for a file containing variable-length records is very clumsy and inconvenient. þ You must use extended FCBs, which are incompatible with CP/M anyway, to access or create files with special attributes such as hidden, read-only, or system. þ The FCB file functions have poor error reporting. This situation has been improved somewhat in MS-DOS version 3 because a program can call the added Int 21H Function 59H (Get Extended Error Information) after a failed FCB function to obtain additional information. þ Microsoft discourages use of FCBs. FCBs will make your program more difficult to port to MS OS/2 later because MS OS/2 does not support FCBs in protected mode at all. Using the Handle Functions The handle file- and record-management functions access files in a fashion similar to that used under the UNIX/XENIX operating system. Files are designated by an ASCIIZ string (an ASCII character string terminated by a null, or zero, byte) that can contain a drive designator, path, filename, and extension. For example, the file specification C:\SYSTEM\COMMAND.COM would appear in memory as the following sequence of bytes: 43 3A 5C 53 59 53 54 45 4D 5C 43 4F 4D 4D 41 4E 44 2E 43 4F 4D 00 When a program wishes to open or create a file, it passes the address of the ASCIIZ string specifying the file to MS-DOS in the DS:DX registers (Figure 8-6). If the operation is successful, MS-DOS returns a 16-bit handle to the program in the AX register. The program must save this handle for further reference. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ mov ah,3dh ; function 3dh = open mov al,2 ; mode 2 = read/write mov dx,seg filename ; address of ASCIIZ mov ds,dx ; file specification mov dx,offset filename int 21h ; request open from DOS jc error ; jump if open failed mov handle,ax ; save file handle . . . filename db 'C:\MYDIR\MYFILE.DAT',0 ; filename handle dw 0 ; file handle ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-6. A typical handle file operation. This sequence of code attempts to open the file designated in the ASCIIZ string whose address is passed to MS-DOS in the DS:DX registers. When the program requests subsequent operations on the file, it usually places the handle in the BX register before the call to MS-DOS. All the handle functions return with the CPU's carry flag cleared if the operation was successful, or set if the operation failed; in the latter case, the AX register contains a code describing the failure. MS-DOS restricts the number of handles that can be active at any one timeÄÄthat is, the number of files and devices that can be open concurrently when using the handle family of functionsÄÄin two different ways: þ The maximum number of concurrently open files in the system, for all active processes combined, is specified by the entry FILES=nn in the CONFIG.SYS file. This entry determines the number of entries to be allocated in the system file table; under MS-DOS version 3, the default value is 8 and the maximum is 255. After MS-DOS is booted and running, you cannot expand this table to increase the total number of files that can be open. You must use an editor to modify the CONFIG.SYS file and then restart the system. þ The maximum number of concurrently open files for a single process is 20, assuming that sufficient entries are also available in the system file table. When a program is loaded, MS-DOS preassigns 5 of its potential 20 handles to the standard devices. Each time the process issues an open or create call, MS-DOS assigns a handle from the process's private allocation of 20, until all the handles are used up or the system file table is full. In MS-DOS versions 3.3 and later, you can expand the per-process limit of 20 handles with a call to Int 21H Function 67H (Set Handle Count). The handle file- and record-management calls may be gathered into the following broad classifications for study: Function Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Common handle file operations 3CH Create file (requires ASCIIZ string). 3DH Open file (requires ASCIIZ string). 3EH Close file. Common handle record operations 42H Set file pointer (also used to find file size). 3FH Read file. 40H Write file. Less commonly used handle operations 41H Delete file. 43H Get or modify file attributes. 44H IOCTL (I/O Control). 45H Duplicate handle. 46H Redirect handle. 56H Rename file. 57H Get or set file date and time. 5AH Create temporary file (versions 3.0 and later). 5BH Create file (fails if file already exists; versions 3.0 and later). 5CH Lock or unlock file region (versions 3.0 and later). 67H Set handle count (versions 3.3 and later). 68H Commit file (versions 3.3 and later). 6CH Extended open file (version 4). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Compare the groups of handle-type functions in the preceding table with the groups of FCB functions outlined earlier, noting the degree of functional overlap. Section 2 of this book, "MS-DOS Functions Reference," gives detailed specifications for each of the handle functions, along with assembly-language examples. Handle File-Access Skeleton The following is a typical program sequence to access a file using the handle family of functions (Figure 8-7): 1. Get the filename from the user by means of the buffered input service (Int 21H Function 0AH) or from the command tail supplied by MS-DOS in the PSP. 2. Put a zero at the end of the file specification in order to create an ASCIIZ string. 3. Open the file using Int 21H Function 3DH and mode 2 (read/write access), or create the file using Int 21H Function 3CH. (Be sure to set the CX register to zero, so that you don't accidentally make a file with special attributes.) Save the handle that is returned. 4. Set the file pointer using Int 21H Function 42H. You may set the file-pointer position relative to one of three different locations: the start of the file, the current pointer position, or the end of the file. If you are performing sequential record I/O, you can usually skip this step because MS-DOS will maintain the file pointer for you automatically. 5. Read from the file (Int 21H Function 3FH) or write to the file (Int 21H Function 40H). Both of these functions require that the BX register contain the file's handle, the CX register contain the length of the record, and the DS:DX registers point to the data being transferred. Both return the actual number of bytes transferred in the AX register. In a read operation, if the number of bytes read is less than the number requested, the end of the file has been reached. In a write operation, if the number of bytes written is less than the number requested, the disk containing the file is full. Neither of these conditions is returned as an error code; that is, the carry flag is not set. 6. If the program is not finished processing the file, go to step 4; otherwise, close the file (Int 21H Function 3EH). Any normal exit from the program will also close all active handles. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ recsize equ 1024 ; file record size . . . mov ah,3dh ; open input file mov al,0 ; mode = read only mov dx,offset fname1 ; name of input file int 21h jc no_file ; jump if no file mov handle1,ax ; save token for file . . . mov ah,3ch ; create output file mov cx,0 ; attribute = normal mov dx,offset fname2 ; name of output file int 21h jc disk_full ; jump if create fails mov handle2,ax ; save token for file . next: . ; process next record . mov ah,3fh ; sequential read from mov bx,handle1 ; input file mov cx,recsize mov dx,offset buffer int 21h jc bad_read ; jump if read error or ax,ax ; check bytes transferred jz file_end ; jump if end of file . . . mov ah,40h ; sequential write to mov bx,handle2 ; output file mov cx,recsize mov dx,offset buffer int 21h jc bad_write ; jump if write error cmp ax,recsize ; whole record written? jne disk_full ; jump if disk is full . . . jmp next ; process next record . file_end: . ; reached end of input . mov ah,3eh ; close input file mov bx,handle1 int 21h . . . mov ah,3eh ; close output file mov bx,handle2 int 21h . . . mov ax,4c00h ; exit with return int 21h ; code of zero . . . fname1 db 'OLDFILE.DAT',0 ; name of input file fname2 db 'NEWFILE.DAT',0 ; name of output file handle1 dw 0 ; token for input file handle2 dw 0 ; token for output file buffer db recsize dup (?) ; buffer for file I/O ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-7. Skeleton of an assembly-language program that performs sequential processing on an input file and writes the results to an output file using the handle file and record functions. This code assumes that the DS and ES registers have already been set to point to the segment containing the buffers and filenames. Points to Remember Here is a summary of the pros and cons of using the handle file and record operations in your program. Compare this list with the one given earlier in the chapter for the FCB family of functions. Advantages: þ The handle calls provide direct support for I/O redirection and pipes with the standard input and output devices in a manner functionally similar to that used by UNIX/XENIX. þ The handle functions provide direct support for directories (the hierarchical file structure) and special file attributes. þ The handle calls support file sharing/locking and record locking in networking environments. þ Using the handle functions, the programmer can open channels to character devices and treat them as files. þ The handle calls make the use of random record access extremely easy. The current file pointer can be moved to any byte offset relative to the start of the file, the end of the file, or the current pointer position. Records of any length, up to an entire segment (65,535 bytes), can be read to any memory address in one operation. þ The handle functions have relatively good error reporting in MS-DOS version 2, and error reporting has been enhanced even further in MS-DOS versions 3.0 and later. þ Microsoft strongly encourages use of the handle family of functions in order to provide upward compatibility with MS OS/2. Disadvantages: þ There is a limit per program of 20 concurrently open files and devices using handles in MS-DOS versions 2.0 through 3.2. þ Minor gaps still exist in the implementation of the handle functions. For example, you must still use extended FCBs to change volume labels and to access the contents of the special files that implement directories. MS-DOS Error Codes When one of the handle file functions fails with the carry flag set, or when a program calls Int 21H Function 59H (Get Extended Error Information) following a failed FCB function or other system service, one of the following error codes may be returned: Value Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ MS-DOS version 2 error codes 01H Function number invalid 02H File not found 03H Path not found 04H Too many open files 05H Access denied 06H Handle invalid 07H Memory control blocks destroyed 08H Insufficient memory 09H Memory block address invalid 0AH (10) Environment invalid 0BH (11) Format invalid 0CH (12) Access code invalid 0DH (13) Data invalid 0EH (14) Unknown unit 0FH (15) Disk drive invalid 10H (16) Attempted to remove current directory 11H (17) Not same device 12H (18) No more files Mappings to critical-error codes 13H (19) Write-protected disk 14H (20) Unknown unit 15H (21) Drive not ready 16H (22) Unknown command 17H (23) Data error (CRC) 18H (24) Bad request-structure length 19H (25) Seek error 1AH (26) Unknown media type 1BH (27) Sector not found 1CH (28) Printer out of paper 1DH (29) Write fault 1EH (30) Read fault 1FH (31) General failure MS-DOS version 3 and later extended error codes 20H (32) Sharing violation 21H (33) File-lock violation 22H (34) Disk change invalid 23H (35) FCB unavailable 24H (36) Sharing buffer exceeded 25HÄ31H (37Ä49) Reserved 32H (50) Unsupported network request 33H (51) Remote machine not listening 34H (52) Duplicate name on network 35H (53) Network name not found 36H (54) Network busy 37H (55) Device no longer exists on network 38H (56) NetBIOS command limit exceeded 39H (57) Error in network adapter hardware 3AH (58) Incorrect response from network 3BH (59) Unexpected network error 3CH (60) Remote adapter incompatible 3DH (61) Print queue full 3EH (62) Not enough room for print file 3FH (63) Print file was deleted 40H (64) Network name deleted 41H (65) Network access denied 42H (66) Incorrect network device type 43H (67) Network name not found 44H (68) Network name limit exceeded 45H (69) NetBIOS session limit exceeded 46H (70) Temporary pause 47H (71) Network request not accepted 48H (72) Print or disk redirection paused 49HÄ4FH (73Ä79) Reserved 50H (80) File already exists 51H (81) Reserved 52H (82) Cannot make directory 53H (83) Fail on Int 24H (critical error) 54H (84) Too many redirections 55H (85) Duplicate redirection 56H (86) Invalid password 57H (87) Invalid parameter 58H (88) Net write fault ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Under MS-DOS versions 3.0 and later, you can also use Int 21H Function 59H to obtain other information about the error, such as the error locus and the recommended recovery action. Critical-Error Handlers In Chapter 5, we discussed how an application program can take over the Ctrl-C handler vector (Int 23H) and replace the MS-DOS default handler, to avoid losing control of the computer when the user enters a Ctrl-C or Ctrl-Break at the keyboard. Similarly, MS-DOS provides a critical-error-handler vector (Int 24H) that defines the routine to be called when unrecoverable hardware faults occur. The default MS-DOS critical-error handler is the routine that displays a message describing the error type and the cue Abort, Retry, Ignore? This message appears after such actions as the following: þ Attempting to open a file on a disk drive that doesn't contain a floppy disk or whose door isn't closed þ Trying to read a disk sector that contains a CRC error þ Trying to print when the printer is off line The unpleasant thing about MS-DOS's default critical-error handler is, of course, that if the user enters an A for Abort, the application that is currently executing is terminated abruptly and never has a chance to clean up and make a graceful exit. Intermediate files may be left on the disk, files that have been extended using FCBs are not properly closed so that the directory is updated, interrupt vectors may be left pointing into the transient program area, and so forth. To write a truly bombproof MS-DOS application, you must take over the critical-error-handler vector and point it to your own routine, so that your program intercepts all catastrophic hardware errors and handles them appropriately. You can use MS-DOS Int 21H Function 25H to alter the Int 24H vector in a well-behaved manner. When your application exits, MS-DOS will automatically restore the previous contents of the Int 24H vector from information saved in the program segment prefix. MS-DOS calls the critical-error handler for two general classes of errorsÄÄ disk-related and non-disk-relatedÄÄand passes different information to the handler in the registers for each of these classes. For disk-related errors, MS-DOS sets the registers as shown on the following page. (Bits 3Ä5 of the AH register are relevant only in MS-DOS versions 3.1 and later.) Register Bit(s) Significance ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ AH 7 0, to signify disk error 6 Reserved 5 0 = ignore response not allowed 1 = ignore response allowed 4 0 = retry response not allowed 1 = retry response allowed 3 0 = fail response not allowed 1 = fail response allowed 1Ä2 Area where disk error occurred 00 = MS-DOS area 01 = file allocation table 10 = root directory 11 = files area 0 0 = read operation 1 = write operation AL 0Ä7 Drive code (0 = A, 1 = B, and so forth) DI 0Ä7 Driver error code 8Ä15 Not used BP:SI Segment:offset of device-driver header ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ For non-disk-related errors, the interrupt was generated either as the result of a character-device error or because a corrupted memory image of the file allocation table was detected. In this case, MS-DOS sets the registers as follows: Register Bit(s) Significance ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ AH 7 1, to signify a non-disk error DI 0Ä7 Driver error code 8Ä15 Not used BP:SI Segment:offset of device-driver header ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ To determine whether the critical error was caused by a character device, use the address in the BP:SI registers to examine the device attribute word at offset 0004H in the presumed device-driver header. If bit 15 is set, then the error was indeed caused by a character device, and the program can inspect the name field of the driver's header to determine the device. At entry to a critical-error handler, MS-DOS has already disabled interrupts and set up the stack as shown in Figure 8-8. A critical-error handler cannot use any MS-DOS services except Int 21H Functions 01H through 0CH (Traditional Character I/O), Int 21H Function 30H (Get MS-DOS Version), and Int 21H Function 59H (Get Extended Error Information). These functions use a special stack so that the context of the original function (which generated the critical error) will not be lost. ÚÄÄÄÄÄÄĿĿ ³ Flags ³ ³ ÃÄÄÄÄÄÄÄ´ ³ Flags and CS:IP pushed ³ CS ³ ÃÄ on stack by original ÃÄÄÄÄÄÄÄ´ ³ Int 21H call ³ IP ³ ³ ÃÄÄÄÄÄÄĴ͵ÄSS:SP on entry to ³ ES ³ ³ Int 21H handler ÃÄÄÄÄÄÄÄ´ ³ ³ DS ³ ³ ÃÄÄÄÄÄÄÄ´ ³ ³ BP ³ ³ ÃÄÄÄÄÄÄÄ´ ³ ³ DI ³ ³ ÃÄÄÄÄÄÄÄ´ ÃÄ Registers at point of ³ SI ³ ³ original Int 21H call ÃÄÄÄÄÄÄÄ´ ³ ³ DX ³ ³ ÃÄÄÄÄÄÄÄ´ ³ ³ CX ³ ³ ÃÄÄÄÄÄÄÄ´ ³ ³ BX ³ ³ ÃÄÄÄÄÄÄÄ´ ³ ³ AX ³ ³ ÃÄÄÄÄÄÄĴ͵ ³ Flags ³ ³ ÃÄÄÄÄÄÄÄ´ ³ ³ CS ³ ÃÄ Return address for ÃÄÄÄÄÄÄÄ´ ³ Int 24H handler ³ IP ³ ³ ÀÄÄÄÄÄÄÙÄÙ ÀÄÄÄÄÄ SS:SP on entry to Int 24H handler Figure 8-8. The stack at entry to a critical-error handler. The critical-error handler should return to MS-DOS by executing an IRET, passing one of the following action codes in the AL register: Code Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 Ignore the error (MS-DOS acts as though the original function call had succeeded). 1 Retry the operation. 2 Terminate the process that encountered the error. 3 Fail the function (an error code is returned to the requesting process). Versions 3.1 and later only. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The critical-error handler should preserve all other registers and must not modify the device-driver header pointed to by BP:SI. A skeleton example of a critical-error handler is shown in Figure 8-9. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ; prompt message used by ; critical-error handler prompt db cr,lf,'Critical Error Occurred: ' db 'Abort, Retry, Ignore, Fail? $' keys db 'aArRiIfF' ; possible user response keys keys_len equ $-keys ; (both cases of each allowed) codes db 2,2,1,1,0,0,3,3 ; codes returned to MS-DOS kernel ; for corresponding response keys ; ; This code is executed during program's initialization ; to install the new critical-error handler. ; . . . push ds ; save our data segment mov dx,seg int24 ; DS:DX = handler address mov ds,dx mov dx,offset int24 mov ax,2524h ; function 25h = set vector int 21h ; transfer to MS-DOS pop ds ; restore data segment . . . ; ; This is the replacement critical-error handler. It ; prompts the user for Abort, Retry, Ignore, or Fail, and ; returns the appropriate code to the MS-DOS kernel. ; int24 proc far ; entered from MS-DOS kernel push bx ; save registers push cx push dx push si push di push bp push ds push es int24a: mov ax,seg prompt ; display prompt for user mov ds,ax ; using function 9 (print string mov es,ax ; terminated by $ character) mov dx,offset prompt mov ah,9 int 21h mov ah,1 ; get user's response int 21h ; function 1 = read one character mov di,offset keys ; look up code for response key mov cx,keys_len cld repne scasb jnz int24a ; prompt again if bad response ; set AL = action code for MS-DOS ; according to key that was entered: ; 0 = ignore, 1 = retry, 2 = abort, ; 3 = fail mov al,[di+keys_len-1] pop es ; restore registers pop ds pop bp pop di pop si pop dx pop cx pop bx iret ; exit critical-error handler int24 endp ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-9. A skeleton example of a replacement critical-error handler. Example Programs: DUMP.ASM and DUMP.C The programs DUMP.ASM (Figure 8-10) and DUMP.C (Figure 8-11) are parallel examples of the use of the handle file and record functions. The assembly-language version, in particular, illustrates features of a well-behaved MS-DOS utility: þ The program checks the version of MS-DOS to ensure that all the functions it is going to use are really available. þ The program parses the drive, path, and filename from the command tail in the program segment prefix. þ The program uses buffered I/O for speed. þ The program sends error messages to the standard error device. þ The program sends normal program output to the standard output device, so that the dump output appears by default on the system console but can be redirected to other character devices (such as the line printer) or to a file. The same features are incorporated into the C version of the program, but some of them are taken care of behind the scenes by the C runtime library. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ name dump page 55,132 title DUMP--display file contents ; ; DUMP--Display contents of file in hex and ASCII ; ; Build: C>MASM DUMP; ; C>LINK DUMP; ; ; Usage: C>DUMP unit:\path\filename.exe [ >device ] ; ; Copyright (C) 1988 Ray Duncan ; cr equ 0dh ; ASCII carriage return lf equ 0ah ; ASCII line feed tab equ 09h ; ASCII tab code blank equ 20h ; ASCII space code cmd equ 80h ; buffer for command tail blksize equ 16 ; input file record size stdin equ 0 ; standard input handle stdout equ 1 ; standard output handle stderr equ 2 ; standard error handle _TEXT segment word public 'CODE' assume cs:_TEXT,ds:_DATA,es:_DATA,ss:STACK dump proc far ; entry point from MS-DOS push ds ; save DS:0000 for final xor ax,ax ; return to MS-DOS, in case push ax ; function 4ch can't be used mov ax,_DATA ; make our data segment mov ds,ax ; addressable via DS register ; check MS-DOS version mov ax,3000h ; function 30h = get version int 21h ; transfer to MS-DOS cmp al,2 ; major version 2 or later? jae dump1 ; yes, proceed ; if MS-DOS 1.x, display ; error message and exit mov dx,offset msg3 ; DS:DX = message address mov ah,9 ; function 9 = print string int 21h ; transfer to MS-DOS ret ; then exit the old way dump1: ; check if filename present mov bx,offset cmd ; ES:BX = command tail call argc ; count command arguments cmp ax,2 ; are there 2 arguments? je dump2 ; yes, proceed ; missing filename, display ; error message and exit mov dx,offset msg2 ; DS:DX = message address mov cx,msg2_len ; CX = message length jmp dump9 ; go display it dump2: ; get address of filename mov ax,1 ; AX = argument number ; ES:BX still = command tail call argv ; returns ES:BX = address, ; and AX = length mov di,offset fname ; copy filename to buffer mov cx,ax ; CX = length dump3: mov al,es:[bx] ; copy one byte mov [di],al inc bx ; bump string pointers inc di loop dump3 ; loop until string done mov byte ptr [di],0 ; add terminal null byte mov ax,ds ; make our data segment mov es,ax ; addressable by ES too ; now open the file mov ax,3d00h ; function 3dh = open file ; mode 0 = read only mov dx,offset fname ; DS:DX = filename int 21h ; transfer to MS-DOS jnc dump4 ; jump, open successful ; open failed, display ; error message and exit mov dx,offset msg1 ; DS:DX = message address mov cx,msg1_len ; CX = message length jmp dump9 ; go display it dump4: mov fhandle,ax ; save file handle dump5: ; read block of file data mov bx,fhandle ; BX = file handle mov cx,blksize ; CX = record length mov dx,offset fbuff ; DS:DX = buffer mov ah,3fh ; function 3fh = read int 21h ; transfer to MS-DOS mov flen,ax ; save actual length cmp ax,0 ; end of file reached? jne dump6 ; no, proceed cmp word ptr fptr,0 ; was this the first read? jne dump8 ; no, exit normally ; display empty file ; message and exit mov dx,offset msg4 ; DS:DX = message address mov cx,msg4_len ; CX = length jmp dump9 ; go display it dump6: ; display heading at ; each 128-byte boundary test fptr,07fh ; time for a heading? jnz dump7 ; no, proceed ; display a heading mov dx,offset hdg ; DS:DX = heading address mov cx,hdg_len ; CX = heading length mov bx,stdout ; BX = standard output mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS dump7: call conv ; convert binary record ; to formatted ASCII ; display formatted output mov dx,offset fout ; DX:DX = output address mov cx,fout_len ; CX = output length mov bx,stdout ; BX = standard output mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS jmp dump5 ; go get another record dump8: ; close input file mov bx,fhandle ; BX = file handle mov ah,3eh ; function 3eh = close int 21h ; transfer to MS-DOS mov ax,4c00h ; function 4ch = terminate, ; return code = 0 int 21h ; transfer to MS-DOS dump9: ; display message on ; standard error device ; DS:DX = message address ; CX = message length mov bx,stderr ; standard error handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS mov ax,4c01h ; function 4ch = terminate, ; return code = 1 int 21h ; transfer to MS-DOS dump endp conv proc near ; convert block of data ; from input file mov di,offset fout ; clear output format mov cx,fout_len-2 ; area to blanks mov al,blank rep stosb mov di,offset fout ; convert file offset mov ax,fptr ; to ASCII for output call w2a mov bx,0 ; init buffer pointer conv1: mov al,[fbuff+bx] ; fetch byte from buffer mov di,offset foutb ; point to output area ; format ASCII part... ; store '.' as default mov byte ptr [di+bx],'.' cmp al,blank ; in range 20h-7eh? jb conv2 ; jump, not alphanumeric cmp al,7eh ; in range 20h-7eh? ja conv2 ; jump, not alphanumeric mov [di+bx],al ; store ASCII character conv2: ; format hex part... mov di,offset fouta ; point to output area add di,bx ; base addr + (offset*3) add di,bx add di,bx call b2a ; convert byte to hex inc bx ; advance through record cmp bx,flen ; entire record converted? jne conv1 ; no, get another byte ; update file pointer add word ptr fptr,blksize ret conv endp w2a proc near ; convert word to hex ASCII ; call with AX = value ; DI = addr for string ; returns AX, DI, CX destroyed push ax ; save copy of value mov al,ah call b2a ; convert upper byte pop ax ; get back copy call b2a ; convert lower byte ret w2a endp b2a proc near ; convert byte to hex ASCII ; call with AL = binary value ; DI = addr for string ; returns AX, DI, CX modified sub ah,ah ; clear upper byte mov cl,16 div cl ; divide byte by 16 call ascii ; quotient becomes the first stosb ; ASCII character mov al,ah call ascii ; remainder becomes the stosb ; second ASCII character ret b2a endp ascii proc near ; convert value 0-0fh in AL ; into "hex ASCII" character add al,'0' ; offset to range 0-9 cmp al,'9' ; is it > 9? jle ascii2 ; no, jump add al,'A'-'9'-1 ; offset to range A-F, ascii2: ret ; return AL = ASCII char ascii endp argc proc near ; count command-line arguments ; call with ES:BX = command line ; returns AX = argument count push bx ; save original BX and CX push cx ; for later mov ax,1 ; force count >= 1 argc1: mov cx,-1 ; set flag = outside argument argc2: inc bx ; point to next character cmp byte ptr es:[bx],cr je argc3 ; exit if carriage return cmp byte ptr es:[bx],blank je argc1 ; outside argument if ASCII blank cmp byte ptr es:[bx],tab je argc1 ; outside argument if ASCII tab ; otherwise not blank or tab, jcxz argc2 ; jump if already inside argument inc ax ; else found argument, count it not cx ; set flag = inside argument jmp argc2 ; and look at next character argc3: pop cx ; restore original BX and CX pop bx ret ; return AX = argument count argc endp argv proc near ; get address & length of ; command line argument ; call with ES:BX = command line ; AX = argument # ; returns ES:BX = address ; AX = length push cx ; save original CX and DI push di or ax,ax ; is it argument 0? jz argv8 ; yes, jump to get program name xor ah,ah ; initialize argument counter argv1: mov cx,-1 ; set flag = outside argument argv2: inc bx ; point to next character cmp byte ptr es:[bx],cr je argv7 ; exit if carriage return cmp byte ptr es:[bx],blank je argv1 ; outside argument if ASCII blank cmp byte ptr es:[bx],tab je argv1 ; outside argument if ASCII tab ; if not blank or tab... jcxz argv2 ; jump if already inside argument inc ah ; else count arguments found cmp ah,al ; is this the one we're looking for? je argv4 ; yes, go find its length not cx ; no, set flag = inside argument jmp argv2 ; and look at next character argv4: ; found desired argument, now ; determine its length... mov ax,bx ; save param starting address argv5: inc bx ; point to next character cmp byte ptr es:[bx],cr je argv6 ; found end if carriage return cmp byte ptr es:[bx],blank je argv6 ; found end if ASCII blank cmp byte ptr es:[bx],tab jne argv5 ; found end if ASCII tab argv6: xchg bx,ax ; set ES:BX = argument address sub ax,bx ; and AX = argument length jmp argvx ; return to caller argv7: xor ax,ax ; set AX = 0, argument not found jmp argvx ; return to caller argv8: ; special handling for argv = 0 mov ax,3000h ; check if DOS 3.0 or later int 21h ; (force AL = 0 in case DOS 1) cmp al,3 jb argv7 ; DOS 1 or 2, return null param mov es,es:[2ch] ; get environment segment from PSP xor di,di ; find the program name by xor al,al ; first skipping over all the mov cx,-1 ; environment variables... cld argv9: repne scasb ; scan for double null (can't use scasb ; SCASW since might be odd addr) jne argv9 ; loop if it was a single null add di,2 ; skip count word in environment mov bx,di ; save program name address mov cx,-1 ; now find its length... repne scasb ; scan for another null byte not cx ; convert CX to length dec cx mov ax,cx ; return length in AX argvx: ; common exit point pop di ; restore original CX and DI pop cx ret ; return to caller argv endp _TEXT ends _DATA segment word public 'DATA' fname db 64 dup (0) ; buffer for input filespec fhandle dw 0 ; token from PCDOS for input file flen dw 0 ; actual length read fptr dw 0 ; relative address in file fbuff db blksize dup (?) ; data from input file fout db 'nnnn' ; formatted output area db blank,blank fouta db 16 dup ('nn',blank) db blank foutb db 16 dup (blank),cr,lf fout_len equ $-fout hdg db cr,lf ; heading for each 128 bytes db 7 dup (blank) ; of formatted output db '0 1 2 3 4 5 6 7 ' db '8 9 A B C D E F',cr,lf hdg_len equ $-hdg msg1 db cr,lf db 'dump: file not found' db cr,lf msg1_len equ $-msg1 msg2 db cr,lf db 'dump: missing file name' db cr,lf msg2_len equ $-msg2 msg3 db cr,lf db 'dump: wrong MS-DOS version' db cr,lf,'$' msg4 db cr,lf db 'dump: empty file' db cr,lf msg4_len equ $-msg4 _DATA ends STACK segment para stack 'STACK' db 64 dup (?) STACK ends end dump ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-10. The assembly-language version: DUMP.ASM. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /* DUMP.C Displays the binary contents of a file in hex and ASCII on the standard output device. Compile: C>CL DUMP.C Usage: C>DUMP unit:path\filename.ext Copyright (C) 1988 Ray Duncan */ #include #include #include #define REC_SIZE 16 /* input file record size */ main(int argc, char *argv[]) { int fd; /* input file handle */ int status = 0; /* status from file read */ long fileptr = 0L; /* current file byte offset */ char filebuf[REC_SIZE]; /* data from file */ if(argc != 2) /* abort if missing filename */ { fprintf(stderr,"\ndump: wrong number of parameters\n"); exit(1); } /* open file in binary mode, abort if open fails */ if((fd = open(argv[1],O_RDONLY | O_BINARY) ) == -1) { fprintf(stderr, "\ndump: can't find file %s \n", argv[1]); exit(1); } /* read and dump records until end of file */ while((status = read(fd,filebuf,REC_SIZE) ) != 0) { dump_rec(filebuf, fileptr, status); fileptr += REC_SIZE; } close(fd); /* close input file */ exit(0); /* return success code */ } /* Display record (16 bytes) in hex and ASCII on standard output */ dump_rec(char *filebuf, long fileptr, int length) { int i; /* index to current record */ if(fileptr % 128 == 0) /* display heading if needed */ printf("\n\n 0 1 2 3 4 5 6 7 8 9 A B C D E F"); printf("\n%04lX ",fileptr); /* display file offset */ /* display hex equivalent of each byte from file */ for(i = 0; i < length; i++) printf(" %02X", (unsigned char) filebuf[i]); if(length != 16) /* spaces if partial record */ for (i=0; i<(16-length); i++) printf(" "); /* display ASCII equivalent of each byte from file */ printf(" "); for(i = 0; i < length; i++) { if(filebuf[i] < 32 || filebuf[i] > 126) putchar('.'); else putchar(filebuf[i]); } } ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 8-11. The C version: DUMP.C. The assembly-language version of the DUMP program contains a number of subroutines that you may find useful in your own programming efforts. These include the following: Subroutine Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ argc Returns the number of command-line arguments. argv Returns the address and length of a particular command-line argument. w2a Converts a binary word (16 bits) into hex ASCII for output. b2a Converts a binary byte (8 bits) into hex ASCII for output. ascii Converts 4 bits into a single hex ASCII character. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ It is interesting to compare these two equivalent programs. The C program contains only 77 lines, whereas the assembly-language program has 436 lines. Clearly, the C source code is less complex and easier to maintain. On the other hand, if size and efficiency are important, the DUMP.EXE file generated by the C compiler is 8563 bytes, whereas the assembly-language DUMP.EXE file is only 1294 bytes and runs twice as fast as the C program. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 9 Volumes and Directories Each file in an MS-DOS system is uniquely identified by its name and its location. The location, in turn, has two components: the logical drive that contains the file and the directory on that drive where the filename can be found. Logical drives are specified by a single letter followed by a colon (for example, A:). The number of logical drives in a system is not necessarily the same as the number of physical drives; for example, it is common for large fixed-disk drives to be divided into two or more logical drives. The key aspect of a logical drive is that it contains a self-sufficient file system; that is, it contains one or more directories, zero or more complete files, and all the information needed to locate the files and directories and to determine which disk space is free and which is already in use. Directories are simply lists or catalogs. Each entry in a directory consists of the name, size, starting location, attributes, and last modification date and time of a file or another directory that the disk contains. The detailed information about the location of every block of data assigned to a file or directory is in a separate control area on the disk called the file allocation table (FAT). (See Chapter 10 for a detailed discussion of the internal format of directories and the FAT.) Every disk potentially has two distinct kinds of directories: the root directory and all other directories. The root directory is always present and has a maximum number of entries, determined when the disk is formatted; this number cannot be changed. The subdirectories of the root directory, which may or may not be present on a given disk, can be nested to any level and can grow to any size (Figure 9-1). This is the hierarchical, or tree, directory structure referred to in earlier chapters. Every directory has a name, except for the root directory, which is designated by a single backslash (\) character. MS-DOS keeps track of a "current drive" for the system and uses this drive when a file specification does not include an explicit drive code. Similarly, MS-DOS maintains a "current directory" for each logical drive. You can select any particular directory on a drive by naming in orderÄÄ either from the root directory or relative to the current directoryÄÄthe directories that lead to its location in the tree structure. Such a list of directories, separated by backslash delimiters, is called a path. When a complete path from the root directory is prefixed by a logical drive code and followed by a filename and extension, the resulting string is a fully qualified filename and unambiguously specifies a file. ÚÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Drive ³ ³ identifier ³ ÀÄÄÄÄÄÂÄÄÄÄÄÄÙ ³ ÚÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄ¿ ³ Root directory ³ ³ (volume label) ³ ÀÄÂÄÄÂÄÄÂÄÄÄÂÄÄÂÄÙ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÚÄÄÄÄÁÄÄÄ¿ ÚÄÄÄÄÁÄÄÄÄÄÄ¿ ÚÄÄÄÁÄÄÄÄ¿ ÚÄÄÄÄÄÄÁÄÄÄÄ¿ ÚÄÄÄÁÄÄÄÄ¿ ³ File A ³ ³ Directory ³ ³ File B ³ ³ Directory ³ ³ File C ³ ÀÄÄÄÄÄÄÄÄÙ ÀÄÂÄÄÄÄÄÄÄÂÄÙ ÀÄÄÄÄÄÄÄÄÙ ÀÄÂÄÄÄÄÄÄÄÄÄÙ ÀÄÂÄÄÄÄÄÄÙ ³ ³ ³ ³ ³ ³ ³ ³ ÚÄÄÄÄÄÙ ³ ³ ³ ³ ³ ³ ³ ÚÄÄÄÄÁÄÄÄÄÄÄ¿ ÚÄÄÁÄÄÄÄÄ¿ ÚÄÄÄÄÄÁÄÄ¿ ÚÄÄÄÁÄÄÄÄ¿ ³ Directory ³ ³ File D ³ ³ File E ³ ³ File F ³ ÀÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÙ Figure 9-1. An MS-DOS file-system structure. Drive and Directory Control You can examine, select, create, and delete disk directories interactively with the DIR, CHDIR (CD), MKDIR (MD), and RMDIR (RD) commands. You can select a new current drive by entering the letter of the desired drive, followed by a colon. MS-DOS provides the following Int 21H functions to give application programs similar control over drives and directories: Function Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0EH Select current drive. 19H Get current drive. 39H Create directory. 3AH Remove directory. 3BH Select current directory. 47H Get current directory. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The two functions that deal with disk drives accept or return a binary drive codeÄÄ0 represents drive A, 1 represents drive B, and so on. This differs from most other MS-DOS functions, which use 0 to indicate the current drive, 1 for drive A, and so on. The first three directory functions in the preceding list require an ASCIIZ string that describes the path to the desired directory. As with the handle-based file open and create functions, the address of the ASCIIZ string is passed in the DS:DX registers. On return, the carry flag is clear if the function succeeds or set if the function failed, with an error code in the AX register. The directory functions can fail for a variety of reasons, but the most common cause of an error is that some element of the indicated path does not exist. The last function in the preceding list, Int 21H Function 47H, allows you to obtain an ASCIIZ path for the current directory on the specified or default drive. MS-DOS supplies the path string without the drive identifier or a leading backslash. Int 21H Function 47H is most commonly used with Int 21H Function 19H to build fully qualified filenames. Such filenames are desirable because they remain valid if the user changes the current drive or directory. Section 2 of this book, "MS-DOS Functions Reference," gives detailed information on the drive and directory control functions. Searching Directories When you request an open operation on a file, you are implicitly performing a search of a directory. MS-DOS examines each entry of the directory to find a match for the filename you have given as an argument; if the file is found, MS-DOS copies certain information from the directory into a data structure that it can use to control subsequent read or write operations to the file. Thus, if you wish to test for the existence of a specific file, you need only perform an open operation and observe whether it is successful. (If it is, you should, of course, perform a subsequent close operation to avoid needless expenditure of handles.) Sometimes you may need to perform more elaborate searches of a disk directory. Perhaps you wish to find all the files with a certain extension, a file with a particular attribute, or the names of the subdirectories of a certain directory. Although the locations of a disk's directories and the specifics of the entries that are found in them are of necessity hardware dependent (for example, interpretation of the field describing the starting location of a file depends upon the physical disk format), MS-DOS does provide functions that will allow examination of a disk directory in a hardware-independent fashion. In order to search a disk directory successfully, you must understand two types of MS-DOS search services. The first type is the "search for first" function, which accepts a file specificationÄÄpossibly including wildcard charactersÄÄand looks for the first matching file in the directory of interest. If it finds a match, the function fills a buffer owned by the requesting program with information about the file; if it does not find a match, it returns an error flag. A program can call the second type of search service, called "search for next," only after a successful "search for first." If the file specification that was originally passed to "search for first" included wildcard characters and at least one matching file was present, the program can call "search for next" as many times as necessary to find all additional matching files. Like "search for first," "search for next" returns information about the matched files in a buffer designated by the requesting program. When it can find no more matching files, "search for next" returns an error flag. As with nearly every other operation, MS-DOS provides two parallel sets of directory-searching services: Action FCB function Handle function ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Search for first 11H 4EH Search for next 12H 4FH ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The FCB directory functions allow searches to match a filename and extension, both possibly containing wildcard characters, within the current directory for the specified or current drive. The handle directory functions, on the other hand, allow a program to perform searches within any directory on any drive, regardless of the current directory. Searches that use normal FCBs find only normal files. Searches that use extended FCBs, or the handle-type functions, can be qualified with file attributes. The attribute bits relevant to searches are as follows: Bit Significance ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 Read-only file 1 Hidden file 2 System file 3 Volume label 4 Directory 5 Archive needed (set when file modified) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The remaining bits of a search function's attribute parameter should be zero. When any of the preceding attribute bits are set, the search function returns all normal files plus any files with the specified attributes, except in the case of the volume-label attribute bit, which receives special treatment as described later in this chapter. Note that by setting bit 4 you can include directories in a search, exactly as though they were files. Both the FCB and handle directory-searching functions require that the disk transfer area address be set (with Int 21H Function 1AH), before the call to "search for first," to point to a working buffer for use by MS-DOS. The DTA address should not be changed between calls to "search for first" and "search for next." When it finds a matching file, MS-DOS places the information about the file in the buffer and then inspects the buffer on the next "search for next" call, to determine where to resume the search. The format of the data returned in the buffer is different for the FCB and handle functions, so read the detailed descriptions in Section 2 of this book, "MS-DOS Functions Reference," before attempting to interpret the buffer contents. Figures 9-2 and 9-3 provide equivalent examples of searches for all files in a given directory that have the .ASM extension, one example using the FCB directory functions (Int 21H Functions 11H and 12H) and the other using the handle functions (Int 21H Functions 4EH and 4FH). (Both programs use the handle write function with the standard output handle to display the matched filenames, to avoid introducing tangential differences in the listings.) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ start: ; set DTA address for buffer ; used by search functions mov dx,seg buff ; DS:DX = buffer address mov ds,dx mov dx,offset buff mov ah,1ah ; function 1ah = search for first int 21h ; transfer to MS-DOS ; search for first match... mov dx,offset fcb ; DS:DX = FCB address mov ah,11h ; function 11h = search for first int 21h ; transfer to MS-DOS or al,al ; any matches at all? jnz exit ; no, quit disp: ; go to a new line... mov dx,offset crlf ; DS:DX = CR-LF string mov cx,2 ; CX = string length mov bx,1 ; BX = standard output handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS ; display matching file mov dx,offset buff+1 ; DS:DX = filename mov cx,11 ; CX = length mov bx,1 ; BX = standard output handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS ; search for next match... mov dx,offset fcb ; DS:DX = FCB address mov ah,12h ; function 12h = search for next int 21h ; transfer to MS-DOS or al,al ; any more matches? jz disp ; yes, go show filename exit: ; final exit point mov ax,4c00h ; function 4ch = terminate, ; return code = 0 int 21h ; transfer to MS-DOS . . . crlf db 0dh,0ah ; ASCII carriage return- ; linefeed string fcb db 0 ; drive = current db 8 dup ('?') ; filename = wildcard db 'ASM' ; extension = ASM db 25 dup (0) ; remainder of FCB = zero buff db 64 dup (0) ; receives search results ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 9-2. Example of an FCB-type directory search using Int 21H Functions 11H and 12H. This routine displays the names of all files in the current directory that have the .ASM extension. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ start: ; set DTA address for buffer ; used by search functions mov dx,seg buff ; DS:DX = buffer address mov ds,dx mov dx,offset buff mov ah,1ah ; function 1ah = search for first int 21h ; transfer to MS-DOS ; search for first match... mov dx,offset fname ; DS:DX = wildcard filename mov cx,0 ; CX = normal file attribute mov ah,4eh ; function 4eh = search for first int 21h ; transfer to MS-DOS jc exit ; quit if no matches at all disp: ; go to a new line... mov dx,offset crlf ; DS:DX = CR-LF string mov cx,2 ; CX = string length mov bx,1 ; BX = standard output handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS ; find length of filename... mov cx,0 ; CX will be char count ; DS:SI = start of name mov si,offset buff+30 disp1: lodsb ; get next character or al,al ; is it null character? jz disp2 ; yes, found end of string inc cx ; else count characters jmp disp1 ; and get another disp2: ; display matching file... ; CX already contains length ; DS:DX = filename mov dx,offset buff+30 mov bx,1 ; BX = standard output handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS ; find next matching file... mov ah,4fh ; function 4fh = search for next int 21h ; transfer to MS-DOS jnc disp ; jump if another match found exit: ; final exit point mov ax,4c00h ; function 4ch = terminate, ; return code = 0 int 21h ; transfer to MS-DOS . . . crlf db 0dh,0ah ; ASCII carriage return- ; linefeed string fname db '*.ASM',0 ; ASCIIZ filename to ; be matched buff db 64 dup (0) ; receives search results ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 9-3. Example of a handle-type directory search using Int 21H Functions 4EH and 4FH. This routine also displays the names of all files in the current directory that have a .ASM extension. Moving Files The rename file function that was added in MS-DOS version 2.0, Int 21H Function 56H, has the little-advertised capability to move a file from one directory to another. The function has two ASCIIZ parameters: the "old" and "new" names for the file. If the old and new paths differ, MS-DOS moves the file; if the filename or extension components differ, MS-DOS renames the file. MS-DOS can carry out both of these actions in the same function call. Of course, the old and new directories must be on the same drive, because the file's actual data is not moved at all; only the information that describes the file is removed from one directory and placed in another directory. Function 56H fails if the two ASCIIZ strings include different logical-drive codes, if the file is read-only, or if a file with the same name and location as the "new" filename already exists. The FCB-based rename file service, Int 21H Function 17H, works only on the current directory and cannot be used to move files. Volume Labels Support for volume labels was first added to MS-DOS in version 2.0. A volume label is an optional name of from 1 to 11 characters that the user assigns to a disk during a FORMAT operation. You can display a volume label with the DIR, TREE, CHKDSK, or VOL command. Beginning with MS-DOS version 3.0, you can use the LABEL command to add, display, or alter the label after formatting. In MS-DOS version 4, the FORMAT program also assigns a semi-random 32-bit binary ID to each disk it formats; you can display this value, but you cannot change it. The distinction between volumes and drives is important. A volume label is associated with a specific storage medium. A drive identifier (such as A) is associated with a physical device that a storage medium can be mounted on. In the case of fixed-disk drives, the medium associated with a drive identifier does not change (hence the name). In the case of floppy disks or other removable media, the disk accessed with a given drive identifier might have any volume label or none at all. Hence, volume labels do not take the place of the logical-drive identifier and cannot be used as part of a pathname to identify a file. In fact, in MS-DOS version 2, the system does not use volume labels internally at all. In MS-DOS versions 3.0 and later, a disk driver can use volume labels to detect whether the user has replaced a disk while a file is open; this use is optional, however, and is not implemented in all systems. MS-DOS volume labels are implemented as a special type of entry in a disk's root directory. The entry contains a time-and-date stamp and has an attribute value of 8 (i.e., bit 3 set). Except for the attribute, a volume label is identical to the directory entry for a file that was created but never had any data written into it, and you can manipulate volume labels with Int 21H functions much as you manipulate files. However, a volume label receives special handling at several levels: þ When you create a volume label after a disk is formatted, MS-DOS always places it in the root directory, regardless of the current directory. þ A disk can contain only one volume label; attempts to create additional volume labels (even with different names) will fail. þ MS-DOS always carries out searches for volume labels in the root directory, regardless of the current directory, and does not also return all normal files. In MS-DOS version 2, support for volume labels is not completely integrated into the handle file functions, and you must use extended FCBs instead to manipulate volume labels. For example, the code in Figure 9-4 searches for the volume label in the root directory of the current drive. You can also change volume labels with extended FCBs and the rename file function (Int 21H Function 17H), but you should not attempt to remove an existing volume label with Int 21H Function 13H under MS-DOS version 2, because this operation can damage the disk's FAT in an unpredictable manner. In MS-DOS versions 3.0 and later, you can create a volume label in the expected manner, using Int 21H Function 3CH and an attribute of 8, and you can use the handle-type "search for first" function (4EH) to obtain an existing volume label for a logical drive (Figure 9-5). However, you still must use extended FCBs to change a volume label. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ buff db 64 dup (?) ; receives search results xfcb db 0ffh ; flag signifying extended FCB db 5 dup (0) ; reserved db 8 ; volume attribute byte db 0 ; drive code (0 = current) db 11 dup ('?') ; wildcard filename and extension db 25 dup (0) ; remainder of FCB (not used) . . . ; set DTA address for buffer ; used by search functions mov dx,seg buff ; DS:DX = buffer address mov ds,dx mov dx,offset buff mov ah,1ah ; function 1ah = set DTA int 21h ; transfer to MS-DOS ; now search for label... ; DS:DX = extended FCB mov dx,offset xfcb mov ah,11h ; function 11h = search for first int 21h ; transfer to MS-DOS cmp al,0ffh ; search successful? je no_label ; jump if no volume label . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 9-4. A volume-label search under MS-DOS version 2, using an extended file control block. If the search is successful, the volume label is returned in buff, formatted in the filename and extension fields of an extended FCB. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ buff db 64 dup (?) ; receives search results wildcd db '*.*',0 ; wildcard ASCIIZ filename . . . ; set DTA address for buffer ; used by search functions mov dx,seg buff ; DS:DX = buffer address mov ds,dx mov dx,offset buff mov ah,1ah ; function 1ah = set DTA int 21h ; transfer to MS-DOS ; now search for label... ; DS:DX = ASCIIZ string mov dx,offset wildcd mov cx,8 ; CX = volume attribute mov ah,4eh ; function 4eh = search for first int 21h ; transfer to MS-DOS jc no_label ; jump if no volume label . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 9-5. A volume-label search under MS-DOS version 3, using the handle-type file functions. If the search is successful (carry flag returned clear), the volume name is placed at location buff+1EH in the form of an ASCIIZ string. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 10 Disk Internals MS-DOS disks are organized according to a rather rigid scheme that is easily understood and therefore easily manipulated. Although you will probably never need to access the special control areas of a disk directly, an understanding of their internal structure leads to a better understanding of the behavior and performance of MS-DOS as a whole. From the application programmer's viewpoint, MS-DOS presents disk devices as logical volumes that are associated with a drive code (A, B, C, and so on) and that have a volume name (optional), a root directory, and from zero to many additional directories and files. MS-DOS shields the programmer from the physical characteristics of the medium by providing a battery of disk services through Int 21H. Using these services, the programmer can create, open, read, write, close, and delete files in a uniform way, regardless of the disk drive's size, speed, number of read/write heads, number of tracks, and so forth. Requests from an application program for file operations actually go through two levels of translation before resulting in the physical transfer of data between the disk device and random-access memory: 1. Beneath the surface, MS-DOS views each logical volume, whether it is an entire physical unit such as a floppy disk or only a part of a fixed disk, as a continuous sequence of logical sectors, starting at sector 0. (A logical disk volume can also be implemented on other types of storage. For example, RAM disks map a disk structure onto an area of random-access memory.) MS-DOS translates an application program's Int 21H file-management requests into requests for transfers of logical sectors, using the information found in the volume's directories and allocation tables. (For those rare situations where it is appropriate, programs can also access logical sectors directly with Int 25H and Int 26H.) 2. MS-DOS then passes the requests for logical sectors to the disk device's driver, which maps them onto actual physical addresses (head, track, and sector). Disk drivers are extremely hardware dependent and are always written in assembly language for maximum speed. In most versions of MS-DOS, a driver for IBM-compatible floppy- and fixed-disk drives is built into the MS-DOS BIOS module (IO.SYS) and is always loaded during system initialization; you can install additional drivers for non-IBM-compatible disk devices by including the appropriate DEVICE directives in the CONFIG.SYS file. Each MS-DOS logical volume is divided into several fixed-size control areas and a files area (Figure 10-1). The size of each control area depends on several factorsÄÄthe size of the volume and the version of FORMAT used to initialize the volume, for exampleÄÄbut all of the information needed to interpret the structure of a particular logical volume can be found on the volume itself in the boot sector. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Boot sector ³ ³ Reserved area ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File allocation table #1 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Possible additional copies of FAT ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Root directory ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ ³ Files area ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 10-1. Map of a typical MS-DOS logical volume. The boot sector (logical sector 0) contains the OEM identification, BIOS parameter block (BPB), and disk bootstrap. The remaining sectors are divided among an optional reserved area, one or more copies of the file allocation table, the root directory, and the files area. The Boot Sector Logical sector 0, known as the boot sector, contains all of the critical information regarding the disk medium's characteristics (Figure 10-2). The first byte in the sector is always an 80x86 jump instructionÄÄeither a normal intrasegment JMP (opcode 0E9H) followed by a 16-bit displacement or a "short" JMP (opcode 0EBH) followed by an 8-bit displacement and then by an NOP (opcode 90H). If neither of these two JMP opcodes is present, the disk has not been formatted or was not formatted for use with MS-DOS. (Of course, the presence of the JMP opcode does not in itself ensure that the disk has an MS-DOS format.) Following the initial JMP instruction is an 8-byte field that is reserved by Microsoft for OEM identification. The disk-formatting program, which is specialized for each brand of computer, disk controller, and medium, fills in this area with the name of the computer manufacturer and the manufacturer's internal MS-DOS version number. 00H ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ E9 XX XX or EB XX 90 ³ 03H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ OEM name and version ³ ³ (8 bytes) ³ OBH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´Ä¿ ³ Bytes per sector (2 bytes) ³ ³ ODH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Sectors per allocation unit (1 byte) ³ ³ 0EH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Reserved sectors, starting at 0 (2 bytes) ³ ³ 10H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Number of FATs (1 byte) ³ B 11H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ P ³ Number of root-directory entries (2 bytes) ³ B 13H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Total sectors in logical volume (2 bytes) ³ ³ 15H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ MS-DOS ³ Media descriptor byte ³ ³ version 2.0 16H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Number of sectors per FAT (2 bytes) ³ ³ 18H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĴ͵ ³ Sectors per track (2 bytes) ³ ³ 1AH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Number of heads (2 bytes) ³ ³ MS-DOS 1CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ version 3.0 ³ Number of hidden sectors (4 bytes) ³Íµ 20H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ MS-DOS ³ Total sectors in logical volume ³ ³ version 4.0 ³ (MS-DOS 4.0 and volume size >32 MB) ³ ³ 24H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĴ͵ ³ Physical drive number ³ ³ 25H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Reserved ³ ³ 26H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Extended boot signature record (29H) ³ ³ Additional 27H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ MS-DOS 4.0 ³ 32-bit binary volume ID ³ ³ information 2BH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Volume label (11 bytes) ³ ³ 36H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ ³ Reserved (8 bytes) ³ ³ 3EH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ÄÙ ³ Bootstrap ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 10-2. Map of the boot sector of an MS-DOS disk. Note the JMP at offset 0, the OEM identification field, the MS-DOS version 2 compatible BIOS parameter block (bytes 0BHÄ17H), the three additional WORD fields for MS-DOS version 3, the double-word number-of-sectors field and 32-bit binary volume ID for MS-DOS version 4.0, and the bootstrap code. The third major component of the boot sector is the BIOS parameter block (BPB) in bytes 0BH through 17H. (Additional fields are present in MS-DOS versions 3.0 and later.) This data structure describes the physical disk characteristics and allows the device driver to calculate the proper physical disk address for a given logical-sector number; it also contains information that is used by MS-DOS and various system utilities to calculate the address and size of each of the disk control areas (file allocation tables and root directory). The final element of the boot sector is the disk bootstrap routine. The disk bootstrap is usually read into memory by the ROM bootstrap, which is executed automatically when the computer is turned on. The ROM bootstrap is usually just smart enough to home the head of the disk drive (move it to track 0), read the first physical sector into RAM at a predetermined location, and jump to it. The disk bootstrap is more sophisticated. It calculates the physical disk address of the beginning of the files area, reads the files containing the operating system into memory, and transfers control to the BIOS module at location 0070:0000H. (See Chapter 2.) Figures 10-3 and 10-4 show a partial hex dump and disassembly of a PC-DOS 3.3 floppy-disk boot sector. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 EB 34 90 49 42 4D 20 20 33 2E 33 00 02 02 01 00 .4.IBM 3.3..... 0010 02 70 00 D0 02 FD 02 00 09 00 02 00 00 00 00 00 .p.............. 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 12 ................ 0030 00 00 00 00 01 00 FA 33 C0 8E D0 BC 00 7C 16 07 .......3.....|.. . . . 01C0 0D 0A 44 69 73 6B 20 42 6F 6F 74 20 66 61 69 6C ..Disk Boot fail 01D0 75 72 65 0D 0A 00 49 42 4D 42 49 4F 20 20 43 4F ure...IBMBIO CO 01E0 4D 49 42 4D 44 4F 53 20 20 43 4F 4D 00 00 00 00 MIBMDOS COM.... 01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA ..............U. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-3. Partial hex dump of the boot sector (track 0, head 0, sector 1) of a PC-DOS version 3.3 floppy disk. This sector contains the OEM identification, a copy of the BIOS parameter block describing the medium, and the bootstrap routine that reads the BIOS into memory and transfers control to it. See also Figures 10-2 and 10-4. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ jmp $+54 ; jump to bootstrap nop db 'IBM 3.3' ; OEM identification ; BIOS parameter block dw 512 ; bytes per sector db 2 ; sectors per cluster dw 1 ; reserved sectors db 2 ; number of FATs dw 112 ; root directory entries dw 720 ; total sectors db 0fdh ; media descriptor byte dw 2 ; sectors per FAT dw 9 ; sectors per track dw 2 ; number of heads dd 0 ; hidden sectors . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-4. Partial disassembly of the boot sector shown in Figure 10-3. The Reserved Area The boot sector is actually part of a reserved area that can span from one to several sectors. The reserved-sectors word in the BPB, at offset 0EH in the boot sector, describes the size of this area. Remember that the number in the BPB field includes the boot sector itself, so if the value is 1 (as it is on IBM PC floppy disks), the length of the reserved area is actually 0 sectors. The File Allocation Table When a file is created or extended, MS-DOS assigns it groups of disk sectors from the files area in powers of 2. These are known as allocation units or clusters. The number of sectors per cluster for a given medium is defined in the BPB and can be found at offset 0DH in the disk's boot sector. Below are some example cluster sizes: Disk type Power of 2 Sectors/cluster ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5.25" 180 KB floppy disk 0 1 5.25" 360 KB floppy disk 1 2 PC/AT fixed disk 2 4 PC/XT fixed disk 3 8 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The file allocation table (FAT) is divided into fields that correspond directly to the assignable clusters on the disk. These fields are 12 bits in MS-DOS versions 1 and 2 and may be either 12 bits or 16 bits in versions 3.0 and later, depending on the size of the medium (12 bits if the disk contains fewer than 4087 clusters, 16 bits otherwise). The first two fields in the FAT are always reserved. On IBM-compatible media, the first 8 bits of the first reserved FAT entry contain a copy of the media descriptor byte, which is also found in the BPB in the boot sector. The second, third, and (if applicable) fourth bytes, which constitute the remainder of the first two reserved FAT fields, always contain 0FFH. The currently defined IBM-format media descriptor bytes are as follows: MS-DOS version where first Descriptor Medium supported ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0F0H 3.5" floppy disk, 2-sided, 18-sector 3.3 0F8H Fixed disk 2.0 0F9H 5.25" floppy disk, 2-sided, 15-sector 3.0 3.5" floppy disk, 2-sided, 9-sector 3.2 0FCH 5.25" floppy disk, 1-sided, 9-sector 2.0 0FDH 5.25" floppy disk, 2-sided, 9-sector 2.0 8" floppy disk, 1-sided, single-density 0FEH 5.25" floppy disk, 1-sided, 8-sector 1.0 8" floppy disk, 1-sided, single-density 8" floppy disk, 2-sided, double-density 0FFH 5.25" floppy disk, 2-sided, 8-sector 1.1 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The remainder of the FAT entries describe the use of their corresponding disk clusters. The contents of the FAT fields are interpreted as follows: Value Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ (0)000H Cluster available (F)FF0Ä(F)FF6H Reserved cluster (F)FF7H Bad cluster, if not part of chain (F)FF8Ä(F)FFFH Last cluster of file (X)XXX Next cluster in file ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Each file's entry in a directory contains the number of the first cluster assigned to that file, which is used as an entry point into the FAT. From the entry point on, each FAT slot contains the cluster number of the next cluster in the file, until a last-cluster mark is encountered. At the computer manufacturer's option, MS-DOS can maintain two or more identical copies of the FAT on each volume. MS-DOS updates all copies simultaneously whenever files are extended or the directory is modified. If access to a sector in a FAT fails due to a read error, MS-DOS tries the other copies until a successful disk read is obtained or all copies are exhausted. Thus, if one copy of the FAT becomes unreadable due to wear or a software accident, the other copies may still make it possible to salvage the files on the disk. As part of its procedure for checking the integrity of a disk, the CHKDSK program compares the multiple copies (usually two) of the FAT to make sure they are all readable and consistent. The Root Directory Following the file allocation tables is an area known in MS-DOS versions 2.0 and later as the root directory. (Under MS-DOS version 1, it was the only directory on the disk.) The root directory contains 32-byte entries that describe files, other directories, and the optional volume label (Figure 10-5). An entry beginning with the byte value E5H is available for reuse; it represents a file or directory that has been erased. An entry beginning with a null (zero) byte is the logical end-of-directory; that entry and all subsequent entries have never been used. 00H ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Filename ³ Note 1 08H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Extension ³ 0BH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File attribute ³ Note 2 0CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Reserved ³ 16H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Time created or last updated ³ Note 3 18H ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Date created or last updated ³ Note 4 1AH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Starting cluster ³ 1CH ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ File size, 4 bytes ³ Note 5 20H ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 10-5. Format of a single entry in a disk directory. Total length is 32 bytes (20H bytes). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Notes for Figure 10-5 1. The first byte of the filename field of a directory entry may contain the following special information: Value Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00H Directory entry has never been used; end of occupied portion of directory. 05H First character of filename is actually E5H. 2EH Entry is an alias for the current or parent directory. If the next byte is also 2EH, the cluster field contains the cluster number of the parent directory (zero if the parent directory is the root directory). E5H File has been erased. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 2. The attribute byte of the directory entry is mapped as follows: Bit Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 Read-only; attempts to open file for write or to delete file will fail. 1 Hidden file; excluded from normal searches. 2 System file; excluded from normal searches. 3 Volume label; can exist only in root directory. 4 Directory; excluded from normal searches. 5 Archive bit; set whenever file is modified. 6 Reserved. 7 Reserved. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 3. The time field is encoded as follows: Bits Contents ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00HÄ04H Binary number of 2-second increments (0Ä29, corresponding to 0Ä58 seconds) 05HÄ0AH Binary number of minutes (0Ä59) 0BHÄ0FH Binary number of hours (0Ä23) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 4. The date field is encoded as follows: Bits Contents ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00HÄ04H Day of month (1Ä31) 05HÄ08H Month (1Ä12) 09HÄ0FH Year (relative to 1980) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 5. The file-size field is interpreted as a 4-byte integer, with the low-order 2 bytes of the number stored first. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The root directory has a number of special properties. Its size and position are fixed and are determined by the FORMAT program when a disk is initialized. This information can be obtained from the boot sector's BPB. If the disk is bootable, the first two entries in the root directory always describe the files containing the MS-DOS BIOS and the MS-DOS kernel. The disk bootstrap routine uses these entries to bring the operating system into memory and start it up. Figure 10-6 shows a partial hex dump of the first sector of the root directory on a bootable PC-DOS 3.3 floppy disk. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 49 42 4D 42 49 4F 20 20 43 4F 4D 27 00 00 00 00 IBMBIO COM'.... 0010 00 00 00 00 00 00 00 60 72 0E 02 00 54 56 00 00 .......'r...TV.. 0020 49 42 4D 44 4F 53 20 20 43 4F 4D 27 00 00 00 00 IBMDOS COM'.... 0030 00 00 00 00 00 00 00 60 71 0E 18 00 CF 75 00 00 .......'q....u.. 0040 43 4F 4D 4D 41 4E 44 20 43 4F 4D 20 00 00 00 00 COMMAND COM .... 0050 00 00 00 00 00 00 00 60 71 0E 36 00 DB 62 00 00 .......'q.6..b.. 0060 42 4F 4F 54 44 49 53 4B 20 20 20 28 00 00 00 00 BOOTDISK (.... 0070 00 00 00 00 00 00 A1 00 21 00 00 00 00 00 00 00 ........!....... 0080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-6. Partial hex dump of the first sector of the root directory for a PC-DOS 3.3 disk containing the three system files and a volume label. The Files Area The remainder of the volume after the root directory is known as the files area. MS-DOS views the sectors in this area as a pool of clusters, each containing one or more logical sectors, depending on the disk format. Each cluster has a corresponding entry in the FAT that describes its current use: available, reserved, assigned to a file, or unusable (because of defects in the medium). Because the first two fields of the FAT are reserved, the first cluster in the files area is assigned the number 2. When a file is extended under versions 1 and 2, MS-DOS searches the FAT from the beginning until it finds a free cluster (designated by a zero FAT field); it then changes that FAT field to a last-cluster mark and updates the previous last cluster of the file's chain to point to the new last cluster. Under versions 3.0 and later, however, MS-DOS searches the FAT from the most recently allocated cluster; this reduces file fragmentation and improves overall access times. Directories other than the root directory are simply a special type of file. Their storage is allocated from the files area, and their contents are 32-byte entriesÄÄin the same format as those used in the root directoryÄÄthat describe files or other directories. Directory entries that describe other directories contain an attribute byte with bit 4 set, zero in the file-length field, and the date and time that the directory was created (Figure 10-7). The first cluster field points, of course, to the first cluster in the files area that belongs to the directory. (The directory's other clusters can be found only by tracing through the FAT.) All directories except the root directory contain two special directory entries with the names . and ... MS-DOS puts these entries in place when it creates a directory, and they cannot be deleted. The . entry is an alias for the current directory; its cluster field points to the cluster in which it is found. The .. entry is an alias for the directory's parent (the directory immediately above it in the tree structure); its cluster field points to the first cluster of the parent directory. If the parent is the root directory, the cluster field of the .. entry contains zero (Figure 10-8). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . 0080 4D 59 44 49 52 20 20 20 20 20 20 10 00 00 00 00 MYDIR ..... 0090 00 00 00 00 00 00 87 9A 9B 0A 2A 00 00 00 00 00 ..........*..... . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-7. Extract from the root directory of an MS-DOS disk, showing the entry for a subdirectory named MYDIR. Bit 4 in the attribute byte is set, the cluster field points to the first cluster of the subdirectory file, the date and time stamps are valid, but the file length is zero. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 2E 20 20 20 20 20 20 20 20 20 20 10 00 00 00 00 . ..... 0010 00 00 00 00 00 00 87 9A 9B 0A 2A 00 00 00 00 00 ..........*..... 0020 2E 2E 20 20 20 20 20 20 20 20 20 10 00 00 00 00 .. ..... 0030 00 00 00 00 00 00 87 9A 9B 0A 00 00 00 00 00 00 ................ 0040 4D 59 46 49 4C 45 20 20 44 41 54 20 00 00 00 00 MYFILE DAT .... 0050 00 00 00 00 00 00 98 9A 9B 0A 2B 00 15 00 00 00 ..........+..... 0060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-8. Hex dump of the first block of the directory MYDIR. Note the . and .. entries. This directory contains exactly one file, MYFILE.DAT. Interpreting the File Allocation Table Now that we understand how the disk is structured, let's see how we can use this knowledge to find a FAT position from a cluster number. If the FAT has 12-bit entries, use the following procedure: 1. Use the directory entry to find the starting cluster of the file in question. 2. Multiply the cluster number by 1.5. 3. Use the integral part of the product as the offset into the FAT and move the word at that offset into a register. Remember that a FAT position can span a physical disk-sector boundary. 4. If the product is a whole number, AND the register with 0FFFH. 5. Otherwise, "logical shift" the register right 4 bits. 6. If the result is a value from 0FF8H through 0FFFH, the file has no more clusters. Otherwise, the result is the number of the next cluster in the file. On disks with at least 4087 clusters formatted under MS-DOS version 3.0 or later, the FAT entries use 16 bits, and the extraction of a cluster number from the table is much simpler: 1. Use the directory entry to find the starting cluster of the file in question. 2. Multiply the cluster number by 2. 3. Use the product as the offset into the FAT and move the word at that offset into a register. 4. If the result is a value from 0FFF8H through 0FFFFH, the file has no more clusters. Otherwise, the result is the number of the next cluster in the file. To convert cluster numbers to logical sectors, subtract 2, multiply the result by the number of sectors per cluster, and add the logical-sector number of the beginning of the data area (this can be calculated from the information in the BPB). As an example, let's work out the disk location of the file IBMBIO.COM, which is the first entry in the directory shown in Figure 10-6. First, we need some information from the BPB, which is in the boot sector of the medium. (See Figures 10-3 and 10-4.) The BPB tells us that there are þ 512 bytes per sector þ 2 sectors per cluster þ 2 sectors per FAT þ 2 FATs þ 112 entries in the root directory From the BPB information, we can calculate the starting logical-sector number of each of the disk's control areas and the files area by constructing a table, as follows: Length Sector Area (sectors) numbers ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Boot sector 1 00H 2 FATs * 2 sectors/FAT 4 01HÄ04H 112 directory entries 7 05HÄ0BH *32 bytes/entry /512 bytes/sector Total sectors occupied by bootstrap, FATs, and 12 root directory ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Therefore, the first sector of the files area is 12 (0CH). The word at offset 01AH in the directory entry for IBMBIO.COM gives us the starting cluster number for that file: cluster 2. To find the logical-sector number of the first block in the file, we can follow the procedure given earlier: 1. Cluster number - 2 = 2 - 2 = 0. 2. Multiply by sectors per cluster = 0 * 2 = 0. 3. Add logical-sector number of start of the files area = 0 + 0CH = 0CH. So the calculated sector number of the beginning of the file IBMBIO.COM is 0CH, which is exactly what we expect knowing that the FORMAT program always places the system files in contiguous sectors at the beginning of the data area. Now let's trace IBMBIO.COM's chain through the file allocation table (Figures 10-9 and 10-10). This will be a little tedious, but a detailed understanding of the process is crucial. In an actual program, we would first read the boot sector using Int 25H, then calculate the address of the FAT from the contents of the BPB, and finally read the FAT into memory, again using Int 25H. From IBMBIO.COM's directory entry, we already know that the first cluster in the file is cluster 2. To examine that cluster's entry in the FAT, we multiply the cluster number by 1.5, which gives 0003H as the FAT offset, and fetch the word at that offset (which contains 4003H). Because the product of the cluster and 1.5 is a whole number, we AND the word from the FAT with 0FFFH, yielding the number 3, which is the number of the second cluster assigned to the file. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 FD FF FF 03 40 00 05 60 00 07 80 00 09 A0 00 0B ....@..'........ 0010 C0 00 0D E0 00 0F 00 01 11 20 01 13 40 01 15 60 ......... ..@..' 0020 01 17 F0 FF 19 A0 01 1B C0 01 1D E0 01 1F 00 02 ................ 0030 21 20 02 23 40 02 25 60 02 27 80 02 29 A0 02 2B ! .#@.%'.'..)..+ . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-9. Hex dump of the first block of the file allocation table (track 0, head 0, sector 2) for the PC-DOS 3.3 disk whose root directory is shown in Figure 10-6. Notice that the first byte of the FAT contains the media descriptor byte for a 5.25-inch, 2-sided, 9-sector floppy disk. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ getfat proc near ; extracts the FAT field ; for a given cluster ; call AX = cluster # ; DS:BX = addr of FAT ; returns AX = FAT field ; other registers unchanged push bx ; save affected registers push cx mov cx,ax shl ax,1 ; cluster * 2 add ax,cx ; cluster * 3 test ax,1 pushf ; save remainder in Z flag shr ax,1 ; cluster * 1.5 add bx,ax mov ax,[bx] popf ; was cluster * 1.5 whole number? jnz getfat1 ; no, jump and ax,0fffh ; yes, isolate bottom 12 bits jmp getfat2 getfat1: mov cx,4 ; shift word right 4 bits shr ax,cx getfat2: pop cx ; restore registers and exit pop bx ret getfat endp ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-10. Assembly-language procedure to access the file allocation table (assumes 12-bit FAT fields). Given a cluster number, the procedure returns the contents of that cluster's FAT entry in the AX register. This simple example ignores the fact that FAT entries can span sector boundaries. To examine cluster 3's entry in the FAT, we multiply 3 by 1.5, which gives 4.5, and fetch the word at offset 0004H (which contains 0040H). Because the product of 3 and 1.5 is not a whole number, we shift the word right 4 bits, yielding the number 4, which is the number of the third cluster assigned to IBMBIO.COM. In this manner, we can follow the chain through the FAT until we come to a cluster (number 23, in this case) whose FAT entry contains the value 0FFFH, which is an end-of-file marker in FATs with 12-bit entries. We have now established that the file IBMBIO.COM contains clusters 2 through 23 (02HÄ17H), from which we can calculate that logical sectors 0CH through 38H are assigned to the file. Of course, the last cluster may be only partially filled with actual data; the portion of the last cluster used is the remainder of the file's size in bytes (found in the directory entry) divided by the bytes per cluster. Fixed-Disk Partitions Fixed disks have another layer of organization beyond the logical volume structure already discussed: partitions. The FDISK utility divides a fixed disk into one or more partitions consisting of an integral number of cylinders. Each partition can contain an independent file system and, for that matter, its own copy of an operating system. The first physical sector on a fixed disk (track 0, head 0, sector 1) contains the master boot record, which is laid out as follows: Bytes Contents ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 000Ä1BDH Reserved 1BEÄ1CDH Partition #1 descriptor 1CEÄ1DDH Partition #2 descriptor 1DEÄ1EDH Partition #3 descriptor 1EEÄ1FDH Partition #4 descriptor 1FEÄ1FFH Signature word (AA55H) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The partition descriptors in the master boot record define the size, location, and type of each partition, as follows: Byte(s) Contents ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00H Active flag (0 = not bootable, 80H = bootable) 01H Starting head 02HÄ03H Starting cylinder/sector 04H Partition type 00H not used 01H FAT file system, 12-bit FAT entries 04H FAT file system, 16-bit FAT entries 05H extended partition 06H "huge partition" (MS-DOS versions 4.0 and later) 05H Ending head 06HÄ07H Ending cylinder/sector 08HÄ0BH Starting sector for partition, relative to beginning of disk 0CHÄ0FH Partition length in sectorsThe active flag, which indicates that the partition is bootable, can be set on only one partition at a time. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ MS-DOS treats partition types 1, 4, and 6 as normal logical volumes and assigns them their own drive identifiers during the system boot process. Partition type 5 can contain multiple logical volumes and has a special extended boot record that describes each volume. The FORMAT utility initializes MS-DOS fixed-disk partitions, creating the file system within the partition (boot record, file allocation table, root directory, and files area) and optionally placing a bootable copy of the operating system in the file system. Figure 10-11 contains a partial hex dump of a master block from a fixed disk formatted under PC-DOS version 3.3. This dump illustrates the partition descriptors for a normal partition with a 16-bit FAT and an extended partition. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0000 . . . 0180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 01 01C0 01 00 04 04 D1 02 11 00 00 00 EE FF 00 00 00 00 01D0 C1 04 05 04 D1 FD 54 00 01 00 02 53 00 00 00 00 01E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 10-11. A partial hex dump of a master block from a fixed disk formatted under PC-DOS version 3.3. This disk contains two partitions. The first partition has a 16-bit FAT and is marked "active" to indicate that it contains a bootable copy of PC-DOS. The second partition is an "extended" partition. The third and fourth partition entries are not used in this example. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 11 Memory Management Current versions of MS-DOS can manage as much as 1 megabyte of contiguous random-access memory. On IBM PCs and compatibles, the memory occupied by MS-DOS and other programs starts at address 0000H and may reach as high as address 09FFFFH; this 640 KB area of RAM is sometimes referred to as conventional memory. Memory above this address is reserved for ROM hardware drivers, video refresh buffers, and the like. Computers that are not IBM compatible may use other memory layouts. The RAM area under the control of MS-DOS is divided into two major sections: þ The operating-system area þ The transient-program area The operating-system area starts at address 0000HÄÄthat is, it occupies the lowest portion of RAM. It holds the interrupt vector table, the operating system proper and its tables and buffers, any additional installable drivers specified in the CONFIG.SYS file, and the resident part of the COMMAND.COM command interpreter. The amount of memory occupied by the operating-system area varies with the version of MS-DOS used, the number of disk buffers, the size of installed device drivers, and so forth. The transient-program area (TPA), sometimes called the memory arena, is the remainder of memory above the operating-system area. The memory arena is dynamically allocated in blocks called arena entries. Each arena entry has a special control structure called an arena header, and all of the arena headers are chained together. Three MS-DOS Int 21H functions allow programs to allocate, resize, and release blocks of memory from the TPA: Function Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 48H Allocate memory block. 49H Release memory block. 4AH Resize memory block. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ MS-DOS itself uses these functions when loading a program from disk at the request of COMMAND.COM or another program. The EXEC function, which is the MS-DOS program loader, calls Int 21H Function 48H to allocate a memory block for the loaded program's environment and another for the program itself and its program segment prefix. It then reads the program from the disk into the assigned memory area. When the program terminates, MS-DOS calls Int 21H Function 49H to release all memory owned by the program. Transient programs can also employ the MS-DOS memory-management functions to dynamically manage the memory available in the TPA. Proper use of these functions is one of the most important criteria of whether a program is well behaved under MS-DOS. Well-behaved programs are most likely to be portable to future versions of the operating system and least likely to cause interference with other processes under multitasking user interfaces such as Microsoft Windows. Using the Memory-Allocation Functions The memory-allocation functions have two common uses: þ To shrink a program's initial memory allocation so that there is enough room to load and execute another program under its control. þ To dynamically allocate additional memory required by the program and to release the same memory when it is no longer needed. Shrinking the Initial Memory Allocation Although many MS-DOS application programs simply assume they own all memory, this assumption is a relic of MS-DOS version 1 (and CP/M), which could support only one active process at a time. Well-behaved MS-DOS programs take pains to modify only memory that they actually own and to release any memory that they don't need. Unfortunately, under current versions of MS-DOS, the amount of memory that a program will own is not easily predicted in advance. It turns out that the amount of memory allocated to a program when it is first loaded depends upon two factors: þ The type of file the program is loaded from þ The amount of memory available in the TPA MS-DOS always allocates all of the largest available memory block in the TPA to programs loaded from .COM (memory-image) files. Because .COM programs contain no file header that can pass segment and memory-use information to MS-DOS, MS-DOS simply assumes the worst case and gives such a program everything. MS-DOS will load the program as long as there is an available memory block as large as the size of the file plus 256 bytes for the PSP and 2 bytes for the stack. The .COM program, when it receives control, must determine whether enough memory is available to carry out its functions. MS-DOS uses more complicated rules to allocate memory to programs loaded from .EXE files. First, of course, a memory block large enough to hold the declared code, data, and stack segments must be available in the TPA. In addition, the linker sets two fields in a .EXE file's header to inform MS-DOS about the program's memory requirements. The first field, MIN_ALLOC, defines the minimum number of paragraphs required by the program, in addition to those for the code, data, and stack segments. The second, MAX_ALLOC, defines the maximum number of paragraphs of additional memory the program would use if they were available. When loading a .EXE file, MS-DOS first attempts to allocate the number of paragraphs in MAX_ALLOC plus the number of paragraphs required by the program itself. If that much memory is not available, MS-DOS assigns all of the largest available block to the program, provided that this is at least the amount specified by MIN_ALLOC plus the size of the program image. If that condition is not satisfied, the program cannot be executed. After a .COM or .EXE program is loaded and running, it can use Int 21H Function 4AH (Resize Memory Block) to release all the memory it does not immediately need. This is conveniently done right after the program receives control from MS-DOS, by calling the resize function with the segment of the program's PSP in the ES register and the number of paragraphs that the program requires to run in the BX register (Figure 11-1). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . org 100h main proc near ; entry point from MS-DOS ; DS, ES = PSP address mov sp,offset stk ; COM program must move ; stack to safe area ; release extra memory... mov ah,4ah ; function 4Ah = ; resize memory block ; BX = paragraphs to keep mov bx,(offset stk - offset main + 10FH) / 16 int 21h ; transfer to MS-DOS jc error ; jump if resize failed . . . main endp . . . dw 64 dup (?) ; new stack area stk equ $ ; new base of stack end main ; defines entry point ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-1. An example of a .COM program releasing excess memory after it receives control from MS-DOS. Int 21H Function 4AH is called with ES pointing to the program's PSP and BX containing the number of paragraphs that the program needs to execute. In this case, the new size for the program's memory block is calculated as the program image size plus the size of the PSP (256 bytes), rounded up to the next paragraph. .EXE programs use similar code. Dynamic Allocation of Additional Memory When a well-behaved program needs additional memory spaceÄÄfor an I/O buffer or an array of intermediate results, for exampleÄÄit can call Int 21H Function 48H (Allocate Memory Block) with the desired number of paragraphs. If a sufficiently large block of unallocated memory is available, MS-DOS returns the segment address of the base of the assigned area and clears the carry flag (0), indicating that the function was successful. If no unallocated block of sufficient size is available, MS-DOS sets the carry flag (1), returns an error code in the AX register, and returns the size (in paragraphs) of the largest block available in the BX register (Figure 11-2). In this case, no memory has yet been allocated. The program can use the value returned in the BX register to determine whether it can continue in a "degraded" fashion, with less memory. If it can, it must call Int 21H Function 48H again to allocate the smaller memory block. When the MS-DOS memory manager is searching the chain of arena headers to satisfy a memory-allocation request, it can use one of the following strategies: þ First fit: Use the arena entry at the lowest address that is large enough to satisfy the request. þ Best fit: Use the smallest arena entry that will satisfy the request, regardless of its location. þ Last fit: Use the arena entry at the highest address that is large enough to satisfy the request. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . mov ah,48h ; function 48h = allocate mem block mov bx,0800h ; 800h paragraphs = 32 KB int 21h ; transfer to MS-DOS jc error ; jump if allocation failed mov buff_seg,ax ; save segment of allocated block . . . mov es,buff_seg ; ES:DI = address of block xor di,di mov cx,08000h ; store 32,768 bytes mov al,0ffh ; fill buffer with -1s cld rep stosb ; now perform fast fill . . . mov cx,08000h ; length to write, bytes mov bx,handle ; handle for prev opened file push ds ; save our data segment mov ds,buff_seg ; let DS:DX = buffer address mov dx,0 mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS pop ds ; restore our data segment jc error ; jump if write failed . . . mov es,buff_seg ; ES = seg of prev allocated block mov ah,49h ; function 49h = release mem block int 21h ; transfer to MS-DOS jc error ; jump if release failed . error: . . handle dw 0 ; file handle buff_seg dw 0 ; segment of allocated block . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-2. Example of dynamic memory allocation. The program requests a 32 KB memory block from MS-DOS, fills it with -1s, writes it to disk, and then releases it. If the arena entry selected is larger than the size requested, MS-DOS divides it into two parts: one block of the size requested, which is assigned to the program that called Int 21H Function 48H, and an unowned block containing the remaining memory. The default MS-DOS allocation strategy is first fit. However, under MS-DOS versions 3.0 and later, an application program can change the strategy with Int 21H Function 58H. When a program is through with an allocated memory block, it should use Int 21H Function 49H to release the block. If it does not, MS-DOS will automatically release all memory allocations for the program when it terminates. Arena Headers Microsoft has not officially documented the internal structure of arena headers for the outside world at present. This is probably to deter programmers from trying to manipulate their memory allocations directly instead of through the MS-DOS functions provided for that purpose. Arena headers have identical structures in MS-DOS versions 2 and 3. They are 16 bytes (one paragraph) and are located immediately before the memory area that they control (Figure 11-3). An arena header contains the following information: þ A byte signifying whether the header is a member or the last entry in the entire chain of such headers þ A word indicating whether the area it controls is available or whether it already belongs to a program (if the latter, the word points to the program's PSP) þ A word indicating the size (in paragraphs) of the controlled memory area (arena entry) MS-DOS inspects the chain of arena headers whenever the program requests a memory-block allocation, modification, or release function, or when a program is EXEC'd or terminated. If any of the blocks appear to be corrupted or if the chain is broken, MS-DOS displays the dreaded message Memory allocation error and halts the system. In the example illustrated in Figure 11-3, COMMAND.COM originally loaded PROGRAM1.COM into the TPA and, because it was a .COM file, COMMAND.COM allocated it all of the TPA, controlled by arena header #1. PROGRAM1.COM then used Int 21H Function 4AH (Resize Memory Block) to shrink its memory allocation to the amount it actually needed to run and loaded and executed PROGRAM2.EXE with the EXEC function (Int 21H Function 4BH). The EXEC function obtained a suitable amount of memory, controlled by arena header #2, and loaded PROGRAM2.EXE into it. PROGRAM2.EXE, in turn, needed some additional memory to store some intermediate results, so it called Int 21H Function 48H (Allocate Memory Block) to obtain the area controlled by arena header #3. The highest arena header (#4) controls all of the remaining TPA that has not been allocated to any program. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ Top of RAM ³ Unowned RAM controlled by header #4 ³ controlled by MS-DOS ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Arena header #4 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Memory area controlled by header #3; additional ³ ³ storage dynamically allocated by PROGRAM2.EXE ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Arena header #3 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Memory area controlled by header #2, ³ ³ containing PROGRAM2.EXE ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Arena header #2 ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Memory area controlled by header #1, ³ ³ containing PROGRAM1.COM ³ ÃÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ´ ³ Arena header #1 ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Bottom of transient- program area Figure 11-3. An example diagram of MS-DOS arena headers and the transient-program area. The environment blocks and their associated headers have been omitted from this figure to increase its clarity. Lotus/Intel/Microsoft Expanded Memory When the IBM Personal Computer and MS-DOS were first released, the 640 KB limit that IBM placed on the amount of RAM that could be directly managed by MS-DOS seemed almost unimaginably huge. But as MS-DOS has grown in both size and capabilities and the popular applications have become more powerful, that 640 KB has begun to seem a bit crowded. Although personal computers based on the 80286 and 80386 have the potential to manage up to 16 megabytes of RAM under operating systems such as MS OS/2 and XENIX, this is little comfort to the millions of users of 8086/8088-based computers and MS-DOS. At the spring COMDEX in 1985, Lotus Development Corporation and Intel Corporation jointly announced the Expanded Memory Specification 3.0 (EMS), which was designed to head off rapid obsolescence of the older PCs because of limited memory. Shortly afterward, Microsoft announced that it would support the EMS and would enhance Microsoft Windows to use the memory made available by EMS hardware and software. EMS versions 3.2 and 4.0, released in fall 1985 and summer 1987, expanded support for multitasking operating systems. The LIM EMS (as it is usually known) has been an enormous success. EMS memory boards are available from scores of manufacturers, and "EMS-aware" softwareÄÄespecially spreadsheets, disk caches, and terminate-and-stay- resident utilitiesÄÄhas become the rule rather than the exception. What Is Expanded Memory? The Lotus/Intel/Microsoft Expanded Memory Specification is a functional definition of a bank-switched memory-expansion subsystem. It consists of hardware expansion modules and a resident driver program specific to those modules. In EMS versions 3.0 and 3.2, the expanded memory is made available to application software as 16 KB pages mapped into a contiguous 64 KB area called the page frame, somewhere above the main memory area used by MS-DOS/PC-DOS (0Ä640 KB). The exact location of the page frame is user configurable, so it need not conflict with other hardware options. In EMS version 4.0, the pages may be mapped anywhere in memory and can have sizes other than 16 KB. The EMS provides a uniform means for applications to access as much as 8 megabytes of memory (32 megabytes in EMS 4.0). The supporting software, which is called the Expanded Memory Manager (EMM), provides a hardware-independent interface between application software and the expanded memory board(s). The EMM is supplied in the form of an installable device driver that you link into the MS-DOS/PC-DOS system by adding a line to the CONFIG.SYS file on the system boot disk. Internally, the Expanded Memory Manager consists of two major portions, which may be referred to as the driver and the manager. The driver portion mimics some of the actions of a genuine installable device driver, in that it includes initialization and output status functions and a valid device header. The second, and major, portion of the EMM is the true interface between application software and the expanded-memory hardware. Several classes of services are provided: þ Verification of functionality of hardware and software modules þ Allocation of expanded-memory pages þ Mapping of logical pages into the physical page frame þ Deallocation of expanded-memory pages þ Support for multitasking operating systems Application programs communicate with the EMM directly, by means of software Int 67H. MS-DOS versions 3.3 and earlier take no part in (and in fact are completely oblivious to) any expanded-memory manipulations that may occur. MS-DOS version 4.0 and Microsoft Windows, on the other hand, are "EMS-aware" and can use the EMS memory when it is available. Expanded memory should not be confused with extended memory. Extended memory is the term used by IBM to refer to the memory at physical addresses above 1 megabyte that can be accessed by an 80286 or 80386 CPU in protected mode. Current versions of MS-DOS run the 80286 and 80386 in real mode (8086-emulation mode), and extended memory is therefore not directly accessible. Checking for Expanded Memory An application program can use either of two methods to test for the existence of the Expanded Memory Manager: þ Issue an open request (Int 21H Function 3DH) using the guaranteed device name of the EMM driver: EMMXXXX0. If the open function succeeds, either the driver is present or a file with the same name coincidentally exists on the default disk drive. To rule out the latter, the application can use IOCTL (Int 21H Function 44H) subfunctions 00H and 07H to ensure that EMM is present. In either case, the application should then use Int 21H Function 3EH to close the handle that was obtained from the open function, so that the handle can be reused for another file or device. þ Use the address that is found in the Int 67H vector to inspect the device header of the presumed EMM. Interrupt handlers and device drivers must use this method. If the EMM is present, the name field at offset 0AH of the device header contains the string EMMXXXX0. This approach is nearly foolproof and avoids the relatively high overhead of an MS-DOS open function. However, it is somewhat less well behaved because it involves inspection of memory that does not belong to the application. These two methods of testing for the existence of the Expanded Memory Manager are illustrated in Figures 11-4 and 11-5. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . ; attempt to "open" EMM... mov dx,seg emm_name ; DS:DX = address of name mov ds,dx ; of Expanded Memory Manager mov dx,offset emm_name mov ax,3d00h ; function 3dh, mode = 00h ; = open, read only int 21h ; transfer to MS-DOS jc error ; jump if open failed ; open succeeded, be sure ; it was not a file... mov bx,ax ; BX = handle from open mov ax,4400h ; function 44h subfunction 00h ; = IOCTL get device information int 21h ; transfer to MS-DOS jc error ; jump if IOCTL call failed and dx,80h ; bit 7 = 1 if character device jz error ; jump if it was a file ; EMM is present, be sure ; it is available... ; (BX still contains handle) mov ax,4407h ; function 44h subfunction 07h ; = IOCTL get output status int 21h ; transfer to MS-DOS jc error ; jump if IOCTL call failed or al,al ; test device status jz error ; if AL = 0 EMM is not available ; now close handle ... ; (BX still contains handle) mov ah,3eh ; function 3eh = close int 21h ; transfer to MS-DOS jc error ; jump if close failed . . . emm_name db 'EMMXXXX0',0 ; guaranteed device name for ; Expanded Memory Manager ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-4. Testing for the Expanded Memory Manager by means of the MS-DOS open and IOCTL functions. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ emm_int equ 67h ; Expanded Memory Manager ; software interrupt . . . ; first fetch contents of ; EMM interrupt vector... mov al,emm_int ; AL = EMM int number mov ah,35h ; function 35h = get vector int 21h ; transfer to MS-DOS ; now ES:BX = handler address ; assume ES:0000 points ; to base of the EMM... mov di,10 ; ES:DI = address of name ; field in device header ; DS:SI = EMM driver name mov si,seg emm_name mov ds,si mov si,offset emm_name mov cx,8 ; length of name field cld repz cmpsb ; compare names... jnz error ; jump if driver absent . . . emm_name db 'EMMXXXX0' ; guaranteed device name for ; Expanded Memory Manager ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-5. Testing for the Expanded Memory Manager by inspection of the name field in the driver's device header. Using Expanded Memory After establishing that the memory-manager software is present, the application program communicates with it directly by means of the "user interrupt" 67H, bypassing MS-DOS/PC-DOS. The calling sequence for the EMM is as follows: ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ mov ah,function ; AH determines service type . ; load other registers with . ; values specific to the . ; requested service int 67h ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ In general, AH contains the EMM function number, AL holds the subfunction number (if any), BX holds a number of pages (if applicable), and DX contains an EMM handle. Registers DS:SI and ES:DI are used to pass the addresses of arrays or buffers. Section 4 of this book, "Lotus/Intel/Microsoft EMS Functions Reference," details each of the expanded memory functions. Upon return from an EMM function, the AH register contains zero if the function was successful; otherwise, it contains an error code with the most significant bit set (Figures 11-6 and 11-7). Other values are typically returned in the AL and BX registers or in a user-specified buffer. Error code Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00H Function successful. 80H Internal error in Expanded Memory Manager software (could be caused by corrupted memory image of driver). 81H Malfunction in expanded-memory hardware. 82H Memory manager busy. 83H Invalid handle. 84H Function requested by application not defined. 85H No more handles available. 86H Error in save or restore of mapping context. 87H Allocation request specified more logical pages than physically available in system; no pages allocated. 88H Allocation request specified more logical pages than currently available in system (request does not exceed physical pages that exist, but some are already allocated to other handles); no pages allocated. Zero pages; cannot be allocated. 8AH Logical page requested to be mapped located outside range of logical pages assigned to handle. 8BH Illegal physical page number in mapping request (not in range 0Ä3). 8CH Page-mapping hardware-state save area full. 8DH Save of mapping context failed; save area already contains context associated with requested handle. 8EH Restore of mapping context failed; save area does not contain context for requested handle. 8FH Subfunction parameter not defined. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-6. Expanded Memory Manager error codes common to EMS versions 3.0, 3.2, and 4.0. After a call to EMM, the AH register contains zero if the function was successful or an error code in the range 80H through 8FH if the function failed. Error code Meaning ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 90H Attribute type not defined. 91H Feature not supported. 92H Source and destination memory regions have same handle and overlap; requested move was performed, but part of source region was overwritten. 93H Specified length for source or destination memory region is longer than actual allocated length. 94H Conventional-memory region and expanded-memory region overlap. 95H Specified offset is outside logical page. 96H Region length exceeds 1 MB. 97H Source and destination memory regions have same handle and overlap; exchange cannot be performed. 98H Memory source and destination types undefined. 99H This error code currently unused. 9AH Alternate map or DMA register sets supported, but the alternate register set specified is not supported. 9BH Alternate map or DMA register sets supported, but all alternate register sets currently allocated. 9CH Alternate map or DMA register sets not supported, and specified alternate register set not zero. 9DH Alternate map or DMA register sets supported, but alternate register set specified is either not defined or not allocated. Dedicated DMA channels not supported. 9FH Dedicated DMA channels supported, but specified DMA channel not supported. A0H No handle found for specified name. A1H Handle with this name already exists. A2H Memory address wrap; sum of the source or destination region base address and length exceeds 1 MB. A3H Invalid pointer passed to function, or contents of source array corrupted. A4H Access to function denied by operating system. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-7. Expanded Memory Manager error codes unique to EMS version 4.0. Most of these errors are related to the EMS functions for use by operating systems and would not normally be encountered by application programs. An application program that uses expanded memory should regard that memory as a system resource, like a file or a device, and employ only the documented EMM services to allocate, access, and release expanded-memory pages. Such a program can use the following general strategy: 1. Establish the presence of the Expanded Memory Manager by one of the two methods demonstrated in Figures 11-4 and 11-5. 2. After the driver is known to be present, check its operational status with EMS Function 40H. 3. Check the version number of EMM with EMS Function 46H, to ensure that all services the application will request are available. 4. Obtain the segment of the page frame used by EMM with EMS Function 41H. 5. Allocate the desired number of expanded-memory pages with EMS Function 43H. If the allocation is successful, EMM returns a handle that the application can use to refer to the expanded-memory pages that it owns. This step is exactly analogous to opening a file and using the handle obtained from the open function for read/write operations on the file. 6. If the requested number of pages are not available, the application can query EMM for the actual number of pages available (EMS Function 42H) and determine whether it can continue. 7. After the application has successfully allocated the needed number of expanded-memory pages, it uses EMS Function 44H to map logical pages in and out of the physical page frame in order to store and retrieve data in expanded memory. 8. When the program finishes using its expanded-memory pages, it must release them by calling EMS Function 45H. Otherwise, the pages will be lost to use by other programs until the system is restarted. Figure 11-8 shows a skeleton program that illustrates this general approach. An interrupt handler or device driver that uses EMS follows the same general procedure outlined in steps 1 through 8, with a few minor variations. It may need to acquire an EMS handle and allocate pages before the operating system is fully functional; in particular, you cannot assume that the MS-DOS Open File or Device, IOCTL, and Get Interrupt Vector functions are available. Thus, such a handler or driver must use a modified version of the "get interrupt vector" technique (Figure 11-5) to test for the existence of EMM, fetching the contents of the Int 67H vector directly. A device driver or interrupt handler typically owns its expanded-memory pages permanently (until the system is restarted) and never deallocates them. Such a program must also take care to save and restore EMM's page-mapping context (EMS Functions 47H and 48H) whenever it accesses expanded memory, so that use of EMS by a foreground program will not be disturbed. The EMM relies on the good behavior of application software to avoid the corruption of expanded memory. If several applications that use expanded memory are running under a multitasking manager such as Microsoft Windows and one or more of them does not abide strictly by EMM conventions, the data of some or all of the applications may be destroyed. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . mov ah,40h ; test EMM status int 67h or ah,ah jnz error ; jump if bad status from EMM mov ah,46h ; check EMM version int 67h or ah,ah jnz error ; jump if couldn't get version cmp al,030h ; make sure at least ver 3.0 jb error ; jump if wrong EMM version mov ah,41h ; get page frame segment int 67h or ah,ah jnz error ; jump if failed to get frame mov page_frame,bx ; save segment of page frame mov ah,42h ; get number of available pages int 67h or ah,ah jnz error ; jump if get pages error mov total_pages,dx ; save total EMM pages mov avail_pages,bx ; save available EMM pages or bx,bx jz error ; abort if no pages available mov ah,43h ; try to allocate EMM pages mov bx,needed_pages int 67h ; if allocation is successful or ah,ah jnz error ; jump if allocation failed mov emm_handle,dx ; save handle for allocated pages . . ; now we are ready for other . ; processing using EMM pages . ; map in EMS memory page... mov bx,log_page ; BX <- EMS logical page number mov al,phys_page ; AL <- EMS physical page (0-3) mov dx,emm_handle ; EMM handle for our pages mov ah,44h ; function 44h = map EMS page int 67h or ah,ah jnz error ; jump if mapping error . . . ; program ready to terminate, ; give up allocated EMM pages... mov dx,emm_handle ; handle for our pages mov ah,45h ; EMS function 45h = release pages int 67h or ah,ah jnz error ; jump if release failed . . . ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-8. A program illustrating the general strategy for using expanded memory. Extended Memory Extended memory is RAM storage at addresses above 1 megabyte (100000H) that can be accessed by an 80286 or 80386 processor running in protected mode. IBM PC/ATÄ and PS/2Äcompatible machines can (theoretically) have as much as 15 MB of extended memory installed, in addition to the usual 1 MB of conventional memory. Protected-mode operating systems such as Microsoft XENIX or MS OS/2 can use extended memory for execution of programs. MS-DOS, on the other hand, runs in real mode on an 80286 or 80386, and programs running under its control cannot ordinarily execute from extended memory or even address that memory for storage of data. However, the ROM BIOS contains two routines that allow real-mode programs restricted access to extended memory: ROM BIOS function Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Int 15H Function 87H Move extended-memory block. Int 15H Function 88H Get extended-memory size. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ These routines can be used by electronic disks (RAMdisks) and by other programs that want to use extended memory for fast storage and retrieval of information that would otherwise have to be written to a slower physical disk drive. Section 3 of this book, "IBM ROM BIOS and Mouse Functions Reference," documents both of these functions. You should use these ROM BIOS routines with caution. Data stored in extended memory is, of course, volatile; it is lost if the machine is turned off. The transfer of data to or from extended memory involves a switch from real mode to protected mode and back, which is a relatively slow process on 80286-based machines; in some cases it is only marginally faster than actually reading the data from a fixed disk. In addition, programs that use the ROM BIOS extended-memory functions are not compatible with the MS-DOS compatibility mode of MS OS/2. Finally, a major deficit in these ROM BIOS functions is that they do not make any attempt to arbitrate between two or more programs or drivers that are using extended memory for temporary storage. For example, if an application program and an installed RAMdisk driver attempt to put data in the same area of extended memory, no error will be returned to either program, but the data of one or both may be destroyed. Figure 11-9 shows an example of the code necessary to transfer data to and from extended memory. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ bmdt db 30h dup (0) ; block move descriptor table buff1 db 80h dup ('?') ; source buffer buff2 db 80h dup (0) ; destination buffer . . . ; copy 'buff1' to extended- ; memory address 100000h mov dx,10h ; DX:AX = destination mov ax,0 ; extended-memory address mov bx,seg buff1 ; DS:BX = source conventional- mov ds,bx ; memory address mov bx,offset buff1 mov cx,80h ; CX = bytes to move mov si,seg bmdt ; ES:SI = block move mov es,si ; descriptor table mov si,offset bmdt call putblk ; request transfer ; fill buff2 from extended- ; memory address 100000h mov dx,10h ; DX:AX = source extended- mov ax,0 ; memory address mov bx,seg buff2 ; DS:BX = destination mov ds,bx ; conventional-memory address mov bx,offset buff2 mov cx,80h ; CX = bytes to move mov si,seg bmdt ; ES:SI = block move mov es,si ; descriptor table mov si,offset bmdt call getblk ; request transfer . . . getblk proc near ; transfer block from extended ; memory to real memory ; call with ; DX:AX = source linear 32-bit ; extended-memory address ; DS:BX = segment and offset ; destination address ; CX = length in bytes ; ES:SI = block move descriptor ; table ; returns ; AH = 0 if transfer OK mov es:[si+10h],cx ; store length into descriptors mov es:[si+18h],cx ; store access rights bytes mov byte ptr es:[si+15h],93h mov byte ptr es:[si+1dh],93h mov es:[si+12h],ax ; source extended-memory address mov es:[si+14h],dl ; convert destination segment ; and offset to linear address mov ax,ds ; segment * 16 mov dx,16 mul dx add ax,bx ; + offset -> linear address adc dx,0 mov es:[si+1ah],ax ; store destination address mov es:[si+1ch],dl shr cx,1 ; convert length to words mov ah,87h ; int 15h function 87h = block move int 15h ; transfer to ROM BIOS ret ; back to caller getblk endp putblk proc near ; transfer block from real ; memory to extended memory ; call with ; DX:AX = dest linear 32-bit ; extended-memory address ; DS:BX = segment and offset ; source address ; CX = length in bytes ; ES:SI = block move descriptor ; table ; returns ; AH = 0 if transfer OK mov es:[si+10h],cx ; store length into descriptors mov es:[si+18h],cx ; store access rights bytes mov byte ptr es:[si+15h],93h mov byte ptr es:[si+1dh],93h mov es:[si+1ah],ax ; store destination extended- mov es:[si+1ch],dl ; memory address ; convert source segment and ; offset to linear address mov ax,ds ; segment * 16 mov dx,16 mul dx add ax,bx ; + offset -> linear address adc dx,0 mov es:[si+12h],ax ; store source address mov es:[si+14h],dl shr cx,1 ; convert length to words mov ah,87h ; int 15h function 87h = block move int 15h ; transfer to ROM BIOS ret ; back to caller putblk endp ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 11-9. Moving blocks of data between conventional memory and extended memory, using the ROM BIOS extended-memory functions. For additional information on the format of the block move descriptor table, see the entry for Int 15H Function 87H in Section 3 of this book, "IBM ROM BIOS and Mouse Functions Reference." Note that you must specify the extended-memory address as a 32-bit linear address rather than as a segment and offset. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 12 The EXEC Function The MS-DOS EXEC function (Int 21H Function 4BH) allows a program (called the parent) to load any other program (called the child) from a storage device, execute it, and then regain control when the child program is finished. A parent program can pass information to the child in a command line, in default file control blocks, and by means of a set of strings called the environment block (discussed later in this chapter). All files or devices that the parent opened using the handle file-management functions are duplicated in the newly created child task; that is, the child inherits all the active handles of the parent task. Any file operations on those handles by the child, such as seeks or file I/O, also affect the file pointers associated with the parent's handles. MS-DOS suspends execution of the parent program until the child program terminates. When the child program finishes its work, it can pass an exit code back to the parent, indicating whether it encountered any errors. It can also, in turn, load other programs, and so on through many levels of control, until the system runs out of memory. The MS-DOS command interpreter, COMMAND.COM, uses the EXEC function to run its external commands and other application programs. Many popular commercial programs, such as database managers and word processors, use EXEC to run other programs (spelling checkers, for example) or to load a second copy of COMMAND.COM, thereby allowing the user to list directories or copy and rename files without closing all the application files and stopping the main work in progress. EXEC can also be used to load program overlay segments, although this use is uncommon. Making Memory Available In order for a parent program to use the EXEC function to load a child program, sufficient unallocated memory must be available in the transient program area. When the parent itself was loaded, MS-DOS allocated it a variable amount of memory, depending upon its original file typeÄÄ.COM or .EXEÄÄand any other information that was available to the loader. (See Chapter 11 for further details.) Because the operating system has no foolproof way of predicting how much memory any given program will require, it generally allocates far more memory to a program than is really necessary. Therefore, a prospective parent program's first action should be to use Int 21H Function 4AH (Resize Memory Block) to release any excess memory allocation of its own to MS-DOS. In this case, the program should call Int 21H Function 4AH with the ES register pointing to the program segment prefix of the program releasing memory and the BX register containing the number of paragraphs of memory to retain for that program. (See Figure 11-1 for an example.) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ WARNING A .COM program must move its stack to a safe area if it is reducing its memory allocation to less than 64 KB. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Requesting the EXEC Function To load and execute a child program, the parent must execute an Int 21H with the registers set up as follows: AH = 4BH AL = 00H (subfunction to load child program) DS:DX = segment:offset of pathname for child program ES:BX = segment:offset of parameter block The parameter block, in turn, contains addresses of other information needed by the EXEC function. The Program Name The name of the program to be run, which the calling program provides to the EXEC function, must be an unambiguous file specification (no wildcard characters) and must include an explicit .COM or .EXE extension. If the path and disk drive are not supplied in the program name, MS-DOS uses the current directory and default disk drive. (The sequential search for .COM, .EXE, and .BAT files in all the locations listed in the PATH variable is not a function of EXEC, but rather of the internal logic of COMMAND.COM.) You cannot EXEC a batch file directly; instead, you must EXEC a copy of COMMAND.COM and pass the name of the batch file in the command tail, along with the /C switch. The Parameter Block The parameter block contains the addresses of four data objects: þ The environment block þ The command tail þ Two default file control blocks The space reserved in the parameter block for the address of the environment block is only 2 bytes and holds a segment address. The remaining three addresses are all double-word addresses; that is, they are 4 bytes, with the offset in the first 2 bytes and the segment address in the last 2 bytes. The Environment Block Each program that the EXEC function loads inherits a data structure called an environment block from its parent. The pointer to the segment of the block is at offset 002CH in the PSP. The environment block holds certain information used by the system's command interpreter (usually COMMAND.COM) and may also hold information to be used by transient programs. It has no effect on the operation of the operating system proper. If the environment-block pointer in the EXEC parameter block contains zero, the child program acquires a copy of the parent program's environment block. Alternatively, the parent program can provide a segment pointer to a different or expanded environment. The maximum size of the environment block is 32 KB, so very large chunks of information can be passed between programs by this mechanism. The environment block for any given program is static, implying that if more than one generation of child programs is resident in RAM, each one will have a distinct and separate copy of the environment block. Furthermore, the environment block for a program that terminates and stays resident is not updated by subsequent PATH and SET commands. You will find more details about the environment block later in this chapter. The Command Tail MS-DOS copies the command tail into the child program's PSP at offset 0080H, as described in Chapter 3. The information takes the form of a count byte, followed by a string of ASCII characters, terminated by a carriage return; the carriage return is not included in the count. The command tail can include filenames, switches, or other parameters. From the child program's point of view, the command tail should provide the same information that would be present if the program had been run by a direct user command at the MS-DOS prompt. EXEC ignores any I/O-redirection parameters placed in the command tail; the parent program must provide for redirection of the standard devices before the EXEC call is made. The Default File Control Blocks MS-DOS copies the two default file control blocks pointed to by the EXEC parameter block into the child program's PSP at offsets 005CH and 006CH. To emulate the function of COMMAND.COM from the child program's point of view, the parent program should use Int 21H Function 29H (the system parse-filename service) to parse the first two parameters of the command tail into the default file control blocks before invoking the EXEC function. File control blocks are not much use under MS-DOS versions 2 and 3, because they do not support the hierarchical file structure, but some application programs do inspect them as a quick way to get at the first two switches or other parameters in the command tail. Chapter 8 discusses file control blocks in more detail. Returning from the EXEC Function In MS-DOS version 2, the EXEC function destroys the contents of all registers except the code segment (CS) and instruction pointer (IP). Therefore, before making the EXEC call, the parent program must push the contents of any other registers that are important onto the stack and then save the stack segment (SS) and stack pointer (SP) registers in variables. Upon return from a successful EXEC call (that is, the child program has finished executing), the parent program should reload SS and SP from the variables where they were saved and then pop the other saved registers off the stack. In MS-DOS versions 3.0 and later, the stack and other registers are preserved across the EXEC call in the usual fashion. Finally, the parent can use Int 21H Function 4DH to obtain the termination type and return code of the child program. The EXEC function will fail under the following conditions: þ Not enough unallocated memory is available to load and execute the requested program file. þ The requested program can't be found on the disk. þ The transient portion of COMMAND.COM in highest RAM (which contains the actual loader) has been destroyed and not enough free memory is available to reload it (PC-DOS version 2 only). Figure 12-1 summarizes the calling convention for function 4BH. Figure 12-2 shows a skeleton of a typical EXEC call. This particular example uses the EXEC function to load and run the MS-DOS utility CHKDSK.COM. The SHELL.ASM program listing later in this chapter (Figure 12-5) presents a more complete example that includes the use of Int 21H Function 4AH to free unneeded memory. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Called with: AH = 4BH AL = function type 00 = load and execute program 03 = load overlay ES:BX = segment:offset of parameter block DS:DX = segment:offset of program specification Returns: If call succeeded Carry flag clear. In MS-DOS version 2, all registers except for CS:IP may be destroyed. In MS-DOS versions 3.0 and later, registers are preserved in the usual fashion. If call failed Carry flag set and AX = error code. Parameter block format: If AL = 0 (load and execute program) Bytes 0Ä1 = segment pointer, environment block Bytes 2Ä3 = offset of command-line tail Bytes 4Ä5 = segment of command-line tail Bytes 6Ä7 = offset of first file control block to be copied into new PSP + 5CH Bytes 8Ä9 = segment of first file control block Bytes 10Ä11 = offset of second file control block to be copied into new PSP + 6CH Bytes 12Ä13 = segment of second file control block If AL = 3 (load overlay) Bytes 0Ä1 = segment address where file will be loaded Bytes 2Ä3 = relocation factor to apply to loaded image ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 12-1. Calling convention for the EXEC function (Int 21H Function 4BH). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ cr egu 0dh ; ASCII carriage return . . . mov stkseg,ss ; save stack pointer mov stkptr,sp mov dx,offset pname ; DS:DX = program name mov bx,offset pars ; ES:BX = param block mov ax,4b00h ; function 4bh, subfunction 00h int 21h ; transfer to MS-DOS mov ax,_DATA ; make our data segment mov ds,ax ; addressable again mov es,ax cli ; (for bug in some 8088s) mov ss,stkseg ; restore stack pointer mov sp,stkptr sti ; (for bug in some 8088s) jc error ; jump if EXEC failed . . . stkseg dw 0 ; original SS contents stkptr dw 0 ; original SP contents pname db '\CHKDSK.COM',0 ; pathname of child program pars dw envir ; environment segment dd cmdline ; command line for child dd fcb1 ; file control block #1 dd fcb2 ; file control block #2 cmdline db 4,' *.*',cr ; command line for child fcb1 db 0 ; file control block #1 db 11 dup ('?') db 25 dup (0) fcb2 db 0 ; file control block #2 db 11 dup (' ') db 25 dup (0) envir segment para 'ENVIR' ; environment segment db 'PATH=',0 ; empty search path ; location of COMMAND.COM db 'COMSPEC=A:\COMMAND.COM',0 db 0 ; end of environment envir ends ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 12-2. A brief example of the use of the MS-DOS EXEC call, with all necessary variables and command blocks. Note the protection of the registers for MS-DOS version 2 and the masking of interrupts during loading of SS:SP to circumvent a bug in some early 8088 CPUs. More About the Environment Block The environment block is always paragraph aligned (starts at an address that is a multiple of 16 bytes) and contains a series of ASCIIZ strings. Each of the strings takes the following form: NAME=PARAMETER An additional zero byte (Figure 12-3) indicates the end of the entire set of strings. Under MS-DOS version 3, the block of environment strings and the extra zero byte are followed by a word count and the complete drive, path, filename, and extension used by EXEC to load the program. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0000 43 4F 4D 53 50 45 43 3D 43 3A 5C 43 4F 4D 4D 41 COMSPEC=C:\COMMA 0010 4E 44 2E 43 4F 4D 00 50 52 4F 4D 50 54 3D 24 70 NDcom.PROMPT=$p 0020 24 5F 24 64 20 20 20 24 74 24 68 24 68 24 68 24 $_$d $t$h$h$h$ 0030 68 24 68 24 68 20 24 71 24 71 24 67 00 50 41 54 h$h$h $q$q$g.PAT 0040 48 3D 43 3A 5C 53 59 53 54 45 4D 3B 43 3A 5C 41 H=C:\SYSTEM;C:\A 0050 53 4D 3B 43 3A 5C 57 53 3B 43 3A 5C 45 54 48 45 SM;C:\WS;C:\ETHE 0060 52 4E 45 54 3B 43 3A 5C 46 4F 52 54 48 5C 50 43 RNET;C:\FORTH\PC 0070 33 31 3B 00 00 01 00 43 3A 5C 46 4F 52 54 48 5C 31;....C:\FORTH\ 0080 50 43 33 31 5C 46 4F 52 54 48 2E 43 4F 4D 00 20 PC31\FORTH.COM. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 12-3. Dump of a typical environment block under MS-DOS version 3. This particular example contains the default COMSPEC parameter and two relatively complex PATH and PROMPT control strings that were set up by entries in the user's AUTOEXEC file. Note the path and file specification of the executing program following the double zeros at offset 0073H that denote the end of the environment block. Under normal conditions, the environment block inherited by a program will contain at least three strings: COMSPEC=variable PATH=variable PROMPT=variable MS-DOS places these three strings into the environment block at system initialization, during the interpretation of SHELL, PATH, and PROMPT directives in the CONFIG.SYS and AUTOEXEC.BAT files. The strings tell the MS-DOS command interpreter, COMMAND.COM, the location of its executable file (to enable it to reload the transient portion), where to search for executable external commands or program files, and the format of the user prompt. You can add other strings to the environment block, either interactively or in batch files, with the SET command. Transient programs can use these strings for informational purposes. For example, the Microsoft C Compiler looks in the environment block for INCLUDE, LIB, and TMP strings to tell it where to find its #include files and library files and where to build its temporary working files. Example Programs: SHELL.C and SHELL.ASM As a practical example of use of the MS-DOS EXEC function, I have included a small command interpreter called SHELL, with equivalent Microsoft C (Figure 12-4) and Microsoft Macro Assembler (Figure 12-5) source code. The source code for the assembly-language version is considerably more complex than the code for the C version, but the names and functionality of the various procedures are quite parallel. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ /* SHELL.C Simple extendable command interpreter for MS-DOS versions 2.0 and later Copyright 1988 Ray Duncan Compile: C>CL SHELL.C Usage: C>SHELL */ #include #include #include #include /* macro to return number of elements in a structure */ #define dim(x) (sizeof(x) / sizeof(x[0])) unsigned intrinsic(char *); /* function prototypes */ void extrinsic(char *); void get_cmd(char *); void get_comspec(char *); void break_handler(void); void cls_cmd(void); void dos_cmd(void); void exit_cmd(void); struct cmd_table { /* intrinsic commands table */ char *cmd_name; int (*cmd_fxn)(); } commands[] = { "CLS", cls_cmd, "DOS", dos_cmd, "EXIT", exit_cmd, }; static char com_spec[64]; /* COMMAND.COM filespec */ main(int argc, char *argv[]) { char inp_buf[80]; /* keyboard input buffer */ get_comspec(com_spec); /* get COMMAND.COM filespec */ /* register new handler for Ctrl-C interrupts */ if(signal(SIGINT, break_handler) == (int(*)()) -1) { fputs("Can't capture Control-C Interrupt", stderr); exit(1); } while(1) /* main interpreter loop */ { get_cmd(inp_buf); /* get a command */ if (! intrinsic(inp_buf) ) /* if it's intrinsic, run its subroutine */ extrinsic(inp_buf); /* else pass to COMMAND.COM */ } } /* Try to match user's command with intrinsic command table. If a match is found, run the associated routine and return true; else return false. */ unsigned intrinsic(char *input_string) { int i, j; /* some scratch variables */ /* scan off leading blanks */ while(*input_string == '\x20') input_string++ ; /* search command table */ for(i=0; i < dim(commands); i++) { j = strcmp(commands[i].cmd_name, input_string); if(j == 0) /* if match, run routine */ { (*commands[i].cmd_fxn)(); return(1); /* and return true */ } } return(0); /* no match, return false */ } /* Process an extrinsic command by passing it to an EXEC'd copy of COMMAND.COM. */ void extrinsic(char *input_string) { int status; status = system(input_string); /* call EXEC function */ if(status) /* if failed, display error message */ fputs("\nEXEC of COMMAND.COM failed\n", stderr); } /* Issue prompt, get user's command from standard input, fold it to uppercase. */ void get_cmd(char *buffer) { printf("\nsh: "); /* display prompt */ gets(buffer); /* get keyboard entry */ strupr(buffer); /* fold to uppercase */ } /* Get the full path and file specification for COMMAND.COM from the COMSPEC variable in the environment. */ void get_comspec(char *buffer) { strcpy(buffer, getenv("COMSPEC")); if(buffer[0] == NULL) { fputs("\nNo COMSPEC in environment\n", stderr); exit(1); } } /* This Ctrl-C handler keeps SHELL from losing control. It just reissues the prompt and returns. */ void break_handler(void) { signal(SIGINT, break_handler); /* reset handler */ printf("\nsh: "); /* display prompt */ } /* These are the subroutines for the intrinsic commands. */ void cls_cmd(void) /* CLS command */ { printf("\033[2J"); /* ANSI escape sequence */ } /* to clear screen */ void dos_cmd(void) /* DOS command */ { int status; /* run COMMAND.COM */ status = spawnlp(P_WAIT, com_spec, com_spec, NULL); if (status) fputs("\nEXEC of COMMAND.COM failed\n",stderr); } void exit_cmd(void) /* EXIT command */ { exit(0); /* terminate SHELL */ } ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 12-4. SHELL.C: A table-driven command interpreter written in Microsoft C. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ name shell page 55,132 title SHELL.ASM--simple MS-DOS shell ; ; SHELL.ASM Simple extendable command interpreter ; for MS-DOS versions 2.0 and later ; ; Copyright 1988 by Ray Duncan ; ; Build: C>MASM SHELL; ; C>LINK SHELL; ; ; Usage: C>SHELL; ; stdin equ 0 ; standard input handle stdout equ 1 ; standard output handle stderr equ 2 ; standard error handle cr equ 0dh ; ASCII carriage return lf equ 0ah ; ASCII linefeed blank equ 20h ; ASCII blank code escape equ 01bh ; ASCII escape code _TEXT segment word public 'CODE' assume cs:_TEXT,ds:_DATA,ss:STACK shell proc far ; at entry DS = ES = PSP mov ax,_DATA ; make our data segment mov ds,ax ; addressable mov ax,es:[002ch] ; get environment segment mov env_seg,ax ; from PSP and save it ; release unneeded memory... ; ES already = PSP segment mov bx,100h ; BX = paragraphs needed mov ah,4ah ; function 4ah = resize block int 21h ; transfer to MS-DOS jnc shell1 ; jump if resize OK mov dx,offset msg1 ; resize failed, display mov cx,msg1_length ; error message and exit jmp shell4 shell1: call get_comspec ; get COMMAND.COM filespec jnc shell2 ; jump if it was found mov dx,offset msg3 ; COMSPEC not found in mov cx,msg3_length ; environment, display error jmp shell4 ; message and exit shell2: mov dx,offset shell3 ; set Ctrl-C vector (int 23h) mov ax,cs ; for this program's handler mov ds,ax ; DS:DX = handler address mov ax,2523h ; function 25h = set vector int 21h ; transfer to MS-DOS mov ax,_DATA ; make our data segment mov ds,ax ; addressable again mov es,ax shell3: ; main interpreter loop call get_cmd ; get a command from user call intrinsic ; check if intrinsic function jnc shell3 ; yes, it was processed call extrinsic ; no, pass it to COMMAND.COM jmp shell3 ; then get another command shell4: ; come here if error detected ; DS:DX = message address ; CX = message length mov bx,stderr ; BX = standard error handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS mov ax,4c01h ; function 4ch = terminate with ; return code = 1 int 21h ; transfer to MS-DOS shell endp intrinsic proc near ; decode user entry against ; the table "COMMANDS" ; if match, run the routine, ; and return carry = false ; if no match, carry = true ; return carry = true mov si,offset commands ; DS:SI = command table intr1: cmp byte ptr [si],0 ; end of table? je intr7 ; jump, end of table found mov di,offset inp_buf ; no, let DI = addr of user input intr2: cmp byte ptr [di],blank ; scan off any leading blanks jne intr3 inc di ; found blank, go past it jmp intr2 intr3: mov al,[si] ; next character from table or al,al ; end of string? jz intr4 ; jump, entire string matched cmp al,[di] ; compare to input character jnz intr6 ; jump, found mismatch inc si ; advance string pointers inc di jmp intr3 intr4: cmp byte ptr [di],cr ; be sure user's entry je intr5 ; is the same length... cmp byte ptr [di],blank ; next character in entry jne intr6 ; must be blank or return intr5: call word ptr [si+1] ; run the command routine clc ; return carry flag = false ret ; as success flag intr6: lodsb ; look for end of this or al,al ; command string (null byte) jnz intr6 ; not end yet, loop add si,2 ; skip over routine address jmp intr1 ; try to match next command intr7: stc ; command not matched, exit ret ; with carry = true intrinsic endp extrinsic proc near ; process extrinsic command ; by passing it to ; COMMAND.COM with a ; " /C " command tail mov al,cr ; find length of command mov cx,cmd_tail_length ; by scanning for carriage mov di,offset cmd_tail+1 ; return cld repnz scasb mov ax,di ; calculate command-tail sub ax,offset cmd_tail+2 ; length without carriage mov cmd_tail,al ; return, and store it ; set command-tail address mov word ptr par_cmd,offset cmd_tail call exec ; and run COMMAND.COM ret extrinsic endp get_cmd proc near ; prompt user, get command ; display the shell prompt mov dx,offset prompt ; DS:DX = message address mov cx,prompt_length ; CX = message length mov bx,stdout ; BX = standard output handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS ; get entry from user mov dx,offset inp_buf ; DS:DX = input buffer mov cx,inp_buf_length ; CX = max length to read mov bx,stdin ; BX = standard input handle mov ah,3fh ; function 3fh = read int 21h ; transfer to MS-DOS mov si,offset inp_buf ; fold lowercase characters mov cx,inp_buf_length ; in entry to uppercase gcmd1: cmp byte ptr [si],'a' ; check if 'a-z' jb gcmd2 ; jump, not in range cmp byte ptr [si],'z' ; check if 'a-z' ja gcmd2 ; jump, not in range sub byte ptr [si],'a'-'A' ; convert to uppercase gcmd2: inc si ; advance through entry loop gcmd1 ret ; back to caller get_cmd endp get_comspec proc near ; get location of COMMAND.COM ; from environment "COMSPEC=" ; returns carry = false ; if COMSPEC found ; returns carry = true ; if no COMSPEC mov si,offset com_var ; DS:SI = string to match... call get_env ; search environment block jc gcsp2 ; jump if COMSPEC not found ; ES:DI points past "=" mov si,offset com_spec ; DS:SI = local buffer gcsp1: mov al,es:[di] ; copy COMSPEC variable mov [si],al ; to local buffer inc si inc di or al,al ; null char? (turns off carry) jnz gcsp1 ; no, get next character gcsp2: ret ; back to caller get_comspec endp get_env proc near ; search environment ; call DS:SI = "NAME=" ; uses contents of "ENV_SEG" ; returns carry = false and ES:DI ; pointing to parameter if found, ; returns carry = true if no match mov es,env_seg ; get environment segment xor di,di ; initialize env offset genv1: mov bx,si ; initialize pointer to name cmp byte ptr es:[di],0 ; end of environment? jne genv2 ; jump, end not found stc ; no match, return carry set ret genv2: mov al,[bx] ; get character from name or al,al ; end of name? (turns off carry) jz genv3 ; yes, name matched cmp al,es:[di] ; compare to environment jne genv4 ; jump if match failed inc bx ; advance environment inc di ; and name pointers jmp genv2 genv3: ; match found, carry = clear, ret ; ES:DI = variable genv4: xor al,al ; scan forward in environment mov cx,-1 ; for zero byte cld repnz scasb jmp genv1 ; go compare next string get_env endp exec proc near ; call MS-DOS EXEC function ; to run COMMAND.COM mov stkseg,ss ; save stack pointer mov stkptr,sp ; now run COMMAND.COM mov dx,offset com_spec ; DS:DX = filename mov bx,offset par_blk ; ES:BX = parameter block mov ax,4b00h ; function 4bh = EXEC ; subfunction 0 = ; load and execute int 21h ; transfer to MS-DOS mov ax,_DATA ; make data segment mov ds,ax ; addressable again mov es,ax cli ; (for bug in some 8088s) mov ss,stkseg ; restore stack pointer mov sp,stkptr sti ; (for bug in some 8088s) jnc exec1 ; jump if no errors ; display error message mov dx,offset msg2 ; DS:DX = message address mov cx,msg2_length ; CX = message length mov bx,stderr ; BX = standard error handle mov ah,40h ; function 40h = write int 21h ; transfer to MS-DOS exec1: ret ; back to caller exec endp cls_cmd proc near ; intrinsic CLS command mov dx,offset cls_str ; send the ANSI escape mov cx,cls_str_length ; sequence to clear mov bx,stdout ; the screen mov ah,40h int 21h ret cls_cmd endp dos_cmd proc near ; intrinsic DOS command ; set null command tail mov word ptr par_cmd,offset nultail call exec ; and run COMMAND.COM ret dos_cmd endp exit_cmd proc near ; intrinsic EXIT command mov ax,4c00h ; call MS-DOS terminate int 21h ; function with ; return code of zero exit_cmd endp _TEXT ends STACK segment para stack 'STACK' ; declare stack segment dw 64 dup (?) STACK ends _DATA segment word public 'DATA' commands equ $ ; "intrinsic" commands table ; each entry is ASCIIZ string ; followed by the offset ; of the procedure to be ; executed for that command db 'CLS',0 dw cls_cmd db 'DOS',0 dw dos_cmd db 'EXIT',0 dw exit_cmd db 0 ; end of table com_var db 'COMSPEC=',0 ; environment variable ; COMMAND.COM filespec com_spec db 80 dup (0) ; from environment COMSPEC= nultail db 0,cr ; null command tail for ; invoking COMMAND.COM ; as another shell cmd_tail db 0,' /C ' ; command tail for invoking ; COMMAND.COM as a transient inp_buf db 80 dup (0) ; command line from standard input inp_buf_length equ $-inp_buf cmd_tail_length equ $-cmd_tail-1 prompt db cr,lf,'sh: ' ; SHELL's user prompt prompt_length equ $-prompt env_seg dw 0 ; segment of environment block msg1 db cr,lf db 'Unable to release memory.' db cr,lf msg1_length equ $-msg1 msg2 db cr,lf db 'EXEC of COMMAND.COM failed.' db cr,lf msg2_length equ $-msg2 msg3 db cr,lf db 'No COMSPEC variable in environment.' db cr,lf msg3_length equ $-msg3 cls_str db escape,'[2J' ; ANSI escape sequence cls_str_length equ $-cls_str ; to clear the screen ; EXEC parameter block par_blk dw 0 ; environment segment par_cmd dd cmd_tail ; command line dd fcb1 ; file control block #1 dd fcb2 ; file control block #2 fcb1 db 0 ; file control block #1 db 11 dup (' ') db 25 dup (0) fcb2 db 0 ; file control block #2 db 11 dup (' ') db 25 dup (0) stkseg dw 0 ; original SS contents stkptr dw 0 ; original SP contents _DATA ends end shell ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 12-5. SHELL.ASM: A simple table-driven command interpreter written in Microsoft Macro Assembler. The SHELL program is table driven and can easily be extended to provide a powerful customized user interface for almost any application. When SHELL takes control of the system, it displays the prompt sh: and waits for input from the user. After the user types a line terminated by a carriage return, SHELL tries to match the first token in the line against its table of internal (intrinsic) commands. If it finds a match, it calls the appropriate subroutine. If it does not find a match, it calls the MS-DOS EXEC function and passes the user's input to COMMAND.COM with the /C switch, essentially using COMMAND.COM as a transient command processor under its own control. As supplied in these listings, SHELL "knows" exactly three internal commands: Command Action ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ CLS Uses the ANSI standard control sequence to clear the display screen and home the cursor. DOS Runs a copy of COMMAND.COM. EXIT Exits SHELL, returning control of the system to the next lower command interpreter. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ You can quickly add new intrinsic commands to either the C version or the assembly-language version of SHELL. Simply code a procedure with the appropriate action and insert the name of that procedure, along with the text string that defines the command, into the table COMMANDS. In addition, you can easily prevent SHELL from passing certain "dangerous" commands (such as MKDIR or ERASE) to COMMAND.COM simply by putting the names of the commands to be screened out into the intrinsic command table with the address of a subroutine that prints an error message. To summarize, the basic flow of both versions of the SHELL program is as follows: 1. The program calls MS-DOS Int 21H Function 4AH (Resize Memory Block) to shrink its memory allocation, so that the maximum possible space will be available for COMMAND.COM if it is run as an overlay. (This is explicit in the assembly-language version only. To keep the example code simple, the number of paragraphs to be reserved is coded as a generous literal value, rather than being figured out at runtime from the size and location of the various program segments.) 2. The program searches the environment for the COMSPEC variable, which defines the location of an executable copy of COMMAND.COM. If it can't find the COMSPEC variable, it prints an error message and exits. 3. The program puts the address of its own handler in the Ctrl-C vector (Int 23H) so that it won't lose control if the user enters a Ctrl-C or a Ctrl-Break. 4. The program issues a prompt to the standard output device. 5. The program reads a buffered line from the standard input device to get the user's command. 6. The program matches the first blank-delimited token in the line against its table of intrinsic commands. If it finds a match, it executes the associated procedure. 7. If the program does not find a match in the table of intrinsic commands, it synthesizes a command-line tail by appending the user's input to the /C switch and then EXECs a copy of COMMAND.COM, passing the address of the synthesized command tail in the EXEC parameter block. 8. The program repeats steps 4 through 7 until the user enters the command EXIT, which is one of the intrinsic commands, and which causes SHELL to terminate execution. In its present form, SHELL allows COMMAND.COM to inherit a full copy of the current environment. However, in some applications it may be helpful, or safer, to pass a modified copy of the environment block so that the secondary copy of COMMAND.COM will not have access to certain information. Using EXEC to Load Overlays Loading overlays with the EXEC function is much less complex than using EXEC to run another program. The overlay can be constructed as either a memory image (.COM) or relocatable (.EXE) file and need not be the same type as the program that loads it. The main program, called the root segment, must carry out the following steps to load and execute an overlay: 1. Make a memory block available to receive the overlay. The program that calls EXEC must own the memory block for the overlay. 2. Set up the overlay parameter block to be passed to the EXEC function. This block contains the segment address of the block that will receive the overlay, plus a segment relocation value to be applied to the contents of the overlay file (if it is a .EXE file). These are normally the same value. 3. Call the MS-DOS EXEC function to load the overlay by issuing an Int 21H with the registers set up as follows: AH = 4BH AL = 03H (EXEC subfunction to load overlay) DS:DX = segment:offset of overlay file pathname ES:BX = segment:offset of overlay parameter block Upon return from the EXEC function, the carry flag is clear if the overlay was found and loaded. The carry flag is set if the file could not be found or if some other error occurred. 4. Execute the code within the overlay by transferring to it with a far call. The overlay should be designed so that either the entry point or a pointer to the entry point is at the beginning of the module after it is loaded. This technique allows you to maintain the root and overlay modules separately, because the root module does not contain any "magical" knowledge of addresses within the overlay segment. To prevent users from inadvertently running an overlay directly from the command line, you should assign overlay files an extension other than .COM or .EXE. It is most convenient to relate overlays to their root segment by assigning them the same filename but a different extension, such as .OVL or .OV1, .OV2, and so on. Figure 12-6 shows the use of EXEC to load and execute an overlay. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ . . . ; allocate memory for overlay mov bx,1000h ; get 64 KB (4096 paragraphs) mov ah,48h ; function 48h = allocate block int 21h ; transfer to MS-DOS jc error ; jump if allocation failed mov pars,ax ; set load address for overlay mov pars+2,ax ; set relocation segment for overlay ; set segment of entry point mov word ptr entry+2,ax mov stkseg,ss ; save root's stack pointer mov stkptr,sp mov ax,ds ; set ES = DS mov es,ax mov dx,offset oname ; DS:DX = overlay pathname mov bx,offset pars ; ES:BX = parameter block mov ax,4b03h ; function 4bh, subfunction 03h int 21h ; transfer to MS-DOS mov ax,_DATA ; make our data segment mov ds,ax ; addressable again mov es,ax cli ; (for bug in some early 8088s) mov ss,stkseg ; restore stack pointer mov sp,stkptr sti ; (for bug in some early 8088s) jc error ; jump if EXEC failed ; otherwise EXEC succeeded... push ds ; save our data segment call dword ptr entry ; now call the overlay pop ds ; restore our data segment . . . oname db 'OVERLAY.OVL',0 ; pathname of overlay file pars dw 0 ; load address (segment) for file dw 0 ; relocation (segment) for file entry dd 0 ; entry point for overlay stkseg dw 0 ; save SS register stkptr dw 0 ; save SP register ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 12-6. A code skeleton for loading and executing an overlay with the EXEC function. The overlay file may be in either .COM or .EXE format. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 13 Interrupt Handlers Interrupts are signals that cause the computer's central processing unit to suspend what it is doing and transfer to a program called an interrupt handler. Special hardware mechanisms that are designed for maximum speed force the transfer. The interrupt handler determines the cause of the interrupt, takes the appropriate action, and then returns control to the original process that was suspended. Interrupts are typically caused by events external to the central processor that require immediate attention, such as the following: þ Completion of an I/O operation þ Detection of a hardware failure þ "Catastrophes" (power failures, for example) In order to service interrupts more efficiently, most modern processors support multiple interrupt types, or levels. Each type usually has a reserved location in memory, called an interrupt vector, that specifies where the interrupt-handler program for that interrupt type is located. This design speeds processing of an interrupt because the computer can transfer control directly to the appropriate routine; it does not need a central routine that wastes precious machine cycles determining the cause of the interrupt. The concept of interrupt types also allows interrupts to be prioritized, so that if several interrupts occur simultaneously, the most important one can be processed first. CPUs that support interrupts must also have the capability to block interrupts while they are executing critical sections of code. Sometimes the CPU can block interrupt levels selectively, but more frequently the effect is global. While an interrupt is being serviced, the CPU masks all other interrupts of the same or lower priority until the active handler has completed its execution; similarly, it can preempt the execution of a handler if a different interrupt with higher priority requires service. Some CPUs can even draw a distinction between selectively masking interrupts (they are recognized, but their processing is deferred) and simply disabling them (the interrupt is thrown away). The creation of interrupt handlers has traditionally been considered one of the most arcane of programming tasks, suitable only for the elite cadre of system hackers. In reality, writing an interrupt handler is, in itself, straightforward. Although the exact procedure must, of course, be customized for the characteristics of the particular CPU and operating system, the guidelines on the following page are applicable to almost any computer system. A program preparing to handle interrupts must do the following: 1. Disable interrupts, if they were previously enabled, to prevent them from occurring while interrupt vectors are being modified. 2. Initialize the vector for the interrupt of interest to point to the program's interrupt handler. 3. Ensure that, if interrupts were previously disabled, all other vectors point to some valid handler routine. 4. Enable interrupts again. The interrupt handler itself must follow a simple but rigid sequence of steps: 1. Save the system context (registers, flags, and anything else that the handler will modify and that wasn't saved automatically by the CPU). 2. Block any interrupts that might cause interference if they were allowed to occur during this handler's processing. (This is often done automatically by the computer hardware.) 3. Enable any interrupts that should still be allowed to occur during this handler's processing. 4. Determine the cause of the interrupt. 5. Take the appropriate action for the interrupt: receive and store data from the serial port, set a flag to indicate the completion of a disk-sector transfer, and so forth. 6. Restore the system context. 7. Reenable any interrupt levels that were blocked during this handler's execution. 8. Resume execution of the interrupted process. As in writing any other program, the key to success in writing an interrupt handler is to program defensively and cover all the bases. The main reason interrupt handlers have acquired such a mystical reputation is that they are so difficult to debug when they contain obscure errors. Because interrupts can occur asynchronouslyÄÄthat is, because they can be caused by external events without regard to the state of the currently executing processÄÄbugs in interrupt handlers can cause the system as a whole to behave quite unpredictably. Interrupts and the Intel 80x86 Family The Intel 80x86 family of microprocessors supports 256 levels of prioritized interrupts, which can be triggered by three types of events: þ Internal hardware interrupts þ External hardware interrupts þ Software interrupts Internal Hardware Interrupts Internal hardware interrupts, sometimes called faults, are generated by certain events encountered during program execution, such as an attempt to divide by zero. The assignment of such events to certain interrupt numbers is wired into the processor and is not modifiable (Figure 13-1). Interrupt Vector Interrupt 8086/88 80286 80386 level address trigger ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00H 00HÄ03H Divide-by-zero x x x 01H 04HÄ07H Single step x x x 02H 08HÄ0BH Nonmaskable x x x interrupt (NMI) 03H 0CHÄ0FH Breakpoint x x x 04H 10HÄ13H Overflow x x x 05H 14HÄ17H BOUND exceeded x x 06H 18HÄ1BH Invalid opcode x x 07H 1CHÄ1FH Processor extension x x not available 08H 20HÄ23H Double fault x x 09H 24HÄ27H Segment overrun x x 0AH 28HÄ2BH Invalid task-state x x segment 0BH 2CHÄ2FH Segment not present x x 0CH 30HÄ33H Stack segment x x overrun 0DH 34HÄ37H General protection x x fault 0EH 38HÄ3BH Page fault x 0FH 3CHÄ3FH Reserved 10H 40HÄ43H Numeric coprocessor x x error 11HÄ1FH 44HÄ7FH Reserved ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 13-1. Internal interrupts (faults) on the Intel 8086/88, 80286, and 80386 microprocessors. External Hardware Interrupts External hardware interrupts are triggered by peripheral device controllers or by coprocessors such as the 8087/80287. These can be tied to either the CPU's nonmaskable-interrupt (NMI) pin or its maskable-interrupt (INTR) pin. The NMI line is usually reserved for interrupts caused by such catastrophic events as a memory parity error or a power failure. Instead of being wired directly to the CPU, the interrupts from external devices can be channeled through a device called the Intel 8259A Programmable Interrupt Controller (PIC). The CPU controls the PIC through a set of I/O ports, and the PIC, in turn, signals the CPU through the INTR pin. The PIC allows the interrupts from specific devices to be enabled and disabled, and their priorities to be adjusted, under program control. A single PIC can handle only eight levels of interrupts. However, PICs can be cascaded together in a treelike structure to handle as many levels as desired. For example, 80286- and 80386-based machines with a PC/AT-compatible architecture use two PICs wired together to obtain 16 individually configurable levels of interrupts. INTR interrupts can be globally enabled and disabled with the CPU's STI and CLI instructions. As you would expect, these instructions have no effect on interrupts received on the CPU's NMI pin. The manufacturer of the computer system and/or the manufacturer of the peripheral device assigns external devices to specific 8259A PIC interrupt levels. These assignments are realized as physical electrical connections and cannot be modified by software. Software Interrupts Any program can trigger software interrupts synchronously simply by executing an INT instruction. MS-DOS uses Interrupts 20H through 3FH to communicate with its modules and with application programs. (For instance, the MS-DOS function dispatcher is reached by executing an Int 21H.) The IBM PC ROM BIOS and application software use other interrupts, with either higher or lower numbers, for various purposes (Figure 13-2). These assignments are simply conventions and are not wired into the hardware in any way. Interrupt Usage Machine ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 00H Divide-by-zero PC, AT, PS/2 01H Single step PC, AT, PS/2 02H NMI PC, AT, PS/2 03H Breakpoint PC, AT, PS/2 04H Overflow PC, AT, PS/2 05H ROM BIOS PrintScreen PC, AT, PS/2 BOUND exceeded AT, PS/2 06H Reserved PC Invalid opcode AT, PS/2 07H Reserved PC 80287/80387 not present AT, PS/2 08H IRQ0 timer tick PC, AT, PS/2 Double fault AT, PS/2 09H IRQ1 keyboard PC, AT, PS/2 80287/80387 segment overrun AT, PS/2 0AH IRQ2 reserved PC IRQ2 cascade from slave 8259A PIC AT, PS/2 Invalid task-state segment (TSS) AT, PS/2 0BH IRQ3 serial communications (COM2) PC, AT, PS/2 Segment not present AT, PS/2 0CH IRQ4 serial communications (COM1) PC, AT, PS/2 Stack segment overflow AT, PS/2 0DH IRQ5 fixed disk PC IRQ5 parallel printer (LPT2) AT Reserved PS/2 General protection fault AT, PS/2 0EH IRQ6 floppy disk PC, AT, PS/2 Page fault AT, PS/2 0FH IRQ7 parallel printer (LPT1) PC, AT, PS/2 10H ROM BIOS video driver PC, AT, PS/2 Numeric coprocessor fault AT, PS/2 11H ROM BIOS equipment check PC, AT, PS/2 12H ROM BIOS conventional-memory size PC, AT, PS/2 13H ROM BIOS disk driver PC, AT, PS/2 14H ROM BIOS communications driver PC, AT, PS/2 15H ROM BIOS cassette driver PC ROM BIOS I/O system extensions AT, PS/2 16H ROM BIOS keyboard driver PC, AT, PS/2 17H ROM BIOS printer driver PC, AT, PS/2 18H ROM BASIC PC, AT, PS/2 19H ROM BIOS bootstrap PC, AT, PS/2 1AH ROM BIOS time of day AT, PS/2 1BH ROM BIOS Ctrl-Break PC, AT, PS/2 1CH ROM BIOS timer tick PC, AT, PS/2 1DH ROM BIOS video parameter table PC, AT, PS/2 1EH ROM BIOS floppy-disk parameters PC, AT, PS/2 1FH ROM BIOS font (characters 80HÄFFH) PC, AT, PS/2 20H MS-DOS terminate process 21H MS-DOS function dispatcher 22H MS-DOS terminate address 23H MS-DOS Ctrl-C handler address 24H MS-DOS critical-error handler address 25H MS-DOS absolute disk read 26H MS-DOS absolute disk write 27H MS-DOS terminate and stay resident 28H MS-DOS idle interrupt 29H MS-DOS reserved 2AH MS-DOS network redirector 2BHÄ2EH MS-DOS reserved 2FH MS-DOS multiplex interrupt 30HÄ3FH MS-DOS reserved 40H ROM BIOS floppy-disk driver (if PC, AT, PS/2 fixed disk installed) 41H ROM BIOS fixed-disk parameters PC ROM BIOS fixed-disk parameters AT, PS/2 (drive 0) 42H ROM BIOS default video driver (if PC, AT, PS/2 EGA installed) 43H EGA, MCGA, VGA character table PC, AT, PS/2 44H ROM BIOS font (characters 00HÄ7FH) PCjr 46H ROM BIOS fixed-disk parameters AT, PS/2 (drive 1) 4AH ROM BIOS alarm handler AT, PS/2 5AH Cluster adapter PC, AT 5BH Used by cluster program PC, AT 60HÄ66H User interrupts PC, AT, PS/2 67H LIM EMS driver PC, AT, PS/2 68HÄ6FH Unassigned 70H IRQ8 CMOS real-time clock AT, PS/2 71H IRQ9 software diverted to IRQ2 AT, PS/2 72H IRQ10 reserved AT, PS/2 73H IRQ11 reserved AT, PS/2 74H IRQ12 reserved AT IRQ12 mouse PS/2 75H IRQ13 numeric coprocessor AT, PS/2 76H IRQ14 fixed-disk controller AT, PS/2 77H IRQ15 reserved AT, PS/2 78HÄ7FH Unassigned 80HÄF0H BASIC PC, AT, PS/2 F1HÄFFH Not used PC, AT, PS/2 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 13-2. Interrupts with special significance on the IBM PC, PC/AT, and PS/2 and compatible computers. Note that the IBM ROM BIOS uses several interrupts in the range 00HÄ1FH, even though they were reserved by Intel for CPU faults. IRQ numbers refer to Intel 8259A PIC priority levels. The Interrupt-Vector Table The bottom 1024 bytes of system memory are called the interrupt-vector table. Each 4-byte position in the table corresponds to an interrupt type (0 through 0FFH) and contains the segment and offset of the interrupt handler for that level. Interrupts 0 through 1FH (the lowest levels) are used for internal hardware interrupts; MS-DOS uses Interrupts 20H through 3FH; all the other interrupts are available for use by either external hardware devices or system drivers and application software. When an 8259A PIC or other device interrupts the CPU by means of the INTR pin, it must also place the interrupt type as an 8-bit number (0 through 0FFH) on the system bus, where the CPU can find it. The CPU then multiplies this number by 4 to find the memory address of the interrupt vector to be used. Servicing an Interrupt When the CPU senses an interrupt, it pushes the program status word (which defines the various CPU flags), the code segment (CS) register, and the instruction pointer (IP) onto the machine stack and disables the interrupt system. It then uses the 8-bit number that was jammed onto the system bus by the interrupting device to fetch the address of the handler from the vector table and resumes execution at that address. Usually the handler immediately reenables the interrupt system (to allow higher-priority interrupts to occur), saves any registers it is going to use, and then processes the interrupt as quickly as possible. Some external devices also require a special acknowledgment signal so that they will know the interrupt has been recognized. If the interrupt was funneled through an 8259A PIC, the handler must send a special code called end of interrupt (EOI) to the PIC through its control port to tell it when interrupt processing is completed. (The EOI has no effect on the CPU itself.) Finally, the handler executes the special IRET (INTERRUPT RETURN) instruction that restores the original state of the CPU flags, the CS register, and the instruction pointer (Figure 13-3). Whether an interrupt was triggered by an external device or forced by software execution of an INT instruction, there is no discernible difference in the system state at the time the interrupt handler receives control. This fact is convenient when you are writing and testing external interrupt handlers because you can debug them to a large extent simply by invoking them with software drivers. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ pic_ctl equ 20h ; control port for 8259A ; interrupt controller . . . sti ; turn interrupts back on, push ax ; save registers push bx push cx push dx push si push di push bp push ds push es mov ax,cs ; make local data addressable mov ds,ax . ; do some stuff appropriate . ; for this interrupt here . mov al,20h ; send EOI to 8259A PIC mov dx,pic_ctl out dx,al pop es ; restore registers pop ds pop bp pop di pop si pop dx pop cx pop bx pop ax iret ; resume previous processing ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Figure 13-3. Typical handler