
   Miscellaneous OMF notes
   =======================

 This is an attempt to put together some of the more difficult to come by
information concerning the standard (if that is the appropriate word) Object
Module Format, or OMF. It is assumed that the reader is acquainted with the
OMF structure and basic records.

 The best "official" source of information is a file called omf.pdf,
available from several places on the Internet. Its title is:

   Tools Interface Standards (TIS)
  Relocatable Object Module Format
        (OMF) Specification
           Version 1.1

 It contains a lot of detail on the Microsoft-generated OMF files but it
doesn't help a whole lot when dealing with object files generated by
tools by IBM, Borland or even Watcom for that matter. The specification
is readily available at 

 http://www.openwatcom.org/ftp/devel/docs/ 

 Since our tools should work with the widest possible variety of OMF files,
we need to understand some of the vendor specific information, especially as
it pertains to commonly used OMF records.


  LINNUM records (94h or 95h)
  ===========================

 LINNUM/LINNUM386 records are common and our tools try to parse them. Sadly,
their contents is not standardized. In particular, it depends on the COMENT
record 0A1h. The 0A1h COMENT record contains debug info style information,
omf.pdf describes three possibilities:

 - CV - Microsoft CodeView style; somewhat documented in omf.pdf
 - DX - AIX style - no information, very exotic
 - HL - IBM HLL style; generated by IBM's CSet/2, CSet++ and VisualAge C++
        compilers.

 The last type is important when dealing with IBM's OMF files, very common
on 32-bit OS/2. No documentation appears to be publicly available, although
it is possible to glean its structure from file dumps coupled with the output
of IBM's LINK386. The IBM HLL style is only known to occur in 32-bit objects
(because for 16-bit development IBM always relied on Microsoft tools).

 IBM uses three types of LINNUM386 records: one lists all source files,
files, another (rarely used?) describes path of flow through the program,
the thir lists actual lines and how they correspond to offsets in the
compiled code. Tables that would be larger than 1024 bytes will be split
into multiple LINNUM386 records.

 The first type, file names table, lists the source files. The structure is
as follows:

 DWORD    - first displayable char in listing line (unused by debugger)
 DWORD    - number of displayable chars in listing line (unused by debugger)
 DWORD    - number of source/listing file names that follow
 <string> - file name(s) in length byte followed by characters format;
            the listing file, if provided, will always be last in the table

 The paths are typically relative, not absolute. This often poses problems
to the debugger - the IBM debugger is known to be awkward to use if a project
contains multiple different source files with identical name.
 
 The second type, path table, is only present in objects with source/listing
information and is not used by IBM debugger. The format is:

 DWORD  - offset into segment
 WORD   - path code, the contents is unknown
 WORD   - source file index (refers to file names table)

 The final LINNUM386 record, line number table, starts with a special first
entry followed by variable number of additional entries. The format of
the first entry is:

 WORD   - line number, contains zero to indicate first entry
 BYTE   - entry type (see below)
 BYTE   - reserved
 WORD   - number of entries that follow
 WORD   - segment number
 DWORD  - size of file names table or address of logical segments (depends
          on entry type, only present for types 0, 1, 2, and 3)
 
 The entry type can hold the following values:

 0 - source/offset information only
 1 - listing/offset information only
 2 - source and listing information
 3 - file names table follows
 4 - path table follows
 
 The IBM debugger doesn't appear to use either listing information or the
path table (NB - this refers to execution path, not pathnames).

 Line number entries follow the first entry and come in the following
flavours:

 Type 0 - source lines
 
 WORD  - line number
 WORD  - source file index
 DWORD - offset within segment
 
 It should be obvious that there is no way to represent line information for
source files with more than 65535 lines (and yes, there are people crazy
enough to have source files that big).

 Type 1 - listing lines

 DWORD - line number
 DWORD - statement number
 DWORD - offset within segment
 
 Type 2 - source and listing lines

 WORD  - source line number
 WORD  - source file index
 DWORD - listing line number
 DWORD - listing statement number
 DWORD - offset within segment

 It is apparent that type 2 is an amalgamation of types 0 and 1. Types 1 and
2 are not commonly used.

 Coupled with the information in the preceding LINNUM386 record, this uniquely
identifies lines in particular source files.

 Some of our code parses the LINNUM records. It is vital that we only parse
this data when we know what format it is in; otherwise we're sure to get
errors or crashes.
