Borchard M.-Upgrading Older Fortran Programs To Current Fortran-95 Versions-Computer Program Service (2001)
Borchard M.-Upgrading Older Fortran Programs To Current Fortran-95 Versions-Computer Program Service (2001)
Borchard M.-Upgrading Older Fortran Programs To Current Fortran-95 Versions-Computer Program Service (2001)
April 2001
Abstract
This report describes the results of a study the Data Bank has undertaken to determine the difficulties encountered and the effort required to export codes written in Fortran-IV, Fortran-77, and platform specific implementations thereof to Fortran-95 compiler environments. The objective is to determine the survivability of not recently developed software across language and system evolution. The study is based on computer codes that had been released to the OECD/NEA and which are placed in the public domain. Samples from codes developed for diverse platforms were taken that appeared to be representative of typical problems normally encountered in such conversion efforts. First the different problems are classified, then the differences between the different standards are discussed, the computer / compiler dependencies are identified and finally specific solutions are proposed as well as tools that can help the transformation work. The study is not fully comprehensive of all possible problems encountered. These depend often on tricks used by programmers in old times to bypass machine and compiler limitations. One conclusion that can be drawn from this is that programs that were complying largely with standards of the programming languages at the time of their development have a high survival rate and can be used today without or with slight changes. Tricky programming style programs require considerable effort to be maintained also in the future.
Contents
1 2 3 Classification of problems...........................................................................................................4 Different standards .....................................................................................................................5 Computer / compiler dependency .............................................................................................8 3.1 3.2 3.3 4 5 6 General .................................................................................................................................8 Compiler - dependency .........................................................................................................8 Computer / processor dependency ....................................................................................10
References.................................................................................................................................14 Appendix Non Standard Function Replacement ....................................................................15 Appendix - The Perils of Floating Point ...................................................................................16 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5 6.1.6 6.1.7 6.1.8 6.2 Binary Floating-Point...................................................................................................16 Inexactness ..................................................................................................................16 Insignificant Digits.......................................................................................................17 Crazy Conversions .......................................................................................................17 Too Many Digits..........................................................................................................18 Too Much Precision .....................................................................................................18 Safe Comparisons ........................................................................................................19 Programming with the Perils ........................................................................................19
6.2.1
1 Classification of problems
This chapter contains the classification of problem sources, when upgrading to new language standards or new computer systems. Sources of problems persist in a) Differences in language definitions. The new Fortran language definitions are designed to keep up compatibility to previous version(s) as far as possible. But Fortran-95 standard deleted some ancient features of FORTRAN-66 or FORTRAN IV, which were still available in FORTRAN-77 and FORTRAN-90. The Fortran-2000 standard will go on removing old features. Most Fortran95-compilers still accept these features as legal elements of the language, but they should be replaced by other up-to-date Fortran-elements, since their task is not always well defined. b) Differences in computer- or processor- architecture. These differences result in data representation mismatches, which lead to two sources of problems 1) Different accuracy or alignment of data in memory. Accuracy and alignment problems can be resolved by using an appropriate compiler option. 2) Mismatch of archived binary data, i.e. binary libraries or executable programs. Although binary libraries can be transformed if a precise format description exists, it will take much time and effort to do so. Binary libraries without precise format description cant be used anymore. Executable programs cannot be used on a different architecture and cannot be transformed. c) Differences in formatted output data. Some compilers read and write formatted input in an incompatible way to other compilers. This will especially happen for different formatted line ends. There exist two line end formats, which are incompatible. Thus it might be necessary to convert formatted data. The NEA Data Bank has tools to perform such conversion. d) Usage of compiler- or computer- specific intrinsic functions or subroutines. These functions or subroutines have to be replaced by up-to-date standard Fortran elements. e) Job-control-language statements in the source code. Job-control-language statements can be removed from the source code. Sometimes the input data definition resides among these statements. It has to be preserved in a different file. The code-restructuring tool SPAG ( http://www.polyhedron.co.uk/pf/plusfort.html) is able to resolve some of the resulting problems automatically. And it generates a more readable source that is easier to translate to up-to-date Fortran.
2 Different standards
During the development of FORTRAN its definition was subject to change. The ANSI-standard introduced FORTRAN 77 ( [1]), which superseded FORTRAN IV / FORTRAN 66. Afterwards the FORTRAN-90 standard replaced FORTRAN 77. It is a superset of FORTRAN-77 and therefore any code that fulfils the requirements of ANSI-FORTRAN-77 is a syntactical correct FORTRAN-90 code. But some features of FORTRAN 77 were marked obsolescent. These features were kept only for compatibility of older programs and their use was no longer recommended. The Fortran-95-standard deleted these features and marked further features as obsolescent. But most Fortran-95 compilers recognise these obsolescent features as well as the deleted features, which are listed below: Variables used in DO-loops may no longer be REAL or DOUBLE PRECISION variables, but only INTEGER-variables. The use of REAL and DOUBLE PRECISION variables could result in a computer dependent behaviour of the corresponding code. Thus this feature was removed. To upgrade a program using this feature such loop may be replaced by a DO WHILEloop, e.g.:
could be replaced by
R=1.0 DO WHILE (R<=17.3) PRINT *,R R=R+0.1 END DO
Be aware of differences in REAL-number representation on different computers. These differences may lead to a different number of loop cycles on two different computers ( [5]). For some DO-loops a straight forward conversion may be possible. I.e. changing the type of the loop index to INTEGER will solve the problem; e.g.:
could be replaced by
INTEGER R DO 100 R=1,17 PRINT *,R 100 CONTINUE
The PAUSE statement was removed. PAUSE stopped the program execution until an operator entered a command to continue program execution. This function was used to change tapes or input cards etc. If a PAUSE is desired it could be replaced by waiting for a keystroke e.g.:
PAUSE
could be replaced by
READ *
which reads a line from standard input (usually the keyboard). The ASSIGN-statement was removed. ASSIGN enabled the assignment of a statement label to an INTEGER variable. This INTEGER-variable could be used in a subsequent GOTO-statement or FORMAT-specification and changed thus the program behaviour depending on its contents. An existing ASSIGN can be replaced by the code segment that contains the according GOTOstatement or FORMAT-specification, e.g.:
IF (J.EQ.1) IF (J.EQ.2) GOTO I 100 PRINT *, GOTO 300 200 PRINT *, ASSIGN 100 TO I ASSIGN 200 TO I 100 200 5
300
PRINT *, END
could be replaced by
IF (J.EQ.1) THEN PRINT *, 100 END IF IF (J.EQ.2) THEN PRINT *, 200 END IF PRINT *, END
In a replacement like this J must not be changed in the first IF-THEN-END IF structure. If a modification of J is unavoidable the code segment must be modified accordingly e.g.
IF (J.EQ.1) THEN PRINT *, 100 J=2 ELSE IF (J.EQ.2) THEN PRINT *, 200 J=J+1 END IF PRINT *, J, END
The last GOTO is an assigned GOTO supported by CDC FORTRAN. It checked K against the numbers in the brackets. The compiler printed an error message if K was different from all of these numbers. The correct replacement replaces the ASSIGN-statements as follows:
104 DO 108 I=1,239 JJJ=A*10. 105 RHO0=1./(D*12.56637062) JJJ=A*10. 106 RHO1(I)=3.*RHO0*D/(X1(I)**3) JJJ=A*10. 107 RHO2(I)=3.*RHO0*D/(X2(I)**3-X1(I)**3) 108 E3(I)=E*(RHO3(I)**(2./3.))
This replacement is possible only with a look at the program flux. Another replacement could use a variable that holds the assigned value and an IF- or CASE-structure to GOTO the selected statement. The H-edit descriptor was deleted. The H(ollerith)-edit descriptor was used for text-constants in FORMAT-statements. It has to be replaced by the A-edit-descriptor and text following the H has to be quoted. The first character following the H-edit descriptor had a special meaning on some machines ( [2]). Usually this character formatted the output (e.g. a 1 meant new page before printing the text). The SPAG source-code restructuring tool ( http://www.polyhedron.co.uk/pf/plusfort.html) performs a replacement of H-edit descriptors automatically. An example of a replacement is listed below:
could be replaced by
WRITE (6,280) 280 FORMAT (15A, CROSS-SECTIONS) 6
A common way to read a title for the output was a labelled FORMAT-line with a dummy H-edit descriptor. A read with the label of this line led to a modification of this line. Thus a subsequent write could use this line to print the updated text. The following example demonstrates this behaviour:
WRITE (6, 110) 100 READ (7, 110) 110 FORMAT (24HTO BE MODIFIED WRITE (6, 110)
This way of using Format specifications is illegal therefore the above code segment has to be modified. A correct replacement is stated below:
CHARACTER TITLE*24 TITLE=TO BE MODIFIED WRITE (6, 110) TITLE 100 READ (7, 110) TITLE 110 FORMAT (24A) WRITE (6, 110) TITLE
Branching to an ENDIF-statement from outside the IF-THEN-ELSE structure is no longer allowed. Inserting a labelled CONTINUE-statement will solve such incompatibilities:
IF (I IF (A WRITE ELSE WRITE 3 ENDIF .EQ. 0) GOTO 3 .GE. 0.0) THEN (6, *) A (6, *) A**2
could be replaced by
IF (I .EQ. 0) GOTO 3 IF (A .GE. 0.0) THEN WRITE (6, *) A ELSE WRITE (6, *) A**2 ENDIF 3 CONTINUE
Remark: All tested compilers accepted the deleted features, except reading into an H-edit descriptor format line (stated above as illegal). Therefore no source-code modification was necessary due to different language definitions.
3.1
General
A general problem persisted in different character-sets that were used to write the programs. Some programs were not converted correctly from BCD (Binary Coded Decimal) to EBCDIC (Extended Binary Coded Decimal Interchange Code). In the current ASCII format these programs contain some strange characters. During the tests some packages not converted from BCD to EBCDIC, were found and modified accordingly (necessary replacements: % by (, < by ), # by =, & by + and @ by ). File handling can be a source of problems. Some of these are discussed in the next chapter. Usually non-standard edit descriptors, different implementations of error status variables, differences in file formats and a different behaviour on errors result in incompatibilities. Another general problem persists in not initialised variables. Some compiler guaranteed that variables contain 0 on their first use. But this is no standard. A variable may contain anything on its first use and therefore it must be initialised. Sometimes not initialised variables hint to a typing error ( see below p.10). Some codes were implemented as spaghetti-codes due to lack of computer power and memory in the ancient times of computing. These codes are not easy to understand. It can be a good resource of understanding to inspect the code after it was restructured with SPAG-tool ( http://www.polyhedron.co.uk/pf/plusfort.html).
3.2
Compiler - dependency
The usage of non-standard compiler features or language extensions is a major problem, when trying to compile older FORTRAN-programs with a Fortran-95 compiler. Some widely used compiler extensions are still supported by Lahey lf95 for compatibility to old machines and programs. This can result in problems, because the compiler extensions differed on different computer-systems. To avoid problems with these extensions they have to be replaced by up-to date elements of Fortran-95. E.g. non-standard extensions like CTIME and ITIME accessed the built-in computer timer(s). They can be replaced by the Fortran-90/95 standard DATE_AND_TIME routine. A compiler supporting the extensions may still be useful to avoid source code modification. But it must be guaranteed that this compiler supports the extensions correctly. And depending on their purpose the extensions may be deleted from the source code or must be replaced by standard routines if no such compiler is available. Anyway a replacement of non-standard extensions by standard functions is desirable. During the programs tests the following problems arose and were solved by the stated solutions: Statement-functions had to be declared by using the DEFINE-keyword in CDC FORTRAN IVprograms; e.g.:
DEFINE VV(I) = V(I)
to conform to standard Fortran. But the statement function feature is marked obsolescent in Fortran-95 standard and might be deleted in future language standards. Generic FUNCTIONs can be used to replace statement functions
READ and WRITE statements can cause a wide range of problems. Some are listed below: Mismatch of data size on WRITE and subsequent READ. Reading a previously unformatted written variable as a dummy value will work only if the size of both variables matches. E.g. the following code segment works only if the size of REAL and INTEGER variables is equal:
OPEN (UNIT=2, FORM=UNFORMATTED, STATUS=NEW) I=2 WRITE (2) I 8
R=REAL(I+2) WRITE (2) R CLOSE (2) OPEN (UNIT=2, FORM=UNFORMATTED, STATUS=OLD) READ (2) DUMMY READ (2) R CLOSE (2) PRINT *, I was , INT (R)-2, and R is , R
But if the REAL-variables like DUMMY and R differ in size from the INTEGER variable I it will print something else. For Lahey lf 95 youll obtain a different result if you use the dbl compiler option and for DEC-ALPHA f95 youll obtain a different result when using the compiler options r8 i4. Mismatch of the number of arguments for READ and the number of items in a formatted input. Some compilers perform a special processing if the number of actual arguments in a READ-statement does not match the number of data available from the input stream. E.g. supposed the input
1 2 3 4 5.0 6.0
The values of I, J, K, R, S and T will usually be (and expected to be) I=1, J=2, K=3, R=5.0, S=6.0 and T undefined/unchanged. But for some compilers the code may stop with an error condition and for some others the values may be I=1, J=2, K=3, R=4.0, S=5.0 and T=6.0 Mismatch of status codes for IOSTAT (i/o-status). On successful completion of an i/ooperation the IOSTAT value will be 0. But in case of an error the IOSTAT-value is not well defined. Thus a program relying on a special error-value may not work with a different compiler/computer. The code could be modified to accept only a successful termination of an i/o-operation to resolve such problems. I.e. an IOSTAT value not equal 0 will result in program termination with the according error code.
Non-standard edit-descriptors. Some compilers introduced edit descriptors for special purposes. E.g. IBM VS FORTRAN used the Q edit-descriptor to format extended precision (real) data. Such format descriptors have to be replaced by standard descriptors (e.g. Q D).
DO loop index variables were sometimes used outside the range of the DO-loop. This is not recommended, although Fortran standard defines the according loop index value. Some ancient compilers disallowed this usage of a loop index by making the scope of the loop-index loop-local. Some codes tried to bypass this by assigning the loop index variable to a global variable of the same name. This does not work anymore. A legal replacement is stated below:
DO 100 I=1,10 I=I 100 CONTINUE PRINT *, I
This code segment solved the local definition problem of I, but it is illegal for up-to-date Fortran. The expected output of this code segment is 10. Deleting the line I=I would result in Fortran95 conformance, but the output would be 11. Thus a correct replacement would be:
DO 100 I=1,10 IOUT=I 100 CONTINUE PRINT *, IOUT
Old Fortran-compilers did not provide character manipulation. Therefore some source codes used INTEGER or REAL variables to store text. But it is not always possible to store the same amount
9
of characters in one INTEGER or REAL variable, because the length of those generic variables differs between different computers. The INTEGER and REAL variables holding character data can be replaced by CHARACTER variables, since Fortran-95 standard provides as well as Fortran-77 standard character manipulation routines. An example is listed below:
DIMENSION TEXT(4) DATA TEXT / INTEGRAL AVERAGE /
TEXT is a REAL variable that is used to store text. The REAL format of Lahey lf95 uses 4 bytes to store the REAL numbers. Thus an array of four REAL numbers (TEXT) provides 16 bytes storage space. But the desired text contains 24 characters and needs therefore 24 bytes storage space. A replacement for this code segment could be:
CHARACTER TEXT*24 DATA TEXT / INTEGRAL AVERAGE /
Beside the general problem of not initialised variables, stated above, some not initialised variables give a hint to a typing error in the source code. E.g.:
DO 11 I=1,K M1=4*K+3+I M2=M1+K M3=M2+K M4=M3+K M5=M4+K E=(T(M1)+T(M5)-4.*(T(M2)+T(M4))+6.*T(M6))/(12.*T(M5)) WRITE(6,*) E 11 CONTINUE
In the formula for E M6 was undefined, but the variable M3 was not used in this formula. The examination of the calculation results and a discussion with the author resulted in the replacement of M6 by M3. Thus the formula for E was modified to:
E=(T(M1)+T(M5)-4.*(T(M2)+T(M4))+6.*T(M3))/(12.*T(M5))
Date and time routines are a further source of incompatibilities, since no Fortran-standard before Fortran-90 defined such routines and their in- and output. Usually the compiler-/computer-specific date and time routines were used to add time-stamps to the output. An easy way to resolve problems with non-standard date and time routines is the mapping of these routines to the standard DATE_AND_TIME function. An example is listed below: First the actual date (IDAY) and time (TIME) will be saved in two variables (IDATE and KLOCK)
CALL IDAY (IDATE) CALL TIME (KLOCK)
3.3
Some problems result not from compiler dependency of the source code but from computer or even processor dependency of the source code. In context with real-number calculations and processor
10
dependency reference [5] contains a good explanation of problems. A list of computer / processor dependency problems is given below: OVERLAY PROGRAM <name> inserted due to lack of memory must be converted to SUBROUTINE <name>. By the use of OVERLAY programs it was possible to load a necessary piece of program into the memory space of a no longer needed piece of program. Today OVERLAY is no longer useful and it is not an element of standard Fortran. Current computers support virtual memory and are equipped with random access memories big enough to hold all subroutines of a program at the same time. An example for the necessary replacement is listed below:
NTLIB=10 CALL OVERLAY(6LTERMOS,1,0) END OVERLAY(TERMOS,1,0) PROGRAM TERMOS COMMON /B100/ T(13,13,43),TT(1544)
has to be replaced by
NTLIB=10 CALL TERMOS END SUBROUTINE TERMOS COMMON /B100/ T(13,13,43),TT(1544)
Job Control Language (JCL) files contain usually statements to start the compiler, to prepare input data, to load and replace overlay modules and to execute the program. Most of these statements can be ignored or deleted safely. But the preparation of input data has to be checked carefully. The JCL-statement DD was often used to assign input and output files to Fortran-units. An example is listed below:
//GO.FT06F001 DD UNIT=L92,DISP=(NEW,PASS),LABEL=(1,SL), // VOLUME=(PRIVATE,SER=EN1109),DSNAME=ENEA1109 // DCB=(,RECFM=F,BLKSIZE=133,TRTCH=ET,DEN=2) //GO.FT09F001 DD UNIT=L91,DISP=(NEW,PASS),LABEL=(1,SL), // VOLUME=(PRIVATE,SER=EN2068),DSNAME=ENEA2068 //GO.SYSIN DD * 50 2 2068 2 1 1 0 0 0 1 50 0.14 0.31 0.42 0.55 0.60 0.69
The first two DD-statements (five lines) define the output destinations for unit 6 and unit 9. All supplied hardware information can be ignored safely. The files fort.6 and fort.9 will be used for the output. On some systems unit 6 (=fort.6) is predefined as standard output. These systems will display all information sent to unit 6 on the screen. The last DD statement defines the standard input (usually unit 5). The two lines following this statement will be used as standard input data. Such DD-statements have to be translated. The two data lines (starting with 50) have to be copied in a file named fort.5 to be used as standard input in the example above. The other two DD statements can be ignored. Usually it is the easiest way to use Fortran unit files (fort.1 for Fortran unit 1, fort.2 for Fortran unit 2, fort.3 for Fortran unit 3, etc.), i.e. to copy the input files to the according unit files. Sometimes it may be better to use generic file names and to add OPEN statements to the source code. And sometimes the code has to be modified more thoroughly, since some special features of JCL were used. An example is listed below:
//GO.FT06F001 DD SYSOUT=A,DCB=(RECFM=FBA,LRECL=133,BLKSIZE=3458) //GO.FT09F001 DD UNIT=TAPE9,VOL=SER=09,DISP=(OLD,PASS), // LABEL=(1,NL), // DCB=(RECFM=FB,LRECL=220,BLKSIZE=6600) //GO.FT09F002 DD VOL=REF=*.FT09F001,DISP=(OLD,PASS), // LABEL=(2,NL), // DCB=(RECFM=FB,LRECL=220,BLKSIZE=6600) 11
The first DD statement above defines the destination for output on unit 6. The next three DD statements concatenate three library files. The result is a data stream available on unit 9. An endof-file mark is added at the end of each file of the stream, i.e. unit 9 contains three end-of-file marks. The last DD statement defines provided INPUT DATA on unit 5. The according program Pepin used the end-of-file marks to read the three different files from the unit 9 stream. For actual files the end-of-file mark does not exist anymore, but the end of file is the physical end of the file on the storage medium (usually a disk). Thus no file will contain more than one end of file. Therefore the Pepin source code had to be modified that it reads three different library files from three different Fortran unit streams. A short segment of the corresponding source code is listed below:
397 READ(IB1,554,END=4) DO 396 I=1,M 396 READ(IB1,556) GO TO 397 4 DO 15 I=1,M 8 READ(IB1,96) DO 22 I=1,M 22 READ(IB1,555) 7 READ(IB1,96)
This code segment reads from unit IB1 until the end condition on line 397 is fulfilled, i.e. the end-of-file mark has been reached. In expectation of the next file being available on unit IB1 it continues reading from unit IB1 on line 8 afterwards. This code segment had to be modified that it reads from two different units/files (IB1, IB2):
397 396 4 8 22 7 READ(IB1,554,END=4) DO 396 I=1,M READ(IB1,556) GO TO 397 DO 15 I=1,M READ(IB2,96) DO 22 I=1,M READ(IB2,555) READ(IB2,96)
Prerequisites to the length of integer and real values may lead to problems. THERMOS-OTA used dummy REAL variables to ignore some binary integer values while reading a binary library. This works only if REAL and INTEGER-variables are equal in size. The resulting error-message was an end-of-file error while reading. But a code not reading until the end-of-file may just produce a different result. To resolve such problems either change the variable type or use a compiler option that equalises sizes of REAL and INTEGER variables. This error is difficult to find, because the structure of the binary data file has to be determined from other parts of the source code. The REAL-format is system-dependent as well as all other data formats. A result of this difference may be a mismatch in the computation results between two systems ( [5]). If a program assumes for an INTEGER-value 16bit format and a cyclic increment (0, 1, 2,, 65534, 65535, 0, 1) a 32bit INTEGER-format on a different computer may result in a completely destroyed program flux. Such a problem did not arise during the program tests. To avoid such effects most compilers provide an option to select default data-representation-format. Usually these compiler options will solve problems resulting from machine dependent data formats. E.g. Lahey lf95 supports default double precision through the dbl option, DEC-ALPHA f95 supports default double precision through the r8 option.
12
COMMON Block size mismatches may give a hint to alignment or variable size problems. E.g. Pepin (NEA1339 used the following common blocks in three routines:
PROGRAM MAIN REAL*8 XNS .. COMMON/C1/XNS(650),XN0(650),XND(650,59) .. END SUBROUTINE A REAL*8 XNS .. COMMON/C1/XNS(650) .. END SUBROUTINE B .. COMMON/C1/EBM(650),EGM(650),SX(650),XND(650,59) .. END
The usage of /C1/ in SUBROUTINE B assumed that REAL variables are half size of REAL*8/DOUBLE PRECISION variables. But for some compilers this does not apply. To correct the arising output error SUBROUTINE B had to be modified. No data was transmitted through /C1/XNS(650) resp. /C1/EBM(650), EGM(650) therefore the modified version looks like:
SUBROUTINE B REAL*8 EBM .. COMMON/C1/EBM(650), SX(650),XND(650,59) DIMENSION EGM(650) .. END
13
4 References
[1] [2] [3] [4] [5] http://www.isi.edu/~iko/pl/hw3_fortran.html Enrico Sartori News group: comp.lang.fortran http://www.fortran.com/FAQ The Perils of Floating Point http://www.lahey.com/float.htm by In-Young Ko NEA Data Bank by Keith Bierman by Bruce M. Bush NEA Data Bank NEA Data Bank NEA Data Bank
FORTRAN VERSION 5 MANUAL CDC OPERATING SYSTEM UNIVAC 1100 FORTRAN V Programmer Reference IBM VS FORTRAN Language Reference
14
ERRSET
Depends; usually it can be removed; use compiler option to select behaviour on division by zero Depends; usually it can be removed; compiler option may select special error treatment Remove Remove; use compiler option STOP Replace by GOTO m if no standard conformance is desired; otherwise refer to chap. 2 (ASSIGN) Map to DATE_AND_TIME Replace by IOSTAT-parameter of OPEN/READ/WRITE Remove
Depends; usually it can be removed; use compiler option to select overflow behaviour Output to a punchcard printer Replace by WRITE Generate random numbers; probably between 0.0 Map to RANDOM_NUMBER and 1.0 Retrieve the number of CPU seconds used so far Map to CPU_TIME Direct access to storage media; first parameter is Use direct access for the according the unit number, second parameter is the address; unit number UNIVAC FORTRAN V Set status lights of the old computers; UNIVAC Remove or transform to a status FORTRAN V variable Get status of the status lights; UNIVAC Remove or transform to a status FORTRAN V variable Get status of an external computer switch; Remove or transform to an input UNIVAC FORTRAN V statement Current time of day as CHARACTER*10 Map to DATE_AND_TIME
15
6.1.2 Inexactness
Floating-point arithmetic on digital computers is inherently inexact. The 24 bits (including the hidden bit) of mantissa in a 32-bit floating-point number represent approximately 7 significant decimal digits. Unlike the real number system, which is continuous, a floating-point system has gaps between each number. If a number is not exactly representable, then it must be approximated by one of the nearest representable values. Because the same number of bits are used to represent all normalized numbers, the smaller the exponent, the greater the density of representable numbers. For example, there are approximately 8,388,607 single-precision numbers between 1.0 and 2.0, while there are only about 8191 between 1023.0 and 1024.0. On any computer, mathematically equivalent expressions can produce different values using floatingpoint arithmetic. In the following example, Z and Z1 will typically have different values because (1/Y) or 1/7 is not exactly representable in binary floating-point: REAL X, Y, Y1, Z, Z1 DATA X/77777/, Y/7/ 16
should help a lot: 1. Only about 7 decimal digits are representable in single-precision IEEE format, and about 16 in double-precision IEEE format. 2. Every time numbers are transferred from external decimal to internal binary or vice-versa, precision can be lost. 3. Always use safe comparisons. 4. Beware of additions and subtractions that can quickly erode the true significance in a result. The computer doesnt know what bits are truely significant. 5. Conversions between data types can be tricky. Conversions to double-precision dont increase the number of truely significant bits. Conversions to integer always truncate toward zero, even if the floating-point number is printed as a larger integer. 6. Dont expect identical results from two different floating-point implementations. I hope that I have given you a little more awareness of what is happening in the internals of floatingpoint arithmetic, and that some of the strange results you have seen make a little more sense. While some of the "perils" can be avoided, many just need to be understood and accepted.
6.2
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) has defined standards for floatingpoint representations and computational results (IEEE Std 754-1985). This section is an overview of the IEEE standard for representing floating-point numbers. The data contained herein helps explain some of the details in the rest of the article, but is not required for understanding the basic concepts. Most binary floating-point numbers can be represented as 1.ffffff x 2^n, where the 1 is the integer bit, the fs are the fractional bits, and the n is the exponent. The combination of the integer bit and the fractional bits is called the mantissa (or significand). Because most numbers can have their exponent adjusted so that there is a 1 in the integer bit (a process called normalizing), the 1 does not need to be stored, effectively allowing for an extra bit of precision. This bit is called a hidden bit. Numbers are represented as sign-magnitude, so that a negative number has the same mantissa as a positive number of the same magnitude, but with a sign bit of 1. A constant, called a bias, is added to the exponent so that all exponents are positive. The value 0.0, represented by a zero exponent and zero mantissa, can have a negative sign. Negative zeros have some subtle properties that will not be evident in most programs. A zero exponent with a nonzero mantissa is a "denormal." A denormal is a number whose magnitude is too small to be represented with an integer bit of 1 and can have as few as one significant bit. Exponent fields of all ones (largest exponent) represent special numeric results. A mantissa of zero represents infinity (positive or negative); a nonzero mantissa represents a NAN (not-a-number). NANs, which occur as a result of invalid numeric operations, are not discussed further in this article. The IEEE Standard defines 32-bit and 64-bit floating-point representations. The 32-bit (singleprecision) format is, from high-order to low-order, a sign bit, an 8-bit exponent with a bias of 127, and 23 bits of mantissa. The 64-bit (double-precision) format is, a sign bit, an 11-bit exponent with a bias of 1023, and 52 bits of mantissa. With the hidden bit, normalized numbers have an effective precision of 24 and 53 bits, respectively. Single-precision format 31, 30-23, 22-0 S, Exponent, Significand Double-precision format 63, 62-52, 51-0 S, Exponent, Significand
6.2.1 Bibliography
American National Standards Institute (1978), "American National Standard, Programming Language FORTRAN", ANSI X3.9-1978, ISO 1539-1980 (E). 20
IEEE Computer Society (1985), "IEEE Standard for Binary Floating-Point Arithmetic", IEEE Std 7541985. Return to Lahey home page
21