File Handling With Little RAM


To meet the requirements of the Mastery of Aspects for Higher Level [IB Computer Science], direct manipulation of files must be carried out in such a way that the whole file does not need to be read into RAM.

It is important to know how to update files without using much RAM.

In the "old days" before the average nerd or even businesses could afford gigabytes of RAM for a personal computer, large quantities of data had to be stored, compared and processed using small amounts of RAM and relatively larger amounts of secondary storage, typically tape and later disk storage.

Commodore 8032

For example, the Commodore 8032 was a very popular SOHO (Small Office Home Office) computer in the early 1980's. The "80" stood for "80 columns, which was twice the number of columns of characters that most other personal computers of the day placed on the screen. The "32" represented the number of KB (KiloBytes) of RAM that came with the machine, which was about twice the memory that came with most other personal computers of the day.

File-handling techniques that used little RAM were essential processing a real-life database with 32 KB of RAM. Such techniques are still useful when the amount of data to be processed exceeds the available RAM capacity.

It happens that the International Baccalaureate requires that candidates in their higher level Computer Science program be able to search a data file and add, delete or change data in a single record without simultaneously storing copious numbers of records into RAM.

In February of 2000 IBO issued a circular on Clarifications of Internal Assessment Details. One page of clarifications concerned Direct Manipulation of Files.

At Higher Level, the Mastery of Aspects part of the Internal Assessment requires candidates to be able to manipulate files directly.

Candidates must attempt to show mastery of the following aspects:

To satisfy these requirements, candidates must carry out file manipulation in such a way that files do not need to be read into RAM for manipulation.

For example, the record to be deleted must be found directly by searching the file in its actual location (by, for example, using a binary or linear search, reading a record, or small block or records, at once). The record could then be directly deleted (if the programming language supports this), or marked for deletion by using a flag field, or by using a rogue value in the key field.

Provided the file size alters, or a record marked for deletion is actually overwrittren during addition of new records, this manipulation would meet the requirement.

However, it is not acceptable, for example, to delete a record from a file, to read the whole file into a linked list or tree, to delete the node, and then to write the data back to the file. (Howefver, this manipulation would satisfy the requirement of being able to delete a data item from a linked list or tree.) Similarly, when directly adding new records to a file, it is not accpetable to read the file into RAM, add the data, and rewrite the file.

Merging Sequential Files

File-handling with little RAM is a basic requirment of exercises 16.7, 16.8 and 16.9 in Deitel's JAVA HOW TO PROGRAM. Those exercises are reproduced below for your inspection. First, however, the preferred strategy for answering those questions is given in blue. This strategy is preferred over loading all records into RAM at one time for at least two reasons:

  1. You will learn and practice processing data with little RAM but relatively larger secondary storage. It defeats the purpose of the exercises to simply load all the records into a structure such as a vector, linked list or tree, then manipulate the data in RAM, then save all the data back to file after it has been processed. The point is to learn to process data through file-handling without copious amounts of RAM.

  2. The International Baccalaureate Higher Level Dossier program must process records without copying the entire file into RAM.

Here the basic idea is to read the first record of the old master file into RAM, then read all of the records from the transaction file into RAM one record at a time, comparing each and every transaction record one record at a time with that first record from the old master file, updating the single master file record when there is a match. After completing all comparisons between that first record from the old master file and all the records from the transaction file, save the updated single record to the new master file.

Now repeat the process with the second record from the old master file, comparing it with all records from the transaction file one record at a time until it too is updated. Then save the updated second record to the new master file.

Repeat this process until the last record in the old master file has been updated and saved (after being updated) to the new master file.

Then delete the old master file and rename the new master file to the name for the old master file.

Most file-handling in Java such as deleting, renaming, checking the existance of, and changing properties of a file, may be improved by using the Java File class defined in the java.io package. Java's standard File Class is introduced and covered in Chapter 87 - The File Class from Bradley Kjell's Introduction to Computer Science using Java.

16.7

Exercise 16.3 asked the reader to write a series of single statements. Actually, these statements form the core of an important type of file-processing program, namely, a file-matching program. In commercial data processing, it is common to have several files in each application system. In an accounts receivable system, for example, there is generally a master file containing detailed information about each customer, such as the customer's name, address, telephone number, outstanding balance, credit limit, discount terms, contract arrangements and possibly a condensed history of recent purchases and cash payments.

  1. As transactions occur (i.e., sales are made and cash payments arrive in the mail), they are entered into a file. At the end of each business period (i.e., a month for some companies, a week for others, and a day in some case the file of transactions (called "trans.dat" in Exercise 16.3) is applied to the master file (called "oldmast.dat" in Exercise 16.3), thus updating each account's record of purchases and payments. During an updating run, the master file is rewritten as a new file ("newmast.dat"), which is then used at the end of the next business period to begin the updating process again.

  2. File-matching programs must deal with certain problems that do not exist in single-file programs. For example, a match does not always occur. A customer on the master file might not have made any purchases or cash payments in the current business period; therefore, no record for this customer will appear on the transaction file. Similarly, a customer who did make some purchases or cash payments could have just moved to this community, and the company might not have had a chance to create a master record for this customer.

  3. Use the statements in Exercise 16.3 as a basis for writing a complete file-matching accounts receivable program. Use the account number on each file as the record key for matching purposes. Assume that each file is a sequential file with records stored in increasing account-number order.

  4. When a match occurs (i.e., records with the same account number appear on both the master file and the transaction file) add the dollar amount on the transaction file to the current balance on the master file, and write the "newmast.dat" record. (Assume that purchases are indicated by positive amounts on the transaction file, payments by negative amounts.) When there is a master record for a particular account but no corresponding transaction record, merely write the master record to "newmast.dat". When there is a transaction record but no corresponding master record, print the message "Unmatched transaction record for account number ..." (fill in the account number from the transaction record).

16.8

After writing the program of Exercise 16.7, write a simple program to create some test data for checking out the program. Use the sample account data in Fig. 16.23 and Fig. 16.24. Run the program of Exercise 16.7, using the files of test data created in this exercise. Print the new master file. Check that the accounts have been updated correctly.

16.9

It is possible (actually common) to have several transaction records with the same record key. This occurs because a particular customer might make several purchases and cash payments during a business period. Rewrite your accounts receivable file-matching program of Exercise 16.7 to provide for the possibility of handling several transaction records with the same record key. Modify the test data of Exercise 16.8 to include the additional transaction records in Fig. 16.25.

Master file
Account number

Name

Balance
100Alan Jones348.17
300Mary Smith27.19
500Sam Sharp0.00
700Sizu Green-14.22
Fig. 16.23 Sample data for master file.

Transaction file
Account number
Transaction
Amount
10027.14
30067.11
400100.56
90082.17
Fig. 16.24 Sample data for transaction file.

Account numberDollar Amount
30083.89
70080.78
7001.53
Fig. 16.25 Additional transaction records.