|
|  |
 |
 |
Understanding Referential Integrity
TC057
by Dave Sander
Improving Referential Integrity
When database designers specify set linking fields, many people go the
simple route and use data items such as last and first name for linking data.
While this method has a certain simplicity and usually works, database
professionals often place a counter or foreign key in a separate character
field for linking their tables together in sets. Why do they use this extra
complication? By using this method they eliminate Referential Integrity
problems and provide their databases with significant simplifications and
performance enhancements.
Lets examine the problem. Suppose that you use last, first, and middle
names to provide for set linking between your tables. If you have a small
name list and few users, nothing may ever happen. You can assign Alpha Field
Rules to preserve Referential Integrity in Set links. This assures that when
a parent linking value is changed, the child linking value is changed as well
to match so that the linked row of data isn't orphaned by the changed value
linking it to the parent table. Name changes need to be permitted in real
life since people might marry, have name entry errors, or may change their
names, all of which require revising the linking values if name is a linking
value.
For multiple user systems and larger tables the simple linking method
begins to have problems. There is a definite performance hit to engaging
Referential Integrity, especially for networks with record locking. If you
change a name in the linking value in a one-to-many link, the computer has to
go and change all those records. The Referential Integrity process also
locks the child record to prevent other users from changing them. This
activity means more work is done on the network and other users are locked
out of all the child records when you change or read them. Any default field
rules and posting rules are locked out at the same time too.
The list of problems possible with the use of the Referential Integrity is
significant. Alpha will lock the parent and all the child records in a set
if you have Referential Integrity turned on and are in change or enter mode.
Using this process creates extra network traffic and a definite performance
drop. The performance problem is made worse if you are using modeless data
entry and have large numbers of users in the database or users who view the
same records frequently. These users may find themselves unable to update
records at times and Lookup() functions or Posting Field rules may fail to work on occasion. Alpha's Referential Integrity process does not work with the complex links produced from using multiple linking fields such as
Lastname+Firstname+MI link values. Yet these complex field combinations are often valuable for verifying individual record identities in the database and should be well controlled.
You can still work with a single concatenated link field in the following way:
Add a calculated field to the parent table, which will be a concatenated field holding results similar to Lastname+Firstname+MI. You can now link child records and assign Referential Integrity to this calculated field. Changes to the concatenated parent fields will be reflected in the calculated field value and, as a result, the Referential Integrity updates to the child links. Updates made to child linking field values won't be able to force changes to the parent linking field values, however, since this is a calculated field. This condition may or may not be an advantage. You should also place a field rule for a parent link field lookup in the child table linking field so users can avoid the data errors of hand entry for the child table linking value.
The advantage of using the link counter or a foreign key is that it is not
a data item and usually never requires revision. This separation from data
values allows users to revise data freely without concern for most set
Integrity problems. Record deletes can still create some Integrity conflicts
as described in the web links listed below.
Good link counters need a bit of
design work. They have to be able to count high enough to handle any
prospective quantity of data entries. Usually I allow a high enough count
for ten years of data plus any estimated growth. A six-character field
allows linking for 106 or 1,000,000 records if you use numbers and 266 or
308,915,776 records if you count using the letters of the alphabet. Alpha
Five has its own convenient incrementing process in field rules, but some
serious users have designed their own auto-increment code for even better
process control (see Issues 3, 4, and 5 for more info).
Other convenient counters that can be used would be any unique and
unchanging alphanumeric counter. Serial numbers are good. Record numbers can be used for one-time tasks if you have to number large amounts of
existing data, but are not good otherwise since deleting records would create
duplicate numbers. You can put record number values in a counter and
increment them after your counter field is running.
If people are using counters for regular data entry processes, it's a good
idea to add a couple extra characters and to select a random and unique
number. If you have 1000 employee ID's, have them use a five-digit counter
able to go to 999,999. The extra counting space will mean that if an incorrect data entry occurs, it will probably fall upon an unused number 999 time out of 1000. Since the computer can verify the existence of each number in another table
by field rules, you will have very few incorrect employee ID's getting into
the data. By comparison, this would not be the case if just 1000 numbers
were used as a bad number entry would fall on another active user ID.
How to avoid Referential Integrity in linking fields
You should begin by examining the current links in the set for various
problems. On the ONE side of the one-to-many relation, put your linking keys
in order and examine them for duplicate entries and resolve these to fit the
newest data. As a test, if a unique index of the linking holds fewer records
than the table does, you have duplicate linking keys present. Also look for
non-duplicate duplicates which would be items like misspelled personal or
corporate names which may be present in unique indexes. Other features in
the data, such as addresses, may often be examined to uncover these problems.
Next examine the MANY side of the relation. You can run an EXIST()
function in a query to find any records that don't have a matching link value
on the ONE side table. You may need to select or create an index to do this
query. A Set browse can also compare parent field values to those in the
child to check for missing values if you create a set with the child and
parent tables reversed.
Backup, Backup, Backup!
When you set off to make such changes to your sets and tables, it's a very
good idea to save the data directory in case you wreck your data by accident.
Saving a copy of the database directory to another hard drive, directory, or
floppy disk is fine; just be sure your floppies have fresh formats and are
known to be good and readable in another computer. To get the current Alpha
path to your database, run the command ?:A5.Get_Path() in the interactive window.
The next chore is to go to the linked tables and add the required counter
field. Go to the Control Panel, select Tables/Sets, and right click for Edit
Structure. Add the new fields of equal length to each table.
Next on the ONE side table, right click for Operations, select Define New
Operation and select Update Records from the Genie, and Finish. Have the
genie update the new field by record number using the expression:
right("00000000"+ltrim(str(recno(),8)),8)
The expression gives the values 00000001, 00000002, 00000003, and so on to update the link field with unique
values. Adjust the length 8 and the zeros string to suit your field length.
For the MANY side table, run a similar operation and change the
expression. The new expression will closely match the following example.
Update the new link field with:
Lookupc("L",Lname+Fname,"Cust_ID","Customer","Namex")
The linking key expression is LNAME+FNAME and CUST_ID is the new linking field in table
Customer. The function will return the new Cust_ID number you just created
in the ONE side table update and will populate the Many side with the correct
linking field values. Getting the Many update to work completely right may
take a couple of trials so you might want to save the operation for later
editing.
After the two updates, view the browse for the set to see if the updates
have been generally successful and correct. Create a new index in the parent
and child tables using the new field name. The index on the ONE side should
be unique to prevent duplicate values. On large tables you should use these
indexes to search for blank or absent values where the update process may
have failed. Queries run against the set can find rows of data where the new
linking fields in each table don't match correctly. Having a few of these
mismatches may be a good thing if some of the old linking values were
incorrect but are now corrected.
The tables are now ready to be re-linked using the new fields. From the
Control Panel right click on the particular Set and go to the Set editor.
Right-click on the child table and then select Edit Link; revise the parent
and child linking fields to use the new field names and change the
Referential Integrity value to None. Save these changes. After you see the Set is updated and working, you can go into the MANY side table and delete
the old linking fields which are no longer being used.
You will now have a set free of nearly all Referential Integrity
requirements. As an added bonus, the fields that required referential
integrity tend to be name fields which consume a lot of space. You might
replace 35 to 50 characters in name-based links with a six-character ID
number. This size reduction makes both the tables and indexes physically
smaller and this tends to enhance performance in the database.
It is a good idea to protect your child linking fields by having the child
field rules check for the existence of a valid Parent link value with the
Exist() function before accepting the child link value. This step requires users to enter a parent record before attempting any children entries as an extra safety feature. Since the correct linking value is critical, add any other field rules for length and value to reject data entry errors. Another valuable control is to create a unique index on the original linking fields so you can warn users when they try to enter the same name or household address a second time. Well considered field rules improve the quality of your data and reduce time spent on maintaining your data.
You can find extended examples of Referential Integrity operations at
Improving Referential Integrity by Peter Wayne which presents detailed
controls of deleted records. Peter also wrote "Write your own auto-increment
rule" and "Generate Unique Random Numbers" at
Autoincrement
David Sander is a database designer and systems analyst consultant who has
developed applications using Alpha databases for eight years. He supports small and medium corporations in the retail services, timber, logging, and publication industries. A former mill equipment designer, he can be contacted by e-mail as A5Dave@aol.com or voice at (412) 835-0782.
[PRINTER FRIENDLY VERSION]
|
|
|