24/7/17

Best practices for using Simple Lookup Tables

By Nick Harrison

Simple lookup tables are just one type of logic table that you may find useful. This article explores what simple lookup tables look like, discusses when they may be applicable, and steps through some of the details for proper implementations.

The One True Lookup Table – why not?

Lookup tables are often overlooked as a repository for business logic. Often they are viewed as nothing more than a simple location for pulling the values for a drop-down list. In many cases, they are given so little respect that the design falls into the trap of the “One True Lookup Table”—OTLT.

OTLT is an anti-pattern of database design. The basic justification for them goes something like this:

I have all these lookup values
Each lookup list is rather small
We are likely to have lots of different lookup types
I don’t want to have to change the data model every time I add a new lookup type
Rather than pollute the data model with lots of little tables, let’s put all of these lookup values in one table

On the surface, this seems perfectly reasonable and sounds like it should lower the total cost of maintenance since you don’t have to create a new table every time you add a new lookup type. We are, after all, always looking for ways to streamline maintenance. However, there are hidden costs to data quality.

In its simplest form, the OTLT might look like Figure 1:

blog_pathak_AI_cover

Figure 1. The basic structure of the One True Lookup Table

There are a couple of problems with this design.

Lookup tables need to be incorporated into your data model. You’ll want to incorporate them in a big way. By keeping each lookup category defined in a separate table, we can easily define all of the constraints implied by the lookup values. This will do wonders for data integrity, but would be very difficult in an OTLT.

Without proper constraints, there is nothing to prevent someone from specifying that the state is “North East” or that the occupancy for a mortgage is “Full Doc”. The constraints will have to be enforced somewhere or you will always have dirty data.

Another problem is the size of the columns. Because you don’t know what type of values will be stored ahead of time and you have to explicitly design them to store anything, the Value column is loosely typed and all of the fields are much bigger than they need to be. Some implementations may even resort to NVARCHAR(MAX) for the value or description. The flexibility designed into this solution means that you cannot put a reasonable maximum size for these fields. With such large sizes for these pivotal fields, you will eventually contain dirty data unless you add even more complexity to the already complex constraint checks on this table.

A Classic Lookup Table

Let’s consider a classic role for a lookup table. We know that in various places we will prompt for the state in an address. We don’t want the users to be able to type the state in directly. We want to force the users to select from a list of valid state options. Our first pass at a lookup table may look like the following (Figure 2):

blog_pathak_AI_cover

Figure 2. Example of a lookup table specifying the state name and abbreviation for the valid states

We would use the lookup table in our data model, as shown here (Figure 3):

blog_pathak_AI_cover

Figure 3. Using the lookup table with transaction data

With this data model, we can easily define the constraints to limit states to only the states that have been defined. In fact, we get those constraints without any work on our part beyond simply defining the foreign key relationship. This means that we will never have to worry about dirty data in this field.

Business Logic makes it Better

So far so good, but this doesn’t really have any business logic (beyond keeping track of which states you do business in). However, if you look through your source code, you might find business logic like that shown in this listing, with repeated switch statements which often may represent a design problem.

^{switch (state)}

^{case "AL":}

^{// Sales Tax for Alabama}

^break;

^{case "FL":}

^{// Sales Tax for Florida}

^break;

^{case "GA":}

^{// Sales Tax for Georgia}

^break;

^{case "KY":}

^{// Sales Tax for Kentucky}

^break;

^{case "NC":}

^{// Sales Tax for North Carolina}

^break;

A common best practice for eliminating switch statements like this is to replace them with an object hierarchy and let polymorphism work through the differences. Here you might have an object for each state and expose a property for the sales tax. Combine this with a factory to return the correct object and simply reference the SalesTax property. This object hierarchy is represented Figure 4

blog_pathak_AI_cover

Figure 4. Sample object hierarchy to replace the case statement

Another option is to incorporate this logic into the lookup table. You can easily extend the data model to include sales tax (Figure 5):

blog_pathak_AI_cover

Figure 5. Extending the lookup table to include limited business logic

Now you can replace the business logic in the switch statement by simply retrieving the SalesTax value from the corresponding record in the lookup table.

About the Author

Nick Harrison is a software architect and .NET advocate in Columbia, SC. Nick has over 20 years’ experience in software development, starting with Unix system programming and then progressing to the .NET platform.

Want more? This article is excerpted from Code Generation with Roslyn. Get your copy today and learn about other types of logic tables, and how wholesale code generation can speed up time to market and reduce turnaround time.