Ten Common Database Design Mistakes

Tulisan ini copas dari : http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/

sebagai materi belajar untuk keperluan sendiri. terimakasih untuk Louis Davidson
sebagai penulis
artikel ini.

No list of mistakes is ever going to be exhaustive. People (myself included) do a lot of really stupid things, at times, in the name of "getting it done." This list simply reflects the database design mistakes that are currently on my mind, or in some cases, constantly on my mind. I have done this topic two times before. If you're interested in hearing the podcast version, visit Greg Low's super-excellent SQL Down Under. I also presented a boiled down, ten-minute version at PASS for the Simple-Talk booth. Originally there were ten, then six, and today back to ten. And these aren't exactly the same ten that I started with; these are ten that stand out to me as of today.

Poor design/planning
Ignoring normalization
Poor naming standards
Lack of documentation
One table to hold all domain values
Using identity/guid columns as y
our only key
Not using SQL facilities to protect data integrity
Not using stored procedures to access data
Trying to build generic objects
Lack of testing
Poor design/planning

"If you don't know where you are going, any road will take you there" – George Harrison

Prophetic words for all parts of life and a description of the type of issues that plague many projects these days.
Let me ask you: would you hire a contractor to build a house and then demand that they start pouring a foundation the very next day? Even worse, would you demand that it be done without blueprints or house plans? Hopefully, you answered "no" to both of these. A design is needed make sure that the house you want gets built, and that the land you are building it on will not sink into some underground cavern. If you answered yes, I am not sure if anything I can say will help you.
Like a house, a good database is built with forethought, and with proper care and attention given to the needs of thedata that will inhabit it; it cannot be tossed together in some sort of reverse implosion.

Since the database is the cornerstone of pretty much every business project, if you don't take the time to map out the needs of the project and how the database is going to meet them, then the chances are that the whole project will veer of

f course and lose direction. Furthermore, if you don't take the time at the start to get the database design right, then you'll find that any substantial changesin the database structures that you need to make further down the line could have a huge impact on the whole project, and greatly increase the likelihood of the project timeline slipping.

Far too often, a proper planning phase is ignore

Before I start with the list, let me be honest for a minute. I used to have a preacher who made sure to tell us before some sermons that he was preaching to himself as much as he was to the congregation. When I speak, or when I write an article, I have to listen to that tiny little voice in my headthat helps filter out my own bad habits, to make sure that I am teaching only the best practices. Hopefully, after reading this article, the little voice in your head will talk to you when you start to stray from what is right in terms of database design practices.

So, the list:

d in favor of just "getting it done". The project heads off in a certain direction and when problems inevitably arise – due to the lack of pro

per designing and planning – there is "no time" to go back and fix them properly, using proper techniques. That's when the "hacking" starts, with the veiled promise to go back and fix things later, something that happens very rarely indeed.

Admittedly it is impossible to predict every need that your design will have to fulfill and every issue that is likely to arise, but it is important to mitigate against potential problems as much as possible, by careful planning.

Ignoring Normalization

Normalization defines a set of methods to break down tables to their constituent parts until each table represents one and only one "thing", and its columns serve to fully describe only the one "thing" that the table represents.

The concept of normalization has been around for 30 years and is the basis on which SQL and relational databases are imple

mented. In other words, SQL was created to work with normalized data structures. Normalization is not just some plot by database programmers to annoy application programmers (that is merely a satisfying side effect!)

SQL is very additive in nature in that, if you have bits and pieces of data, it is easy to build up a set of values or results. In the FROM clause, you take a set of data (a table) and add (JOIN) it to another table. You can add as many sets of data together as you like, to produce the final set you need.

This additive nature is extremely important, not only for ease of development, but also for performance. Indexes are most effective when they can work with the entire key value. Whenever you have to use SUBSTRING, CHARINDEX, LIKE, and so on, to parse out a value that is combined with other values in a single column (for example, to split the last name of a person out of a full name column) the SQL paradigm starts to break down and data becomes become less and less searchable.

So normalizing your data is essential to good performance, and ease of development, but the question always comes up: "How normalized is normalized enough?" If you have read any books about normalization, then you will have heard many times that 3rd Normal Form is essential, but 4th and 5th Normal Forms are really useful and, once you get a handle on them, quite easy to follow and wellworth the time required to implement them.

In reality, however, it is quite common that not even the first Normal Form is implemented correctly.

Whenever I see a table with repeating column names appended with numbers, I cringe in horror. And Icringe in horror quite often.

Consider the following example Customer table:

Are there always 12 payments? Is the order of payments significant? Does a NULL value for a payment mean UNKNOWN (not filled in yet), or a missed payment? And when was the payment made?!?

A payment does not describe aCustomer and should not be stored in the Customer table. Details of payments should be stored in a Payment table, in which you could also record extra information about the payment, like when the payment was made, and what the payment was for:

n this second design, each column stores a single unit of information about a single "thing" (a payment), and each row represents a specific instance of a payment.

This second design is going to require a bit more code early in the process but, it is far more likely that you will be able to figure ou

t what is going on in the system without having to hunt down the original programmer and kick their butt…sorry… figure out what they were thinking

Poor naming standards

"That which we call a rose, by any other name would smell as sweet"

This quote from Romeo and Juliet by William Shakespeare sounds nice, and it is tru

e from one angle. If everyone agreed that, from now on, a rose was going to be called dung, then we could get over it and it would smell just as sweet. The problem is that if, when building a database for a florist, the designer calls it dung and the client calls it a rose, then you are going to have some meetings that sound far more like an Abbott and Costello routine than a serious conversation about storing information about horticulture products.

Names, while a personal choice, are the first and most important line of documentation for your application. I will not get into all of the details of how best to name things here– it is a large and messy topic. What I want to stress in this article is the need for consistency. The names you choose are not just to enable you to identify the purpose of an object, but to allow all future programmers, users, and so on to quickly and easily understand how a component part of your database was intended to be used, and what data it stores.

No future user of your design should need to wade through a 500 page document to determine the meaning of some wacky name.

Consider, for example, a column named, X304_DSCR. What the heck does that mean? You might decide, after some head scratching, that it means "X304 description". Possibly it does, but maybe DSCR means discriminator, or discretizator?

Unless you have established DSCR as a corporate standard abbreviation for description, then X304_DESCRIPTION is a much better name, and one leaves nothing to the imagination.

That just leaves you to figure out what the X304 part of the name means. On first inspection, to me, X304 sounds like more like it should be

data in a column rather than a column name. If I subsequently found that, in the organization, there was also an X305 and X306 then I would flag that as an issue with the database design. For maximum flexibility, data is stored in columns, not in column names.

Along these same lines, resist the temptation to include "metadata" in an object's name. A name such as tblCustomer or colVarcharAddress might seem useful from a development perspective, but to the end user it is just confusing. As a developer, you should rely on being able to determine that a table name is a table name by context in the code or tool, and present to the users clear, simple, descriptive names, such as Customer and Address.

A practice I strongly advise against is the use

of spaces and quoted identifiers in object names. You should avoid column names such as "Part Number" or, in Microsoft style, [Part Number], therefore requiring you users to include these spaces and identifiers in their code. It is annoying and simply unnecessary.

Acceptable alternatives would be part_number, partNumber or PartNumber. Again, consistency is key. If you choose PartNumber then that's fine – as long as the column containing invoice numbers is called InvoiceNumber, and not one of the other possible variations.

Lack of documentation

I hinted in the intro that, in some cases, I am writing for myself as much as you. This is the topic where that is most true. By carefully nam

ing your objects, columns, and so on, you can make it clear to anyone what it is that your database is modeling. However, this is only step one in the documentation battle. The unfortunate reality is, though, that "step one" is all too often the only step.

Not only will a well-designed data model adhere to a solid naming standard, it will also contain definitions on its tables, columns, relationships, and even default and check constraints, so that it is clear to everyone how they are intended to be used. In many cases, you may want to include sample values, where the need arose for the object, and anything else that you may want to know in a year or two when "future you" has to go back and make changes to the code.

NOTE:
Where this documentation is stored is largely a m

atter of corporate standards and/or convenience to the developer and end users. It could be stored in the database itself, using extended properties. Alternatively, it might be in maintained in the data modeling tools. It could even be in a separate data store, such as Excel or another relational database. My company maintains a metadata repository database, which we developed in order to present this data to end users in a searchable, linkable format. Format and usability is important, but the primary battle is to have the information available and up to date.

Your goal should be to provide enough information that when you turn the database over to a support programmer, they can figure out your minor bugs and fix them (yes, we all make bugs in our code!). I know there is an old joke that poorly documented code is a synonym for "job security." While there is a hint of truth to this, it is also a way to be hated by your coworkers and never get a raise. And no good p

rogrammer I know of wants to go back and rework their own code years later. It is best if the bugs in the code can be managed by a junior support programmer while you create the next new thing. Job security along with raises is achieved by being the go-to person for new challenges.

One table to hold all domain values

"One Ring to rule them all and in the darkness bind them"

This is all well and good for fantasy lore, but it's not so good when applied to database design, in the form of a "ruling" domain table. Relational databases are based on the fundamental idea that every object represents one and only one th

ing. There should never be any doubt as to what a piece of data refers to. By tracing through the relationships, from column name, to table name, to primary key, it should be easy to examine the relationships and know exactly what a piece of data means.

The big myth perpetrated by architects who don't really understand relational database architecture (me included early in my career) is that the more tables there are, the more complex the design will be. So, conversely, shouldn't condensing multiple tables into a single "catch-all" table simplify the design? It does sound like a good idea, but at one time giving Pauly Shore the lead in a movie sounded like a good idea too.

For example, consider the following model snippet

where I needed domain values for:

Customer CreditStatus
Customer Type
Invoice Status
Invoice Line Item BackOrder Status
Invoice Line Item Ship Via Carrier

On the face of it that would be five domain tables…but why not just use one generic domain table, like this?

This may seem a very clean and natural way to design a table for all but the problem is that it is just not very natural to work with in SQL. Say we just want the domain values for the Customer table:

SELECT *
FROM Customer
JOIN GenericDomain as CustomerType
ON Customer.CustomerTypeId = CustomerType.GenericDomainId
and CustomerType.RelatedToTable = 'Customer'
and  CustomerType.RelatedToColumn = 'CustomerTypeId'
JOIN GenericDomain as CreditStatus
ON  Customer.CreditStatusId = CreditStatus.GenericDomainId
and CreditStatus.RelatedToTable = 'Customer'
and CreditStatus.RelatedToColumn = ' CreditStatusId'

As you can see, this is far from being a natural join. It comes down to the problem of mixing apples with oranges. At first glance, domain tables are just an abstract concept of a container that holds text. And from an implementation centric standpoint, this is quite true, but it is not the correct way to build a database. In a database, the process of normalization, as a means of breaking down and isolating data, takes every table to the point where one row represents one thing. And each domain of values is a distinctly different thing from all of the other domains (unless it is not, in which case the one table will suffice.). So what you do, in essence, is normalize the data on each usage, spreading the work out over time, rather than doing the task once and getting it over with.

So instead of the single table for all domains, you might model it as:

disini gambar

After Death Comes Life

Search This Blog