I just want to remind my basic knowledge about database design. I live and work in custom solution environment. In order to deliver a good solution to my client, I think it is good to relearn again what I've been studied before. Now, to make sure everything is right, I summarize an article taken from Microsoft.
Database Design Basics
A properly designed database provides you with access to up-to-date, accurate information. Because a correct design is essential to achieving your goals in working with a database, investing the time required to learn the principles of good design makes sense. In the end, you are much more likely to end up with a database that meets your needs and can easily accommodate change.
In a simple database, you might have only one table. For most databases you will need more than one. Each row is also called a record, and each column, is also called a field.
- A record is a meaningful and consistent way to combine information about something.
- A field is a single item of information — an item type that appears in every record.
#What is good database design?
Certain principles guide the database design process.
- The first principle is that duplicate information (also called redundant data) is bad, because it wastes space and increases the likelihood of errors and inconsistencies.
- The second principle is that the correctness and completeness of information is important. If your database contains incorrect information, any reports that pull information from the database will also contain incorrect information. As a result, any decisions you make that are based on those reports will then be misinformed.
A good database design is, therefore, one that:
- Divides your information into subject-based tables to reduce redundant data.
- Provides Access with the information it requires to join the information in the tables together as needed.
- Helps support and ensure the accuracy and integrity of your information.
- Accommodates your data processing and reporting needs.
#The design process
The design process consists of the following steps:
Determining the purpose of your database
It is a good idea to write down the purpose of the database on paper — its purpose, how you expect to use it, and who will use it. The idea is to have a well developed mission statement that can be referred to throughout the design process. Having such a statement helps you focus on your goals when you make decisions.
Finding and organizing the required information
To find and organize the information required, start with your existing information and forms. If you don't have any existing forms, imagine instead that you have to design a form to record the information. What information would you put on the form? What fill-in boxes would you create? Identify and list each of these items.
As you prepare this list, don’t worry about getting it perfect at first. Instead, list each item that comes to mind. If someone else will be using the database, ask for their ideas, too. You can fine-tune the list later.
Next, consider the types of reports or mailings you might want to produce from the database. Design the report in your mind, and imagine what it would look like. What information would you place on the report? List each item. Do the same for the form letter and for any other report you anticipate creating.
Giving thought to the reports and mailings you might want to create helps you identify items you will need in your database.A key point to remember is that you should break each piece of information into its smallest useful parts. In general, if you want to sort, search, calculate, or report based on an item of information, you should put that item in its own field.
Dividing the information into tables
To divide the information into tables, choose the major entities, or subjects. When you design your database, always try to record each fact just once. If you find yourself repeating the same information in more than one place, place that information in a separate table. Once you have chosen the subject that is represented by a table, columns in that table should store facts only about the subject.
Turning information items into columns
To determine the columns in a table, decide what information you need to track about the subject recorded in the table. Once you have determined the initial set of columns for each table, you can further refine the columns. If you want to perform a search, filter or sort operation by a column, you need the information stored in a separate column.
The following list shows a few tips for determining your columns.
- Don’t include calculated data. In most cases, you should not store the result of calculations in tables. Instead, you can have the Applications perform the calculations when you want to see the result.
- Store information in its smallest logical parts. You may be tempted to have a single field for full names, or for product names along with product descriptions. If you combine more than one kind of information in a field, it is difficult to retrieve individual facts later. Try to break down information into logical parts; for example, create separate fields for first and last name, or for product name, category, and description.
Specifying primary keys
Each table should include a column or set of columns that uniquely identifies each row stored in the table. This is often a unique identification number, such as an employee ID number or a serial number. In database terminology, this information is called the primary key of the table. A primary key must always have a value. If a column's value can become unassigned or unknown (a missing value) at some point, it can't be used as a component in a primary key.
You should always choose a primary key whose value will not change. In a database that uses more than one table, a table’s primary key can be used as a reference in other tables. If the primary key changes, the change must also be applied everywhere the key is referenced. Using a primary key that will not change reduces the chance that the primary key might become out of sync with other tables that reference it.
Often, an arbitrary unique number is used as the primary key. For example, you might assign each order a unique order number. The order number's only purpose is to identify an order. Once assigned, it never changes.
If you don’t have in mind a column or set of columns that might make a good primary key, consider using a column that has the AutoNumber data type. When you use the AutoNumber data type, System automatically assigns a value for you. Such an identifier is factless; it contains no factual information describing the row that it represents. Factless identifiers are ideal for use as a primary key because they do not change.
Now that you have divided your information into tables, you need a way to bring the information together again in meaningful ways. In a relational database, you divide your information into separate, subject-based tables. You then use table relationships to bring the information together as needed.
- Creating a one-to-many relationship
To represent a one-to-many relationship in your database design, take the primary key on the "one" side of the relationship and add it as an additional column or columns to the table on the "many" side of the relationship.
- Creating a many-to-many relationship
Note that to detect many-to-many relationships between your tables, it is important that you consider both sides of the relationship.
The answer of many-to-many relationship is to create a third table, often called a junction table, that breaks down the many-to-many relationship into two one-to-many relationships. You insert the primary key from each of the two tables into the third table. As a result, the third table records each occurrence or instance of the relationship.
- Creating a one-to-one relationship
When you detect the need for a one-to-one relationship in your database, consider whether you can put the information from the two tables together in one table. If you don’t want to do that for some reason, perhaps because it would result in a lot of empty space, the following list shows how you would represent the relationship in your design:
- If the two tables have the same subject, you can probably set up the relationship by using the same primary key in both tables.
- If the two tables have different subjects with different primary keys, choose one of the tables (either one) and insert its primary key in the other table as a foreign key.
Determining the relationships between tables helps you ensure that you have the right tables and columns. When a one-to-one or one-to-many relationship exists, the tables involved need to share a common column or columns. When a many-to-many relationship exists, a third table is needed to represent the relationship.
#Refining the design
Once you have the tables, fields, and relationships you need, you should create and populate your tables with sample data and try working with the information: creating queries, adding new records, and so on. Doing this helps highlight potential problems — for example, you might need to add a column that you forgot to insert during your design phase, or you may have a table that you should split into two tables to remove duplication.
See if you can use the database to get the answers you want. Create rough drafts of your forms and reports and see if they show the data you expect. Look for unnecessary duplication of data and, when you find any, alter your design to eliminate it.
As you try out your initial database, you will probably discover room for improvement. Here are a few things to check for:
- Did you forget any columns? If so, does the information belong in the existing tables? If it is information about something else, you may need to create another table. Create a column for every information item you need to track. If the information can’t be calculated from other columns, it is likely that you will need a new column for it.
- Are any columns unnecessary because they can be calculated from existing fields? If an information item can be calculated from other existing columns — a discounted price calculated from the retail price, for example — it is usually better to do just that, and avoid creating new column.
- Are you repeatedly entering duplicate information in one of your tables? If so, you probably need to divide the table into two tables that have a one-to-many relationship.
- Do you have tables with many fields, a limited number of records, and many empty fields in individual records? If so, think about redesigning the table so it has fewer fields and more records.
- Has each information item been broken into its smallest useful parts? If you need to report, sort, search, or calculate on an item of information, put that item in its own column.
- Does each column contain a fact about the table's subject? If a column does not contain information about the table's subject, it belongs in a different table.
- Are all relationships between tables represented, either by common fields or by a third table? One-to-one and one-to- many relationships require common columns. Many-to-many relationships require a third table.
#Applying the normalization rules
You can apply the data normalization rules (sometimes just called normalization rules) as the next step in your design. You use these rules to see if your tables are structured correctly. The process of applying the rules to your database design is called normalizing the database, or just normalization.
Normalization is most useful after you have represented all of the information items and have arrived at a preliminary design. The idea is to help you ensure that you have divided your information items into the appropriate tables. What normalization cannot do is ensure that you have all the correct data items to begin with.
You apply the rules in succession, at each step ensuring that your design arrives at one of what is known as the "normal forms." Five normal forms are widely accepted — the first normal form through the fifth normal form. This article expands on the first three, because they are all that is required for the majority of database designs.
1) First normal form
First normal form states that at every row and column intersection in the table there, exists a single value, and never a list of values. For example, you cannot have a field named Price in which you place more than one Price. If you think of each intersection of rows and columns as a cell, each cell can hold only one value.
2) Second normal form
Second normal form requires that each non-key column be fully dependent on the entire primary key, not on just part of the key. This rule applies when you have a primary key that consists of more than one column. For example, suppose you have a table containing the following columns, where Order ID and Product ID form the primary key:
Order ID (primary key)
Product ID (primary key)
Product Name
This design violates second normal form, because Product Name is dependent on Product ID, but not on Order ID, so it is not dependent on the entire primary key. You must remove Product Name from the table. It belongs in a different table (Products).
3) Third normal form
Third normal form requires that not only every non-key column be dependent on the entire primary key, but that non-key columns be independent of each other.
Another way of saying this is that each non-key column must be dependent on the primary key and nothing but the primary key. For example, suppose you have a table containing the following columns:
ProductID (primary key)
Name
SRP
Discount
Assume that Discount depends on the suggested retail price (SRP). This table violates third normal form because a non-key column, Discount, depends on another non-key column, SRP. Column independence means that you should be able to change any non-key column without affecting any other column. If you change a value in the SRP field, the Discount would change accordingly, thus violating that rule. In this case Discount should be moved to another table that is keyed on SRP.
...
Please note that the article in this blog contain some modification.
...