Skip to content

Data Modeling and Schema Design

Easy

Normalization: What is database normalization? Briefly explain the purpose of 1NF, 2NF, and 3NF.

  • Database normalization is the process of organizing data in a database to reduce redundancy (duplicate data) and improve data integrity (accuracy and consistency)

  • First Normal Form:

    • Purpose: Eliminate repeating groups and ensure atomicity.
    • Instead of storing multiple phone numbers in one column, create separate rows for each number.
  • Second Normal Form:

    • Purpose: Remove partial dependency.
    • A partial dependency happens when a non-key attribute depends on part of a composite primary key (not the whole key).
    • All non-key attributes must depend on the whole primary key, not just part of it.
    • If a table’s primary key is (StudentID, CourseID), then attributes like StudentName (which depends only on StudentID) should be moved to a separate table.
  • Third Normal Form:

    • Purpose: Remove transitive dependency.
    • A transitive dependency happens when a non-key attribute depends on another non-key attribute, instead of depending directly on the primary key.
    • Non-key attributes should depend only on the primary key, not on other non-key attributes.
    • If Students table has columns (StudentID, DepartmentID, DepartmentName), then DepartmentName depends on DepartmentID, not directly on StudentID. So, split Department details into a separate table.

Surrogate vs. Natural Key: What is the difference between a surrogate key and a natural key?

  • Natural Key

    • A natural key is a primary key that comes directly from the business data itself.
    • It’s an attribute (or set of attributes) that naturally identifies a row.
    • Examples:
      • Social Security Number (SSN) for a person.
      • ISBN for a book.
      • Email address for a user.
  • Surrogate Key

    • A surrogate key is an artificial key introduced by the system (usually an auto-increment integer or UUID).
    • It has no business meaning and only exists to uniquely identify a row.
    • Examples:
      • customer_id = 101 (auto-increment integer).
      • order_id = UUID.