Sunday, April 19, 2009

The Relational Model


If you look at any system that was born on and for the
internet, like Ruby on Rails, or the PHP language, you find
an immense wealth of resources on the internet itself, in
endless product web sites, blogs, and forums. But when
you look for the same comprehensive information on products
or ideas that matured before the web you find it is not there.
Relational databases stand out as a product family that matured
before the internet, and so their representation in cyberspace
is very different from the newer stuff.



The Math Stuff



You may have heard relational theorists argue that the
strength of relational databases comes from their solid
mathematical foundations. Perhaps you have wondered,
what does that mean? And why is that good?



To understand this, we have to begin with
>Edsger W. Dijkstra, a pioneer in the area of computer
science with many accomplishments to his name. Dijkstra
believed that the best way to develop a system or program
was to begin with a mathematical description of the system,
and then refine that system into a working program. When
the program completely implemented the math, you were
finished.



There is a really huge advantage to this approach. If you
start out with a mathematical theory of some sort, which
presumably has well known behaviors, then the working program
will have all of those behaviors and, put simply, everybody
will know what to expect of it.



This approach also reduces time wasted on creative efforts
to work out how the program should behave. All those
decisions collapse intot he simple drive to make the program
mimic the math.



A Particular Bit of Math



It so happens that there is a particular body of math
known as Relational Theory, which it seemd to
>E. F. Codd would be a very nice fit for storing
business information. In his landmark 1970 paper
>A Relational Model of Data for Large Shared Data Banks
(pdf)
he sets out to show how these mathematical
things called "relations" have behaviors that would be
ideal for storing business models.



If we take the Dijkstra philosophy seriously, which is to
build systems based on well-known mathematical theories,
and we take Codd's claim that "Relations" match well to
business record-keeping needs, the obvious conclusion is
that we should build some kind of "Relational" datastore,
and so we get the Relational Database systems of today.



So there in a nutshell is why relational theorists are
so certain of the virtues of the relational model, it's
behaviors are well-known, and if you can build something
that matches them, you will have a very predictable
system.



They are Still Talking About It



If you want to know more about the actual mathematics,
check out the "http://groups.google.com/group/comp.databases.theory/topics"
>comp.databases.theory
Usenet group, or check out
Wikipedia's articles on "http://en.wikipedia.org/wiki/Relational_algebra"
>Relational Algebra
and "http://en.wikipedia.org/wiki/Relational_calculus"
>Relational Calculus
.



A Practical Downside



The downside to all of this comes whenever the mathematical
model describes behaviors that are contrary to human goals
or simply irrelevant to them. Examples are not hard to
find.



When the web exploded in popularity, many programmers found
that their greatest data storage needs centered on documents
like web pages rather than collections of atomic values
like a customer's discount code or credit terms. They found
that relational databases were just not that good at storing
documents, which only stands to reason because they were never
intended to. In theory the model could be stretched,
(if the programmer stretched as well), but the programmers
could feel in their bones that the fit was not right, and they
began searching for something new.



Another example is that of calculated values. If you have
shopping cart, you probably have some field "TOTAL" somewhere
that stores the final amount due for the customer. It so
happens that such a thing violates relational theory, and there
are some very bright theorists who will refuse all requests
for assistance in getting that value to work, because you
have violated their theory. This is probably the most shameful
behavior that relational theorists exhibit - a complete
refusal to consider extending the model to better reflect
real world needs.



The Irony: There are No Relational Databases



The irony of it all is that when programmers set out to build
relational systems, they ran into quite a few practical
downsides and a sort of consensus was reached to break the
model and create the SQL-based databases we have today.
In a truly relational system a table would have quite
a few more rules on it than we have in our SQL/TABLE based
systems of today. But these rules must have seemed
impractical or too difficult to implement, and they were
scratched.



There is at least one product out there that claims to
be truly relational, that is "http://en.wikipedia.org/wiki/Dataphor">Dataphor.

The Weird Optional Part



Probably the grandest irony in the so-called relational
database management systems is that any programmer can
completely break the relational model by making bad
table designs. If your tables are not normalized, you
lose much of the benefits of the relational model,
and you are completely free to make lots of
non-normalized and de-normalized tables.



Conclusion



I have to admit I have always found the strength of
relational databases to be their simplicy and power,
and not so much their foundations (even if shaky) in
mathematical theory. A modern database is very good
at storing data in tabular form, and if you know how
to design the tables, you've got a great foundation for
a solid application. Going further, I've always found
relational theorists to be unhelpful in the extreme in
the edge cases where overall application needs are not
fully met by the underlying mathematical model. The
good news is that the products themselves have all of
the power we need, so I left the relational theorists
to their debates years ago.

No comments:

Post a Comment