Thursday, December 16, 2010

Historical Perspective of ORM and Alternatives

A couple of years ago I broke my basic rule of sticking
to practical how-to and general programming philosophy
and wrote >Why I Do Not Use ORM. It sure got a lot of hits,
and is read every day
by people searching such things as "orm bad" or "why use orm".
But I have never been
satisfied with that post, and so I decided to take
another stab from another angle. There are legitimate
problems that led to ORM, and those problems need to
be looked at even if we cannot quite agree on what they
are or if ORM is the answer.



UPDATE: In response to comments below and on reddit.com,
I have a new post that gives a detailed
analysis of an algorithm implemented as a sproc, in app
code with embedded SQL, and in ORM.





Here then, is one man's short history of commercial
database application programming, from long before
the ORM system, right up to the present.



This blog has two tables of contents, the
Topical Table of Contents and the list
of
Database Skills.



The Way Back Machine



When I began my career the world was a different place.
No Web, no Java, and Object Orientation had not yet
entered the mainstream. My first
application was written on a timeshare system (a microVAX)
and writing LAN applications made me a good living for
awhile before I graduated to client/server.



In those days there were three things a programmer
(We were not "software engineers" yet, just
programmers) had to know. Every programmer I knew
wanted to master all of these skills. They were:



  • How to design a database schema for correctness
    and efficiency.
  • How to code an application that could process
    data from the database, correctly and efficiently.
  • How to make a good UI, which came down to
    hotkeys and stuffing the screen with as much info
    as possible.


In this essay we are going to look at those first two.



My own experience may be somewhat peculiar in that I
have never worked on a team where the programmers
were separated from the database. (OK, one exception, in
my current assignment there is an iron curtain between
the two, but happily it is not my problem from where
I sit).
Coders made tables, and "tablers" wrote
code. So this focus on being a good developer by
developing both skills may be rare, enjoyed by those who
have the same ecumenical background that I enjoyed.



Some Changes That Did Not Matter



Things changed rapidly, but most of those changes
did not really affect application development.



When Windows 95 came out, being "almost as good as
a Mac", we recoded our DOS apps into Windows apps
without too much trouble and life went on as before.



Laser printers replaced dot-matrix for most office use,
CPUs kept getting faster (and Windows kept getting
slower), each year there were more colors on the
screen, disks got bigger and RAM got cheaper.



Only the internet and the new stateless programming
required any real adjustment, but it was easy for a
database guy because good practice had always been to
keep your transactions as short as possible. The stateless
thing just kind of tuned that to a sharp edge.



Finally, with the internet, the RDBMS finally lost its
place as sole king of the datastore realm, but those new
datastores will have to wait for another day, lest we
get bogged down.



Enter Object Orientation



Arguably nothing changed programming more than
Object Orientation. Certainly not Windows 95, faster
graphics or any of those other Moore's Law consequences.
I would go so far as to say that even
the explosion of the web just produced more programming,
and of different kinds of apps, and even that did not
come close to the impact of Object Orientation.
Disagree if you like, but as it came in, it was
new, it was strange, it was beautiful, and we were
in love.



Now here is something you may not believe. The biggest
question for those of us already successfully developing
large applications was: What is it good for? What does
it give me that I do not already have? Sure its
beautiful, but what does it do?



User interfaces were for me the easiest first place to
see the benefits. When the widgets became classes and objects,
and we empolyed encapsulation, inheritance and
composition, the world
changed and I don't know anybody who ever looked back.



OOP, Data, and Data Structures



But in the matter of processing data, things were not
so clear cut. The biggest reason may have been that
all languages back then had specialized data structures
that were highly tuned to handling relational data.
These worked so well that nobody at first envisioned
anything like >ActiveRecord because
we just did not need it.



With these structures you could write applications
that ran processes involving dozens of tables, lasting
hours, and never wonder, "Gosh, how do I map this data
to my language of choice?" You chose the language you
were using precisely because it knew how to handle
data!


I would like to throw in just one example to show how
OOP was not relevant to getting work done back then.
I was once asked to optimize something called "ERP
Allocation" that ran once/day, but was taking 26 hours
at the largest customer site, obviously a big problem.
It turned out there was a call to the database inside of
a tightly nested loop, and when I moved the query outside
of the loop the results were dramatic. The programmers
got the idea and they took over from there. The main
point of course is that it was all about how to
efficiently use a database. The language was OOP, and
the code was in a class, but that had nothing to do
with the problem or the solution. Going further,
coding a process so data intensive as this one
using ActiveRecord
was prima facia absurd to anybody who knew about data
and code.



Java and the Languages of The Internet



But the web had another impact that was far
more important than just switching to stateless
programming. This was the introduction
of an entirely new family of languages that took
over the application space, listed here in no
particular order: Perl, PHP,
Python, Ruby, and the king of them all: Java.



All of these languages have one thing in common
that positively jumps out at a veteran: a complete
lack of data structures specialized for handling
relational data.

So as these languages exploded in popularity
with their dismal offerings in data handling,
the need to provide something better in that
area became rapidly clear.



Java has a special role to play because it was
pure OOP from the ground up. Even the whitespace
is an object! The impact of Java is very important
here because Object Orientation was now the One True
Faith, and languages with a more
flexible approach were gradually demoted
to mere 'scripting' languages. (
Of course proponents will quickly point out that 1/12 of the
world's population is now using a single application
written in one of those 'scripting' languages).



So the explosion of languages without decent
data handling abilities, coupled with a rise in
OOP-uber-alles thinking led us quite naturally to:



The First Premise of ORM: The Design Mismatch



The first premise of ORM is that there is a design
mismatch between OOP and Relational, which must resolved
before any meaningful work can be done.



This view is easy to sympathize with, even if you
disagree, when you consider the points raised in the
above sections, that the languages in play lack any real
specialized data structures, and that a certain
exclusive truthiness to OOP has arisen that is blind
to entire classes of solutions.



So we must grant the ORM crowd their first
premise, in modified form. It is not that there
is a design mismatch, it is that there is something
missing, something that was in older systems that
is just not there in the newer languages. Granting
that this missing feature is an actual mismatch
requires a belief in the Exclusive Truth of OOP,
which I do not grant. OOP is like the computer
itself, of which Commander Spock said, "Computers
make excellent servants, but I have no wish to be
servant to a computer."



But anyway, getting back to the story, the race
was on to replace what had been lost, and to do it
in an OOPy way.



The Second Premise of ORM: Persistence



Fast forward and we soon have an entire family
of tools known as Object-Relational-Mappers,
or ORM. With them came an old idea: persistence.



The idea has always been around that databases
exist to persist the work of the programmer.
I thought that myself when I was, oh, about 25 or
so. I learned fast that my view of reality was,
*cough*, lacking,
and that in fact there are two things
that are truly real for a developer:



  • The users, who create the paycheck, and
  • The data, which those users seemed to think
    was supposed to be correct 100% of the time.


From this perspective, the application code suddenly
becomes a go-between, the necessary appliance that
gets data from the db to the user (who creates the
paycheck), and takes instructions back from the user
and puts them in the database (correctly, thank you,
and don't make the user wait). No matter how
beautiful the code was, the user would only ever see
the screen (or page nowadays) and you only heard about
it if it was wrong. Nobody cares about my code, nobody
cares about yours.



However, in the ORM world the idea of a database as the
persistence layer now sits on a throne reserved for
axiomatic truth. Those who disagree with me on this
may say that I have the mistaken perspective of an outsider,
to which I could say only that it is this very idea that
keeps me an outsider.



But we should not paint the world with a broad brush.
Chris Wong writes an excellent blog where he occassionally
details how to respect the database while using Hibernate, in
this post
and this post.



An Alternative World View



There are plenty of alternatives to ORM, but I would
contend that they begin with a different world view.
Good business recognizes the infinite value of the
users as the generators of the Almighty Paycheck, and
the database as the permanent record of a job well
done.



This worldview forces us into a humble position with
respect to our own application code, which is that it
is little more than a waiter, carrying orders to the
kitchen and food back to the patrons. When we see it
this way, the goal becomes to write code that can
efficiently get data back and forth. A small handful
of library routines can trap SQL injection, validate
types, and ship data off to the database. Another
set can generate HTML, or, can simply pass JSON
data up to those nifty browser client libraries
like ExtJS (now
"Sencha" for some reason)
.



This covers a huge amount of what an application
does, if you do not have much in the way of
business logic.



But How Do You Handle Business Logic?



I have an entire essay on this about half-written,
but in short, it comes down to understanding what
business logic really is. Update: "http://database-programmer.blogspot.com/2011/01/business-logic-from-working-definition.html">This post is now available



The tables themselves are the bottom layer of
business logic. The table design itself implements
the foundation for all of the business rules.
This is why it is so important to get it right.
The tables are organized using normalization to
have a place for everything and everything in its
place, and after that the application code mostly
writes itself.



The application code then falls into two areas:
value-add and no value-add. There is no value-add
when the application simply ships data off to the
user or executes a user request to update the
database. Those kinds of things should be handled
with the lightest possible library that gets the
job done.



But the value-add stuff is different, where a
user's request requires lookups, possibly computations
and so forth. The problem here is that a naive
analysis of requirements (particulary the
transliteration error (Scroll down to "The
Customer Does Not Design Tables)

will tend to generate many cases of perceived need for
value-add where a simpler design can reduce these
cases to no value-add. But even when the database has
been simplified to pristine perfection, there are jobs
that require loops, multiple passes and so forth,
which must be made idempotent and robust, which
will always require some extra coding. But if you know
what you are doing, these always turn out to be the
ERP Allocation example given above: they are a lot more
about the data than the classes.



Another huge factor is where you come down on the
normalization debate, particularly on the inclusion of
derived values. If you keep derived values out of the database,
which is technically correct from a limited perspective,
then suddenly the value-add code is much more important
because without it your data is incomplete. If
you elect to put derived values into your database than
value-add code is only required when writing to the
database
, so huge abstractions meant to handle any
read/write situation are unnecessary. (And of course,
it is extremely important to
Keep denormalized values correct

).



And the Rest of It



This essay hardly covers the entirety of
making code and data work together. You still have
to synchronize schema changes to code, and I still
think a data dictionary is the best D-R-Y way to
do that.



I hope this essay shows something of why many programmers
are so down on ORM, but much more importantly that there
are coherent philosophies out there that begin with a
different worldview and deliver what we were all doing
before ORM and what we will all still be doing after
ORM: delivering data back and forth between user and
database.

No comments:

Post a Comment