informatique -solutions series: January 2011

Thursday, January 27, 2011

SQL Rally abstracts submitted

I've submitted two abstracts for the SQLRally.

If you're interested in learning PowerShell by example or want to hear how I automated our login provisioning and compliance reporting. Vote for my sessions.

It's a win-win situation in the PowerShell track for attendees no matter who you vote for.

Another great bunch of abstracts in the DBA track.

Get out and vote.

Saturday, January 22, 2011

Maintaining One Code Base with Possibly Conflicting Custom Features

Today's essay deals with the tricky issue of custom features
for individual customers who are running instances of your
software.

The question comes by way of a regular reader who prefers to
remain anonymous, but asks this:

... I work on a large (to me, anyway) application that serves as a client database, ticket system, time-tracking, billing, asset-tracking system. We have some customers using their own instances of the software. Often, those customers want additional fields put in different places (e.g., a priority column on tickets). This results in having multiple branches to account for versions with slight changes in code and in the database. This makes things painful and time-consuming in the long run: applying commits from master to the other branches requires testing on every branch; same with database migrate scripts, which frequently have to be modified.

Is there an easier way? I have thought about the possibility of making things "optional" in the database, such as a column on a table, and hiding its existence in the code when it's not "enabled." This would have the benefit of a single code set and a single database schema, but I think it might lead to more dependence on the code and less on the database -- for example, it might mean constraints and keys couldn't be used in certain cases.

Restating the Question

Our reader asks, is it better to have different code branches
or to try to keep a lot of potentially conflicting and optional
items mixed in together?

Well, the wisdom of the ages is to maintain a single code branch,
including the database schema. I tried exactly once, very early
in my career, to fork my own code, and gave up almost within days.
When I went to work in larger shops I always arrived in a situation
where the decision had already been made to maintain a single
branch. Funny thing, since most programmers cannot agree on the
color of the sky when they're staring out the window, this is
the only decision I have ever seen maintained with absolute
unanimity no matter how many difficulties came out of it.

There is some simple arithmetic as to why this is so. If you have
single feature for a customer that is giving you a headache, and
you fork the code, you now have to update both code branches for
every change plus regression test them both, including the feature
that caused the headache. But if you keep them combined you only
have the one headache feature to deal with. That's why people
keep them together.

Two Steps

Making custom features work smoothly is a two-step process.
The first step is arguably more difficult than the second,
but the second step is absolutely crucial if you have
business logic tied to the feature.

Most programmers when confronted with this situation
will attempt to make various features optional. I
consider this to be a mistake because it complicates
code, especially when we get to step 2. By far the
better solution is to make features ignorable
by anybody who does not want them.

The wonderful thing about ingorable features is
they tend to eliminate the problems with apparently
conflicting features. If you can rig the features
so anybody can use either or both, you've eliminated
the conflict.

Step 1: The Schema

As mentioned above, the first step is arguably more
difficult than the second, because it may involve
casting requirements differently than they are
presented.

For example,
our reader asks about a priority column on tickets,
asked for by only one customer. This may seem like
a conflict because nobody else wants it, but we
can dissolve the conflict when we make the feature
ignorable. The first step involves doing this at
the database or schema level.

But first we should mention that the UI is easy,
we might have a control panel
where we can make fields invisible. Or maybe our
users just ignore the fields they are not interested
in. Either way works.

The problem is in the database.
If the values for priority come
from a lookup table, which they should,
then we have a foreign key, and
we have a problem if we try to ignore it:

We can allow nulls in the foreign key, which is
fine for the people ignoring it, but
This means the people who require it can end
up with tickets that have no priority because it does
not prevent a user from leaving it blank.

A simple answer here is to pre-populate your priority
lookup table with a value of "Not applicable", perhaps
with a hardcoded id of zero. Then we set the default
value for the TICKET.priority to zero. This means people
can safely ignore it because it will always be valid.

Then, for the customer who paid for it, we just go in
after the install and delete the default entry. It's
a one-time operation, not even worth writing a script
for, and it forces them to create a set of priorities
before using the system. Further, by leaving the
default of zero in there, it forces valid answers
because users will be dinged with an FK violation if
they do not provide a real priority.

For this particular example, there is no step 2, because
the problem is completely solved at the schema level.
To see how to work with step 2, I will make up an
example of my own.

Step 2: Unconditional Business Logic

To illustrate step 2, I'm going to make up an
example that is not really appropriate to our
reader's question, frankly because I cannot think
of one for that situation.

Let's say we have an eCommerce system, and one
of our sites wants customer-level discounts based
on customer groups, while another wants discounts
based on volume of order -- the more you buy, the
deeper the discount. At this point most programmers
start shouting in the meeting, "We'll make them
optional!" Big mistake, because it makes for lots
of work. Instead we will make them ignorable.

Step 1 is to make ignorable features in the schema.
Our common code base contains a table of customer
groups with a discount percent, and in the customers
table we make a nullable foreign key to the customer
groups table. If anybody wants to use it, great, and
if they want to ignore it, that's also fine. We do
the same thing with a table of discount amounts,
we make an empty table that lists threshhold amounts
and discount percents. If anybody wants to use it
they fill it in, everybody else leaves it blank.

Now for the business logic, the calculations of
these two discounts. The crucial idea here is
not to make up conditional logic that tries to
figure out whether or not to apply the discounts.
It is vastly easier to always apply both
discounts, with the discounts coming out zero for
those users who have ignored the features.

So for the customer discount, if the customer's
entry for customer group is null, it will not match
to any discount, and you treat this as zero.
Same for the sale amount discount, the lookup to
see which sale amount they qualify doesn't find
anything because the table is empty, so it treats
it as zero.

So the real trick at the business logic level is
not to figure out which feature to use, which leads
to complicatec conditionals that always end up
conflicting with each other, but to always use
all features and code them so they have no effect
when they are being ignored.

Conclusion

Once upon a time almost everybody coding for a living
dealt with these situations -- we all wrote code that
was going to ship off to live at our customer's site.
Nowadays this is less common, but for those of us
dealing with it it is a big deal.

The wisdom of the ages is to maintain a common code
base. The method suggested here takes that idea
to its most complete implementation, a totally common
code base in which all features are active all of
the time, with no conditionals or optional features
(except perhaps in the UI and on printed reports),
and with schema and business logic set up so that
features that are being ignored simply have no
effect on the user.

Friday, January 21, 2011

Opinion: SQL Rally Abstract Process

I'm really enjoying the process that the SQL Rally is using to select speakers.

I've submitted abstracts to several SQL Saturdays and the PASS Summit but this has been the most interesting process so far. What I really liked about the SQL Rally submission process:

Blind submission

Two abstract limit

Community votes for the final slate of speakers.

The blind submission process eliminates being concerned about what others will do. Submit what you have, if it's worthy, it will be selected. If not, move on.

The two abstract limit levels the playing field for less experienced speakers who haven't compiled as much content but have gained enough speaking experience at the local level to move up to the regional level. I think this limit also has produced better abstracts. I think the candidates spent more time sharpening their abstracts because they only had two chances to get a slot.

Finally, letting the PASS community vote was a great idea but stretching the voting out over a month is brilliant from a marketing perspective. The organizers have developed a very community focused method of creating maximum interest in this event. Well done!

I'd like to see all or parts of this process used for future PASS events.

Tuesday, January 18, 2011

NEVER CHANGE RESOLUTION BEFORE A PRESENTATION!

I had the honor of speaking at SQL Saturday #62 this past weekend in Tampa.

It's only the second time I've presented using my Windows 7 laptop and the first time I presented using the Presenter View in Powerpoint 2010. We still run Windows XP at work so I don't use Windows 7 on a daily basis. I arrived early to setup for my presentation, connected the projector and booted the laptop. The laptop auto set the resolution to 800x600. I didn't like the setting so I increased the resolution to a higher level and started up the presentation and went into Presenter View. Cue foreboding music...

By this time, the first attendees started rolling in. I looked at the first few slides to check everything was OK and then went back to the first slide in the deck. I started my presentation, got to the fourth slide on the deck and the picture was missing! The next slide was blank! WTF? The slides looked good in the ribbon at the bottom of Presenter View but weren't rendering correctly on the screen or in the Presenter View's left pane.

Needless to say, this really threw me off. Thankfully, one of the attendees suggested I switch to the editor view and present from that. So, I muddled through the rest of the presentation and my demos. Despite my PowerPoint woes, I did receive decent reviews via Twitter but I was not happy. I had to figure out what went wrong. I kept thinking and talking about it with other people for the next couple of hours.

Finally, I thought changing the resolution might have been the crux of the issue. I went back in the room during the lunch break and plugged my laptop into the projector. This time, I did not touch the resolution. The deck displayed correctly. Head slap!

So, the sole purpose of this post is to save some other speaker newbie from repeating this mistake.

Thanks to @BrentO for his post on Presenter View and @mrdenny and @brianwmitchell for their input while I was trying to figure this out.

Friday, January 7, 2011

Can You Really Create A Business Logic Layer?

The past three posts of this little mini-series
have gone from a >Working definition of business logic
to a >Rigorous definition of business logic
and on to some >theorems about business logic.
To wrap things up, I'd like to ask the question,
is it possible to isolate business logic into
a single tier?

Starting At the Top: The User Interface

First, let's review that our rigorous definition of business logic
includes schema (types and constraints),
derived values (timestamps, userstamps, calculations,
histories), non-algorithmic compound operations
(like batch billing) and algorithmic compound
operations, those that require looping in their
code. This encompasses everything we might do
from the simplest passive things like a constraint
that prevents discounts from being over 100% to
the most complex hours-long business process,
along with everything in between accounted for.

Now I want to start out by using that definition
to see a little bit about what is going on in
the User Interface. This is not the presentation
layer as it is often called but the interaction
layer and even the command layer.

Consider an admin interface to
a database, where the user is entering or modifying
prices for the price list. Now, if the user could
enter "Kim Stanley Robinson" as the price, that would be
kind of silly, so of course the numeric inputs
only allow numeric values. Same goes for dates.

So the foundation of usability for a UI
is at very least
knowlege of and enforcement of types in
the UI layer. Don't be scared off that I am
claiming the UI is enforcing anything, we'll
get to that a little lower down.

Now consider the case where the user is
typing in a discount rate for this or that,
and a discount is not allowed to be over 100%.
The UI really ought to enforce this,
otherwise the user's time is wasted when she
enters an invalid value, finishes the entire form,
and only then gets an error when she tries to
save. In the database world we call this
a constraint, so the UI needs to know about
constraints to better serve the user.

Now this same user is typing a form where there
is an entry for US State. The allowed values are
in a table in the database, and it would be nice
if the user had a drop-down list, and one that
was auto-suggesting as the user typed. Of course
the easiest way to do something like this is just
make sure the UI form "knows" that this field is
a foreign key to the STATES table, so it can generate
the list using some generic library function that
grabs a couple of columns out of the STATES
table. Of course, this kind of lookup thing will
be happening all over the place, so it would work
well if the UI knew about and enforced foreign
keys during entry.

And I suppose the user might at some point be
entering a purchase order. The purchase order is
automatically stamped with today's date. The
user might see it, but not be able to change it,
so now our UI knows about system-generated values.

Is this user allowed to delete a customer?
If not, the button should either be grayed out or not
be there at all. The UI needs to know about
and enforce some security.

More About Knowing and Enforcing

So in fact the UI layer not only knows the logic
but is enforcing it. It is enforcing it for
two reasons, to improve the user experience with
date pickers, lists, and so forth, and to prevent the user
from entering invalid data and wasting round trips.

And yet, because we cannot trust what comes in
to the web server over the wire, we have to
enforce every single rule a second time when
we commit the data.

You usually do not hear people say that the UI
enforces business logic. They usually say the
opposite. But the UI does enforce business logic.
The problem is, everything the UI enforces has
to be enforced again. That may be why we often
overlook the fact that it is doing so.

The Application and The Database

Now let's go through the stuff the UI is
enforcing, and see
what happens in the application and the database.

With respect to type, a strongly typed language
will throw an error if the type is wrong, and a weakly
typed language is wise to put in a type check anyway.
The the DBMS is going to only allow correctly typed
values, so, including the UI,
type is enforced three times.

With respect to lookups like US state, in
a SQL database we always let the server do that
with a foreign key, if we know what is good for
us. That makes double enforcement for lookups.

So we can see where this is going. As we look at
constraints and security and anything else that
must be right, we find it will be enforced at least
twice, and as much as three times.

You Cannot Isolate What Must be Duplicated

By defining First Order Business Logic, the simplest
foundation layer, as including things like types
and keys and constraints, we find that the enforcement
of this First Order stuff is done 2 or 3 times, but
never only once.

This more or less leaves in tatters the idea of a
"Business Logic Layer" that is in any way capable of
handling all business logic all by its lonesome.
The UI layer is completely useless unless it is
also enforcing as much logic as possible, and
even when we leave the Database Server as the
final enforcer of First Order Business Logic
(types, constraints, keys), it is still often good
engineering to do some checks to prevent
expensive wasted trips to the server.

So we are wasting time if we sit around trying to figure
out how to get the Business Logic
"where it belongs", because it "belongs" in at
least two places and sometimes three. Herding
the cats into a single pen is a fool's errand, it
is at once unnecessary, undesirable, and impossible.

Update: Regular reader Dean Thrasher of Infovark summarizes
most of what I'm saying here using an apt industry
standard term: Business Logic is a cross-cutting concern.

Some Real Questions

Only when we have smashed the concept that Business
Logic can exist in serene isolation in its own layer
can we start to ask the questions that would actually
speed up development and make for better engineering.

Freed of the illusion of a separate layer, when we
look at the higher Third and Fourth Order Business
Logic, which always require coding, we can decide where
they go based either on >engineering or the availability of qualified
programmers in particular technologies,
but we should not make
the mistake of believing they are going where they
go because the gods would have it so.

But the real pressing question if we are seeking
to create efficient manageable large systems is
this: how we distribute
the same business logic into 2 or 3 (or more)
different places so that it is enforced
consistently everywhere. Because a smaller code
base is always easier to manage than a large one,
and because configuration is always easier than
coding, this comes down to meta-data, or if you
prefer, a data dictionary. That's the trick that
always worked for me.

Is This Only A Matter of Definitions?

Anybody who disagrees with the thesis here has
only to say, "Ken, those things are not business
logic just because you wrote a blog that says they
are. In my world business logic is about code
baby!" Well sure, have it your way.
After all, the nice thing about definitions is that we
can all pick the ones we like.

But these definitions, the theorems I derived on
Tuesday, and the multiple-enforcement thesis presented
here today should make sense to anbyody struggling
with where to put the business logic. That struggle
and its frustrations come from the mistake of
imposing abstract
conceptual responsibilities on each tier instead
of using the tiers as each is able to get the
job done. Databases are wonderful for type,
entity integrity (uniqueness), referential integrity,
ACID compliance, and many other things. Use them!
Code is often better when the problem at hand cannot
be solved with a combination of keys and constraints
(Fourth Order Business Logic), but even that code can
be put into the DB or in the application.

So beware of paradigms that assign responsibility
without compromise to this or that tier. It cannot
be done. Don't be afraid to use code for doing things
that require structured imperative step-wise operations,
and don't be afraid to use the database for what it is
good for, and leave the arguments about "where everything
belongs" to those with too much time on their hands.

Tuesday, January 4, 2011

Theorems Regarding Business Logic

In yesterday's >Rigorous Definition of Business Logic, we saw that
business logic can be defined in four orders:

First Order Business Logic is entities and
attributes that users (or other agents) can save,
and the security rules that govern read/write
access to the entitites and attributes.
Second Order Business Logic is entities
and attributes derived by rules and formulas,
such as calculated values and history tables.
Third Order Business Logic are non-algorithmic
compound operations (no structure or looping is
required in expressing the solution), such as
a month-end batch billing or, for the old-timers
out there, a year-end general ledger
roll-up.
Fourth Order Business Logic are algorithmic
compound operations. These occur when the action
of one step affects the input to future steps.
One example is ERP Allocation.

A Case Study

The best way to see if these have any value is to
cook up some theorems and examine them with an
example. We will take
a vastly simplified time billing system, in which
employees enter time which is billed once/month to
customers. We'll work out some details a little below.

Theorem 1: 1st and 2nd Order, Analysis

The first theorem we can derive from these definitions
is that we should look at First and Second Order Schemas
together during analysis. This is because:

First Order Business Logic is about entities and atrributes
Second Order Business Logic is about entities and attributes
Second Order Business Logic is about values
generated from First Order values and, possibly,
other Second Order values
Therefore, Second Order values are always
expressed ultimately in terms of First Order
values
Therefore, they should be analyzed together

To give the devil his due, ORM does this easily, because
it ignores so much database theory (paying a large price
in performance for doing so) and
considers an entire row, with its first order and
second order values together, as being part of one class.
This is likely the foundation for the claims of ORM
users that they experience productivity gains when
using ORM. Since I usually do nothing but bash ORM,
I hope this statement will be taken as utterly sincere.

Going the other way, database theorists and evangelists
who adhere to full normalization can hobble an
analysis effort by refusing to consider
2nd order because those values denormalize the database,
so sometimes the worst of my own crowd will prevent
analysis by trying to keep these out of the conversation.
So, assuming I have not pissed off my own friends,
let's keep going.

So let's look at our case study of the time billing
system. By theorem 1, our analysis of entities and
attributes should include both 1st and 2nd order
schema, something like this:

 
 INVOICES
-----------
 invoiceid      2nd Order, a generated unique value
 date           2nd Order if always takes date of batch run
 customer       2nd Order, a consequence of this being an
                           aggregation of INVOICE_LINES
 total_amount   2nd Order, a sum from INVOICE_LINES
               
 INVOICE_LINES
---------------
 invoiceid      2nd order, copied from INVOICES
 customer         +-  All three are 2nd order, a consequence
 employee         |   of this being an aggregration of
 activity         +-  employee time entries
 rate           2nd order, taken from ACTIVITIES table
                           (not depicted)
 hours          2nd order, summed from time entries
 amount         2nd order, rate * hours
 
 TIME_ENTRIES
--------------
 employeeid     2nd order, assuming system forces this
                    value to be the employee making
                    the entry
 date           1st order, entered by employee
 customer       1st order, entered by employee
 activity       1st order, entered by employee
 hours          1st order, entered by employee

Now, considering how much of that is 2nd order, which
is almost all of it, the theorem is not only supported
by the definition, but ought to line up squarely
with our experience. Who would want to try to analyze
this and claim that all the 2nd order stuff should
not be there?

Theorem 2: 1st and 2nd Order, Implementation

The second theorem we can derive from these definitions
is that First and Second Order Business logic require
separate implementation techniques. This is because:

First Order Business Logic is about user-supplied values
Second Order Business Logic is about generated values
Therefore, unlike things cannot be implemented with
like tools.

Going back to the time entry example, let's zoom in on
the lowest table, the TIME_ENTRIES. The employee
entering her time must supply customer, date, activity, and
hours, while the system forces the value of employeeid.
This means that customer and activity must be validated
in their respective tables, and hours must be checked
for something like <= 24. But for employeeid the
system provides the value out of its context.
So the two kinds of values are processed in very
unlike ways. It seems reasonable that our code would
be simpler if it did not try to force both kinds of
values down the same validation pipe.

Theorem 3: 2nd and 3rd Order, Conservation of Action

This theorem states that
the sum of Second and Third Order
Business Logic is fixed:

Second Order Business Logic is about generating
entities and attributes by rules or formulas
Third Order Business Logic is coded
compound creation of entities and attributes
Given that a particular set of requirements
resolves to a finite set of actions that generate
entities and values, then
The sum of Second Order and Third Order Business
Logic is fixed.

In plain English, this means that the more Business
Logic you can implement through 2nd Order
declarative rules and formulas, the fewer
processing routines you have to code. Or, if you
prefer, the more processes you code, the fewer
declarative rules about entitities and
attributes you will have.

This theorem may be hard to compare to experience
for verification
because most of us are so used to thinking in
terms of the batch billing as a process that we cannot imagine it
being implemented any other way: how exactly am I
suppose to implement batch billing declaratively?.

Let's go back to the schema above, where we can
realize upon examination that the entirety of the batch
billing "process" has been detailed in a 2nd Order
Schema, if we could somehow add these facts to our
CREATE TABLE commands the way we add keys, types,
and constraints, batch billing would occur
without the batch part.

Consider this. Imagine that a user enters a
a TIME_ENTRY. The system
checks for a matching EMPLOYEE/CUSTOMER/ACTIVITY
row in INVOICE_DETAIL, and when it finds the row
it updates the totals. But if it does not find
one then it creates one! Creation
of the INVOICE_DETAIL record causes the system to
check for the existence of an invoice for that
customer, and when it does not find one it creates
it and initializes the totals. Subsequent time entries
not only update the INVOICE_DETAIL rows but the
INVOICE rows as well. If this were happening, there would be no
batch billing at the end of the month because the
invoices would all be sitting there ready to go
when the last time entry was made.

By the way, I coded something that does this in a
pretty straight-forward way a few years ago, meaning
you could skip the batch billing process and add a few
details to a schema that would cause the database to
behave exactly as described above. Although the
the format for specifying these extra features
was easy enough (so it seemed to me as the author),
it seemed the conceptual shift of thinking
that it required of people was far larger than I
initially and naively imagined. Nevertheless,
I toil forward, and that is
the core idea behind my >Triangulum project.

Observation: There Will Be Code

This is not so much a theorem as an observation.
This observation is that if your application
requires Fourth Order Business Logic then somebody
is going to code something somewhere.

An anonymous reader pointed out in the comments
to >Part 2 that Oracle's MODEL clause may work
in some cases. I would assume so, but I would also
assume that reality can create complicated Fourth
Order cases faster than SQL can evolve. Maybe.

But anyway, the real observation here is is that
no modern language, either app
level or SQL flavor, can express an algorithm
declaratively. In other words, no combination
of keys, constraints, calculations and derivations,
and no known combination of advanced SQL functions
and clauses
will express an ERP Allocation routine or a
Magazine Regulation routine. So you have to code it.
This may not always be true, but I think it is
true now.

This is in contrast to the example given in the
previous section about the fixed total of
2nd and 3rd Order Logic. Unlike that example,
you cannot provide enough
2nd order wizardry to eliminate fourth order.
(well ok maybe you can,
but I haven't figured it
out yet myself and have never heard that anybody
else is even trying. The trick would be to have
a table that you truncate and insert a single row
into, a trigger would fire that would know how
to generate the
next INSERT, generating a cascade. Of course, since
this happens in a transaction, if you end up
generating 100,000 inserts this might be a bad
idea ha ha.)

Theorem 5: Second Order Tools Reduce Code

This theorem rests on the acceptance of an observation,
that using meta-data repositories, or data dictionaries,
is easier than coding. If that does not hold true,
then this theorem does not hold true. But if that
observation (my own observation, admittedly) does
hold true, then:

By Theorem 3, the sum of 2nd and 3rd order
logic is fixed
By observation, using meta-data that manages
schema requires less time than coding,
By Theorem 1, 2nd order is analyzed and specified
as schema
Then it is desirable to specify as much business
logic as possible as 2nd order schema, reducing
and possibly eliminating manual coding of Third
Order programs.

Again we go back to the batch billing example.
Is it possible to convert it all to 2nd Order as
described above. Well yes it is, because I've done
it. The trick is an extremely counter-intuitive
modification to a foreign key that causes a
failure to actually generate the parent row that
would let the key succeed. To find out more about
this, check out >Triangulum (not ready for prime time as of this
writing).

Conclusions

The major conclusion in all of this is that anlaysis
and design should begin with First and Second Order
Business Logic, which means working out schemas, both
the user-supplied values and the system-supplied
values.

When that is done, what we often call "processes"
are layered on top of this.

Tomorrow we will see part 4 of 4, examining the
business logic layer, asking, is it possible to
create a pure business logic layer that gathers
all business logic unto itself?

SQL PASS Summit Helium Talks

I've started listening to the PASS Summit DVD sessions during my commute. By accident, I discovered that my Sansa Clip+ mp3 player has a playback speed option while listening to the Lightning Talks.

Buck Woody, Brent Ozar, Adam Machanic are great speakers but the fun begins when you change the playback speed option to FAST. It really sounds like Helium has been introduced into the room. It's really entertaining, you can still understand the speakers and you get through the content quicker.

One final note, Michelle Ufford must have been talking REALLY fast.

Monday, January 3, 2011

SQL Saturday #62

I'll be speaking at SQL Saturday #62 in Tampa, FL on January 15, 2011.

I'm presenting a session called "Why Learn PowerShell" because there was a request for a beginning PowerShell session. Tons of speaker submissions for this SQL Saturday so the organizers gave most speakers one slot to give the maximum amount of people the chance to speak. They are still trying to secure additional space to expand the schedule.

If you have any interest in learning PowerShell, I encourage you attend this session.

I'll explain why it's worth learning and provide REAL WORLD examples that I use regularly.

To further convince you that SQL Saturday is worth your time consider the following:

The variety of topics is pretty amazing for a FREE event.

Local, regional, and national speakers will present.

Several 2010 PASS Summit presenters will speak.

The Microsoft Scripting Guy and The Scripting Wife will be attending

A local Microsoft Certified Master will be presenting on the Parallel Data Warehouse.

The best lunch of any SQL Saturday will be served.

I've also registered for the Day of Data session "Virtualization and Storage for the DBA" by Denny Cherry being held the day before at the Italian Club in Ybor City.

I encourage all IT Pros to attend this event.

Sunday, January 2, 2011

Business Logic: From Working Definition to Rigorous Definition

This is part 2 of a 4 part mini-series that began
before the holidays with >A Working Definition Business Logic. Today we proceed
to a rigorous definition, tomorrow we will see "http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html">some theorems,
and the series will wrap up with a post on the "business layer."

In the first post, the working definition said that
business logic includes at least:

The Schema
Calculations
Processes

None of these was very rigorously defined, kind of a
"I'll know it when I see it" type of thing, and we did
not talk at all about security. Now the task becomes
tightening this up into a rigorous definition.

Definitions, Proofs, and Experience

What I propose below is a definition in four parts.
As definitions, they are not supposed
to prove anything, but they are definitely supposed
to ring true to the experience of any developer
who has created or worked on
a non-trivial business application. This effort
would be a success if we reach some concensus that
"at least it's all in there", even if we go
on to argue bitterly about which components
should be included in which layers.

Also, while I claim the definitions below are
rigorous, they are not yet formal. My
instinct is that formal definitions can be
developed using >First Order Logic, which would allow the
theorems we will see tomorrow to move from
"yeah that sounds about right" to being
formally provable.

As for their practical benefit, inasmuch as
"the truth shall make you free", we ought to be
able to improve our architectures if we can settle
at very least what we are talking about
when we use the vague term "business logic."

The Whole Picture

What we commonly call "business logic", by
which we vaguely mean, "That stuff I have
to code up",
can in fact be rigorously defined
as having four parts, which I believe are
best termed orders, as there is a definite
precedence to their discovery, analysis and implementation.

First Order: Schema
Second Order: Derivations
Third Order: Non-algorithmic compound operations
Fourth Order: Algorithmic compound operations

Now we examine each order in detail.

A Word About Schema and NoSQL

Even "schema-less" databases have a schema, they
simply do not enforce it in the database server.
Consider: an eCommerce site using MongoDB is not
going to be tracking the local zoo's animal
feeding schedule, because that is out of scope.
No, the code
is limited to dealing with orders, order lines,
customers, items and stuff like that.

It is in the very act of expressing scope as
"the data values we will handle" that a schema is
developed. This holds true regardless of whether
the datastore will be a filesystem, an RDBMS, a
new NoSQL database, or anything else.

Because all applications have a schema, whether the
database server enforces it or whether the
application enforces it, we need a vocabulary
to discuss the schema. Here we have an embarrasment
of choices, we can talk about entities and attributes,
classes and properties, documents and values, or
columns and tables. The choice of "entities and
attributes" is likely best because it is as close as
possible to an implementation-agnostic language.

First Order Business Logic: Schema

We can define schema, including security, as:

that body of entities and
their attributes whose relationships and
values will be managed by the
application stack, including the authorization of
roles to read or write to entities and properties.

Schema in this definition does not include derived
values of any kind or the processes that may operate
on the schema values, those are higher order of
business logic. This means that the schema
actually defines the entire body of values that
the application will accept from outside sources
(users and other programs) and commit to the
datastore. Restating again into even more
practical terms, the schema is the stuff users
can save themselves.

With all of that said, let's enumerate the properties
of a schema.

Type is required for every attribute.

Constraints are limits to the values allowed
for an attribute beyond its type. We may have a
discount percent that may not exceed 1.0 or 100%.

Entity Integrity is usually thought of
in terms of primary keys
and the vague statement "you can't have duplicates."
We cannot have a list of US States where "NY" is
listed 4 times.

Referential Integrity means that when one
entity links or refers to another entity, it must
always refer to an existing entity.
We cannot have some script kiddie flooding our
site with sales of
items "EAT_ME" and "F***_YOU", becuase those are
not valid items.

The general term 'validation' is not included
because any particular validation rule is
is a combination of any or all of type, constraints,
and integrity rules.

Second Orders Business Logic: Derived values

When we speak of derived values, we usually
mean calculated values, but some derivations
are not arithmetic, so the more general term
"derived" is better. Derivations are:

A complete entity or an attribute
of an entity generated from other entities
or attributes according to a formula or rule.

The definition is sufficiently general that
a "formula or rule" can include conditional
logic.

Simple arithmetic derived values include things
like calculating price * qty, or summing an
order total.

Simple non-arithmetic derivations include
things like
fetching the price of an item to use on an
order line. The price in the order is defined
as being a copy of the item's price at the
time of purchase.

An example of a complete entity being derived
is a history table that tracks changes
in some other table.
This can also be implemented
in NoSQL as a set of documents tracking the
changes to some original document.

Security also applies to generated values
only insofar as who can see them. But security
is not an issue for writing these values
because by definition they are generated from
formulas and rules, and so no outside user
can ever attempt to explicitly specify the
value of a derived entity or property.

One final point about Second Order Business
Logic is that it can be expressed declaratively,
if we have the tools, which we do not, at
least not in common use. I wrote one myself some
years ago and am re-releasing it as "http://code.google.com/p/triangulum-db/"
>Triangulum, but that is a post for another day.

Sorting out First and Second Order

The definitions of First and Second Order Business Logic
have the
advantage of being agnostic to what kind of
datastore you are using, and being agnostic
to whether or not the derived values are
materialized. (In relational terms, derivations
are almost always denormalizing if
materialized, so in a fully normalized database
they will not be there, and you have to go through
the application to get them.)

Nevertheless, these two definitions can right off
bring some confusion to the term "schema."
Example: a history table is absolutely in a database schema,
but I have called First Order Business Logic "schema" and
Second Order Business Logic is, well, something else.
The best solution here is to simply use the
terms First Order Schema and Second Order Schema.
An order_lines table is First Order schema, and
the table holding its history is Second Order Schema.

The now ubiquitous auto-incremented surrogate primary
keys pose another stumbling block. Because they are
used so often (and so often because of seriously faulty
reasoning, see >A Sane Approach To Choosing Primary Keys) they
would automatically be considered schema -- one of the
very basic values of a sales order, check, etc. But
they are system-generated so they must be Second Order, no?
Isn't the orderid a very basic part of the schema and
therefore First Order? No. In fact, by these
definitions, very little if any of an order header
is First Order, the tiny fragments that are first order
might be the shipping address, the user's choice of
shipping method, and payment details provided by the
user. The other information that is system-generated,
like Date, OrderId, and order total are all Second
Order.

Third Order Business Logic

Before defining Third Order Business Logic
I would like to offer a simple example:
Batch Billing. A consulting
company bills by the hour. Employees enter time
tickets throughout the day. At the end of the
month the billing agent runs a program that, in
SQL terms:

Inserts a row into INVOICES for each
customer with any time entries
Inserts a row into INVOICE_LINES that
aggregates the time for each employee/customer
combination.

This example ought to make clear what I mean by
definining Third Order Business Logic as:

A Non algorithmic compound
operation.

The "non-algorithmic" part comes from the fact that
none of the individual documents, an INVOICE
row and its INVOICE_LINES, is dependent on any other.
There is no case in which the
invoice for one customer will influence the value
of the invoice for another. You do not need an
algorithm to do the job, just one or more steps
that may have to go in a certain order.

Put another way, it is a one-pass set-oriented
operation. The fact that it must be executed in
two steps is an artifact of how database
servers deal with referential integrity, which is
that you need the headers before you can put in
the detail. In fact,
when using a NoSQL database, it may be possible to
insert the complete set of documents in one
command, since the lines can be nested directly
into the invoices.

Put yet a third way, in more practical terms,
there is no conditional or looping logic required
to specify the operation. This does not
mean there will be no looping logic in the final
implementation, because performance concerns and
locking concerns may cause it to be implemented
with 'chunking' or other strategies, but the
important point is that the specification
does not include loops or step-wise operations
because the individual invoices are all
functionally independent of each other.

I do not want to get side-tracked here, but I
have had a working hypothesis in my mind for
almost 7 years that Third Order Business Logic,
even before I called it that, is an artifact,
which appears necessary because of the limitations
of our tools. In future posts I would like to
show how a fully developed understanding and
implementation of Second Order Business Logic
can dissolve many cases of Third Order.

Fourth Order Business Logic

We now come to the upper bound of complexity
for business logic, Fourth Order, which
we label "algorithmic compound operations",
and define a particular Fourth Order Business
Logic process as:

Any operation where it
is possible or certain that
there will be at least
two steps, X and Y, such that the result
of Step X modifies the inputs available to
Step Y.

In comparison to Third Order:

In Third Order the results are
independent of one another, in Fourth Order
they are not.
In Third Order no conditional or branching
is required to express the solution, while in
Fourth Order conditional, looping, or branching
logic will be present in the expression of the
solution.

Let's look at the example of ERP Allocation.
In the interest of brevity, I am going to skip most
of the explanation of the ERP Allocation algorithm
and stick to this basic review: a company has a list
of sales orders (demand) and a list of purchase
orders (supply). Sales orders come in through EDI,
and at least once/day the purchasing department
must match supply to demand to find out what they
need to order. Here is an unrealistically simple
example of the supply and demand they might be facing:


  *** DEMAND ***          *** SUPPLY ***

    DATE    | QTY           DATE    | QTY
------------+-----      ------------+----- 
  3/ 1/2011 |  5          3/ 1/2011 |  3
  3/15/2011 | 15          3/ 3/2011 |  6
  4/ 1/2011 | 10          3/15/2011 | 20
  4/ 3/2011 |  7

The desired output of the ERP Allocation
might look like this:


 *** DEMAND ***      *** SUPPLY ****
    DATE    | QTY |  DATE_IN   | QTY  | FINAL 
------------+-----+------------+------+-------
  3/ 1/2011 |  5  |  3/ 1/2011 |  3   |  no
                  |  3/ 3/2011 |  2   | Yes 
  3/15/2011 | 15  |  3/ 3/2011 |  4   |  no
                  |  3/15/2011 | 11   | Yes
  4/ 1/2011 | 10  |  3/15/2011 |  9   |  no
  4/ 3/2011 |  7  |    null    | null |  no

From this the purchasing agents know that the
Sales Order that ships on 3/1 will be two days
late, and the Sales Orders that will ship on
4/1 and 4/3 cannot be filled completely. They
have to order more stuff.

Now for the killer question: Can the desired
output be generated in a single SQL query?
The answer is no, not even with Common
Table Expressions or other recursive constructs.
The reason is that each match-up of a purchase
order to a sales order modifies the supply
available to the next sales order. Or,
to use the definition of Fourth Order Business
Logic, each iteration will consume some supply
and so will affect the inputs available to
the next step.

We can see this most clearly if we look at some
pseudo-code:


for each sales order by date {
   while sales order demand not met {
      get earliest purchase order w/qty avial > 0
         break if none
      make entry in matching table
      // This is the write operation that 
      // means we have Fourth Order Business Logic
      reduce available qty of purchase order
   }
   break if no more purchase orders
}

Conclusions

As stated in the beginning, it is my belief
that these four orders should "ring true" with
any developer who has experience with non-trivial
business applications. Though we may dispute
terminology and argue over edge cases, the
recognition and naming of the Four Orders should
be of immediate benefit during analysis, design,
coding, and refactoring. They rigorously
establish both the minimum and maximum bounds of
complexity while also filling in the two kinds of
actions we all take between those bounds.
They are datamodel agnostic,
and even agnostic to implementation strategies
within data models (like the normalize/denormalize
debate in relational).

But their true power is in providing a framework
of thought for the process of synthesizing
requirements into a specification and from there
an implementation.

Tomorrow we will see some theorems that we can
derive from these definitions.

Thursday, January 27, 2011

Saturday, January 22, 2011

Restating the Question

Two Steps

Step 1: The Schema

Step 2: Unconditional Business Logic

Conclusion

Friday, January 21, 2011

Tuesday, January 18, 2011

Friday, January 7, 2011

Related Reading

Starting At the Top: The User Interface

More About Knowing and Enforcing

The Application and The Database

You Cannot Isolate What Must be Duplicated

Some Real Questions

Is This Only A Matter of Definitions?

Tuesday, January 4, 2011

A Case Study

Theorem 1: 1st and 2nd Order, Analysis

Theorem 2: 1st and 2nd Order, Implementation

Theorem 3: 2nd and 3rd Order, Conservation of Action

Observation: There Will Be Code

Theorem 5: Second Order Tools Reduce Code

Conclusions

Monday, January 3, 2011

Sunday, January 2, 2011

Similar Reading

Definitions, Proofs, and Experience

The Whole Picture

A Word About Schema and NoSQL

First Order Business Logic: Schema

Second Orders Business Logic: Derived values

Sorting out First and Second Order

Third Order Business Logic

Fourth Order Business Logic

Conclusions