informatique -solutions series: Theorems Regarding Business Logic

Tuesday, January 4, 2011

Theorems Regarding Business Logic

In yesterday's >Rigorous Definition of Business Logic, we saw that
business logic can be defined in four orders:

First Order Business Logic is entities and
attributes that users (or other agents) can save,
and the security rules that govern read/write
access to the entitites and attributes.
Second Order Business Logic is entities
and attributes derived by rules and formulas,
such as calculated values and history tables.
Third Order Business Logic are non-algorithmic
compound operations (no structure or looping is
required in expressing the solution), such as
a month-end batch billing or, for the old-timers
out there, a year-end general ledger
roll-up.
Fourth Order Business Logic are algorithmic
compound operations. These occur when the action
of one step affects the input to future steps.
One example is ERP Allocation.

A Case Study

The best way to see if these have any value is to
cook up some theorems and examine them with an
example. We will take
a vastly simplified time billing system, in which
employees enter time which is billed once/month to
customers. We'll work out some details a little below.

Theorem 1: 1st and 2nd Order, Analysis

The first theorem we can derive from these definitions
is that we should look at First and Second Order Schemas
together during analysis. This is because:

First Order Business Logic is about entities and atrributes
Second Order Business Logic is about entities and attributes
Second Order Business Logic is about values
generated from First Order values and, possibly,
other Second Order values
Therefore, Second Order values are always
expressed ultimately in terms of First Order
values
Therefore, they should be analyzed together

To give the devil his due, ORM does this easily, because
it ignores so much database theory (paying a large price
in performance for doing so) and
considers an entire row, with its first order and
second order values together, as being part of one class.
This is likely the foundation for the claims of ORM
users that they experience productivity gains when
using ORM. Since I usually do nothing but bash ORM,
I hope this statement will be taken as utterly sincere.

Going the other way, database theorists and evangelists
who adhere to full normalization can hobble an
analysis effort by refusing to consider
2nd order because those values denormalize the database,
so sometimes the worst of my own crowd will prevent
analysis by trying to keep these out of the conversation.
So, assuming I have not pissed off my own friends,
let's keep going.

So let's look at our case study of the time billing
system. By theorem 1, our analysis of entities and
attributes should include both 1st and 2nd order
schema, something like this:

 
 INVOICES
-----------
 invoiceid      2nd Order, a generated unique value
 date           2nd Order if always takes date of batch run
 customer       2nd Order, a consequence of this being an
                           aggregation of INVOICE_LINES
 total_amount   2nd Order, a sum from INVOICE_LINES
               
 INVOICE_LINES
---------------
 invoiceid      2nd order, copied from INVOICES
 customer         +-  All three are 2nd order, a consequence
 employee         |   of this being an aggregration of
 activity         +-  employee time entries
 rate           2nd order, taken from ACTIVITIES table
                           (not depicted)
 hours          2nd order, summed from time entries
 amount         2nd order, rate * hours
 
 TIME_ENTRIES
--------------
 employeeid     2nd order, assuming system forces this
                    value to be the employee making
                    the entry
 date           1st order, entered by employee
 customer       1st order, entered by employee
 activity       1st order, entered by employee
 hours          1st order, entered by employee

Now, considering how much of that is 2nd order, which
is almost all of it, the theorem is not only supported
by the definition, but ought to line up squarely
with our experience. Who would want to try to analyze
this and claim that all the 2nd order stuff should
not be there?

Theorem 2: 1st and 2nd Order, Implementation

The second theorem we can derive from these definitions
is that First and Second Order Business logic require
separate implementation techniques. This is because:

First Order Business Logic is about user-supplied values
Second Order Business Logic is about generated values
Therefore, unlike things cannot be implemented with
like tools.

Going back to the time entry example, let's zoom in on
the lowest table, the TIME_ENTRIES. The employee
entering her time must supply customer, date, activity, and
hours, while the system forces the value of employeeid.
This means that customer and activity must be validated
in their respective tables, and hours must be checked
for something like <= 24. But for employeeid the
system provides the value out of its context.
So the two kinds of values are processed in very
unlike ways. It seems reasonable that our code would
be simpler if it did not try to force both kinds of
values down the same validation pipe.

Theorem 3: 2nd and 3rd Order, Conservation of Action

This theorem states that
the sum of Second and Third Order
Business Logic is fixed:

Second Order Business Logic is about generating
entities and attributes by rules or formulas
Third Order Business Logic is coded
compound creation of entities and attributes
Given that a particular set of requirements
resolves to a finite set of actions that generate
entities and values, then
The sum of Second Order and Third Order Business
Logic is fixed.

In plain English, this means that the more Business
Logic you can implement through 2nd Order
declarative rules and formulas, the fewer
processing routines you have to code. Or, if you
prefer, the more processes you code, the fewer
declarative rules about entitities and
attributes you will have.

This theorem may be hard to compare to experience
for verification
because most of us are so used to thinking in
terms of the batch billing as a process that we cannot imagine it
being implemented any other way: how exactly am I
suppose to implement batch billing declaratively?.

Let's go back to the schema above, where we can
realize upon examination that the entirety of the batch
billing "process" has been detailed in a 2nd Order
Schema, if we could somehow add these facts to our
CREATE TABLE commands the way we add keys, types,
and constraints, batch billing would occur
without the batch part.

Consider this. Imagine that a user enters a
a TIME_ENTRY. The system
checks for a matching EMPLOYEE/CUSTOMER/ACTIVITY
row in INVOICE_DETAIL, and when it finds the row
it updates the totals. But if it does not find
one then it creates one! Creation
of the INVOICE_DETAIL record causes the system to
check for the existence of an invoice for that
customer, and when it does not find one it creates
it and initializes the totals. Subsequent time entries
not only update the INVOICE_DETAIL rows but the
INVOICE rows as well. If this were happening, there would be no
batch billing at the end of the month because the
invoices would all be sitting there ready to go
when the last time entry was made.

By the way, I coded something that does this in a
pretty straight-forward way a few years ago, meaning
you could skip the batch billing process and add a few
details to a schema that would cause the database to
behave exactly as described above. Although the
the format for specifying these extra features
was easy enough (so it seemed to me as the author),
it seemed the conceptual shift of thinking
that it required of people was far larger than I
initially and naively imagined. Nevertheless,
I toil forward, and that is
the core idea behind my >Triangulum project.

Observation: There Will Be Code

This is not so much a theorem as an observation.
This observation is that if your application
requires Fourth Order Business Logic then somebody
is going to code something somewhere.

An anonymous reader pointed out in the comments
to >Part 2 that Oracle's MODEL clause may work
in some cases. I would assume so, but I would also
assume that reality can create complicated Fourth
Order cases faster than SQL can evolve. Maybe.

But anyway, the real observation here is is that
no modern language, either app
level or SQL flavor, can express an algorithm
declaratively. In other words, no combination
of keys, constraints, calculations and derivations,
and no known combination of advanced SQL functions
and clauses
will express an ERP Allocation routine or a
Magazine Regulation routine. So you have to code it.
This may not always be true, but I think it is
true now.

This is in contrast to the example given in the
previous section about the fixed total of
2nd and 3rd Order Logic. Unlike that example,
you cannot provide enough
2nd order wizardry to eliminate fourth order.
(well ok maybe you can,
but I haven't figured it
out yet myself and have never heard that anybody
else is even trying. The trick would be to have
a table that you truncate and insert a single row
into, a trigger would fire that would know how
to generate the
next INSERT, generating a cascade. Of course, since
this happens in a transaction, if you end up
generating 100,000 inserts this might be a bad
idea ha ha.)

Theorem 5: Second Order Tools Reduce Code

This theorem rests on the acceptance of an observation,
that using meta-data repositories, or data dictionaries,
is easier than coding. If that does not hold true,
then this theorem does not hold true. But if that
observation (my own observation, admittedly) does
hold true, then:

By Theorem 3, the sum of 2nd and 3rd order
logic is fixed
By observation, using meta-data that manages
schema requires less time than coding,
By Theorem 1, 2nd order is analyzed and specified
as schema
Then it is desirable to specify as much business
logic as possible as 2nd order schema, reducing
and possibly eliminating manual coding of Third
Order programs.

Again we go back to the batch billing example.
Is it possible to convert it all to 2nd Order as
described above. Well yes it is, because I've done
it. The trick is an extremely counter-intuitive
modification to a foreign key that causes a
failure to actually generate the parent row that
would let the key succeed. To find out more about
this, check out >Triangulum (not ready for prime time as of this
writing).

Conclusions

The major conclusion in all of this is that anlaysis
and design should begin with First and Second Order
Business Logic, which means working out schemas, both
the user-supplied values and the system-supplied
values.

When that is done, what we often call "processes"
are layered on top of this.

Tomorrow we will see part 4 of 4, examining the
business logic layer, asking, is it possible to
create a pure business logic layer that gathers
all business logic unto itself?

informatique -solutions series