informatique -solutions series: Business Logic: From Working Definition to Rigorous Definition

Sunday, January 2, 2011

Business Logic: From Working Definition to Rigorous Definition

This is part 2 of a 4 part mini-series that began
before the holidays with >A Working Definition Business Logic. Today we proceed
to a rigorous definition, tomorrow we will see "http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html">some theorems,
and the series will wrap up with a post on the "business layer."

In the first post, the working definition said that
business logic includes at least:

The Schema
Calculations
Processes

None of these was very rigorously defined, kind of a
"I'll know it when I see it" type of thing, and we did
not talk at all about security. Now the task becomes
tightening this up into a rigorous definition.

Definitions, Proofs, and Experience

What I propose below is a definition in four parts.
As definitions, they are not supposed
to prove anything, but they are definitely supposed
to ring true to the experience of any developer
who has created or worked on
a non-trivial business application. This effort
would be a success if we reach some concensus that
"at least it's all in there", even if we go
on to argue bitterly about which components
should be included in which layers.

Also, while I claim the definitions below are
rigorous, they are not yet formal. My
instinct is that formal definitions can be
developed using >First Order Logic, which would allow the
theorems we will see tomorrow to move from
"yeah that sounds about right" to being
formally provable.

As for their practical benefit, inasmuch as
"the truth shall make you free", we ought to be
able to improve our architectures if we can settle
at very least what we are talking about
when we use the vague term "business logic."

The Whole Picture

What we commonly call "business logic", by
which we vaguely mean, "That stuff I have
to code up",
can in fact be rigorously defined
as having four parts, which I believe are
best termed orders, as there is a definite
precedence to their discovery, analysis and implementation.

First Order: Schema
Second Order: Derivations
Third Order: Non-algorithmic compound operations
Fourth Order: Algorithmic compound operations

Now we examine each order in detail.

A Word About Schema and NoSQL

Even "schema-less" databases have a schema, they
simply do not enforce it in the database server.
Consider: an eCommerce site using MongoDB is not
going to be tracking the local zoo's animal
feeding schedule, because that is out of scope.
No, the code
is limited to dealing with orders, order lines,
customers, items and stuff like that.

It is in the very act of expressing scope as
"the data values we will handle" that a schema is
developed. This holds true regardless of whether
the datastore will be a filesystem, an RDBMS, a
new NoSQL database, or anything else.

Because all applications have a schema, whether the
database server enforces it or whether the
application enforces it, we need a vocabulary
to discuss the schema. Here we have an embarrasment
of choices, we can talk about entities and attributes,
classes and properties, documents and values, or
columns and tables. The choice of "entities and
attributes" is likely best because it is as close as
possible to an implementation-agnostic language.

First Order Business Logic: Schema

We can define schema, including security, as:

that body of entities and
their attributes whose relationships and
values will be managed by the
application stack, including the authorization of
roles to read or write to entities and properties.

Schema in this definition does not include derived
values of any kind or the processes that may operate
on the schema values, those are higher order of
business logic. This means that the schema
actually defines the entire body of values that
the application will accept from outside sources
(users and other programs) and commit to the
datastore. Restating again into even more
practical terms, the schema is the stuff users
can save themselves.

With all of that said, let's enumerate the properties
of a schema.

Type is required for every attribute.

Constraints are limits to the values allowed
for an attribute beyond its type. We may have a
discount percent that may not exceed 1.0 or 100%.

Entity Integrity is usually thought of
in terms of primary keys
and the vague statement "you can't have duplicates."
We cannot have a list of US States where "NY" is
listed 4 times.

Referential Integrity means that when one
entity links or refers to another entity, it must
always refer to an existing entity.
We cannot have some script kiddie flooding our
site with sales of
items "EAT_ME" and "F***_YOU", becuase those are
not valid items.

The general term 'validation' is not included
because any particular validation rule is
is a combination of any or all of type, constraints,
and integrity rules.

Second Orders Business Logic: Derived values

When we speak of derived values, we usually
mean calculated values, but some derivations
are not arithmetic, so the more general term
"derived" is better. Derivations are:

A complete entity or an attribute
of an entity generated from other entities
or attributes according to a formula or rule.

The definition is sufficiently general that
a "formula or rule" can include conditional
logic.

Simple arithmetic derived values include things
like calculating price * qty, or summing an
order total.

Simple non-arithmetic derivations include
things like
fetching the price of an item to use on an
order line. The price in the order is defined
as being a copy of the item's price at the
time of purchase.

An example of a complete entity being derived
is a history table that tracks changes
in some other table.
This can also be implemented
in NoSQL as a set of documents tracking the
changes to some original document.

Security also applies to generated values
only insofar as who can see them. But security
is not an issue for writing these values
because by definition they are generated from
formulas and rules, and so no outside user
can ever attempt to explicitly specify the
value of a derived entity or property.

One final point about Second Order Business
Logic is that it can be expressed declaratively,
if we have the tools, which we do not, at
least not in common use. I wrote one myself some
years ago and am re-releasing it as "http://code.google.com/p/triangulum-db/"
>Triangulum, but that is a post for another day.

Sorting out First and Second Order

The definitions of First and Second Order Business Logic
have the
advantage of being agnostic to what kind of
datastore you are using, and being agnostic
to whether or not the derived values are
materialized. (In relational terms, derivations
are almost always denormalizing if
materialized, so in a fully normalized database
they will not be there, and you have to go through
the application to get them.)

Nevertheless, these two definitions can right off
bring some confusion to the term "schema."
Example: a history table is absolutely in a database schema,
but I have called First Order Business Logic "schema" and
Second Order Business Logic is, well, something else.
The best solution here is to simply use the
terms First Order Schema and Second Order Schema.
An order_lines table is First Order schema, and
the table holding its history is Second Order Schema.

The now ubiquitous auto-incremented surrogate primary
keys pose another stumbling block. Because they are
used so often (and so often because of seriously faulty
reasoning, see >A Sane Approach To Choosing Primary Keys) they
would automatically be considered schema -- one of the
very basic values of a sales order, check, etc. But
they are system-generated so they must be Second Order, no?
Isn't the orderid a very basic part of the schema and
therefore First Order? No. In fact, by these
definitions, very little if any of an order header
is First Order, the tiny fragments that are first order
might be the shipping address, the user's choice of
shipping method, and payment details provided by the
user. The other information that is system-generated,
like Date, OrderId, and order total are all Second
Order.

Third Order Business Logic

Before defining Third Order Business Logic
I would like to offer a simple example:
Batch Billing. A consulting
company bills by the hour. Employees enter time
tickets throughout the day. At the end of the
month the billing agent runs a program that, in
SQL terms:

Inserts a row into INVOICES for each
customer with any time entries
Inserts a row into INVOICE_LINES that
aggregates the time for each employee/customer
combination.

This example ought to make clear what I mean by
definining Third Order Business Logic as:

A Non algorithmic compound
operation.

The "non-algorithmic" part comes from the fact that
none of the individual documents, an INVOICE
row and its INVOICE_LINES, is dependent on any other.
There is no case in which the
invoice for one customer will influence the value
of the invoice for another. You do not need an
algorithm to do the job, just one or more steps
that may have to go in a certain order.

Put another way, it is a one-pass set-oriented
operation. The fact that it must be executed in
two steps is an artifact of how database
servers deal with referential integrity, which is
that you need the headers before you can put in
the detail. In fact,
when using a NoSQL database, it may be possible to
insert the complete set of documents in one
command, since the lines can be nested directly
into the invoices.

Put yet a third way, in more practical terms,
there is no conditional or looping logic required
to specify the operation. This does not
mean there will be no looping logic in the final
implementation, because performance concerns and
locking concerns may cause it to be implemented
with 'chunking' or other strategies, but the
important point is that the specification
does not include loops or step-wise operations
because the individual invoices are all
functionally independent of each other.

I do not want to get side-tracked here, but I
have had a working hypothesis in my mind for
almost 7 years that Third Order Business Logic,
even before I called it that, is an artifact,
which appears necessary because of the limitations
of our tools. In future posts I would like to
show how a fully developed understanding and
implementation of Second Order Business Logic
can dissolve many cases of Third Order.

Fourth Order Business Logic

We now come to the upper bound of complexity
for business logic, Fourth Order, which
we label "algorithmic compound operations",
and define a particular Fourth Order Business
Logic process as:

Any operation where it
is possible or certain that
there will be at least
two steps, X and Y, such that the result
of Step X modifies the inputs available to
Step Y.

In comparison to Third Order:

In Third Order the results are
independent of one another, in Fourth Order
they are not.
In Third Order no conditional or branching
is required to express the solution, while in
Fourth Order conditional, looping, or branching
logic will be present in the expression of the
solution.

Let's look at the example of ERP Allocation.
In the interest of brevity, I am going to skip most
of the explanation of the ERP Allocation algorithm
and stick to this basic review: a company has a list
of sales orders (demand) and a list of purchase
orders (supply). Sales orders come in through EDI,
and at least once/day the purchasing department
must match supply to demand to find out what they
need to order. Here is an unrealistically simple
example of the supply and demand they might be facing:


  *** DEMAND ***          *** SUPPLY ***

    DATE    | QTY           DATE    | QTY
------------+-----      ------------+----- 
  3/ 1/2011 |  5          3/ 1/2011 |  3
  3/15/2011 | 15          3/ 3/2011 |  6
  4/ 1/2011 | 10          3/15/2011 | 20
  4/ 3/2011 |  7

The desired output of the ERP Allocation
might look like this:


 *** DEMAND ***      *** SUPPLY ****
    DATE    | QTY |  DATE_IN   | QTY  | FINAL 
------------+-----+------------+------+-------
  3/ 1/2011 |  5  |  3/ 1/2011 |  3   |  no
                  |  3/ 3/2011 |  2   | Yes 
  3/15/2011 | 15  |  3/ 3/2011 |  4   |  no
                  |  3/15/2011 | 11   | Yes
  4/ 1/2011 | 10  |  3/15/2011 |  9   |  no
  4/ 3/2011 |  7  |    null    | null |  no

From this the purchasing agents know that the
Sales Order that ships on 3/1 will be two days
late, and the Sales Orders that will ship on
4/1 and 4/3 cannot be filled completely. They
have to order more stuff.

Now for the killer question: Can the desired
output be generated in a single SQL query?
The answer is no, not even with Common
Table Expressions or other recursive constructs.
The reason is that each match-up of a purchase
order to a sales order modifies the supply
available to the next sales order. Or,
to use the definition of Fourth Order Business
Logic, each iteration will consume some supply
and so will affect the inputs available to
the next step.

We can see this most clearly if we look at some
pseudo-code:


for each sales order by date {
   while sales order demand not met {
      get earliest purchase order w/qty avial > 0
         break if none
      make entry in matching table
      // This is the write operation that 
      // means we have Fourth Order Business Logic
      reduce available qty of purchase order
   }
   break if no more purchase orders
}

Conclusions

As stated in the beginning, it is my belief
that these four orders should "ring true" with
any developer who has experience with non-trivial
business applications. Though we may dispute
terminology and argue over edge cases, the
recognition and naming of the Four Orders should
be of immediate benefit during analysis, design,
coding, and refactoring. They rigorously
establish both the minimum and maximum bounds of
complexity while also filling in the two kinds of
actions we all take between those bounds.
They are datamodel agnostic,
and even agnostic to implementation strategies
within data models (like the normalize/denormalize
debate in relational).

But their true power is in providing a framework
of thought for the process of synthesizing
requirements into a specification and from there
an implementation.

Tomorrow we will see some theorems that we can
derive from these definitions.

informatique -solutions series