This is part 2 of a 4 part mini-series that began
   before the holidays with     >A Working Definition Business Logic.  Today we proceed
   to a rigorous definition, tomorrow we will see    "http://database-programmer.blogspot.com/2011/01/theorems-regarding-business-logic.html">some theorems,
   and the series will wrap up with a post on the "business layer."
In the first post, the working definition said that
   business logic includes at least:
- The Schema
 
- Calculations
 
- Processes
None of these was very rigorously defined, kind of a 
   "I'll know it when I see it" type of thing, and we did
   not talk at all about security.  Now the task becomes
   tightening this up into a rigorous definition.
Similar Reading
Toon Koppelaars has some excellent material along
   these same lines, and a good place to start is his
   Helsinki Declaration (IT Version).
   The articles have a different focus than this series,
   so they make great contrasting reading.  I consider
   my time spent reading through it very well spent.
Definitions, Proofs, and Experience
What I propose below is a definition in four parts.
   As definitions, they are not supposed
   to prove anything, but they are definitely supposed
   to ring true to the experience of any developer
   who has created or worked on
   a non-trivial business application.  This effort
   would be a success if we reach some concensus that
   "at least it's all in there", even if we go
   on to argue bitterly about which components
   should be included in which layers.
Also, while I claim the definitions below are
   rigorous, they are not yet formal.  My
   instinct is that formal definitions can be
   developed using    >First Order Logic, which would allow the
   theorems we will see tomorrow to move from
   "yeah that sounds about right" to being
   formally provable.
As for their practical benefit, inasmuch as
   "the truth shall make you free", we ought to be
   able to improve our architectures if we can settle
   at very least what we are talking about
   when we use the vague term "business logic."
The Whole Picture
What we commonly call "business logic", by
   which we vaguely mean, "That stuff I have
   to code up",
   can in fact be rigorously defined
   as having four parts, which I believe are
   best termed orders, as there is a definite
   precedence to their discovery, analysis and implementation.
- First Order: Schema 
 
- Second Order: Derivations
 
- Third Order: Non-algorithmic compound operations
 
- Fourth Order: Algorithmic compound operations
Now we examine each order in detail.
A Word About Schema and NoSQL
Even "schema-less" databases have a schema, they
   simply do not enforce it in the database server.
   Consider: an eCommerce site using MongoDB is not
   going to be tracking the local zoo's animal
   feeding schedule, because that is out of scope.
   No, the code
   is limited to dealing with orders, order lines,
   customers, items and stuff like that.
It is in the very act of expressing scope as
   "the data values we will handle" that a schema is
   developed.  This holds true regardless of whether
   the datastore will be a filesystem, an RDBMS, a 
   new NoSQL database, or anything else.
Because all applications have a schema, whether the
   database server enforces it or whether the 
   application enforces it, we need a vocabulary
   to discuss the schema.  Here we have an embarrasment
   of choices, we can talk about entities and attributes,
   classes and properties, documents and values, or
   columns and tables.  The choice of "entities and
   attributes" is likely best because it is as close as
   possible to an implementation-agnostic language.
First Order Business Logic: Schema
We can define schema, including security, as:
that body of entities and 
   their attributes whose relationships and
   values will be managed by the
   application stack, including the authorization of
   roles to read or write to entities and properties.
Schema in this definition does not include derived
   values of any kind or the processes that may operate
   on the schema values, those are higher order of 
   business logic.  This means that the schema 
   actually defines the entire body of values that
   the application will accept from outside sources
   (users and other programs) and commit to the
   datastore. Restating again into even more
   practical terms, the schema is the stuff users
   can save themselves.
With all of that said, let's enumerate the properties
   of a schema.  
Type is required for every attribute.
Constraints are limits to the values allowed
   for an attribute beyond its type.  We may have a
   discount percent that may not exceed 1.0 or 100%.
Entity Integrity is usually thought of 
   in terms of primary keys
   and the vague statement "you can't have duplicates."
   We cannot have a list of US States where "NY" is
   listed 4 times.  
Referential Integrity means that when one
   entity links or refers to another entity, it must
   always refer to an existing entity.
   We cannot have some script kiddie flooding our 
   site with sales of 
   items "EAT_ME" and "F***_YOU", becuase those are
   not valid items.
The general term 'validation' is not included 
   because any particular validation rule is 
   is a combination of any or all of type, constraints,
   and integrity rules.
Second Orders Business Logic: Derived values
When we speak of derived values, we usually 
   mean calculated values, but some derivations
   are not arithmetic, so the more general term
   "derived" is better.  Derivations are:
A complete entity or an attribute
   of an entity generated from other entities
   or attributes according to a formula or rule.
The definition is sufficiently general that
   a "formula or rule" can include conditional
   logic.
Simple arithmetic derived values include things
   like calculating price * qty, or summing an
   order total.
Simple non-arithmetic derivations include
   things like
   fetching the price of an item to use on an
   order line.  The price in the order is defined
   as being a copy of the item's price at the
   time of purchase.
An example of a complete entity being derived
   is a history table that tracks changes
   in some other table.
   This can also be implemented
   in NoSQL as a set of documents tracking the
   changes to some original document.
Security also applies to generated values
   only insofar as who can see them.  But security
   is not an issue for writing these values
   because by definition they are generated from
   formulas and rules, and so no outside user 
   can ever attempt to explicitly specify the
   value of a derived entity or property.
One final point about Second Order Business
   Logic is that it can be expressed declaratively,
   if we have the tools, which we do not, at
   least not in common use.  I wrote one myself some
   years ago and am re-releasing it as    "http://code.google.com/p/triangulum-db/"
   >Triangulum, but that is a post for another day.
Sorting out First and Second Order
The definitions of First and Second Order Business Logic
   have the
   advantage of being agnostic to what kind of
   datastore you are using, and being agnostic
   to whether or not the derived values are
   materialized.  (In relational terms, derivations
   are almost always denormalizing if
   materialized, so in a fully normalized database
   they will not be there, and you have to go through
   the application to get them.)
Nevertheless, these two definitions can right off
   bring some confusion to the term "schema." 
   Example: a history table is absolutely in a database schema,
   but I have called First Order Business Logic "schema" and
   Second Order Business Logic is, well, something else.
   The best solution here is to simply use the
   terms First Order Schema and Second Order Schema.
   An order_lines table is First Order schema, and 
   the table holding its history is Second Order Schema.
The now ubiquitous auto-incremented surrogate primary
   keys pose another stumbling block.  Because they are
   used so often (and so often because of seriously faulty
   reasoning, see    >A Sane Approach To Choosing Primary Keys) they
   would automatically be considered schema -- one of the
   very basic values of a sales order, check, etc.  But
   they are system-generated so they must be Second Order, no?
   Isn't the orderid a very basic part of the schema and
   therefore First Order?  No.  In fact, by these 
   definitions, very little if any of an order header 
   is First Order, the tiny fragments that are first order
   might be the shipping address, the user's choice of
   shipping method, and payment details provided by the
   user.  The other information that is system-generated,
   like Date, OrderId, and order total are all Second
   Order.
Third Order Business Logic
Before defining Third Order Business Logic
   I would like to offer a simple example:
   Batch Billing.  A consulting
   company bills by the hour.  Employees enter time
   tickets throughout the day.  At the end of the
   month the billing agent runs a program that, in
   SQL terms:
- Inserts a row into INVOICES for each
 customer with any time entries
 
- Inserts a row into INVOICE_LINES that
 aggregates the time for each employee/customer
 combination.
This example ought to make clear what I mean by
   definining Third Order Business Logic as:
A Non algorithmic compound 
   operation.
The "non-algorithmic" part comes from the fact that
   none of the individual documents, an INVOICE
   row and its INVOICE_LINES, is dependent on any other.
   There is no case in which the
   invoice for one customer will influence the value
   of the invoice for another.   You do not need an
   algorithm to do the job, just one or more steps
   that may have to go in a certain order.
Put another way, it is a one-pass set-oriented
   operation.  The fact that it must be executed in
   two steps is an artifact of how database
   servers deal with referential integrity, which is
   that you need the headers before you can put in
   the detail.  In fact,
   when using a NoSQL database, it may be possible to 
   insert the complete set of documents in one 
   command, since the lines can be nested directly
   into the invoices.
Put yet a third way, in more practical terms,
   there is no conditional or looping logic required
   to specify the operation.  This does not
   mean there will be no looping logic in the final
   implementation, because performance concerns and
   locking concerns may cause it to be implemented
   with 'chunking' or other strategies, but the
   important point is that the specification
   does not include loops or step-wise operations
   because the individual invoices are all 
   functionally independent of each other.
I do not want to get side-tracked here, but I
   have had a working hypothesis in my mind for
   almost 7 years that Third Order Business Logic,
   even before I called it that, is an artifact,
   which appears necessary because of the limitations
   of our tools.  In future posts I would like to
   show how a fully developed understanding and
   implementation of Second Order Business Logic 
   can dissolve many cases of Third Order.
Fourth Order Business Logic
We now come to the upper bound of complexity
   for business logic, Fourth Order, which
   we label "algorithmic compound operations",
   and define a particular Fourth Order Business
   Logic process as:
Any operation where it
   is possible or certain that
   there will be at least
   two steps, X and Y, such that the result
   of Step X modifies the inputs available to
   Step Y.
In comparison to Third Order:
- In Third Order the results are 
 independent of one another, in Fourth Order
 they are not.
 
- In Third Order no conditional or branching
 is required to express the solution, while in
 Fourth Order conditional, looping, or branching
 logic will be present in the expression of the
 solution.
Let's look at the example of ERP Allocation.
   In the interest of brevity, I am going to skip most
   of the explanation of the ERP Allocation algorithm
   and stick to this basic review: a company has a list
   of sales orders (demand) and a list of purchase
   orders (supply).  Sales orders come in through EDI,
   and at least once/day the purchasing department
   must match supply to demand to find out what they
   need to order.  Here is an unrealistically simple
   example of the supply and demand they might be facing:
*** DEMAND *** *** SUPPLY ***
DATE | QTY DATE | QTY
------------+----- ------------+-----
3/ 1/2011 | 5 3/ 1/2011 | 3
3/15/2011 | 15 3/ 3/2011 | 6
4/ 1/2011 | 10 3/15/2011 | 20
4/ 3/2011 | 7
The desired output of the ERP Allocation
   might look like this:
*** DEMAND *** *** SUPPLY ****
DATE | QTY | DATE_IN | QTY | FINAL
------------+-----+------------+------+-------
3/ 1/2011 | 5 | 3/ 1/2011 | 3 | no
| 3/ 3/2011 | 2 | Yes
3/15/2011 | 15 | 3/ 3/2011 | 4 | no
| 3/15/2011 | 11 | Yes
4/ 1/2011 | 10 | 3/15/2011 | 9 | no
4/ 3/2011 | 7 | null | null | no
From this the purchasing agents know that the
   Sales Order that ships on 3/1 will be two days
   late, and the Sales Orders that will ship on
   4/1 and 4/3 cannot be filled completely.  They
   have to order more stuff.
Now for the killer question: Can the desired
   output be generated in a single SQL query?
   The answer is no, not even with Common
   Table Expressions or other recursive constructs.
   The reason is that each match-up of a purchase
   order to a sales order modifies the supply
   available to the next sales order.  Or,
   to use the definition of Fourth Order Business
   Logic, each iteration will consume some supply
   and so will affect the inputs available to
   the next step.
We can see this most clearly if we look at some
   pseudo-code:
for each sales order by date {
while sales order demand not met {
get earliest purchase order w/qty avial > 0
break if none
make entry in matching table
// This is the write operation that
// means we have Fourth Order Business Logic
reduce available qty of purchase order
}
break if no more purchase orders
}
Conclusions
As stated in the beginning, it is my belief
   that these four orders should "ring true" with 
   any developer who has experience with non-trivial
   business applications.  Though we may dispute
   terminology and argue over edge cases, the 
   recognition and naming of the Four Orders should 
   be of immediate benefit during analysis, design,
   coding, and refactoring.  They rigorously
   establish both the minimum and maximum bounds of
   complexity while also filling in the two kinds of
   actions we all take between those bounds. 
   They are datamodel agnostic,
   and even agnostic to implementation strategies
   within data models (like the normalize/denormalize
   debate in relational). 
But their true power is in providing a framework
   of thought for the process of synthesizing 
   requirements into a specification and from there
   an implementation.
Tomorrow we will see some theorems that we can
   derive from these definitions.
 
 
No comments:
Post a Comment