informatique -solutions series: June 2009

Tuesday, June 30, 2009

Approaches to "UPSERT"

This week in the Database Programmer we look at something
called an "UPSERT", the strange trick where an insert
command may magically convert itself into an update if
a row already exists with the provided key. This trick
is very useful in a variety of cases. This week we will
see its basic use, and next week we will see how the same
idea can be used to materialize summary tables efficiently.

An UPSERT or ON DUPLICATE KEY...

The idea behind an UPSERT is simple. The client issues
an INSERT command. If a row already exists with the
given primary key, then instead of throwing a key
violation error, it takes the non-key values and updates
the row.

This is one of those strange (and very unusual) cases
where MySQL actually supports something you will not
find in all of the other more mature databases. So if you
are using MySQL, you do not need to do anything special
to make an UPSERT. You just add the term "ON DUPLICATE
KEY UPDATE" to the INSERT statement:


insert into table (a,c,b) values (1,2,3)
    on duplicate key update
     b = 2,
     c = 3

The MySQL command gives you the flexibility to specify
different operation on UPDATE versus INSERT, but with
that flexibility comes the requirement that the UPDATE
clause completely restates the operation.

With the MySQL command there are also various considerations
for AUTO_INCREMENT columns and multiple unique keys.
You can read more at the MySQL page for the
"http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html"
>INSERT ... ON DUPLICATE KEY UPDATE feature.

A Note About MS SQL Server 2008

MS SQL Server introduced something like UPSERT in
SQL Server 2008. It uses the MERGE command, which is
a bit hairy, check it out in this
"http://www.databasejournal.com/features/mssql/article.php/3739131/UPSERT-Functionality-in-SQL-Server-2008.htm"
>nice tutorial.

Coding a Simpler UPSERT

Let us say that we want a simpler UPSERT, where you do not
have to mess with SQL Server's MERGE or rewrite the entire
command as in MySQL. This can be done with triggers.

To illustrate, consider a shopping cart with a natural key
of ORDER_ID and SKU. I want simple application code that
does not have to figure out if it needs to do an INSERT or
UPDATE, and can always happily do INSERTs, knowing they will
be converted to updates if the line is already there.
In other words, I want simple application code that just keeps
issuing commands like this:


INSERT INTO ORDERLINES
       (order_id,sku,qty)
VALUES 
       (1234,'ABC',5)

We can accomplish this by a trigger. The trigger must occur
before the action, and it must redirect the action to an
UPDATE if necessary. Let us look at examples for MySQL,
Postgres, and SQL Server.

A MySQL Trigger

Alas, MySQL giveth, and MySQL taketh away. You cannot code
your own UPSERT in MySQL because of an extremely severe
limitation in MySQL trigger rules. A MySQL trigger may not
affect a row in a table different from the row originally
affected by the command that fired the trigger. A MySQL
trigger attempting to create a new row may not affect
a different row.

Note: I may be wrong about this. This limitation has bitten
me on several features that I would like to provide for MySQL.
I am actually hoping this limitation will not
apply for UPSERTs because the new row does not yet exist, but
I have not had a chance yet to try.

A Postgres Trigger

The Postgres trigger example is pretty simple, hopefully the
logic is self-explanatory. As with all code samples, I did
this off the top of my head, you may need to fix a syntax
error or two.


CREATE OR REPLACE FUNCTION orderlines_insert_before_F()
RETURNS TRIGGER
 AS $BODY$
DECLARE
    result INTEGER; 
BEGIN
    SET SEARCH_PATH TO PUBLIC;
    
    -- Find out if there is a row
    result = (select count(*) from orderlines
                where order_id = new.order_id
                  and sku      = new.sku
               )

    -- On the update branch, perform the update
    -- and then return NULL to prevent the 
    -- original insert from occurring
    IF result = 1 THEN
        UPDATE orderlines 
           SET qty = new.qty
         WHERE order_id = new.order_id
           AND sku      = new.sku;
           
        RETURN null;
    END IF;
    
    -- The default branch is to return "NEW" which
    -- causes the original INSERT to go forward
    RETURN new;

END; $BODY$
LANGUAGE 'plpgsql' SECURITY DEFINER;

-- That extremely annoying second command you always
-- need for Postgres triggers.
CREATE TRIGGER orderlines_insert_before_T
   before insert
   ON ORDERLINES
   FOR EACH ROW
   EXECUTE PROCEDURE orderlines_insert_before_F();

A SQL Server Trigger

SQL Server BEFORE INSERT triggers are significantly different
from Postgres triggers. First of all, they operate at the
statement level, so that you have a set of new rows instead
of just one. Secondly, the trigger must itself contain an
explicit INSERT command, or the INSERT never happens. All of this
means our SQL Server example is quite a bit more verbose.

The basic logic of the SQL Server example is the same as the
Postgres, with two additional complications. First, we must use
a CURSOR to loop through the incoming rows. Second, we must
explicitly code the INSERT operation for the case where it
occurs. But if you can see past the cruft we get for all of that,
the SQL Server exmple is doing the same thing:


CREATE TRIGGER upsource_insert_before
ON orderlines
INSTEAD OF insert
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @new_order_id int;
    DECLARE @new_sku      varchar(15);
    DECLARE @new_qty      int;
    DECLARE @result       int;

    DECLARE trig_ins_orderlines CURSOR FOR 
            SELECT * FROM inserted;
    OPEN trig_ins_orderlines;

    FETCH NEXT FROM trig_ins_orderlines
     INTO @new_order_id
         ,@new_sku
         ,@new_qty;

    WHILE @@Fetch_status = 0 
    BEGIN
        -- Find out if there is a row now
        SET @result = (SELECT count(*) from orderlines
                        WHERE order_id = @new_order_id
                          AND sku      = @new_sku
                      )
    
        IF @result = 1 
        BEGIN
            -- Since there is already a row, do an
            -- update
            UPDATE orderlines
               SET qty = @new_qty
             WHERE order_id = @new_order_id
               AND sku      = @new_sku;
        END
        ELSE
        BEGIN
            -- When there is no row, we insert it
            INSERT INTO orderlines 
                  (order_id,sku,qty)
            VALUES
                  (@new_order_id,@new_sku,@new_qty)
            UPDATE orderlines

        -- Pull the next row
        FETCH NEXT FROM trig_ins_orderlines
         INTO @new_order_id
             ,@new_sku
             ,@new_qty;

    END  -- Cursor iteration

    CLOSE trig_ins_orderlines;
    DEALLOCATE trig_ins_orderlines;

END

A Vague Uneasy Feeling

While the examples above are definitely cool and nifty,
they ought to leave a certain nagging doubt in many
programmers' minds. This doubt comes from the fact that
an insert is not necessarily an insert anymore,
which can lead to confusion. Just imagine the new programmer
who has joined the team an is banging his head on his desk
because he cannot figure out why his INSERTS are not
working!

We can add a refinement to the process by making the
function optional. Here is how we do it.

First, add a column to the ORDERLINES table called
_UPSERT that is a char(1). Then modify the trigger so that
the UPSERT behavior only occurs if the this column holds
'Y'. It is also extremely import to always set this value
back to 'N' or NULL in the trigger, otherwise it will appear
as 'Y' on subsequent INSERTS and it won't work properly.

So our new modified explicit upsert requires a SQL statement
like this:


INSERT INTO ORDERLINES
       (_upsert,order_id,sku,qty)
VALUES
       ('Y',1234,'ABC',5)

Our trigger code needs only a very slight modification.
Here is the Postgres example, the SQL Server example should
be very easy to update as well:


   ...trigger declration and definition above
   IF new._upsert = 'Y'
      result = (SELECT.....);
      _upsert = 'N';
   ELSE
      result = 0;
   END IF;
   
   ...rest of trigger is the same

Conclusion

The UPSERT feature gives us simplified code and fewer
round trips to the server. Without the UPSERT there are
times when the application may have to query the server to
find out if a row exists, and then issue either an UPDATE
or an INSERT. With the UPSERT, one round trip is eliminated,
and the check occurs much more efficiently inside of the
server itself.

The downside to UPSERTs is that they can be confusing if
some type of explicit control is not put onto them such as
the _UPSERT column.

Next week we will see a concept similar to UPSERT used
to efficiently create summary tables.

Monday, June 15, 2009

[Publi-Info] D�couvrez et prenez part au projet Moblin

Comme le savent ce qui suivent ce blog depuis un moment, il n'est pas dans mes habitudes de sponsoriser mon contenu et de vous parler de machines � caf�, de plantes vertes ou de gastronomie sur ce blog dans le but de me mettre un peu d'argent en poche.

J'ai choisi de vous parler de Moblin parce que j'ai moi-m�me r�cement install� Ubuntu sur mon ordinateur, et fait en m�me temps mes premiers pas dans le monde de libre et de Linux, qui m'enthousiasme par la richesse de sa communaut� et par les perspectives nouvelles qu'il ouvre sur l'informatique, enfin en tout cas pour moi qui m'�tait jusque l� cantonn� � Windows.

Moblin est un OS open source destin� aux netbook, t�l�phones portables et MIDs.

Bas� sur GNU/Linux, le projet a �t� lanc� en 2007 par Intel dans le but de devenir la plateforme Linux pour mobile la plus aboutie.

Je pourrais m'�tendre un peu plus sur les objectifs et les caract�ristiques de projet, mais je ne r�siste pas � l'envie de vous d�voiler � quoi ressemble un ordinateur tournant sous Moblin :

On peut remarquer que l'interface prend la forme d'onglets regroupants les principaux logiciels et fonctions de syst�me, ce qui n'est pas sans rappeller l'interface de Xandros sur les eeePC.

Voila cette m�me interface en fonctionnement :

Sympa non ?

Personellement je trouve cette interface tr�s ergonomique, dans le sens ou on peut acc�der rapidement aux fonctions essentielles pour l'utilisateur lambda, telles que le navigateur, le lecteur audio ...

Alors bien s�r cela parait tr�s limit� pour l'utilisateur averti qui ne peut se contenter des fonctions de base de son odinateur, mais cette interface convient on ne peut mieux � des personnes qui ont besoin d'un OS simple et rapide pour effectuer des t�ches basiques dans un environnement graphique chaleureux.

De plus, l'optimisation de Moblin pour les appareils mobiles rend sa consommation en �nergie moindre, vous assurant une automie cons�quente.

Moblin est aujourd'hui en plein d�veloppement par une communaut� grandissante, que vous pouvez rejoindre en vous rendant sur Moblin Zone ou Moblin.org.

Vous pourrez ainsi, si vous �tes d�veloppeur, participer � l'�laboration de Moblin. Et si vos comp�tences en code se limitent comme les miennes � pas grand-chose, vous pouvez t�l�charger Moblin, le tester sur votre propre machine et participer � son d�veloppement en reportant vos impression et en aidant les d�veloppeurs � faire de Moblin "la plateforme Linux pour mobile la plus aboutie".

Article sponsoris�