Bulk Inserts and the Performance Holy Grial (Part I)

Hello,

I've been asked to improve a part of a model class which is causing some database issues. We use PHP and Oracle, so our legacy code is built using the OCI (Oracle Client Interface) built-in library to take advantage as much as possible of the driver speed instead of using either the CodeIgniter abstracted database class or, simply, PDO (PDO for Oracle).

Since we have to deal with bulk transactions under high load scenarios, database access optimization is a must. Massive rows have to be inserted prior to another batch of updates on them lately.

Binding parameters in advance is a way of getting a performance improvement (read more on this topic here: https://blogs.oracle.com/sql/entry/improve_sql_query_performance_by), in addition to avoid SQL injection attacks.

So, here's is a snippet of what the bulk insert method looked like before any change:

class TxnUser extends CI_Model 
{
    const OPEN   = 1;
    const CLOSED = 0;
 
    [...]

    // Legacy method to insert multiple user transactions
    public function insert_user_bulk(array $common, array $data) : bool 
    {

        $sql = 'INSERT INTO "' . $this->tablename . '"
        ("txn", "service", "code", "userid", "pricecode", "amount", "status")
        VALUES (:txn, :service, :code, :userid, :pricecode, :amount, ' . self::OPEN . ')';

        $stmt = oci_parse($this->db->connid, $sql);

        oci_bind_by_name($stmt, ':txn', $txn, 32);
        oci_bind_by_name($stmt, ':service', $service, 32);
        oci_bind_by_name($stmt, ':code', $code, 32);
        oci_bind_by_name($stmt, ':userid', $userid, 32);
        oci_bind_by_name($stmt, ':pricecode', $pricecode, 32);
        oci_bind_by_name($stmt, ':amount', $amount, 32);
        
        // $common = ['txn' => 1, 'service' => 'Music', 
        //            'pricecode' => 'VAT20', $amount => 7.3]
        extract($common);
        foreach ($data as $user) {
            // $user = ['userid' => 123, 'customer' => 1, 'usercode' => 53452345]
            extract($user);
            $code = ($customer) ? $usercode : 'N/A'; 

            $c = oci_execute($stmt, OCI_NO_AUTO_COMMIT);
        }

        return oci_commit($this->db->connid);
    }

    [...]
}


To clarify to non-PHP programmers, the PHP built-in function extract, takes an associative array and creates as many local variables into the scope as keys are found:

Eg, the array ['hello' => 'world'] would become a variable called $hello with value "world" in it.

On the other hand, if you may have noticed, non initialised variables are being used for all the oci_bind_by_name calls. This is really ugly, but it is something PHP allows, the language is loosely typed and variables can be used without previous initialisation.

The point here is that, because of how oci_bind_by_name works, we need to create an accessible space in memory where to put all the values we will use later on. Note the ampersand in the third parameter of the method signature:


bool oci_bind_by_name ( resource $statement , string $bv_name , 
                         mixed &$variable [, int $maxlength = -1 [, int $type = SQLT_CHR ]] )
Source php.net: http://php.net/manual/en/function.oci-bind-by-name.php

So, this reference, at the moment of the oci_bind_by_name call, contains rubbish, however, once the extract methods create the local variables into the scope, all those memory references are populated with the right value. Yes, I can guess what you're thinking ... this is PHP, joy !

As for the forth parameter, $maxlength, in the snippet there is a hard-coded value (32) which represents the maximum length in bytes for each field. And this is another tricky point. Default value is -1, and, as the php.net manual states:

For IN binds it is recommended to set the maxlength length if the statement is re-executed multiple times with different values for the PHP variable. Otherwise Oracle may truncate data to the length of the initial PHP variable value. If you don't know what the maximum length will be, then re-call oci_bind_by_name() with the current data size prior to each oci_execute() call. Binding an unnecessarily large length will have an impact on process memory in the database.

Note the last sentence: "Binding an unnecessarily large length will have an impact on process memory in the database." So, in order to avoid performance impacts the max length has to be handled carefully. But, why not using -1? it's just the exact length of the field. Well, it may happen then that the length for each field will be the first one when doing the oci_execute (and that is what the statement "Otherwise Oracle may truncate data to the length of the initial PHP variable value." stands for), unless you put all the binding calls into the loop, this way:

 
    [...]

        // $common = ['txn' => 1, 'service' => 'Music', 
        //            'pricecode' => 'VAT20', $amount => 7.3]
        extract($common);
        foreach ($data as $user) {
            // $user = ['userid' => 123, 'customer' => 1, 'usercode' => 53452345]
            extract($user);
            $code = ($customer) ? $usercode : 'N/A'; 
            
            oci_bind_by_name($stmt, ':txn', $txn); 
                        // equivalent to oci_bind_by_name($stmt, ':txn', $txn, -1);
            oci_bind_by_name($stmt, ':service', $service);
            oci_bind_by_name($stmt, ':code', $code);
            oci_bind_by_name($stmt, ':userid', $userid);
            oci_bind_by_name($stmt, ':pricecode', $pricecode);
            oci_bind_by_name($stmt, ':amount', $amount);

            $c = oci_execute($stmt, OCI_NO_AUTO_COMMIT);
            oci_commit($this->db->connid);
        }

    [...]


Unfortunately, we need to commit (or switch oci_execute to auto-commit) in order to ensure that variable references are not being overwritten at each iteration. By doing this way, we increase our complexity to linear due to the loop: N rows, N executes and N commits, what is not our goal.

Our goal is to have N rows, M executes (N > M), 1 commit.

Go to Part II

Comments

Popular posts from this blog

Bulk Inserts and the Performance Holy Grial (Part II)

Legacy code concepts II: the Seam Model