[Lazarus] SQLdb: Pseudo Transaction?

Tue Jun 30 16:49:06 CEST 2020

>
> Ryan,
>
> your answer helps a lot; thank you.
>
> Another (maybe stupid) question: What exactly is the purpose of a 
> 'transaction'? Do I need it for each SQL statement to be executed?

That's an excellent question in fact - lots of confusion out there.

Two main reasons, which I will quickly show - 1 - ACID guarantees 
(especially Isolation and Consistency, the others [Atomicity and 
Durability] you get anyway), and 2 - Speed.
(https://en.wikipedia.org/wiki/ACID)

In most DB engines (SQLite definitely) most executed commands gets their 
own little wrapped transaction if you did not start one explicitly - 
just to make the nuts and bolts of the engine function correctly. This 
might be a "lesser" form of transaction, but it has to at a minimum 
prepare the table, lock tables/rows/whatever, do the reading/writing, 
and then release those locks.

This also means that if you issue four commands, let's use an example 
(based loosely on a contacts-list table), say you execute these three in 
order:

SELECT MAX(ID) FROM contacts;
(The result of the previous statement is put into a variable, say: X)
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES 
(X+1, 'Joe', 'Soap', Now, ...);
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES 
(X+2, 'Joe', 'Jones', Now, ...);
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES 
(X+3, 'Joe', 'Smith', Now, ...);

Now inside the DB Engine, it is wrapped in pseudo-code like this 
(Obviously there is more going on, but I'm just mentioning the 
interesting bits to our example):

tt = Start_Transaction;
     qq = prepare query( SELECT MAX(ID) FROM contacts; ) :
         if (qq is READ Query)
             obtain_read_lock(tt, qq); // Executed
         ELSE
             obtain_write_lock(tt, qq);// Skipped
         run_query(qq);
         output_results;
         drop_locks(tt, qq);
     release_prepared_resources(qq);
     IF ERRORS > 0
         roll_back(tt);
     ELSE
         commit(tt);
release_transaction(tt);
tt = Start_Transaction;
     qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, 
TimeAdded, ...) VALUES (X+1, 'Joe', 'Soap', Now, ...); ) :
         if (qq is READ Query)
             obtain_read_lock(tt, qq);// Skipped
         ELSE
             obtain_write_lock(tt, qq); // Executed
         run_query(qq);
         output_results;
         drop_locks(tt, qq);
     release_prepared_resources(qq);
     IF ERRORS > 0
         roll_back(tt);
     ELSE
         commit(tt);
release_transaction(tt);
tt = Start_Transaction;
     qq = prepare query(INSERT INTO contacts(ID, FirstName, LastName, 
TimeAdded, ...) VALUES (X+2, 'Joe', 'Jones', Now, ...); ) :
         if (qq is READ Query)
             obtain_read_lock(tt, qq); // Skipped
         ELSE
             obtain_write_lock(tt, qq); // Executed
         run_query(qq);
         output_results;
         drop_locks(tt, qq);
     release_prepared_resources(qq);
     IF ERRORS > 0
         roll_back(tt);
     ELSE
         commit(tt);
release_transaction(tt);
tt = Start_Transaction;
     qq = prepare query(INSERT INTO contacts(ID, FirstName, LastName, 
TimeAdded, ...) VALUES (X+3, 'Joe', 'Smith', Now, ...); ) :
         if (qq is READ Query)
             obtain_read_lock(tt, qq);   // Skipped
         ELSE
             obtain_write_lock(tt, qq); // Executed
         run_query(qq);
         output_results;
         drop_locks(tt, qq);
     release_prepared_resources(qq);
     IF ERRORS > 0
         roll_back(tt);
     ELSE
         commit(tt);
release_transaction(tt);

The "output_results;" command might be a no-op for INSERT queries, 
though some DBs do return values.

Now note some things:
Towards Point 1 above:  If someone else also were trying to insert to 
this database at the same time, they might get an X that is in-between 
your inserts, and indeed insert values in between. This is called a 
concurrancy problem and is fixed by the "isolation" in an ACID transaction.
   (I know that behaviour can well be fixed by declaring the ID column 
with AUTO_INCREMENT, but let's imagine it wasn't in this example, you 
may well have other columns that must be Unique).

Note also that the inserted values for "Now()" would be slightly 
different every time since it's just a littlebit later when the next 
INSERT happens.  Inside a transaction, the TimeAdded values will all be 
the same.
Basically, while the transaction is in progress, and for the duration of 
it, no other connection to the same database can mess with those values, 
plus those values will remain consistent for the duration of the 
transaction. This is a very important property to many people.
There are a few other considerations to get technical with, like 
Serializability modes etc, allowing Dirty-reads, Read-uncommitted data 
(they don't fall in the scope of this discussion, but you can look them 
up if interested) - i'll only say here that not everyone wants their 
transactions to be closed off completely so Database Engines have 
switches/ways to allow breaking of the ACID model.

Point 2: Also note that the speed magnificently increases when you 
explicitly START the transaction, because the above script and 
pseudo-code changes to this:

BEGIN TRANSACTION;
SELECT MAX(ID) FROM contacts;
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES 
(X+1, 'Joe', 'Soap', Now, ...);
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES 
(X+2, 'Joe', 'Jones', Now, ...);
INSERT INTO contacts(ID, FirstName, LastName, TimeAdded, ....) VALUES 
(X+3, 'Joe', 'Smith', Now, ...);
END TRANSACTION;

tt = Start_Transaction;
     obtain_read_lock(tt);
     qq = prepare query( SELECT MAX(ID) FROM contacts; ) :
         if (qq is WRITE Query)escalate_to_write_lock(tt, qq); // Skipped
         run_query(qq);
         output_results;**
     release_prepared_resources(qq);
     qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, 
TimeAdded, ...) VALUES (X+1, 'Joe', 'Soap', Now, ...); ) :
         if (qq is WRITE Query) if (qq is WRITE 
Query)escalate_to_write_lock(tt, qq); // Executed
         run_query(qq);
     release_prepared_resources(qq);
     qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, 
TimeAdded, ...) VALUES (X+2, 'Joe', 'Jones', Now, ...); ) :
if (qq is WRITE Query) if (qq is WRITE Query)escalate_to_write_lock(tt, 
qq); // No-op because write-lock is already held.
         run_query(qq);
     release_prepared_resources(qq);
     qq = prepare query( INSERT INTO contacts(ID, FirstName, LastName, 
TimeAdded, ...) VALUES (X+3, 'Joe', 'Smith', Now, ...); ) :
if (qq is WRITE Query) if (qq is WRITE Query)escalate_to_write_lock(tt, 
qq); // No-op because write-lock is already held.
         run_query(qq);
     release_prepared_resources(qq);
     IF ERRORS > 0
        roll_back(tt);
     ELSE
        commit(tt);
     output_results;**
release_transaction(tt);

** - The exact times and places when results are output in queries 
differ from Engine to Engine and can be controlled with switches in 
some. Typically inline after selects, but sometimes you only want to see 
the results that actually made it into the database (i.e. at the END of 
the transaction).

You can of course get even faster when using multi-line inserts and 
prepared statements - but that's also another discussion.

Note also that only transactions with explicit errors will be rolled 
back by the Engines typically. If there is an error (a thing that you 
don't want to commit to the data) but it isn't a problem to commit it 
for the DB engine, then you have to explicitly tell it to roll back. (In 
sqlite this is just querying the command: ROLLBACK;)

There are also conflict handlers which you can specify on how you want 
to proceed when a conflict (error) occurs, but again, that's another 
topic, this one is too long already. :)

Cheers,
Ryan