How to Resolve In-Doubt Transactions

What is an “In-Doubt Transaction”?

The two-phase commit mechanism ensures that all nodes either commit or perform a rollback together. What happens if any of the three phases fails because of a system or network error? The transaction becomes in-doubt.

Distributed transactions can become in-doubt in the following ways:

A server system running Oracle Database software crashes
A network connection between two or more Oracle Databases involved in distributed processing is disconnected
An unhandled software error occurs

The RECO process automatically resolves in-doubt transactions when the system, network, or software problem is resolved. Until RECO can resolve the transaction, the data is locked for both reads and writes. The database blocks reads because it cannot determine which version of the data to display for a query.

https://docs.oracle.com/html/E25494_01/ds_txns004.htm

Distributed transactions perform DML on multiple databases which is a bit more complicated task because the database must coordinate the consistency in those separate or even perhaps between different DBMSs (like Oracle – MS SQL). To ensure the transaction atomicity, Oracle implements a 2-phase commit mechanism through which the distributed transactions undergo some phases like prepare, commit, forget, etc. These phases constitute the hand-shake mechanism of the distributed transaction.

However, sometimes things may go wrong (due to some network, system problem or even a reconfiguration of the underlying objects) and one of the phases fails while others are ok. Here, we say that the transaction becomes in-doubt. Normally this problem should be handled by the RECO process itself, but in some cases, this cannot be performed.

Why RECO cannot perform in some cases?

One of the databases involved in the distributed transaction might be unreachable (network, system issues etc.) while the RECO was trying to resolve the problem (even when retrying to recover). (UNSTUCK)

The lookup tables of the “2-phase commit” mechanism might become inconsistent with the transaction itself. (STUCK)

What is a “Two-Phase Commit” Mechanism

Two-phase commit protocol. … (2PC)….. It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (rollback) the transaction (it is a specialized type of consensus protocol).

2 phase commit protocol is an atomic commitment protocol for distributed systems. This protocol as its name implies consists of two phases. The first one is the commit-request phase in which transaction manager coordinates all of the transaction resources to commit or abort.

The two-phase commit protocol is a distributed algorithm which lets all sites in a distributed system agree to commit a transaction. The protocol results in either all nodes committing the transaction or aborting, even in the case of site failures and message losses.

The database ensures the integrity of data in a distributed transaction using the two-phase commit mechanism.

2PC has three phases:

1) In the prepare phase, the initiating node in the transaction asks the other participating nodes to promise to commit or roll back the transaction.

2) During the commit phase, the initiating node asks all participating nodes to commit the transaction. If this outcome is not possible, then all nodes are asked to roll back.

3) Forget phase – The global coordinator forgets about the transaction.

see more at https://docs.oracle.com/html/E25494_01/ds_txns003.htm

What is a Distributed Environment / Transaction

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a model in which components located on networked

In network computing, DCE (Distributed Computing Environment) is an industry-standard software technology for setting up and managing computing and data exchange in a system of distributed computers. Using DCE, application users can use applications and data at remote servers.

A distributed transaction is a database transaction in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources.

A distributed transaction is a transaction that updates data on two or more networked computer systems.

A distributed transaction includes one or more statements that, individually or as a group, update data on two or more distinct nodes of a distributed database.

Note:

In my case, an Oracle database on Linux is connecting to MSSQL database on Windows machine using a DG4MSQL (Database Gateway for MS SQL Server) installed on the Windows server and a Database Link created in the Oracle DB. So this is a Distributed system

The Dictionary tables involved are:

To see the waiting transactions -> DBA_2PC_PENDING view.

To identify the remote DB query -> DBA_2PC_NEIGHBORS view

Handling UnStuck Transactions

Hopefully, there is no inconsistency between the lookup tables and the transaction and the following code resolves the problem:

1. Get pending transaction’s Local_transaction_id:

To see the waiting or pending transactions -> DBA_2PC_PENDING view.

SQL> select local_tran_id,global_tran_id, state,mixed, commit# from dba_2pc_pending;

97.33.166765 ORCL.781a8889.97.33.166765 prepared no 60787107482

Here, ‘97.33.166765’ is the transaction id of the distributed transaction, which will be used in the following commands.

2. Rollback or Commit pending transaction:

Note: Oracle recommends to wait for the RECO to recover pending transactions and does not recommend to perform this procedure unless:

* Total loss of the remote database

* Reconfiguration in software resulting in loss of two-phase commit capability

* Loss of information from an external transaction coordinator such as a TPMonitor

There are cases when the distributed transaction either committed or rollbacked on any of the nodes.

To determined how to resolve the pending transaction you need to identify the remote DB (query the DBA_2PC_NEIGHBORS)

and on every node check if any of the transactions have committed (check the COMMIT# column won’t be null on the DBA_2PC_PENDING view).

select * from DBA_2PC_NEIGHBORS;

If any of the nodes have performed a commit then before executing the DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY procedure you should force a commit on all the participating nodes.

SQL> COMMIT FORCE ‘73.24.102527’;

If the state of the transaction is “prepared” and there is no inconsistency, the transaction can be forced to rollback, or maybe if the underlying problem which caused the in-doubt transaction is resolved the transaction can be forced to commit as follows:

SQL> ROLLBACK FORCE ‘97.33.166765’ /* ->replace with your own trx_id */

SQL> ROLLBACK ‘97.33.166765’

SQL> COMMIT FORCE ‘97.33.166765’ /* ->replace with ur own trx_id */

SQL> COMMIT ‘97.33.166765’

Note: If the command hangs, go to the “Handling Stuck DBA_2PC_PENDING” section.

If the state of the transaction is “collecting” and you execute the above command, you may see an error like:

ERROR at line 1:

ORA-02058: no prepared transaction found with ID 97.33.166765

You may also see an error like below which is resolved by not including the FORCE: COMMIT ‘97.33.166765’ ;

SQL Error: ORA-02043: must end current transaction before executing ROLLBACK FORCE

02043. 00000 – “must end current transaction before executing %s”

*Cause: a transaction is in progress and one of the following commands

command is issued: COMMIT FORCE, ROLLBACK FORCE, or

ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY in single process mode.

*Action: COMMIT or ROLLBACK the current transaction and resubmit

command.

3. Purge/clean pending transaction:

Execute the following command to purge/clean the transaction (as sysdba)

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(‘97.33.166765’); /* ->replace with ur own trx_id */

PL/SQL procedure successfully completed.

If you get above error, do

commit;

alter session set “_smu_debug_mode” = 4;

EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(‘97.33.166765’); /* ->replace with ur own trx_id */

PL/SQL procedure successfully completed.

Repeat above statements until queries below have no rows.

Test to confirm that the transaction has gone

SELECT * FROM DBA_2PC_PENDING;

select local_tran_id,global_tran_id, state,mixed, commit# from dba_2pc_pending;

select * from DBA_2PC_NEIGHBORS;

No rows returned.

How to Resolve In-Doubt Transactions

What is an “In-Doubt Transaction”?

What is a Distributed Environment / Transaction

How to Resolve In-Doubt Transactions

Search Our Blog

Recent Posts

Recent Comments

Archives

Blog Categories