Notes on Advanced Database Management System (ADBMS)


Que 1. Define Distributed Databases .

Ans:  A distributed database system consists of loosely coupled sites that share no physical components. The data may be stored on same physical location or on different location. All computers are interconnected with network. Each site may participate in the execution of transactions that access data at one site or several sites. The main difference between centralized and distributed database system is that the data is stored on single location in centralized system whereas in the distributed system, the data resides in several locations.

Que 2. Define Distributed Data Storage.

Ans: There are several approaches to storing this relation in the distributed databases .

Replication : In this approach, the system maintains several identical copies of the relation. Each relation is stored at different site.
Fragmentation : In this approach, the relation is partitioned into several fragments. Each fragment is stored at a different site.
Replication and fragmentation : The relation is portioned into several fragments. The system maintains several copies of each fragment.

Que 3. What do you mean by Distributed Transactions?

Ans: A distributed transaction includes one or more statements, that update data individually or as a group, on two or more nodes of a distributed database. There are two types of transactions :

Local Transactions & Global Transactions.

Local transactions access and modify data in only one local database.
Global transaction access and modify data in several databases.
Like other transactions, distributed transactions must have all four ACID properties. Each site has its ADMS 15 own Local Transaction Manager whose function is to ensure the ACID properties of those transactions that execute at that site. The various transaction involves in execution of global transaction.

Que 4. Define Commit Protocol.

Ans: All statements in a transaction (distributed or non distributed) must commit or rollback as a unit .The effects of an ongoing transaction should be invisible to all other transactions at all nodes. Two-phase commit allows groups of transactions across several nodes to be treated as a unit, means either all transactions commit, or they all get rolled back.

2 Phase Commit (2 PC):

The two-phase commit consists of two phases – Prepare Phase & Execute Phase

Prepare Phase :
The coordinator sends a message “prepare for commit” to each node to get ready for committing the transaction.
All database who are participating receive the message will force-write all the transaction details to disk and then send a “ready to commit” or “OK” signal to the coordinator.
If the force-writing to the disk fails or if the local transaction cannot commit for some reason, the participating database sends a “cannot commit” or “Not OK” signal to the coordinator.

Commit Phase:
If all participating databases reply “OK”, the coordinator signals “OK”. This means that the transaction is successful.
The coordinator sends a “commit” signal for the transaction to all the participating databases.
All participating databases completes the transaction by permanently modifying the database.
If any databases has given a “Not OK” signal, then the communicator also signals a “Not OK”.
The “rollback” or undo message will send by coordinator to the local effect of the transaction to each participating database.

3 Phase Commit (3 PC):
3 Phase Commit is a non blocking protocol.
It was developed to avoid the failures that occur in two-phase commit transactions.
The 3PC also has a coordinator who initiates and coordinates the transaction
3PC has an additional phase called Pre-Commit phase which will remove the uncertainty period for participants that have committed and are waiting for the global abort or commit message from the coordinator.
When receiving a pre-commit message, participants know that all others have voted to commit. If a pre-commit message has not been received the participant will abort and release any blocked resources.

Que 5. Define Concurrency Control in Distributed Database.

Ans:  To control Concurrency in distributed database, we have 2 options :

Locking Protocols : The various locking protocols used in a distributed environment are:

§ Single-Lock Manager Approach

The system maintains a single lock manager that resides in a single site. When a transaction needs to lock a data item, it sends a lock request to that site. The lock manager decides that the lock can be granted immediately or not. If the lock can be granted, the lock manager sends a message to the site at which the lock request was initiated. Otherwise, the request is delayed until it can be granted, at which time a message is sent to the site at which the lock request was initiated.

§ Multiple Coordinators –

In this approach, the lock manager function is distributed over several sites. Each lock manager watch and check the lock and unlock requests for a subset of data items. Each lock manager resides in a different site. ADMS this is approach complicates deadlock handling, because the lock and unlock requests are not made at a single site.

§ Majority Protocol –

In this protocol, each site maintains a local lock manager whose task is to administer the lock and unlock requests for those data items that are stored in that site. This scheme deals with replicated data in a decentralized manner, thus avoiding the drawbacks of central control. This protocol is more complicated for implementation. Deadlock handling is also an complicated task because lock and unlock requests are not made at one site.

§ Biased Protocol –

This protocol is same as Majority Protocol but the difference is that requests for shared locks treated more effectively than requests for exclusive locks. The system maintains a lock manager at each site. Each manager manages the locks for all the data items stored at that site. Shared and Exclusive locks are handled differently.
In this protocol overheads on read operations are less than majority protocol, but has additional overheads on writes. Deadlock handling is complex here like majority protocol.

§ Primary Copy –

In the case of data replication, one of the replicas (copies) designated as a primary copy. Concurrency control for replicated data handled in a manner similar to that of unreplicated data. Implementation , but if primary site fails, the data item is unavailable, even though other sites may have a replica.

Timestamping : Each transaction is given a unique timestamp that the system uses in deciding the serialization order. There are 2 primary methods for generating unique timestamps: first is centralized and second is distributed in the centralized scheme, a single site is chosen for distributing the timestamps. The site can use a logical counter or its own local clock for this purpose.

In the distributed scheme, each site generates a unique local timestamp using either a logical counter or the local clock. We obtain the unique global timestamp by concatenating the unique local timestamp with the site identified, which also must be unique. The order of concatenation is important.