Que 1. Define Distributed Databases .
Ans:
A distributed
database system consists of loosely coupled sites that share no physical
components. The data may be stored on same physical location or on different
location. All computers are interconnected with network. Each site may
participate in the execution of transactions that access data at one site or
several sites. The main difference between centralized and distributed database
system is that the data is stored on single location in centralized system
whereas in the distributed system, the data resides in several locations.
Que 2. Define Distributed Data Storage.
Ans: There are several approaches to storing this relation
in the distributed databases .
Replication : In this approach, the system maintains
several identical copies of the relation. Each relation is stored at different
site.
Fragmentation : In this approach, the relation is
partitioned into several fragments. Each fragment is stored at a different
site.
Replication and fragmentation : The relation is portioned into several
fragments. The system maintains several copies of each fragment.
Que 3. What do you mean by Distributed
Transactions?
Ans: A distributed transaction includes one or more
statements, that update data individually or as a group, on two or more nodes
of a distributed database. There are two types of transactions :
Local
Transactions & Global Transactions.
Local
transactions access and
modify data in only one local database.
Global
transaction access and
modify data in several databases.
Like other transactions, distributed
transactions must have all four ACID properties. Each site has its ADMS 15 own Local Transaction Manager whose function is to
ensure the ACID properties of those transactions that execute at that site. The
various transaction involves in execution of global transaction.
Que 4. Define Commit Protocol.
Ans: All statements in a transaction (distributed or non
distributed) must commit or rollback as a unit .The effects of an ongoing
transaction should be invisible to all other transactions at all nodes.
Two-phase commit allows groups of transactions across several nodes to be
treated as a unit, means either all transactions commit, or they all get rolled
back.
2 Phase Commit (2 PC):
The two-phase commit consists of two phases
– Prepare Phase & Execute Phase
Prepare Phase :
The coordinator
sends a message “prepare for commit” to each node to get ready for committing
the transaction.
All database who are
participating receive the message will force-write all the transaction details
to disk and then send a “ready to commit” or “OK” signal to the coordinator.
If the force-writing to the disk fails or
if the local transaction cannot commit for some reason, the participating
database sends a “cannot commit” or “Not OK” signal to the coordinator.
Commit Phase:
If all participating
databases reply “OK”, the coordinator signals “OK”. This means that the
transaction is successful.
The coordinator
sends a “commit” signal for the transaction to all the participating databases.
All participating
databases completes the transaction by permanently modifying the database.
If any databases has
given a “Not OK” signal, then the communicator also signals a “Not OK”.
The “rollback” or undo message will send by
coordinator to the local effect of the transaction to each participating
database.
3 Phase Commit (3
PC):
3 Phase Commit is a
non blocking protocol.
It was developed to
avoid the failures that occur in two-phase commit transactions.
The 3PC also has a
coordinator who initiates and coordinates the transaction
3PC has an
additional phase called Pre-Commit phase which will remove the uncertainty
period for participants that have committed and are waiting for the global
abort or commit message from the coordinator.
When receiving a pre-commit message,
participants know that all others have voted to commit. If a pre-commit message
has not been received the participant will abort and release any blocked
resources.
Que 5. Define Concurrency Control in
Distributed Database.
Ans: To control Concurrency in distributed database, we have 2 options :
Locking Protocols : The various locking protocols used in a
distributed environment are:
§ Single-Lock
Manager Approach –
The system maintains a single lock manager
that resides in a single site. When a transaction needs to lock a data item, it
sends a lock request to that site. The lock manager decides that the lock can
be granted immediately or not. If the lock can be granted, the lock manager
sends a message to the site at which the lock request was initiated. Otherwise,
the request is delayed until it can be granted, at which time a message is sent
to the site at which the lock request was initiated.
§ Multiple
Coordinators –
In this approach, the lock manager function
is distributed over several sites. Each lock manager watch and check the lock
and unlock requests for a subset of data items. Each lock manager resides in a
different site. ADMS this is approach complicates deadlock handling,
because the lock and unlock requests are not made at a single site.
§ Majority
Protocol –
In this protocol, each site maintains a
local lock manager whose task is to administer the lock and unlock requests for
those data items that are stored in that site. This scheme deals with
replicated data in a decentralized manner, thus avoiding the drawbacks of
central control. This protocol is more complicated for implementation. Deadlock
handling is also an complicated task because lock and unlock requests are not
made at one site.
§ Biased
Protocol –
This protocol is same as Majority Protocol
but the difference is that requests for shared locks treated more effectively
than requests for exclusive locks. The system maintains a lock manager at each
site. Each manager manages the locks for all the data items stored at that
site. Shared and Exclusive locks are handled differently.
In this protocol overheads on read
operations are less than majority protocol, but has additional overheads on
writes. Deadlock handling is complex here like majority protocol.
§ Primary Copy –
In the case of data replication, one of the
replicas (copies) designated as a primary copy. Concurrency control for
replicated data handled in a manner similar to that of unreplicated data.
Implementation , but if primary site fails, the data item is unavailable, even
though other sites may have a replica.
Timestamping : Each transaction is given a unique timestamp that the
system uses in deciding the serialization order. There are 2 primary methods
for generating unique timestamps: first is centralized and second is
distributed in the centralized scheme, a single site is chosen for distributing
the timestamps. The site can use a logical counter or its own local clock for
this purpose.
In the distributed scheme,
each site generates a unique local timestamp using either a logical counter or
the local clock. We obtain the unique global timestamp by concatenating the
unique local timestamp with the site identified, which also must be unique. The
order of concatenation is important.