BASE DE DATOS DISTRIBUIDAS

June 16, 2017 | Autor: Oscar Silva | Categoria: Computer Science, Database Systems, Computer Networks, Databases

Descrição do Produto

BASE DE DATOS DISTRIBUIDAS SQL SERVER 2008

EA

INTRODUCCIÓN • MS SQL Server 2008 distributes data to remote or mobile users by replication either over the Internet or over the local area network (LAN). • Two methods of distributing data exist in SQL Server: • Distributed transactions • Replication

• It is possible to maintain multiple copies of current data in the same environment using both of these methods.

DISTRIBUCIÓN DE DATOS • Companies nowadays have multiple copies of data distributed across various locales. This data can be in different formats, employ different relational database management systems (RDBMSs), and often reside on multiple servers. • The purpose in all these cases is to bring the data closer, to allow autonomy of data, and to reduce conflicts for the users.

CONFIGURACIÓN DE UN SISTEMA DISTRIBUIDO • The terms distributed databases and distributed processing are often used interchangeably. Distributed processing involves applications that share the resources among the members of the network. The client-server system would be an example of distributed processing.

• The distribution of all these heterogeneous sources, like processors, networks, operating systems, and data, opens up new opportunities in scalability as well as creating challenges in security, transactional consistency, and transparency. Distributed computing systems are systems in which several autonomous processes interconnected in a computer network cooperate to perform a specific task. Distributed databases are parts of this much bigger distributed computing picture. • Consider the challenges of a distributed system such as a distributed data environment. In a distributed data environment, like a replication environment, the various sites are autonomous and can be online when needed. As such, scalability is allowed. Transactions are atomic, meaning that either the whole of a transaction is committed or all of it is rolled back. Although latency is an integral factor in replication, the consistency of the transactions is still maintained.

• In a distributed environment, each of the components is independent of each other, so they can also function independently. • SQL Server provides access to heterogeneous databases by using the ODBC or OLE DB drivers. • Microsoft OLE DB Provider for ODBC • Microsoft OLE DB Provider for Jet • Microsoft OLE DB Provider for Oracle

DISTRIBUTED DATA TRANSACTIONS • MS DTC is a transaction manager that coordinates transaction-processing requests from databases, message queues, and file systems either on a local machine or distributed across the network.

REPLICACIÓN • Replication is a “set of technologies” that can move data and database objects from one database to another and across different platforms and geographic locales. This allows users to work with a local copy of the database, and any changes they make are transferred to one or more remote servers or mobile users across the network. The consistency of the database is maintained by the process of synchronization.

• There are two kinds of replication: eager and lazy replication

EAGER REPLICATION • Eager replication is also known as synchronous replication. In this method, an application can update a local replica of a table, and within the same transaction it can also update other replicas of the same table. No concurrency anomalies occur, since synchronous replication gives serializable execution. Any anomaly in concurrency is detected by the locking method.

• The atomicity of the transactions is guaranteed by the employment of the 2PC method.

• Eager replication consists of the following steps: execute, transmit, notify, and either commit or rollback. An executed transaction is transmitted to different nodes, and in the event of failure in one node, the transaction is rolled back and all the other nodes are notified of the failure. The transaction is then aborted in all nodes. If replication is successful in all the nodes, a commit is broadcast and a copy of the committed transaction is then sent to all the nodes.

• Eager replication is not a good choice for a remote or mobile environment, since it reduces update performance

LAZY REPLICATION • Lazy replication is also known as asynchronous replication. In this case, if the transactions are committed, they are sent to the different sites for the updates to occur. However, if they are rolled back, the changes will not be transmitted to the different sites. Thus, the very nature of asynchronous replication allows the updates of committed transactions to be sent to disconnected sites, as in the case of handheld sets or mobile devices.

1. With this type of replication, it is possible for two different sites to update the same data on the same destination site. This will lead to a conflict in the updating of the data. Such update conflicts need to be resolved in lazy replication, and this is done by associating timestamps with each of the transaction objects. 2. Each object carries the timestamp associated with the previous update of that data. So when a transaction is sent to the destination site, it first checks to see whether the timestamp associated with the local copy of the replicated data matches the incoming transaction’s old timestamp for that data. 3. Only if they match will the changes, including the transaction’s new timestamp, be applied. 4. If the timestamps do not match at the initial stage, the updated transaction is rejected.

REPLICATION IN SQL SERVER • SQL Server follows asynchronous (lazy) replication. It permits three different kinds of asynchronous replication: snapshot, transactional, and merge replication. • Snapshot replication makes a copy of the data and propagates changes for the whole set of data rather than individual transactions, thereby making it a discontinuous process and entailing a higher degree of latency. • Transactional replication allows incremental changes to data to be transferred either continuously or at specific time intervals. Transactional replication is normally used where there is a high volume of inserts, updates, and deletes. This type of replication is usually used in a server-to-server environment. • Merge replication permits a higher degree of autonomy. It allows the subscribing servers to make changes and then it propagates those changes to the publishing servers, which in turn transfer the changes to other subscriber servers

Following are some of the likely situations in which the dynamic nature of replication can be used:

• Snapshot replication can be employed to archive data that can then be used for data warehousing purposes in the future. • Transactional replication can be used to transfer real-time data to facilitate the generation of real-time reports. • Peer-to-peer replication can be used for disaster recovery purposes. Such a process can enable IT systems to have a warm standby of databases, although this does not necessarily substitute having a sound backup and recovery strategy for the organization. • Merge replication can be used for sales force automation. Salespeople can take advantage of the web synchronization for merge replication to download data from the Web, while any conflicts in DML operations can be resolved in the background.

GRADO DE AUTONOMÍA Y LATENCIA EN LOS DIFERENTES MÉTODOS DE DISTRIBUCIÓN DE DATOS

MODELO PUBLICADOR - SUBSCRIPTOR • The Publisher-Subscriber model is based on a metaphor from the publishing industry. This metaphor is a logical representation of the architecture the software industry has followed in database replication.

• Un libro es una Publicación y cada capítulo es un artículo.

COMPONENTES DE LA REPLICACIÓN • Distribuidor • Publicador • Subscriptor • Publicación

• Artículo • Subscripciones • Agentes

DISTRIBUIDOR • The Distributor server is the common link that enables all the components involved in replication to interact with each other. It contains the distribution database, and it is responsible for the smooth passage of data between the Publisher servers and the Subscriber servers. • If the Distributor server is located on the same machine as the Publisher server, it is known as the local Distributor server, but if it is on a separate machine from the Publisher server, it is called the remote Distributor server. In large-scale replication, it is better to house the Distributor server on a remote server. This will not only improve performance, but also reduce I/O processing and reduce the impact of replication on the Publisher server.

• The role of the Distributor server varies depending on the type of replication:

• In snapshot and transactional replication, the distribution database in the Distributor server stores the replicated transactions temporarily and also stores the metadata and the job history. The replication agents are also stored in the Distributor server, except in cases where the agents are configured remotely or pull subscriptions are used. (A pull subscription is one in which the Subscriber server asks for periodic updates of all changes made at the publishing server.) • In merge replication, unlike in snapshot and transactional replication, the distribution database in the Distributor server stores the metadata and the history of the synchronization. It also contains the Snapshot Agent and the Merge Agent for push subscriptions.

• A push subscription is a subscription in which the Publisher server propagates the changes to the subscribing servers without any specific request from the subscribing server.

• The distribution database is a system database that is created when the Distributor server is configured. You should not drop the distribution database unless you want to disable it. It stores information about not only replication, but also the metadata, job history, and transactions.

PUBLICADOR • While the Distributor server manages the data flow, the Publisher server ensures that data is available for replication to other servers. The Publisher is the server that contains the data to be replicated. It can also identify and maintain changes in data. • Depending on the type of replication, changes in data are identified and periodically time-stamped

SUBSCRIPTOR • The Subscriber server stores replicas and receives updates from the Publisher server. Periodic updates made on the Subscriber server can then be sent back to the Publisher server. It may also be necessary for the Subscriber server to act as a Publisher server and republish the data to other subscribing servers.

PUBLICACIÓN • The Publisher server contains a collection of articles in the publication database. This database tells the Publisher server which data needs to be sent to other servers or to the subscribing servers. In other words, the publication database acts as the data source for replication.

• The database that is published can contain one or more publications. A publication is a unit that contains one or more articles that are sent to the subscribing servers.

ARTÍCULO • An article is any grouping of data to be replicated; it is a component of a publication. It may contain a set of tables or a subset of tables. Articles can also contain a set of columns (vertical filtering), a set of rows (horizontal filtering), stored procedures, views, indexed views, or user defined functions (UDFs).

SUBSCRIPCIONES • Subscriber servers must define their subscriptions for a particular set of publications in order to receive the snapshot from the Publisher server. • For all three types of replication, snapshot files are made of the schema and initial data files of the publication and are stored in the snapshot folder. Subsequent changes to the data or the schema are transferred from the Publisher server to the Subscriber server. This process is known as synchronization. • Subscriptions need to be synchronized within a specific period of time, which depends on the replication and subscription types used. • There are two methods by which data changes made on the publication can be sent to subscriptions in SQL Server: anonymous subscriptions and named subscriptions.

• There are two kinds of named subscriptions: push subscriptions and pull subscriptions. (In fact, anonymous subscription is a kind of pull subscription.)

• Push subscriptions are created at the Publisher server. The Publisher server retains control of the subscriptions and can propagate the changes either on demand, or continuously, or at scheduled intervals. However, synchronization in push subscriptions is typically transmitted continuously, whenever changes occur in the publication, without waiting for the Subscriber server to make a request.

AGENTES • The agents collate all the changes and perform the necessary jobs in distributing the data. • There are five different types of agents: • Snapshot Agent • Log Reader Agent • Distribution Agent

• Merge Agent • Queue Reader Agent • Consulta

MODELOS DE REPLICACIÓN FISICA • Publisher/Distributor–Subscriber model • Central Publisher–Multiple Subscribers model • Central Subscriber–Multiple Publishers model • Multiple Publishers–Multiple Subscribers model

PUBLISHER/DISTRIBUTOR–SUBSCRIBER MODEL

CENTRAL PUBLISHER– MULTIPLE SUBSCRIBERS MODEL

Typically this model is used when you want the Subscribers to have read-only permission

CENTRAL PUBLISHER– MULTIPLE SUBSCRIBERS MODEL WITH REMOTE DISTRIBUTOR The replication process might be disrupted if there is only one remote Distributor server in the network and it fails.

CENTRAL SUBSCRIBER–MULTIPLE PUBLISHERS MODEL

Data warehousing is a typical example

MULTIPLE PUBLISHERS– MULTIPLE SUBSCRIBERS MODEL

INSTALLING SQL SERVER REPLICATION • Práctica

Lihat lebih banyak...

BASE DE DATOS DISTRIBUIDAS

Descrição do Produto

Comentários