Parallel Maintenance of Materialized Views on Personal Computer Clusters
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Parallel Maintenance of Materialized Views on Personal Computer Clusters Weifa Liang Jeffrey X. Yu Department of Computer Science Dept. Systems Eng. and Eng. Management Australian National University Chinese University of Hong Kong Canberra, ACT 0200, Australia Shatin, N.T., Hong Kong email: [email protected] email: [email protected] ABSTRACT amount of historical, consolidated data. To respond to user A data warehouse is a repository of integrated information queries quickly, it is inevitable to introduce parallelism to that collects and maintains a large amount of data from speed up the data processing in data warehousing, due to multiple distributed, autonomous and possibly heteroge- that the analysis of such large volume of data is painstak- neous data sources. Often the data is stored in the form ing and time consuming. Thus, parallel database engines is of materialized views in order to provide fast access to the essential for large scale data warehouses. With the popu- integrated data. How to maintain the warehouse data com- larity and cost-effectiveness brought by the Personal Com- pletely consistent with the remote source data is a chal- puter (PC) cluster, it becomes one of the most promising lenging issue, and transactions containing multiple updates platforms for data intensive applications such as for large at one or multiple sources further complicate this consis- scale data warehousing. tency issue. Due to the fact that a data warehouse usu- Many incremental maintenance algorithms for mate- ally contains a very large amount of data and its process- rialized views have been introduced for centralized data- ing is time consuming, it becomes inevitable to introduce base systems [2, 6, 7, 4]. A number of similar studies parallelism to data warehousing. The popularity and cost- have also been conducted in distributed resource environ- effective parallelism brought by the PC cluster makes it be- ments [3, 8, 15]. These previous works formed a spec- come a promising platform for such purpose. trum of solutions ranging from a fully virtual approach at In this paper the complete consistency maintenance of one end where no data is materialized and all user queries select-project-join (SPJ) materialized views is considered. are answered by interrogating the source data [8], to a full Based on a PC cluster consisting of personal comput- replication at the other end where the whole databases at ers, several parallel maintenance algorithms for the mate- the sources are copied to the warehouse so that the view rialized views are presented. The key behind the proposed maintenance can be handled in the warehouse locally [5, 8]. algorithms is how to tradeoff the work load among the PCs The two extreme solutions are inefficient in terms of com- and how to balance the communications cost among the munication and query response time in the former case, PCs as well between the PC cluster and remote sources. and storage space in the latter case. More efficient solu- tion is to materialize the relevant subsets of source data in KEY WORDS the warehouse (usually the query answer). Thus, only the Materialized view incremental maintenance, data ware- relevant source updates are propagated to the warehouse, housing, partitioning, parallel algorithms, PC cluster and the warehouse refreshes the materialized data incre- mentally against the updates [9, 10]. However, in a dis- tributed source environment, this approach may necessitate 1 Introduction the warehouse contacting the sources many rounds for ad- ditional information to ensure the correctness of the update A data warehouse mainly consists of materialized views, result [15, 3, 1, 14]. which can be used as an integrated and uniform basis for decision-making support, data mining, data analysis, and To keep a materialized view in a data warehouse at ad-hoc querying across the source data. The maintenance a certain level of consistency with its remote source data, problem of materialized views has been received increas- extensively studies have been conducted in the past. To ing attention in the past few years due to its application to the best of our knowledge, all those previously known al- data warehousing. The view maintenance aims to main- gorithms are sequential algorithms. In this paper we focus tain the content of a materialized view at a certain level of on devising parallel algorithms for materialized view main- consistency with the remote source data, in addition to re- tenance in a PC cluster. Specifically, the complete consis- freshing the content of the view as fast as possible when tency maintenance of select-project-join (SPJ) materialized an update commits at one of the sources. It is well known views is considered. Three parallel maintenance algorithms that the data stored in data warehouses is usually very large for materialized views on a PC cluster are presented. The
simple algorithm delivers a solution for complete consis- tency maintenance of a materialized view without using lowing [15, 1], the update logs of the sources (relations) in the definition of are sent to the data warehouse and any auxiliary view. To improve the maintenance time of materialized views, the other two algorithms using auxil- by stored at an update message queue (UMQ) for , denoted . iary views are proposed. One is the equal partition-based algorithm, and another is the frequency partition-based al- View consistency. Assume that there are material- ized views in the warehouse and remote data sources. A gorithm. They improve the view maintenance time dra- matically compared with the simple algorithm, at the ex- pense of extra warehouse space to accommodate the auxil- warehouse state represents the content of the data ware- house at that moment, which is a vector of components and each component is the content of a materialized view iary data. The key of devising these algorithms is to explore at that moment. The warehouse state changes whenever the shared data, to tradeoff the work load among the PCs, and to balance the communications overheads among the state one of the materialized views in it is updated. A source represents the content of sources at a given time #" !$ PCs and between the PC cluster and the remote sources in moment. A source state is a vector of components, a parallel computational platform. The rest of the paper is organized as follows. Sec- ! where each component represents the state of a source at that given time point. The th component, ! of a source &%(')+*,'.-&-.-/'0 1 tion 2 introduces the computational model and four levels state represents the content of source at that moment. of consistency definition of materialized views. Section 3 Let be the warehouse state &%2'3 +* '&-.-&-.'3 5 4 67 presents a simple, complete consistency maintenance algo- sequence after a series of source update states rithm without the use of any auxiliary views. Section 4 de- 9 8 5 99 8:4 . Consider a view derived from ; vises a complete consistency algorithm based on the equal sources. Let be the content of at warehouse partitioning of sources in order to improve the view main- state , be the content of over the source @ABDC CEF> tenance time. Section 5 presents another complete consis- state , and be the final source state, , tency algorithm based on the update frequency partitioning , and . Furthermore, assume that source 567 % HG?5I % of sources, after taking into account both the source up- updates are executed in a serializable fashion across the date frequency and the aggregate space needed for auxil- sources, and is initially synchronized with the source iary views. Section 6 concludes the paper. data, i.e., . The following four levels of consistency between the materialized view and its 2 Preliminaries remote sources has been defined in [15]. 5I 94& 6 1 JG +1 1. Convergence. For all finite executions, Computational model. A Personal Computer (PC) clus- , where is the final warehouse state. That is, ter consists of ( ) PCs, interconnected through a the content of is eventually consistent with the source high-speed network locally. Each PC in the cluster has its data after the last update and all activities are ceased. own main memory and disk. No shared memory among the 7 9 8 2. Weak consistency. Convergence holds, and for every L 698:AGK5 PCs in the cluster exists. The communications among the warehouse state , there exists a source state such PCs are implemented through message passing mode. This that M G N * ')NPO,'&-.-.-&')NRQ . Furthermore, for each source , there exists a serial schedule L parallel computational model is also called shared-nothing MIMD model. In this paper the defined PC cluster will serve as the of transactions such that there is a locally serializable schedule at source achieving that state, SUTU> . platform for a data warehouse, while a data warehouse consists of the materialized views mainly, the materialized M 3. Strong consistency. Convergence holds, and there exists a serial schedule and a mapping from warehouse V views therefore are stored on the disks of PCs. For con- M states to source states with the following properties: (i) 8 V W 6 8 X Y G venience, we here only consider relational views. It is Serial schedule is equivalent to the actual execution of VW698]^\UV_Z 5Q Z98: G[\ 5 9X8 \J Q well known that there are several ways to store a materi- transactions at the source. (ii) For every , alized view in an MIMD machine. One popular way is that for some and . (iii) If , then the materialized view is partitioned horizontally (vertically) where is a precedence relation. into disjoint fragments, and each of the fragments is 4. Completeness. The view in the warehouse is strong M 7 8 VW6 8 UG stored into one of the PCs. However, in this paper we do consistency with the source data, and for every not intend to fragment the view and distribute its fragments defined by the serial schedule , there is a such to all PCs, rather, we assume that a materialized view is that . That is, there is a complete order stored in the disk of a PC entirely. The reason behind this preserving mapping between the warehouse and source ! M*+'0M O '&-.-.-.'0 M7` aMG 8 is that the content of a materialized view is consolidated, states. integrated data, which will be used for answering users’ Maintenance of materialized views. Let be a SPJ- query for decision making purpose, and this data is totally type view derived from relations and different from the data in operational databases. Without bdc7egf ZM*ihjUM O hIj[ -.-.-khIjUM`g l is located at a remote source , which is defined as , loss of generality, let be a materialized view located in is called the home of , . Note that a PC usually contains multiple materialized views. Fol- M 8 - mon2M - p M 8 - mkn#q m , where is the set of projection attributes, is the selection condition which is the conjunction of clauses like or ,
8 and '' pG '.' 'G are the attributes of and , and is constant, q M M @!S n# respectively, . Updates 3** O different PCs. If there is a source update time and another source update to XO g * O * )O to at time at to source data are assumed to be either tuples’ inserts or deletes. A modify operation is treated as a delete followed by an insert. All views in the warehouse are based on the with O * . To respond to the updates, the two home PCs of the two views perform the maintenance to and concurrently. Assume that the update to R O* bag semantics which means there is a count field for each tuple in a table, and the value of the count may be positive finishes before * sistency definition, does. Following the complete con- should be updated before O . and zero. To keep at a certain level of consistency with its re- mote source data, several sequential algorithms have been Thus, this maintenance algorithm does not keep the mate- rialized views in the data warehouse completely consistent with their remote source data. proposed [15, 1, 14]. In this paper we dedicate ourselves To overcome the work load imbalance and to keep to develop parallel maintenance algorithms in a distrib- all materialized views completely consistent with their re- uted data warehouse environment where the data ware- mote source data, a timestamp is assigned to the source house platform is a PC cluster of PCs on which we focus update when the PC cluster receives a source update, and on the complete consistency maintenance of SPJ material- the source update is sent to the UMQs of those materialized ized views. For the sake of completeness, here we briefly views in which the source has been used in their definitions. restate the SWEEP algorithm [1] which will be used later. The materialized views in the data warehouse are then up- The SWEEP algorithm is chosen because it is the best al- dated sequentially by the order of timestamps assigned to gorithm for complete consistency maintenance so far. It is them. If several materialized views sharing an update from also the optimal one [12]. a common source, then the update sequence of these ma- , The SWEEP algorithm consists of two steps for the terialized views is determined by their topological order in maintenance of a SPJ materialized view . In step one, a DAG, assuming that the dependence relationships among +M it evaluates the update change to due to a current the materialized views forms a DAG. Now we are ready to X source update . While any further source updates may give the detailed algorithm. occur during the current update evaluation, to remove the M 8 +M8 +M78 Given a materialized view with as its home, I effects of these later updates on the current update result, , has been used to offset those effects. In step two, I by the assumption there is an update message queue associated with at . Let ( may +M78 M78 8 I)8 I the update result is merged with the content of and be either a set of insert updates or a set of delete up- is updated. It is easy to see that step one is the dominant dates ) be a source update log in . Denote step which queries remote sources and performs the evalua- by , a partial queue of with the head I +M 8 +M 8 I tion. While the data manipulated in this step are the content , i.e., is such a queue that all front of up- , of and the remote source data, it is totally inde- dates before have been removed from and pendent of the content of . Step two is a minimum cost becomes the head of the resulting queue. The proposed I +M 8 I step which merges to in the data warehouse locally. parallel algorithm proceeds as follows. For each source update, , of the first updates in 3 A Simple Parallel Algorithm the queue , it is assigned to one of the PCs in parallel (if the total number of updates in is less In this section we introduce a simple maintenance algo- than , then each update is assigned to one of the PCs I 8 08 rithm for materialized views distributed on a PC cluster. randomly, in the end some PCs are idle), so is . First of all we introduce the following naive algorithm. +M 8 will be used to offset the effect of later updates 5 Let be a materialized view with home at . to the current update result derived from . Each PC will take care of the maintenance of and keep the update then evaluates the view update to respond the source up- message queue for . The sequential mainte- date assigned to it. During the view update evaluation, I+M 8 ! SJ! I nance algorithm SWEEP will be run on for the main- once a source update related to is received by the data tenance of . The performance of this naive algorithm warehouse, the source update will be sent to and 8 8 reaches the optimal system performance if the material- for all , . ized views in the data warehouse assigned to each PC have Let be a source update in assigned to , 8 equal aggregate update frequencies. Otherwise, if there are . is responsible to evaluate the view update , materialized views at some PCs which have much higher to , using the sequential algorithm SWEEP. After the eval- update frequencies than the others, then, the PCs hosting these materialized views will become very busy while the uation is finished, sends the result of . When the home PC of receives an update result, to the home other PCs may be idle during the whole maintenance pe- riod. Thus, the entire system performance will be deteri- source update at the head of 5 it first checks whether the update result is derived from the . If yes, it merges orated due to the work load heavily imbalance among the PCs. Above the all, this algorithm is not completely con- the result with the content of update from the head of and removes the source . Otherwise, it waits materialized views and * gO sistent, illustrated by the following example. Consider two which are located at two in front of the current update in I until all the update results derived from the source updates have been re-
FGA8 * , 8 ceived and merged, and then merges the current result with 4 Equal Partition-Based Maintenance , the content of . As results we have . Given a materialized view , assume that the time used for +M 8 Lemma 1 The simple maintenance algorithm is com- the update evaluation is , in response to a single source pletely consistent. update . For each update there is no difference in terms Proof Consider an update +M 8 I +M 8 in which can be of its update evaluation time between running on the PC cluster and a single CPU machine, i.e., the sequential and I I +M 8 further distinct by the following two cases: (i) is the parallel algorithms will visit the other sources except head of ; (ii) is one of the first updates in the update source one by one in order to get the final update X X 8 . result. The time spent for the view maintenance is thus date +M 8 Let us consider case (i). Assume that the source up- I X is assigned to , then , which is linear to the number of accesses to remote sources. In the following an approach aiming to reduce the number of such in this case, is also assigned to tial assumption. +M 8 will evaluate the view update , by the ini- to accesses is proposed. , 8 , due to the update , using the SWEEP algorithm. Note , that to evaluate , the data needed is only related to the 4.1 Equal partition-based algorithm source data, , and the partial result of so far. , Initially, the partial result of is empty. In other words, This approach is introduced to improve the view mainte- the evaluation of is independent of the content of . nance time using auxiliary views. The basic idea behind Once the evaluation is done, the result is sent back to the it is first to derive several auxiliary views from the defi- , +M8 home of the materialized view . In this case the result will nition of a view, and each auxiliary view is derived from be merged to the content of immediately due to that a subset of sources. The auxiliary views are materialized is the head of . Thus, the content of af- at the warehouse too. The view then is re-defined equiv- ter the merge is completely consistent with the source data, alently, using the auxiliary views instead of the base re- because it’s behavior is exactly as the same as the SWEEP lations. Thus, the view update evaluation is implemented +M 8 through evaluating its auxiliary views, which takes less algorithm. time. The detailed explanation of the approach is as fol- We now deal with case (ii). Assume that is as- I 8 G signed to , so is the partial update message queue lows [14]. , . Following the argument in case (i), now Let be a materialized view derived from relations. +M 8 5 8 is responsible to the evaluation of due to the update The source relations is partitioned into dis- data, , while this evaluation can be done using the source and the partial update result of , so the last group containing joint groups, and each group consists of relations except 6 relations. With- +M 8 far. Once the evaluation is done, the result is sent back out loss of generality, assume that the first relations form G b c e f M * hIjEMOhIj -.-&-ohjEM ` I to the home PC of . If now becomes the head of group one, the second relations form group two, and the , it can be merged with the current content of last relations form group . Following the SUT B m Q T , and the merged result is completely consistent with the definition of , an source data, which follows the SWEEP algorithm. Other- auxiliary view for each group is defined as follows, +M7 8 I , m QG b c Q e f Q ZM.Q * hIj7M&QROhj -.-&-2hIj7M Q * wise, if the view update results due to the source updates in front of in have not been merged with the , +M8 (1) lJl T content of , then, is still in some old state, to main- tain complete consistency of , derived by cannot where is an attribute set in which an attribute is ei- IT M&Q * '.-&-. -.'3M Q * be merged to until it becomes the head of . Therefore, the lemma follows. ther in or in such a clause of that the attribute comes from the relations in and is M .Q * M Q * The advantage of the proposed algorithm keeps the a maximal subset of clauses of in which the attributes lJIT T work load of all PCs evenly because at a given time in- of each clause come only from to . Note terval, each PC deals with a source update of a given a I 8 M .Q * '&-.-&-/'0M Q * that the attributes in only come from rela- * materialized view . However, a partial copy tions in only. The last group , of is needed to be distributed to all the PCs, can be defined similarly. Thus, can then be rewrit- FG bdc7egf mg%ohj m7 * hIjS-.-&-2hIj7m7 O hIj7m7 */ therefore, the extra space is needed to accommodate these ten equivalently in terms of the auxiliary views defined, queues. Compared with its sequential counterpart, the . speed-up obtained by this simple parallel algorithm is al- most in an ideal case where every PC is busy for evalu- ating a source update and the communications cost among 4.2 Parallel algorithm the PCs is ineligible because only the incremental update results are sent back to the home PC of the materialized view , while the data transfer from remote sites and the query evaluation at remote sites take much longer time. In the following we show how to implement the equal partition-based algorithm in a cluster of PCs by proposing a parallel maintenance algorithm.
B ! Q Q O m7 8 Given a SPJ-type view , assume that its auxil- passes the partial result and the token to , ,Q iary views have been derived, . The and so on. Finally receives the partial update result maintenance of is implemented through the maintenance of its auxiliary views. Let be the home of . Ini- which is sent. actually, and the token from which it is initially +M78 sends the result back to the home PC of . The 8 tially, the m7 8 auxiliary views are assigned to the m7d8 PCs in Q +m7 home PC of now proceeds the merge with the content of 7m the cluster. Assume that auxiliary view is assigned to , and removes from the head of . At the . Then, 7m is materialized at that PC too, same time, it informs to merge with the content TGDZ! , where is a given random number before the assignment. of . Obviously, the current content of is completely mX 8 m 8 U . Following the initial assumption, Q Let consistent with the source data because all data in , there is an update message queue for at +M 8 +M 8 is at the state where the warehouse starts to Q in addition to for at . During the deal with the view update evaluation due to and is I, the home PC of update evaluation, once a new source update is added to sends the update to the head in . Compared with the simple maintenance algorithm, been used in the definition of . m+M8 8 immediately if the update comes from a source which has the equal partition-based parallel algorithm has reduced the size of the partial update message queue of 5m M78 Q m 8 mm 8 8 Consider a source update which is the head ele- at other PCs except the home of the view dramatically. +m7 ment in . Assume that has been used in the In this case the home PC of an auxiliary view only m7 definition of +M 8 Q which is located in . Then, to re- m 8 holds the update message queue of , while m 8 spond to the update , the view update evaluation contains only the source update logs of the +m7 Q to will be carried out at by applying the sequen- relations used in the definition of , rather than the rela- tial algorithm SWEEP. Once the evaluation is finished, the 7 m tions used in the definition of . Meanwhile, to obtain the result is not merged to the content of at im- view update evaluation result, the number of accesses to the Q * mediately, in order to keep completely consistent with remote sites is reduced to instead of , therefore, it the remote source data. But the result can be passed to reduces the view maintenance time, thereby improving the m7 which then performs the join with another QRO auxiliary view of in it, and then it passes the joined system performance ultimately. It must be mentioned this Q is obtained at the expenses of more space for accommo- result to its next neighboring , and so on. dating auxiliary views and extra time used for maintaining Q This procedure continues until the initial sender re- , auxiliary views. Q ceives the joined result which is the final result actu- ally, the result with the content of . At the same time, merges the partial result sends the result to the home PC of and merges +m with the content of . By m7 5 Frequency Partition-Based Maintenance ,FG b c e f m % hjS-&-.-,hIj +m hIjS-.-&-2hIj7m7 * Eq. (2), the correctness of the proposed algorithm follows. The performance of the equal partition-based algorithm is deteriorated when the aggregate update frequencies of (2) some auxiliary views are extremely high. As a result, work loads of the home PCs of these auxiliary views will be Lemma 2 The equal partition-based maintenance algo- heavier while the work loads of other PCs will be lighter rithm is completely consistent. m7, +M8 +M 7Q 8 during the view maintenance period, because the home PC of a materialized (auxiliary) view is also responsible to han- Proof Consider a source update . Assume that dle the update result merging with its content in addition to +m Q is used in the definition of which is assigned to . handling the update evaluation for the auxiliary view on The view update evaluation proceeds as follows. it, like any other PCs. In this section we assume that not +m7 m7 The view update is first evaluated by . To every source has identical update frequency. To balance maintain the view completely consistent with the source the work load among the PCs in the cluster, it requires that data, the result is not merged with the content of +M 8 each of the auxiliary views of have equal update fre- m7 +m immediately because the view update evaluations from the quencies aggregately, while finding such auxiliary views m other source updates after in may use the derived from the definition of has generally been shown content of for their evaluations. Note that to be NP-hard. Instead, two approximate solutions have is completely consistent with the source data, which is , been given, which are based on the minimum spanning tree Q guaranteed by the SWEEP algorithm. +M78 +m and edge-contraction approaches [11]. Here we will use ,, Q We now proceed the view update evaluation due to +m +m Q one of the algorithms for finding auxiliary views. . Having obtained , suppose that also holds Q * m a token for . Following Eq. (2), to evaluate , sends its result which is a partial result of with 5.1 Frequency partition-based algorithm containing . When an the token to C+8 `8 * C 8 G M 8 = !S +m hIjJm7 , auxiliary view receives the token and the partial result, it Let be the update frequency of source , performs a merge operation to produce a new partial update and . Given a SPJ view and an integer result of . Once the merge is done, it , the problem is to find auxiliary views such that (i)
T C_ C_ 8 the total space of the auxiliary views is minimized; and cluster. The proposed algorithms guarantee the content of (ii) the absolute difference is a materialized view completely consistent with its remote ! GJ minimized for any two groups of relations 8P % * !8JDG E M *+'0M O '.-&-.-&'0M7` and with source data. The key to devise these algorithms is to 8 G ! G , i.e., the sum of the source update frequencies in each explore the shared data and tradeoff the work load among group is roughly equal, , the PCs and to balance the communication overhead , and . Clearly, the between the PC cluster and the remote sources and among problem is an optimization problem with two objectives to the PCs in a parallel computational environment. be met simultaneously. The first objective is to minimize the extra warehouse space to accommodate the auxiliary Acknowledgment: The work was partially supported views. The second objective is to balance the sources’ up- by both a small grant (F00025) from Australian Re- date load. This optimization problem is NP-hard, instead, search Council and the Research Grants of Council of the a feasible solution for it is given below. An undirected weighted graph GD ' ') is * '0 O Hong Kong Special Administrative Region (Project No. CUHK4198/00E). * constructed, where each relation used in the definition of is a vertex in . Associated with each vertex , the References weight is the update frequency of the corresponding relation. There is an edge between and [1] D. Agrawal et al. Efficient view maintenance at data warehouses. Proc. of ACM-SIGMOD Conf., 1997, if and only if there is a conditional clause in containing O ' 417–427. the attributes from the two relational tables and only, [2] J.A. Blakeley et al. Efficiently updating materialized and a weight associated with the edge is the size views. Proc. of ACM-SIGMOD Conf., 1986, 61–71. of the resulting table after joining the two tables, where [3] L. Colby et al. Algorithms for deferred view main- ''0k*,') O is the selection condition in the definition of . Having , an MST-based approximation algorithm for the problem is presented as follows [11]. tenance. Proc. of ACM-SIGMOD Conf., 1996, 469– 480. S' ' '0o*9') O' [4] T. Griffin and L. Libkin. Incremental maintenance of o* O views with duplicates. Proc. of ACM-SIGMOD Conf., Appro Partition( ) NN ' ') * 1995, 328–339. /* and are the weight functions of vertices and edges */ [5] A. Gupta et al. Data integration using self- 1. Find a minimum spanning tree from ; maintainable views. Proc. 4th Int’l Conf. on Extend- 2. Find a max-min partition of by an algorithm in [13]. ing Database Technology, 1996, 140–146. 3. The vertices in each subtree form a group, and a vertex [6] A. Gupta and I. Mumick. Maintenance of material- partition of is obtained. ized views: problems, techniques, and applications. The -vertex partition in is obtained by running algo- IEEE Data Engineering Bulletin, 18(2), 1995, 3–18. [7] A. Gupta et al. Maintaining views incrementally. 8 ! rithm Appro Partition. auxiliary views can then be Proc. of ACM-SIGMOD Conf., 1993, 157–166. derived by the definition of , and each is derived from a [8] R. Hull and G. Zhou. Towards the study of perfor- group of relations , . Note that each auxiliary mance trade-offs between materialized and virtual in- view obtained has an equal update frequency aggregately. tegrated views. Proc. of Workshop on Materialized Views: Tech.& Appl.,1996, 91–102. 5.2 Parallel algorithm [9] N. Huyn. Efficient view self-maintenance. Proc. of the 23rd VLDB Conf., Athens, Greece, 1997, 26–35. For a given SPJ-type view , assume that the auxil- [10] W. Liang et al. Making multiple views self- iary views above defined have been found by applying the maintainable in a data warehouse. Data and Knowl- Appro Partition algorithm. We then assign each of edge Engineering, 30(2), 1999, 121–134. [11] W. Liang et al. Maintaining materialized views for the auxiliary views to one of the PCs in the cluster. data warehouses with the multiple remote source en- The remaining processing is exactly the same as that in vironments. Proc of 1st Int’l Conf. on WAIM, LNCS, the equal partition-based maintenance algorithm, omitted. Vol. 1846, 299–310, 2000. Therefore, we have the following lemma. [12] W. Liang and J. X. Yu. Revisit on view maintenance Lemma 3 The frequency partition-based maintenance al- in data warehouses. Proc of 2nd Int’l Conf. on WAIM, gorithm is completely consistent. LNCS, Vol. 2118, 203–211, 2001. [13] Y. Perl and S. R. Schach. Max-min tree partitioning. Proof The proof is similar to Lemma 2, omitted. J. ACM, 28(1), 1981, 5–15. [14] H. Wang et al. Efficient refreshment of material- ized views with multiple sources. Proc. of 8th ACM- CIKM, 1999, 375–382. 6 Conclusions [15] Y. Zhuge et al. View maintenance in a warehousing environment. Proc. of ACM-SIGMOD Conf., 1995, In this paper several parallel algorithms for materialized 316–327. view maintenance have been proposed, based on a PC
You can also read