Zusammenfassung |
Materialized views (MV) can significantly improve the query
performance of relational databases. In this paper, we con-
sider MVs to optimize complex scenarios where many het-
erogeneous nodes with different resource constraints (e.g.,
CPU, IO and network bandwidth) query and update nu-
merous tables on different nodes. Such problems are typical
for large enterprises, e.g., global retailers storing thousands
of relations on hundreds of nodes at different subsidiaries.
Choosing which views to materialize in a distributed, com-
plex scenario is NP-hard. Furthermore, the solution space
is huge, and the large number of input factors results in
non-monotonic cost models. This prohibits the straightfor-
ward use of brute-force algorithms, greedy approaches or
proposals from organic computing. For the same reason, all
solutions for choosing MVs we are aware of do not consider
either distributed settings or update costs.
In this paper we describe an algorithmic framework which
restricts the sets of considered MVs so that a genetic algo-
rithm can be applied. In order to let the genetic algorithm
converge quickly, we generate initial populations based on
knowledge on database tuning, and devise a selection func-
tion which restricts the solution space by taking the simi-
larity of MV configurations into account. We evaluate our
approach both with artificial settings and a real-world RFID
scenario from retail. For a small setting consisting of 24 ta-
bles distributed over 9 nodes, an exhaustive search needs
10 hours processing time. Our approach derives a compa-
rable set of MVs within 30 seconds. Our approach scales
well: Within 15 minutes it chooses a set of MVs for a real-
world scenario consisting of 1,000 relations, 400 hosts, and
a workload of 3,000 queries and updates. |