Data parallel systems like MapReduce or
Stratosphere usually have low level language interfaces, demanding
the user to both know a complex programming language, and to be
experienced with the underlying data flows to some extend. Also, in
order to get well-performing plans, extensive knowledge of the data is
necessary. Compared to classical relational database systems, this
makes it considerably harder to use such systems.
We propose an additional layer on top of Stratosphere, which processes queries in an SQL-like language. It performs optimizations as known from classic database systems to define join ordering and operator selection. The implemented prototype allows convenient generation of complex PACT plans with dozends of operators in less than a second. Even with only little available statistics, the optimizer finds fairly good plans.