In this work, we focus on the development of big data systems that reacts to external events, and that changes its internal state accordingly. The main motivation for this work relies on the fact that the development of these systems is commonly a difficult task with current database technologies and programming languages.
First, actual software storage technologies for big data follow in most of cases a pull-based approach to gather new information. This technology implies a waste of resources pulling the data storage even when there is no new data available. Other approaches like cloud store services (e.g. Firebase) and NoSQL databases (e.g. RethinkDB) follow a push-based model. However, cloud store services usually provide limited query capabilities than traditional databases. Database solutions like RethinkDB provides reactive capabilities called change feeds but they are limited for a single table and/or a single document. This means that complex reactive queries involving joins are not supported. Second, these systems need to perpetually keep track in real-time of the status of one or more external agents. However, the status and the control-flow of these applications is no longer determined by the internal structure of the code. Instead, the control-flow is driven by events and values produced by external agents (e.g. a user, values produced by IO actions or by other programs hosted on other machines, etc.). Dealing with this problem using conventional programming languages (e.g. Java, C#, Python, etc.) and push-based storage approaches leads to code that consists of an overwhelming myriad of callback functions that will be invoked whenever a change occurs. To sum up, the applications based on these technologies suffer of large amount of boilerplate code and accidental complexity. The main contributions of our work are:
An extension of a NoSQL store that uses a rule engine to obtain a reactive database behaviour. Our aim is to provide a push-based database system that notifies applications (client layer) about new changes in the database. Our reactive database includes three main components: - a NoSQL store based on Riak KV. - a Rule engine to support complex reactive queries (e.g. involving joins). The queries are represented by directed acyclic graphs, and they share intermediate results to obtain better performance. In other words, our reactive database does not need to run the reactive queries over full database every time new data arrives. This feature fits very well to applications that require to handle large amount of data. Our rule engine uses a RETE-based pattern matching algorithm to select the rules (queries) to fire. The engine is implemented using an actor-based approach to improve its performance, and it was integrated as a dependency library of Riak RV. - a Notification Manager to notify the client-side about the new changes in the database.
A composable query language to provide reactive abstractions for the database layer. Our reactive query abstractions follow the notions of monads adopted by LinQ to make them familiar to developers. A Graph Dependency Manager will handle the notifications of changes in the database and re-compute the part of the application affected. The queries abstractions will be seamlessly integrated into a reactive language. This integration pursues the following goals: - a Tierless model that unifies the three tiers (client, server, database) of the reactive application in one. This model will allow programmers to use the same programming language for all the tiers. This feature eliminates the impedance mismatch between client and server side code by providing a common type definition for the data they exchange. - a Stackless model that provides a single data abstraction as bridge between the front-end and back-end of a reactive system. The goal is to strive for a flat technology stack in order to avoid issues such as “object-relational impedance mismatch” or “object-event impedance mismatch”.