Real-time Data Warehousing

In Real-Time Data Warehousing (RTDM) typically a fast input stream of customer's sales transactions from Operational Data Sources (ODSs) needs to be joined with a disk-based Master Data (MD) before loading it to RTDM. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole MD.

In this project we present novel approaches for caching and load shedding in semi-stream one-to-many equijoins.

Caching: We build our front-stage caching component, that has the granularity of tuples, with a number of well known semi-stream join algorithms and study its effect in the performance.

Intelligent Load Shedding: We present a load shedding approach that sheds the tuples that are most expensive to process, thus increasing the service rate. We measure the service rate under load shedding and compare it with other related approaches.

Project team

  • Muhammad Asif Naeem