OLTP + OLAP = HTAP (Hybrid Transaction/Analytical Processing)

This will make EL on ETL or ELT Irrelevant

Faisal Malik Widya Prasetya
3 min readJun 24, 2022

Back story

Like most super hero, I will start from the back story. From the first day until now I’m being a data engineer, my job is always around ETL, ELT, Data Pipeline, moving data from sources to destination for analytical purposes. One must have source is an OLTP database. Usually a company doing their operations and process their data in OLTP database like MySQL, Postgres, or any traditional RDBMS. Which is make sense as it is ACID and having row store for optimized writing processes. As OLTP database optimized for writing, it’s not a good idea to perform analytics directly on the OLTP database, so they have to move the data to OLAP database which usually columnar store for optimized reading. Unfortunately, this can cause latency on the OLAP data availability. This has been a common practice for decades and a lot of improvements are made to handle the latency such as using CDC on stream processing so that the data will be available on OLAP database in near real-time. Of Course this cost a lot as we have to capture the data stream from all changes on OLTP database, transform it into a format that is acceptable by the OLAP database and then write it into OLAP database. Doing this on day to day basis during all my working experiences for 4 years makes me think, what if we combine OLTP and OLAP, but it’ll be hard as OLTP optimized for writing with its row store while OLAP optimized for reading with its columnar store.

Solution

I was thinking if we combine both it’ll be diagonal store which doesn’t sounds right, until GCP release their new service called AlloyDB. AlloyDB is PostgreSQL compatible database service that optimize both transactional and analytical by separating compute and storage and have columnar in-memory AI assisted data. So, it’s not diagonal store, instead it provides both row and columnar simultaneously so that it optimize both writing and reading into and from the database. After some google search, I also found earlier implementation of HTAP called TiDB (Ti stands for Titanium). TiDB is MySQL compatible, so I think it make sense for Google to make AlloyDB PostgreSQL compatible. The idea of TiDB is to make a coordinator which distribute the data coming in to two different type of engines. The Coprocessor and MPP Engine. Coprocessor stores the data in row while MPP Engine store in column. With this implementation, TiDB can perform OLTP on the coprocessor and OLAP on MPP Engine.

With HTAP becomes real, does it mean Data Engineer will no longer needed?

Back again to my day to day job as a data engineer. Without ETL or ELT necessary, I think Data Engineer job will be more Data Modelling for analytics. Even though, this HTAP solution doesn’t remove the necessity to perform ETL or ELT for external data used by the company, this external data issues might be tackled by Airbyte. Now ETL is somehow gone from my day to day job, I think reverse ETL might be tackled shortly by Airbyte as they’re just acquire grouparoo, a reverse ETL open source company. So,

  • ETL or ELT from OLTP to OLAP is tackled by HTAP
  • ETL or ELT from external sources is tackled by Airbyte
  • Reverse ETL is tackled by Grouparoo

Data modelling and data warehousing maybe the only thing left for data engineer to do, but DataOps or Data Platform Engineer might be a good thing to consider as above solutions are still need to be deployed and we also need to architect the whole process, build the infrastructure and running the operations with monitoring. Those solutions are tools for us to focus more on the things that are more critical for us to focus on.

Takeaways

  • If your company currently using MySQL as the main database, just migrate to TiDB. It will take a while but you don’t need to change a lot of database connection as TiDB compatible with MySQL.
  • If your company currently using PostgreSQL as the main database, just migrate to AlloyDB with the same reason as above.
  • If you’re a data engineer, try to learn Data Modelling (Star, Snowflake, Activity Schema, etc.) DataOps and Data Platform Engineering. As ETL, ELT and Reverse ETL seems to be replaced by standardized solutions.
  • Other options for data engineer might be learning internal mechanism of Airbyte and try to contribute to the code base. It might useful if you want to deploy or customize the tool to suit the business use cases.

--

--

Faisal Malik Widya Prasetya

Data Engineer with experiences on various startups and consulting.