Building a modern data platform with Databricks on Google Cloud
24th March 2023
Ed Ball, Head of Data Management at Oakbrook, shares the highlights from a recent talk he made at Google's office in Central London
I recently travelled down to the Google Academy in London Victoria to speak to a group of data experts about how Oakbrook have built their new analytical data platform using Databricks on Google Cloud. The event also included speakers from three of our strategic suppliers - Databricks, Google and Fivetran - and an interactive lab to demonstrate how the technologies work together.
In my keynote, I spoke about how Oakbrook had challenges governing a complex data environment and how we've evolved, creating a modern scalable platform designed for big data processing and advanced analytics.
How we use data and analytics has always been a big part of our success - enabling us to move quickly and make better lending decisions. We use a variety of different data sources to gain new insights, develop our strategies and improve our models. That includes highly structured data like consumer credit files, through to semi-structured data like bank transactions from our Open Banking providers, and unstructured data such as audio files of call recordings. Not to mention the 9 million events published by our bespoke O6K software platform every day.
When we started assessing the options for creating our new platform there were several key criteria for us. We knew we wanted scalable compute separate from storage. We knew we wanted a platform which would natively support our data science and advanced analytics. We knew we wanted a flexible and open platform which could support different languages and integrate easily with different services. Obviously, we wanted it to be cost-effective and we knew we couldn’t compromise on security.
Our solution
Our solution uses Google Cloud Storage to host our Delta Lake, and Cloud Functions to take Pub/Sub events from O6K and land them into a Bronze layer ready for further processing. Our data engineers write Databricks notebooks in PySpark to take that raw data and produce clean Silver tables for our analysts. And, we have a Gold layer with more curated data being used for Power BI reporting dashboards and downstream applications. More recently we have introduced Fivetran to extract previously siloed data from operational systems such as Zendesk and Mailchimp and load it into our 'Lakehouse' to help our analysts gain even more insights.
As well as the projected cost savings, the combination of these technologies has enabled us to further embed advanced analytics into our core platform. We’ve been able to deliver changes in days which would have previously taken weeks or months. With only a small team of engineers in the last 12 months we’ve helped the business launch 2 new products, to implement a fully automated collections strategy and are now a few weeks away from enabling a new decision engine for customers wanting to borrow more.
Alongside all this, we have a platform which is faster, more reliable, and more secure. Our modern data platform will not only help to drive the next big innovations in our lending and collections strategies, but also provide a great experience for our data engineers and data scientists.
Oakbrook's story, and the evolution of our data strategy is still only just beginning. But by working closely with our key technology partners and connecting with the data community at events like this at the Google Academy, we’re pursuing our mission to "innovate together". And by continuing to develop our data and analytics platform, we’ll continue to work towards our purpose to "simplify and personalise borrowing".
If you're interested in a career at Oakbrook then check out our latest vacancies here.