How TrueLayer Uses Datalore for Secure Collaboration

https://www.jetbrains.com/company/customers/experience/truelayer/

TrueLayer is a global open banking platform that makes it easy for anyone to build better financial experiences. Businesses of every size, from startups to large enterprises, use TrueLayer to power their payments, access financial insights, and onboard customers across the UK and Europe. Founded in 2016, TrueLayer is trusted by millions of consumers and businesses around the world. Their vision is to create a financial system that works for everyone.

About TrueLayer

Could you please introduce yourself?

I’m Moreno Raimondo Vendra, Senior Machine Learning engineer at TrueLayer. Our ML team supports other teams in our organization who have data-intensive needs. We help them handle large volumes of data, produce data insights, and create machine learning models out of that data. We mostly contribute to the core TrueLayer product use cases, but sometimes our work also includes research projects.

What kinds of projects is TrueLayer involved in?

TrueLayer is a FinTech company and an open banking provider, so we primarily work with financial data. We allow our customers to access open banking data, ensuring GDPR compliance. One of the projects my team is part of is enriching user transactions with additional merchant information.

Problems to solve

What made you look for Datalore or alternative solutions? What challenges did you face?

Working with financial data is not a trivial task, as you can’t just access a production database or a data lake, download the data, and work on it. You have to ensure secure access to the data and produce insights that are easy to share as well.

In the past, we had a standalone AWS EC2 machine, which was hard to log into because of multiple VPNs and temporary personal credentials that would often expire. We couldn’t easily upgrade the size of the instance to work with a larger volume of data. And of course onboarding was a pain for new team members.

“Datalore enabled our team to ergonomically access our data while meeting the security requirements, which was a game changer for us. As a result, we could collaborate much more easily both within our Machine Learning team and with our stakeholders.”

— Moreno Raimondo Vendra, Senior Machine Learning engineer, TrueLayer

The Datalore experience

Who uses Datalore in your team?

We use Datalore heavily in our Machine Learning team of three, and we also have two more stakeholders in the company.

What kind of data do you work with?

The data we work with is usually produced in operational databases, but we then store part of that data in our data lake on AWS S3. The main type of data that we work with is depersonalized metadata on open banking transactions. We usually access it through the Python client for S3.

We also work with data that is produced by our own services, such as logs and metrics. With Datalore we were able to debug complex issues that required retrieving hundreds of gigabytes of data, as well as to identify patterns, visualize data, and share our insights.

How do you explore data in Datalore?

We mostly use pandas and we frequently use the Visualize tab, which is really intuitive to use. It makes exploring data much quicker and a much better experience.

It is also something we work on collaboratively. Someone could pull the data and share the notebook to edit it with the team together, and then someone will pick it up and continue working later on. We always try to make every notebook a report. After the analysis is done we always add a conclusion and apply storytelling practices to make it a meaningful piece of work.

Datalore allows us to do this data storytelling very well, since we have one place where we pull the data, do complex manipulations with Python (we can dig as deep as we want), create visualizations, and export the results into a format which is friendly for business consumers. We can do it in one place, without having to interact with multiple tools. We can produce PDF and Static reports and even schedule them to run on a regular basis and keep track of the changing features and metrics over time. Being able to access the history of these runs was extremely useful for us.

“Data exploration and reports made a very compelling use case for us. But we also use Datalore in areas like model prototyping and training, where we found that having easy data access enabled us to experiment faster.”

Now that we can organize notebooks in workspaces, it is easy to keep track of what each team member has been working on for specific projects and topics. This problem has already been solved for software engineering, because there is Github, Gitlab and other git-like platforms. But for data science and notebook collaboration, it is not trivial for organizations.

“Datalore has made collaboration a lot easier and we now have a place to keep all of that valuable work together and organized.”

Could you give an example of how your team collaborates?

At TrueLayer, we are a team of ML engineers and our most common practice is to get together on a notebook and do pair programming.

We also use the PyCharm and Code With Me plugins for code development. We love that the interface and experience of editing code in real time is similar across different tools. For example, we were running a training script for a ML model and we were able to collaborate on a Python script in Attached files in real time. It allowed us to be in the code together, rather than a video chat, which made spotting and fixing issues easier and faster. Having a place to organize work in workspaces while keeping track of the history improved the teams’ productivity.

How do you combine PyCharm, Code With Me, and Datalore in your projects?

We have model servers deployed on our clusters, which are essentially Python APIs. We will usually have a training notebook in Datalore, train the model, produce the model artifact (an archive), and then deploy it to our cluster. We then use PyCharm and Code With Me to develop the model server APIs. Having familiar UIs across the various JetBrains tools has made this process very convenient for the team.

What’s next?

Recently, there has been a lot of interest from our software and data engineers, who are data savvy and want to access their data products in a much easier way. At this point, a lot of engineers know what a Jupyter notebook is, but being able to easily provide data connections through Datalore would really help lower the barrier of entry for software engineers.

Contacts

truelayer.com

Try Datalore