Problem
A financial institution faced the challenge of automating data quality control and enhancing their internal Feature Store. Previously, the process was manual, involving SQL code blocks and significant team involvement to confirm processes before calculating internal tables. There was no visual dashboard for monitoring results. The goal was to automate the entire process using GCP services.
Solution
Closer created a DataOps pipeline using GCP services like Cloud Source Repo for code versioning, Vertex AI for CI/CD pipelines, and BQ Routines for stored procedures. It was implemented dry-run SQL syntax checks, managed pipelines with Cloud Build, and automated processes with Cloud Schedulers. This setup ensured seamless integration between GCP services, including BQ , Cloud Functions, and Pub/Sub.
Results
The automation enhanced monthly data quality control with efficient email alerts for issues like duplicates, nulls, and count deviations. Internal SuperTables were automatically built, efficiently feeding the Feature Store. Real-time process visualizations in Looker provided status updates on tables and control columns, resulting in better data quality for models and increased overall efficiency.