We are looking for a Lead Data Engineer to design, develop, and optimize our data infrastructure on Google Cloud Platform (GCP). You will architect scalable pipelines using BigQuery, Google Cloud Storage, Apache Airflow, dbt, Dataflow, and Pub/Sub, ensuring high availability and performance across our ETL/ELT processes. Additionally, you will leverage RabbitMQ for event-driven communication, Terraform for Infrastructure as Code (IaC), and Great Expectations to enforce data quality standards. The role also involves building our Data Mart (Data Mach) environment, containerizing services with Docker and Kubernetes (K8s), and implementing CI/CD best practices.
A successful candidate has extensive knowledge of cloud-native data solutions, strong proficiency with ETL/ELT frameworks (including dbt), and a passion for building robust, cost-effective pipelines. You will lead a team of data engineers, mentor junior members, and define best practices for data engineering across the organization—ensuring all solutions are aligned with our business goals and industry best practices.
Top mobile app developer and publisher, specializing in iOS application in the utility, productivity, entertainment categories with a special focus on generative AI applications.
Define and implement the overall data architecture on GCP, including data warehousing in BigQuery, data lake patterns in Google Cloud Storage, and Data Mart (Data Mach) solutions.
Integrate Terraform for Infrastructure as Code to provision and manage cloud resources efficiently.
Establish both batch and real-time data processing frameworks to ensure reliability, scalability, and cost efficiency.
Design, build, and optimize ETL/ELT pipelines using Apache Airflow for workflow orchestration.
Implement dbt (Data Build Tool) transformations to maintain version- controlled data models in BigQuery, ensuring consistency and reliabilityacross the data pipeline.
Use Google Dataflow (based on Apache Beam) and Pub/Sub for large- scale streaming/batch data processing and ingestion.
Automate job scheduling and data transformations to deliver timely insights for analytics, machine learning, and reporting.
Leverage RabbitMQ to implement event-driven or asynchronous data workflows between microservices.
Employ Docker and Kubernetes (K8s) for containerization and orchestration, enabling flexible and efficient microservices-based data workflows.
Implement CI/CD pipelines for streamlined development, testing, and deployment of data engineering components.
Enforce data quality standards using Great Expectations or similar frameworks, defining and validating expectations for critical datasets.
Define and uphold metadata management, data lineage, and auditing standards to ensure trustworthy datasets.
Implement security best practices, including encryption at rest and in transit, Identity and Access Management (IAM), and compliance with GDPR or CCPA where applicable.
Integrate with Looker (or similar BI tools) to provide data consumers with intuitive dashboards and real-time insights.
Collaborate with Data Science, Analytics, and Product teams to ensure the data infrastructure supports advanced analytics, including machine learning initiatives.
Maintain Data Mart (Data Mach) environments that cater to specific business domains, optimizing access and performance for key stakeholders.
Manage, mentor, and develop a team of data engineers, fostering a culture of innovation, continuous learning, and excellence.
Partner with stakeholders to gather requirements, communicate roadmap updates, and deliver high-impact data solutions.
Present project status, key metrics, and potential risks to senior leadership and relevant teams.
Experience:
- At least 2+ years of hands-on experience managing and scaling campaigns in Google Ads. You’re comfortable with all campaign types and their unique strategies.
Technical Expertise with GCP Stack:
- Proven track record building and maintaining BigQuery environments and Google Cloud Storagebased data lakes.
- Deep knowledge of Apache Airflow for scheduling/orchestration and ETL/ELT design.
- Experience implementing dbt for data transformations, RabbitMQ for event-driven workflows, and Pub/Sub + Dataflow for streaming/batch data pipelines.
- Familiarity with designing and implementing Data Mart (Data Mach) solutions, as well as using Terraform for IaC.
Programming & Containerization
- Strong coding capabilities in Python, Java, or Scala, plus scripting for automation.
- Experience with Docker and Kubernetes (K8s) for containerizing data- related services.
- Hands-on with CI/CD pipelines and DevOps tools (e.g., Terraform, Ansible, Jenkins, GitLab CI) to manage infrastructure and deployments.
Data Quality & Governance
- Proficiency in Great Expectations (or similar) to define and enforce data quality standards.
- Expertise in designing systems for data lineage, metadata management, and compliance (GDPR, CCPA).
- Strong understanding of OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) systems.
Leadership & Communication
- Demonstrated ability to manage cross-functional teams and deliver complex projects on schedule.
- Excellent communication skills for both technical and non-technical audiences.
- High level of organization, self-motivation, and problem-solving aptitude.
Machine Learning (ML)Integration
- Familiarity with end-to-end ML workflows and model deployment on GCP (e.g., Vertex AI).
Advanced Observability
- Experience with Prometheus, Grafana, Datadog or New Relic for system health and performance monitoring.
Security & Compliance
- Advanced knowledge of compliance frameworks such as HIPAA, SOC 2, or relevant regulations.
Real-Time Data Architectures
- Additional proficiency in Kafka, Spark Streaming, or other streaming solutions.
Certifications
- GCP-specific certifications (e.g., Google Professional Data Engineer) are highly desirable.
Strategic Influence
- Opportunity to shape the entire GCP-based data infrastructure and define best practices for a high-growth organization.
Competitive Compensation
- Aligned with your leadership experience and market benchmarks.
Professional Growth
- Access to training, conferences, and certification programs.
Innovative Environment
- Collaborate with forward-thinking professionals who value data-driven insights and modern engineering practices.
Flexible Work Arrangements
- Remote and hybrid options to support work- life balance.
If you are eager to lead a data engineering team in designing scalable GCP data solutions and have a passion for architecting high-performance pipelines with Airflow, BigQuery, dbt, Terraform, RabbitMQ, and more, we invite you to join our team!