Senior Data Engineer - Data Lakehouse Infrastructure
About the Role
We are seeking a Senior Data Engineer to join our team at TRM Labs, a blockchain intelligence company dedicated to building a safer world. In this Senior Data Engineer remote position, you will play a crucial role in architecting and scaling our data lakehouse infrastructure, which supports complex workloads and real-time data pipelines.
What You'll Do
- Design and implement core components of our lakehouse architecture, ensuring high performance and scalability.
- Optimize query performance and manage metadata using tools such as Apache Spark, Trino, and Snowflake.
- Collaborate with cross-functional teams, including data scientists and product managers, to deliver impactful data solutions.
- Develop and orchestrate ETL/ELT pipelines using Apache Airflow and GCP-native tools.
- Architect a high-performance data lakehouse on GCP, leveraging technologies like BigQuery and Kafka.
Requirements
- 5+ years of experience in data or software engineering, focusing on distributed data systems.
- Proven experience building and scaling data platforms on GCP.
- Strong command of query engines such as Trino, Spark, or Snowflake.
- Exceptional programming skills in Python and SQL.
- Hands-on experience with Airflow for workflow orchestration.
Nice to Have
- Experience with modern table formats like Apache Hudi or Iceberg.
- Familiarity with data governance frameworks.
- Knowledge of streaming and batch processing methodologies.
What We Offer
- Competitive salary range of $190,000 - $220,000 per year.
- Opportunity to participate in TRM’s equity plan.
- Flexible remote work environment with async communication.
- Collaborative team culture focused on innovation and problem-solving.
- Professional development opportunities and mentorship.
This Senior Data Engineer role at TRM Labs offers a unique opportunity to work on impactful projects in a remote-first environment with competitive compensation.
Who Will Succeed Here
Proficient in Python and SQL with a strong understanding of data modeling and ETL processes, capable of designing efficient data pipelines using Apache Spark and Trino for real-time analytics.
Self-motivated and disciplined, demonstrating the ability to work independently in a remote environment while managing multiple projects and deadlines effectively.
Deep understanding of cloud platforms, specifically GCP, and experience with data orchestration tools like Apache Airflow, showcasing an innovative mindset to optimize data lakehouse architecture.
Learning Resources
Career Path
Market Overview
Skills & Requirements
Domain Trends
Industry News
Loading latest industry news...
Finding relevant articles from the last 6 months