Job Summary
We are seeking a skilled Data Engineer with 3 to 5 years of experience in building scalable data pipelines and solutions, with strong hands-on expertise in Databricks. The ideal candidate should be proficient in working with large-scale data processing frameworks and have a solid understanding of Delta Lake, PySpark, and cloud-based data platforms.
Key Responsibilities:
- Design, build, and maintain robust ETL/ELT pipelines using Databricks (PySpark/SQL).
- Develop and optimize data workflows and pipelines on Delta Lake and Databricks Lakehouse architecture.
- Integrate data from multiple sources, ensuring data quality, reliability, and performance.
- Collaborate with data scientists, analysts, and business stakeholders to translate requirements into scalable data solutions.
- Monitor and troubleshoot production data pipelines; ensure performance and cost optimization.
- Work with DevOps teams for CI/CD integration and automation of Databricks jobs and notebooks.
- Maintain metadata, documentation, and versioning for data pipelines and assets.
Required Skills:
- 3–5 years of experience in data engineering or big data development.
- Strong hands-on experience with Databricks (Notebook, Jobs, Workflows).
- Proficiency in PySpark, Spark SQL, and Delta Lake.
- Experience working with Azure or AWS (preferably Azure Data Lake, Blob Storage, Synapse, etc.).
- Strong SQL skills for data manipulation and analysis.
- Familiarity with Git, CI/CD pipelines, and job orchestration tools (e.g., Airflow, Databricks Workflows).
- Understanding of data modeling, data warehousing, and data governance best practices.
Preferred Qualifications:
- Databricks certification (e.g., Databricks Certified Data Engineer Associate).
- Experience with Power BI, Snowflake, or Synapse Analytics is a plus.
- Exposure to streaming data pipelines (e.g., using Kafka, Structured Streaming).
- Understanding of cost optimization and performance tuning in Databricks