Job Summary
We are seeking a skilled Data Engineer with 3 to 5 years of experience in building scalable data pipelines and solutions, with strong hands-on expertise in Databricks. The ideal candidate should be proficient in working with large-scale data processing frameworks and have a solid understanding of Delta Lake, PySpark, and cloud-based data platforms.

Key Responsibilities:
  • Design, build, and maintain robust ETL/ELT pipelines using Databricks (PySpark/SQL).
  • Develop and optimize data workflows and pipelines on Delta Lake and Databricks Lakehouse architecture.
  • Integrate data from multiple sources, ensuring data quality, reliability, and performance.
  • Collaborate with data scientists, analysts, and business stakeholders to translate requirements into scalable data solutions.
  • Monitor and troubleshoot production data pipelines; ensure performance and cost optimization.
  • Work with DevOps teams for CI/CD integration and automation of Databricks jobs and notebooks.
  • Maintain metadata, documentation, and versioning for data pipelines and assets.

Required Skills:
  • 3–5 years of experience in data engineering or big data development.
  • Strong hands-on experience with Databricks (Notebook, Jobs, Workflows).
  • Proficiency in PySpark, Spark SQL, and Delta Lake.
  • Experience working with Azure or AWS (preferably Azure Data Lake, Blob Storage, Synapse, etc.).
  • Strong SQL skills for data manipulation and analysis.
  • Familiarity with Git, CI/CD pipelines, and job orchestration tools (e.g., Airflow, Databricks Workflows).
  • Understanding of data modeling, data warehousing, and data governance best practices.

Preferred Qualifications:
  • Databricks certification (e.g., Databricks Certified Data Engineer Associate).
  • Experience with Power BI, Snowflake, or Synapse Analytics is a plus.
  • Exposure to streaming data pipelines (e.g., using Kafka, Structured Streaming).
  • Understanding of cost optimization and performance tuning in Databricks