Relevant Experience: 4+
Tech Stack:
Programming: Strong proficiency in Python and advanced SQL, with familiarity in Scala or Java.
Data Engineering Tools: Hands-on production experience with Apache Spark, Airflow, and dbt.
AI & GenAI Frameworks: Proven experience with LangChain, LlamaIndex, and HuggingFace Transformers.
Cloud Platforms: Solid hands-on experience with AWS, GCP, or Azure data services in production.
Data Warehousing: Proficiency in designing and optimizing Snowflake, BigQuery, or Redshift solutions.
Databases: Deep knowledge of relational databases like PostgreSQL and MySQL, and NoSQL systems like MongoDB, Elasticsearch, or Cassandra.
Vector Search & Embeddings: Production-level optimization of Pinecone, Weaviate, and pgvector.
MLOps & Governance: Management of ML metadata and experiment tracking using MLflow and Weights & Biases.
Infrastructure & DevOps: Proficiency in Docker, Kubernetes, Terraform, and CI/CD tools such as GitHub Actions or Jenkins.
Data Architecture: Experience with modern architectures including Delta Lake and Iceberg.
Must Have:
Experience: 4–5 years in Data Engineering with deep expertise in Data Science and Generative AI.
Data Pipeline Skills: Ability to architect and own scalable ETL/ELT pipelines for structured and unstructured data.
GenAI Expertise: Hands-on production experience with RAG pipelines, LLM fine-tuning, prompt engineering, and embedding pipelines.
MLOps: Experience managing model deployment, experiment tracking, and model registries.
Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related quantitative field.
Leadership: Ability to act as a technical lead, mentor junior engineers, and conduct code reviews.
Good to Have:
Languages: Familiarity with Scala or Java.
Advanced AI: Experience with multi-modal AI systems or agentic AI frameworks like AutoGen or CrewAI.
Streaming: Knowledge of Apache Kafka or AWS Kinesis.
Governance: Experience with data mesh, data contract frameworks, or federated data governance.
Certifications: Cloud platform or AI/ML certifications (e.g., Google ML Engineer, DeepLearning.AI).
Contributions: Published technical writing, conference talks, or open-source contributions in the Data/AI space.