Job Description
Roles:
- Work closely with the backend and analytics teams to design, build, and maintain efficient data pipelines.
- Collect, clean, and process large datasets from multiple sources.
- Assist in designing ETL (Extract, Transform, Load) processes for structured and unstructured data.
- Implement and optimize database schemas for scalability and performance.
- Collaborate with data scientists and developers to support analytics and ML workloads.
- Ensure data integrity, security, and accessibility across systems.
Responsibilities:
- Build and manage data workflows using open-source technologies (e.g., Airflow, Apache Spark, dbt).
- Develop and optimize SQL queries for data extraction and transformation.
- Support the integration of APIs and external data sources into the central data warehouse.
- Create automated scripts for data validation, monitoring, and error handling.
- Participate in code reviews, documentation, and testing of data systems.
- Troubleshoot and resolve performance or data quality issues in production pipelines.
Requirements:
- Pursuing a degree in Computer Science, Information Technology, or a related field.
- Strong understanding of SQL and relational databases (PostgreSQL preferred).
- Familiarity with Python for data processing (Pandas, NumPy).
- Basic knowledge of ETL concepts, data warehousing, and APIs.
- Experience with Linux, Git, and command-line tools.
- Knowledge of cloud platforms (AWS, GCP, or Azure) is a plus.
- Strong problem-solving mindset with attention to detail and scalability.