Data Engineer

Involve Asia

AI-generated summary

beta

This job is a Data Engineer role at Involve Asia. You might like this job because you’ll work with large datasets, building smart systems that help teams find useful insights. Plus, you'll use cool tools like Python, EMR, and Kubernetes!

Undisclosed

Menara MBMR, Kuala Lumpur

Full-Time

Job Description

We’re looking for a Junior/Mid Level Data Engineer who is curious, technically hands-on, and eager to grow into building reliable production-grade data systems.

You’ll work with senior engineers, analysts, product teams, and business stakeholders to build, maintain, and improve data pipelines and datasets used for analytics, reporting, and operational decision-making.

This role suits someone who enjoys going beyond surface-level fixes: tracing data issues across systems, validating assumptions, reading pipeline logic, improving data quality checks, and learning how to operate data systems at scale.

You do not need to know everything from day one, but you should be comfortable learning quickly, asking good questions, writing clean SQL and Python, and taking ownership of assigned work.

WHAT YOU’LL DO

Build and maintain data pipelines

Design, build, and maintain data pipelines that ingest, transform, validate, and serve data for analytics and business use cases.
Work with batch data processing workflows, and gradually learn streaming or CDC-based patterns where applicable.
Support end-to-end pipeline development from source ingestion to data lake storage, transformation, modeling, and serving layers.
Write SQL and Python code for data transformation, automation, validation, and pipeline support.
Help improve pipeline reliability, performance, and maintainability over time.

Support data infrastructure and orchestration

Work with workflow orchestration tools such as Apache Airflow to schedule, monitor, and troubleshoot data jobs.
Support workloads running on compute clusters and data lake environments.
Help maintain datasets, tables, partitions, schemas, and transformation logic used by analytics and reporting teams.
Assist in improving data pipeline documentation, runbooks, and operational playbooks.

Data quality, reliability, and troubleshooting

Investigate pipeline failures, data discrepancies, freshness issues, and unexpected metric changes.
Perform root-cause analysis by checking source data, transformation logic, job logs, SQL queries, and downstream reports.
Build or improve data quality checks for completeness, freshness, accuracy, duplication, and anomaly detection.
Work with analysts, product teams, and engineers to clarify expected data behavior and resolve issues.
Help implement fixes that prevent recurring problems, not just temporary patches.

Engineering practices

Write clean, readable, modular, and maintainable code.
Participate in code reviews and learn good engineering practices such as testing, version control, dependency management, and CI/CD.
Follow team standards for naming, documentation, data modeling, and pipeline development.
Contribute to technical documentation, including data flow notes, pipeline logic, data contracts, and troubleshooting guides.

Collaboration and communication

Partner with Data Analysts, BI users, Product, Engineering, and Operations teams to understand data needs and translate them into reliable datasets and pipelines.
Explain data issues, pipeline behavior, and trade-offs clearly to both technical and non-technical stakeholders.
Raise risks early when data quality, pipeline stability, or delivery timelines may be affected.

Job Requirements

WHAT WE’RE LOOKING FOR

Technical Competencies — Junior Level

Comfortable writing SQL queries involving joins, CTEs, aggregations, filtering, and basic performance awareness.
Able to write Python scripts for data transformation, automation, validation, or analysis.
Basic understanding of data pipelines, ETL/ELT concepts, and data warehousing or data lake concepts.
Familiarity with version control, preferably Git.
Basic understanding of data quality concepts such as freshness, completeness, accuracy, duplication, and anomaly checks.
Willingness to learn orchestration tools such as Apache Airflow and distributed processing concepts.

Technical Competencies — Mid Level

Solid experience building or maintaining production data pipelines.
Strong SQL skills, including query optimization awareness and data modeling considerations.
Good Python coding ability with attention to clean, reusable, and testable code.
Hands-on experience with workflow orchestration tools, preferably Apache Airflow.
Experience working with data lakes, warehouses, or large-scale analytical datasets.
Understanding of data modeling concepts such as OLTP vs OLAP, partitioning, fact/dimension tables, and how models affect usability and performance.
Able to troubleshoot pipeline failures, performance issues, and data quality problems with minimal supervision.
Familiarity with observability concepts such as logs, metrics, alerts, SLA/SLOs, and pipeline monitoring.

Behavioural Competencies

Strong curiosity and willingness to learn deeply.
Enjoys solving ambiguous technical problems and tracing issues to root cause.
Strong sense of ownership over assigned pipelines, tasks, and data quality.
Analytical thinking and attention to detail.
Clear communication with both technical and non-technical stakeholders.
Able to work under guidance while progressively taking more independent ownership.
Comfortable asking questions, receiving feedback, and improving through code reviews.
Reliable, structured, and proactive in following through on issues.

QUALIFICATIONS

Junior Level

0–2 years of experience in data engineering, software engineering, analytics engineering, BI engineering, or a strong portfolio of data projects.
Bachelor’s degree in Computer Science, Software Engineering, Data Science, Statistics, Mathematics, Engineering, Economics, or equivalent practical experience.
Internships, academic projects, freelance work, or side projects involving SQL, Python, pipelines, automation, or data processing are welcome.

Mid Level

2–5 years of experience in data engineering, analytics engineering, software engineering, or production data systems.
Experience building, maintaining, or operating data pipelines in a production or business-critical environment.
Strong practical experience with SQL, Python, orchestration, and data quality practices.

Skills

Java (Programming Language)

Python (Programming Language)

SQL (Programming Language)

Data System

Company Benefits

Others

Unlimited snacks and beverage provided!

Health

You will enjoy inpatient, outpatient, optical and dental, health screening, physiotherapy benefits!

Education

You will get the chance to be sponsored for your education and professional membership fees!

Lifestyle

One day off on your birthday! Laptop allowance will be provided for using your own laptop!

Family Focus

Maternity and Paternity up to 100 days and 10 days respectively, Marriage leave benefits is provided too!

Flexibility

Work your way! Enjoy flexible working hours, casual dress code, and a day of WFH each week!

Additional Info

Company Activity

Last active - few minutes ago

Experience Level

1 - 3 Years of Experience

Career Level

Junior Executive

Job Specialisation

Data Science & Analytics

Company Profile

Involve Asia

Open and transparent culture We provide good opportunities for learning and working directly with the founders Flexible Working Hours Free snacks & beverage in office ! We provide Dental & Optical benefits (upon confirmation) Laptop allowance Insurance for inpatient boarding 3 days marriage leave entitlement Maternity leave up to 100 days Paternity leave up to 10 days Annual...