Senior Data Engineer - EC

TaipeiLINE TaiwanEngineeringServer-sideFull-time

We are seeking for a strong and passionate data engineer working closely with key stakeholders, data scientists, machine learning engineers and platform engineers for large-scale system implementation. With a focus on complex data pipelines for e-commerce services at LINE, you should be able to design and drive large projects (cross-domain recommender system) from scratch to production.  In addition to extracting and transforming data, you will be expected to use your expertise to build extensible data models to provide workable and efficient strategies to partnering data scientist for data cleansing, performance enhancements and development of data products.

 

Responsibilities

  • Collaborate with engineers, program managers and data scientists to understand data needs.
  • Design, build and launch (schedule) efficient and robust data pipelines to move and transform data.
  • Deploy and monitor inclusive data quality checks to ensure high quality of data statistically, (collaborative with data scientists)
  • Performance troubleshooting and tuning of data pipelines.
  • Create data sources for BI tool use.

 

Basic Qualifications

  • BS/MS in Computer Science or a related technical field.
  • Expert with scripting languages such as Python and shell scripts.
  • Extensive experience with OOP in Python.
  • A minimum of 5 years of SQL and relational databases experience.
  • A minimum of 3 years experience in custom ETL design, implementation and maintenance.
  • A minimum of 3 years experience workflow management engines (e.g., Airflow)
  • A minimum of 3 years experience with Hadoop eco-system (e.g., Hive, Spark, Yarn)
  • Experience with SQL performance tuning and end-to-end process optimization.
  • Experience working with cloud Big Data platform (e.g., AWS Redshift, Google BigQuery)
  • Experience with Kubernetes & CI/CD tools.

 

Preferred Qualifications

  • Strong understanding of Spark, with a specific focus on optimization, data manipulation (RDD, Spark SQL queries) and persistence (HDFS, Parquet)
  • Experience with handling ETL processes of e-commerce data
  • Familiar with NoSQL databases (e.g., MongoDB, Redis)
  • Familiar with querying massive datasets using Spark, Presto, Hive
  • Familiar with scrum-based development
  • Experience with streaming data by an efficient tool (e.g., Spark, Kafka)
  • Experience with data cleansing, especially for anomaly/outlier detection
  • Experience with migrating data warehousing and data processing

Relevant Jobs