Site Reliability Engineer
The LINE Taiwan Site Reliability Engineering (SRE) team provides and builds infrastructure that fuel LINE's services. We are the foundation on which Line's software developers build the services. We are looking for passionate and hardworking Site Reliability Engineers to combine software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Familiarity with micro services architecture and container orchestration with Kubernetes.
- Expertise related to DevOps engineering including version control systems (Git), automated build and testing (Drone), cluster management and application delivery GitOps (ArgoCD).
- Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment.
- Experience with monitoring tools such as Prometheus, log aggregation tools such as Loki and distributed request tracing such as OpenTelemetry, Jaeger.
- Experience with one or more of the following: Go, Node.js, Python, Java.
- Experience with deploying, supporting and supervising new and existing services, platforms, and application stacks.
- Excellent troubleshooting and problem solving skills.
- Experience with scale testing, disaster recovery, and capacity planning.
- Solid knowledge of the operating system networking stack, TCP and UDP.
- Strong hands-on knowledge in Unix/Linux operating systems.