Site Reliability Engineer (f/m/d) – Observability & Internal Tools
We usually respond within three days
Remote in our day-to-day work. On-site when it matters.
We work remote by default – focused, efficient, and with full ownership. For larger features, architectural decisions, and real brainstorming sessions, we come together in Berlin or Cologne – fast, hands-on, and without unnecessary meeting overhead.
We use AI to accelerate – not to replace thinking.
We design the system, steer the output, and take responsibility for what we ship.
Fast where it makes sense. Careful where it matters.
Your Mission
Take full ownership of smartclip’s internal utility and platform tooling. Focus your energy on the intersection of observability, automation, and developer infrastructure. Don't just maintain existing systems – evolve them, research cutting-edge open-source alternatives, and implement them.
Forget expensive enterprise SaaS. Invest in deep in-house expertise. Understand our systems end-to-end, maintain total flexibility, and contribute back to the open-source ecosystem we depend on.
Face these challenges:
Build & Evolve: Operate and advance our observability stack (including Prometheus, Grafana, and Forgejo).
Go Open Source First: Replace "buy" decisions with robust "build & maintain" strategies.
Engineer the Platform: Design observability as a platform capability. Define SLOs and create actionable alerting to stop incidents before they start.
Secure the Stack: Embed security engineering into the delivery process. Find vulnerabilities before the pen tests do.
Master the Infrastructure: Navigate Linux systems and distributed tooling. Balance bold exploration with production stability.
Your Skills
Be motivated by systems thinking and deep technical curiosity. Stop being a consumer – start being a builder.
Must-haves:
Apply an Observability Mindset: Implement a clear strategy for metrics, logs, and traces. Transform "noisy alerts" into "actionable insights."
Embrace Ownership: Live the "you build it, you run it" philosophy. Stop the ticket ping-pong and end the excuses.
Nice-to-haves:
Design and evolve production-grade setups on GCP or AWS.
Show us your contributions to open-source projects.
Turn your passion for root-cause analysis into blameless post-mortems.
Why you’ll love working with us
Ownership over tickets: You’re trusted with real responsibility, not just tasks. No unnecessary bureaucracy, no micromanagement – we rely on you to take things forward.
Build > Talk: We test what works – not what sounds good. Fail fast, learn faster.
High standards, low ego: We take our work seriously, but not ourselves. Direct feedback, honest collaboration, no drama.
Stay sharp: Hackathons, conferences, community – we invest in your growth and keep you at the cutting edge.
Remote flexibility. In person, when it matters.: You work flexibly remote, with a connection to our Berlin or Cologne locations, where our TV Labs are and we experiment, build, and learn together.
And yes – the fundamentals are covered too: 30 days of vacation + Dec 24 & 31 off, Smart Fridays (4 days week possible), mobility (Germany ticket & JobRad), sports & health offerings, mental health support, corporate benefits, RTL+ access, and more.
Your CV is just the starting point.
What matters more to us than your resume: a portfolio, a side project, a demo repo – anything that shows you don’t just talk about code, you ship it. Production-ready. Thought through. Done.
smartclip is committed to creating a diverse and inclusive environment. All qualified applicants will receive consideration for employment without regard to race, ethnicity, nationality, age, gender, gender identity, religion, sexual orientation, disability, or any other diverse characteristics.
- Department
- System Operations
- Role
- Site Reliability Engineer
- Locations
- Berlin
- Remote status
- Fully Remote