I'm Vítek Urbanec, and I help engineering teams turn operational chaos into coordinated response. From incident management to operational readiness, I bring 13+ years of experience building resilient systems at scale.
Specialized support to transform your engineering operations
With Focus on Incident Response & Operational Resilience
Before your service goes live, ensure your team is ready when (not if) things go wrong. Comprehensive assessment of your production readiness—covering traditional services and AI systems—with deep focus on incident response capabilities.
Fixed Price: €15,000 | 4-week engagement
Optimize cloud spend, improve governance, and build sustainable infrastructure practices
Currently developing these offerings. Interested? Let's talk.
Since 2011, I've been building production readiness and operational resilience into some of the world's most critical systems. From incident response processes to operational standards, I've seen what works when systems fail and pressure is high.
Customer Engineer managing SAN/NAS storage incidents for major banks and financial institutions where downtime wasn't an option.
Managed incident response for Rackspace's largest accounts across Europe from London—handling data-related incidents where every minute of downtime mattered. Ensured capacity planning and operational resiliency of the hosted solutions.
Handled incident response for one of Europe's largest online betting platforms—systems that needed to stay up during major sporting events with millions of users. Developed observability and resiliency for the betting exchange and private OpenStack cloud.
Designed the incident management process from the ground up for Unity in Finland, taking them from ad-hoc firefighting to a structured, scalable approach.
Created standards for service owner teams on-call and operational readiness, establishing frameworks that enabled dozens of teams to own their services effectively at scale.
Co-founder of the SRE Finland meetup group, regularly presenting on incident management and post-mortem practices.
Practice incident response and improve operational readiness
Practical steps to improve your CDN resilience without expensive multi-CDN setups. Lessons from the Cloudflare outage.
Open WorksheetYour guide to managing extreme load and operational stress during high-traffic events.
Read the PlaybookNavigate realistic incident scenarios and make decisions under pressure without real-world consequences.
Play the GameLet's discuss how I can help your team build better systems and processes