Get all your news in one place.

100's of premium titles.
One app.

Start reading

Get all your news in one place.

100's of premium titles. One news app.

Start reading

inkl

Technology

Gerald Miller

Igor Taldenko: "A Website Outage Today Is an Event That Affects People's Lives"

Architecture Johnson & Johnson ( Pfizer

An engineer with over 12 years of experience working with high-load systems explains why a website outage is no longer just a bug — it's an event capable of derailing transactions, delaying payments, and even impacting people's health — and how to build architecture that minimizes these risks.

According to recent European studies on digital behavior, the internet for most users is primarily a tool for handling important tasks. Around 94% of users regularly rely on online banking or fintech services — making it one of the most widespread digital habits, second only to email. Meanwhile, approximately 68% of people search for medical information online as their first step when making health-related decisions.

Against this backdrop, the cost of digital system failures has grown significantly. When it comes to entertainment — watching videos, browsing memes, or chatting — outages tend to be seen as annoying but not critical: you can simply close the tab and find another source. But when money, health, or other vital services are involved, even a few minutes of downtime can have real consequences — a missed payment or a decision someone didn't get to make in time.

We spoke with Igor Taldenko — Technical Lead at Wayfinder-ux and Senior Software Engineer at Velocity Global LLC, a specialist with over 12 years of experience in high-load and distributed systems, and a member of the professional IT community AITEX, which brings together engineers and experts from around the world — about how the engineer's role is changing in this environment, and why architecture is becoming an ethical zone of responsibility. Igor has worked with international pharmaceutical companies Pfizer and Johnson & Johnson, and is also involved in the development of Two Homes — a co-parenting platform for families navigating life after divorce.

In this interview, he discusses how architectural mistakes can quietly affect users' psychological wellbeing, who bears responsibility when a bug in the code scales to millions of users simultaneously, and what approaches in his work help reduce these risks.

- Igor, you've spent over 12 years working with the architecture of different types of systems — your experience spans e-commerce, where architecture is primarily about business metrics, and systems with high social significance, where the person, their safety and comfort, come first. What's the main difference in how you approach their architecture?

- The architecture of these systems differs primarily in priorities. In e-commerce, a certain degree of functional degradation is acceptable: if the recommendation or personalization engine temporarily goes down, it's unpleasant, but the core business process — placing an order — keeps working.

In systems with high social significance, that's no longer an option. Here it's critical that the main user scenarios remain accurate, predictable, and available under any conditions — because when failures happen, it's not the business that suffers. It's people.

- You've worked with major international pharmaceutical companies, including Pfizer and Johnson & Johnson, where system failures can have consequences for people's health and lives. Are there additional layers of testing or architectural review in projects of that scale?

- Yes — beyond standard functional testing, scenario-based and load testing are widely used, along with failure simulation and testing how the system behaves under partial component failures. In engineering terms, several things become paramount: data integrity — guaranteeing that information in the system stays accurate and doesn't break during processing; strict consistency — ensuring data across all parts of the system matches and doesn't diverge; idempotency of operations — making sure the same action doesn't cause errors or duplicate effects even when repeated; and full auditability — the ability to precisely reconstruct who made what change, and when.

Over time I've come to believe that more rigorous pre-launch testing is less about a specific industry and more about overall approach. System reliability isn't just a technical property — it's a reflection of how its creators regard the user: whether they respect their time and trust, or don't. Users don't see this directly, but they always feel it.

- Could you say more about what architectural decisions you'd call "engineering respect" for the user?

- Say you're paying for a purchase and accidentally double-tap the button — but your card is only charged once. Behind that is idempotency of operations: the system is built so that the same action can't accidentally lead to errors like duplicate charges.

Another example: you're filling out a long form, you submit something, and the page crashes mid-transaction. When you come back, your data is still there and you can pick up right where you left off. That's proper error handling with state preservation — the system "remembers" which step you were on and doesn't make you start over.

All of this ultimately comes down to one thing: even when something goes wrong in the system, the user shouldn't have to pay for it with their time, their money, or extra stress. And as a rule, this becomes apparent through experience. Everyone has had a moment when an important service stopped working at the worst possible time — and it threw them completely off balance.

- You've worked on the architecture of high-load systems. Have you encountered situations where a technical failure triggered not just inconvenience in users, but an emotional reaction — anxiety, tension — and this showed up in indirect business metrics?

- You can't directly measure those states, but you can estimate them fairly accurately through behavioral and product signals. This involves baseline metrics — latency and error rate — as well as user perception indexes like Apdex. On top of that, RUM metrics (Real User Monitoring) are used, which reflect users' actual experience with the product.

Let me put that in plain terms: say a site is technically up, but pages load slowly — that's latency, the delay in loading. Or you click the payment button and it seems to register, but nothing happens — that's an error rate, a system failure. RUM metrics are when we look at the product through real users' eyes: where they have to wait for a page to load, at what point they give up and close it, and at which step they weren't able to complete an action.

These technical data points are then mapped against business metrics — conversion, retention, engagement — and behavioral patterns. This lets you see a direct connection between degrading technical indicators and a worsening user experience.

This is especially important now, in an era of rapid AI development and automation. These technologies accelerate development, but they also accelerate the scaling of errors: where a bug used to affect one user or one scenario, it can now be instantly reproduced across thousands or millions of cases through identical logic or a shared model.

- Following on from that — where is the line between automation that improves efficiency and automation that becomes a source of new risks? How do you manage that balance in architecture?

- The line is where the system starts making irreversible or poorly controlled decisions without sufficient verification and observability. To manage that balance, I build architecture around the principle of controlled automation. For instance, changes are rolled out gradually rather than to all users at once, certain features can be quickly switched off without taking down the entire system, and critical components are isolated from each other so that a single error doesn't break everything at once.

The underlying idea is simple: automation shouldn't be unconditional. A human must always have a way to intervene and stop the process if something goes wrong — because in some systems this isn't just a bug, it's real consequences for real people. Velocity Global is a clear example: a logic failure there can affect employee onboarding, payroll calculations, or legal documents.

- You're involved in building the architecture for Two Homes, a co-parenting platform for families after divorce. Why does this topic matter to you personally, and what drew you to it as an engineer?

- For me, products like this always involve a heightened concentration of meaning in the details. With Two Homes — where the goal is to reduce tension between parents and create a clear structure for co-parenting — any imprecision in the logic, notifications, or user flows can amplify that tension rather than ease it. In moments like that, you feel the engineer's responsibility very sharply.

Over time I've also developed a simple internal understanding: every specialist, in whatever field, is working with the consequences of their work in some form. And we always have a choice — either to make those consequences more complicated, or to make them more predictable and safer. For me, being conscious of that responsibility and bringing it into my work is what matters.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here