Preparing Incident ViewLive
Available for SRE Roles

Aarju Raj Arya

Senior Incident Management Engineer | SRE

I Resolve Critical Production Incidents Before They Impact Business

Handling P1/P2 incidents for enterprise clients (Walmart, Titan, Hilti), leading war rooms, reducing MTTR, and restoring production systems under strict SLA pressure.

Incident Console

Production Reliability View

Systems stable
War rooms
P1 / P2 coordination
Observability
Grafana + New Relic
Escalation
Structured triage flow
Focus
Restore service fast

Distributed systems response

Cross-team incident command

Production reliability ownership

Incidents managed

0+

SLA compliance

0.0%

MTTR reduction

0%

Uptime maintained

0.0%
Operational Impact

Incident ownership built for enterprise production pressure.

Execution built for enterprise production systems where downtime directly affects business operations.

Managed 11,500+ production incidents across enterprise SaaS systems.

Led 100+ monthly P1/P2 war rooms across distributed environments.

Reduced MTTR by 25% through structured triage and escalation workflows.

Maintained 99.9% uptime across critical business systems.

Achieved 98.5% SLA compliance across high-volume support queues.

Built Grafana dashboards improving detection and response speed.

Performed API and log-level troubleshooting using Postman and logs.

Fast detection -> Controlled response -> Reliable recovery

Experience Timeline

Execution under pressure, written as results.

Short, scan-friendly bullets built from enterprise production work.

Incident Management Engineer

FarEye Technologies

2021 - Present

Noida, India

Owned incident response for enterprise production systems handling high-volume traffic and critical business operations.

Managed enterprise clients: Walmart, Titan Group, and Hilti.

Led P1/P2 war rooms across distributed SaaS systems.

Coordinated Dev, QA, Infra, and Product teams during outages.

Reduced MTTR by 25% through structured escalation workflows.

Maintained 99.9% uptime through proactive monitoring.

Built Grafana dashboards improving detection speed.

Created RCA reports and drove incidents to closure.

Conducted Monthly Service Reviews (MSR).

WordPress Developer

Freelancer

2019 - 2021

Remote

Delivered and deployed responsive client websites.

Handled production defects and release issues on live sites.

Managed hosting, migrations, and uptime monitoring.

Case Studies

Production incidents structured into problem, action, result.

Open each case study for a deeper incident breakdown covering detection, response, and resolution.

Loading skills visibility...
Loading monitoring view...

Contact

Hiring for SRE / Incident Management?

I help teams reduce downtime, improve SLA performance, and handle critical production incidents under pressure.

Available for immediate opportunities.

What I Bring

Incident ownership across enterprise production systems.

SLA-focused communication during high-pressure response.

Monitoring, triage, RCA, and recovery coordination.