Date: 30/04/2024
Authors: André Chaves, Eduardo Milhomen, Douglas Ferreira
Status: Solved
Impact: Partial latency on document creation
The primary cause of the latency was that requests were held for an extended duration without timely processing.
The issue was initially identified through an alert from DataDog, which reported excessive latency affecting Itau Endpoints.
To resolve the issue, we scaled up the number of pods handling the traffic and implemented a strict time limit on the endpoint. Now, no request is held for more than 60 seconds. This adjustment has stabilized the document creation process and resolved the latency issues.
The monitoring application alerted us of the latency promptly.
We took more time than expected to identify the root cause of the problem.
Despite the significant latency, the impact was confined to partial delays on document creation rather than a complete system outage.