SDK-199 Auth Retry policy completely bypassed #987

joaodordio · 2025-12-15T17:21:57Z

🔹 Jira Ticket(s)

SDK-199

✏️ Description

Fixes auth retry for expired JWT refresh: scheduled token refresh no longer bypasses pauseAuthRetries or authRetryPolicy.maxRetry, preventing runaway/infinite onAuthTokenRequested loops.

Added unit tests to lock pause + maxRetry behavior on scheduled refresh, and reset retry count after a successful refresh is queued.

codecov · 2025-12-15T17:38:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.48%. Comparing base (f896dc7) to head (4a72ae5).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #987      +/-   ##
==========================================
+ Coverage   69.45%   69.48%   +0.03%     
==========================================
  Files         109      109              
  Lines        8944     8953       +9     
==========================================
+ Hits         6212     6221       +9     
  Misses       2732     2732

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Ayyanchira

Made some comments

Ayyanchira · 2025-12-16T13:08:00Z

swift-sdk/Internal/AuthManager.swift

        let timeIntervalToRefresh = TimeInterval(expirationDate) - dateProvider.currentDate.timeIntervalSince1970 - expirationRefreshPeriod
        if timeIntervalToRefresh > 0 {
            scheduleAuthTokenRefreshTimer(interval: timeIntervalToRefresh, isScheduledRefresh: true, successCallback: onSuccess)
+            resetRetryCount()


Schedule wont necessarily guarantees successful retrieval of token. Should we reset the count here?

My understanding is that since the schedule is fired, we can reset the retries. regardless of its success.

swift-sdk/Internal/AuthManager.swift

Ayyanchira · 2025-12-16T13:10:43Z

swift-sdk/Internal/AuthManager.swift

+            if self.localStorage.email != nil || self.localStorage.userId != nil {
+                self.isInScheduledRefreshCallback = isScheduledRefresh
+                self.requestNewAuthToken(hasFailedPriorAuth: false, onSuccess: successCallback, shouldIgnoreRetryPolicy: isScheduledRefresh)
+                self.isInScheduledRefreshCallback = false


This part should be well tested manually as well

I'm leaning on these tests to cover this part:

Refresh is queued and callback fires (scheduled refresh path hits the timer closure and requests a token)

Skip refresh when there’s no email and no userId

Scheduled refresh respects retry limits (this is specifically exercising isInScheduledRefreshCallback + shouldPauseRetry logic invoked inside the timer closure when isScheduledRefresh == true)

Pausing retries blocks the refresh (this hits shouldSkipTokenRefresh before scheduling the timer)

# Conflicts: # swift-sdk/Internal/AuthManager.swift

sumeruchat · 2025-12-18T16:26:10Z

Pull Request Review Comment

Please Consider Refactoring to Option 1 from the Ticket

The current fix works, but I strongly recommend simplifying to the Option 1 approach that was outlined in the ticket description. Here's why:

The Core Problem

This bug exists because shouldIgnoreRetryPolicy: true bypasses safety limits (pauseAuthRetry and maxRetry). The ticket correctly identifies these as critical controls that prevent infinite loops and battery drain.

The question is: should we ever allow these safety limits to be bypassed?

Current Implementation: Selective Bypass

Your fix introduces a flag isInScheduledRefreshCallback to detect when we're in a scheduled refresh, then enforces safety limits only in that specific case:

private func shouldPauseRetry(_ shouldIgnoreRetryPolicy: Bool) -> Bool {
    if pauseAuthRetry {
        return true
    }
    
    // Only enforces maxRetry for scheduled refreshes
    if isInScheduledRefreshCallback {
        return retryCount >= authRetryPolicy.maxRetry
    }
    
    // Still allows bypass in other cases
    return retryCount >= authRetryPolicy.maxRetry && !shouldIgnoreRetryPolicy
}

This means: Any code path that calls requestNewAuthToken(shouldIgnoreRetryPolicy: true) outside of scheduled refreshes can still bypass maxRetry. You've fixed the symptom (scheduled refresh infinite loop) but left the backdoor open.

Option 1: Always Enforce Safety Limits

private func shouldPauseRetry(_ shouldIgnoreRetryPolicy: Bool) -> Bool {
    if pauseAuthRetry {
        return true
    }
    
    if retryCount >= authRetryPolicy.maxRetry {
        return true
    }
    
    return false
}

This means: No code path can bypass safety limits, ever. Period.

Why This Is Better

1. Prevents ALL infinite loops, not just scheduled refresh loops

If tomorrow someone adds code that calls requestNewAuthToken(shouldIgnoreRetryPolicy: true) in a different context (error handler, manual retry, etc.), your current fix won't protect them. Option 1 would.

2. No state management complexity

Your approach requires:

A new instance variable
Setting the flag before the call
Clearing the flag after the call
Future developers understanding this flag exists and what it means

Option 1 requires: none of that.

3. Thread-safe by default

Flag management patterns can be racy if methods get called from multiple threads. Option 1 has no such risk.

4. Matches the stated intent

The ticket says: "pauseAuthRetry and maxRetry are safety controls that should ALWAYS be respected."

Your implementation respects them sometimes. Option 1 respects them always.

5. Fewer lines of code

Current: 12 lines in shouldPauseRetry() + flag declaration + flag management
Option 1: 8 lines total

Simple code is maintainable code.

What About `shouldIgnoreRetryPolicy`?

I think there's confusion about what this parameter is supposed to do. Based on the name and ticket context, it seems like it was meant to bypass retry backoff delays, not safety limits.

If the intent is "try again immediately without waiting," that's fine. But that's different from "ignore maxRetry and keep trying forever."

Proposal: Safety limits (pause + maxRetry) should be absolute. If we need to bypass backoff timing, that's a separate concern and should be named/handled differently.

The Root Cause Matters

The ticket identifies the root cause perfectly:

"The shouldIgnoreRetryPolicy parameter was likely introduced to allow immediate retries in certain scenarios (like manual user-triggered refreshes), but inadvertently became part of the automatic token expiration flow, bypassing critical safety controls."

The fix shouldn't be to add a special case that detects when we're in the problematic flow. The fix should be to remove the ability to bypass safety controls because that's fundamentally dangerous regardless of the flow.

What If We Need to Bypass?

If there's a legitimate use case where we need to retry beyond maxRetry, that should be:

Explicitly documented in code comments
A separate code path with a different method name
Something we consciously choose, not accidentally enable via a boolean parameter

Right now, shouldIgnoreRetryPolicy: true is a loaded gun sitting in the codebase. Your fix adds a safety on one trigger, but the gun is still loaded.

Bottom Line

Option 1 is:

Simpler (less code, no state management)
Safer (no infinite loop scenarios, thread-safe)
Clearer (safety limits are always enforced)
More maintainable (future developers can't bypass accidentally)

The only downside is it changes behavior for all callers of shouldIgnoreRetryPolicy: true. But that's actually a feature because bypassing safety limits was the bug in the first place.

Please consider refactoring to Option 1. If there's a specific scenario that requires bypassing maxRetry, let's discuss it explicitly so we can design a proper solution.

sumeruchat

Agree with Claude that we should take the simpler approach @joaodordio

joaodordio · 2025-12-19T10:24:12Z

Agree with Claude that we should take the simpler approach @joaodordio

@sumeruchat We need to have a bigger discussion about this.

The solution you are recommending based on Claude, removes the shouldIgnoreRetryPolicy flag from the implementation.

We need to further discuss offline why we need that flag and what the purpose of it is.

joaodordio self-assigned this Dec 15, 2025

joaodordio marked this pull request as ready for review December 15, 2025 23:37

Ayyanchira reviewed Dec 16, 2025

View reviewed changes

joaodordio requested a review from Ayyanchira December 16, 2025 15:37

joaodordio force-pushed the fix/SDK-199-Auth-Retry-Policy-Completely-Bypassed branch from 5fd4de2 to 4de526e Compare December 17, 2025 12:17

🔧 Initial fix to respect retry policy

935883e

# Conflicts: # swift-sdk/Internal/AuthManager.swift

joaodordio force-pushed the fix/SDK-199-Auth-Retry-Policy-Completely-Bypassed branch from 4de526e to 935883e Compare December 17, 2025 15:44

🔀 Resolved conflicts

4a72ae5

sumeruchat self-requested a review December 18, 2025 16:26

sumeruchat requested changes Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SDK-199 Auth Retry policy completely bypassed #987

SDK-199 Auth Retry policy completely bypassed #987

Uh oh!

joaodordio commented Dec 15, 2025 •

edited by atlassian bot

Loading

Uh oh!

codecov bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

Ayyanchira left a comment

Uh oh!

Ayyanchira Dec 16, 2025

Uh oh!

joaodordio Dec 16, 2025

Uh oh!

Uh oh!

Ayyanchira Dec 16, 2025

Uh oh!

joaodordio Dec 16, 2025

Uh oh!

sumeruchat commented Dec 18, 2025

Uh oh!

sumeruchat left a comment

Uh oh!

joaodordio commented Dec 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SDK-199 Auth Retry policy completely bypassed #987

Are you sure you want to change the base?

SDK-199 Auth Retry policy completely bypassed #987

Uh oh!

Conversation

joaodordio commented Dec 15, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔹 Jira Ticket(s)

✏️ Description

Uh oh!

codecov bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ayyanchira left a comment

Choose a reason for hiding this comment

Uh oh!

Ayyanchira Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

joaodordio Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ayyanchira Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

joaodordio Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sumeruchat commented Dec 18, 2025

Pull Request Review Comment

Please Consider Refactoring to Option 1 from the Ticket

The Core Problem

Current Implementation: Selective Bypass

Option 1: Always Enforce Safety Limits

Why This Is Better

What About shouldIgnoreRetryPolicy?

The Root Cause Matters

What If We Need to Bypass?

Bottom Line

Uh oh!

sumeruchat left a comment

Choose a reason for hiding this comment

Uh oh!

joaodordio commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joaodordio commented Dec 15, 2025 •

edited by atlassian bot

Loading

codecov bot commented Dec 15, 2025 •

edited

Loading

What About `shouldIgnoreRetryPolicy`?

joaodordio commented Dec 19, 2025 •

edited

Loading