Skip to content

Conversation

@joaodordio
Copy link
Member

@joaodordio joaodordio commented Dec 15, 2025

🔹 Jira Ticket(s)

✏️ Description

Fixes auth retry for expired JWT refresh: scheduled token refresh no longer bypasses pauseAuthRetries or authRetryPolicy.maxRetry, preventing runaway/infinite onAuthTokenRequested loops.

Added unit tests to lock pause + maxRetry behavior on scheduled refresh, and reset retry count after a successful refresh is queued.

@joaodordio joaodordio self-assigned this Dec 15, 2025
@codecov
Copy link

codecov bot commented Dec 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.48%. Comparing base (f896dc7) to head (4a72ae5).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #987      +/-   ##
==========================================
+ Coverage   69.45%   69.48%   +0.03%     
==========================================
  Files         109      109              
  Lines        8944     8953       +9     
==========================================
+ Hits         6212     6221       +9     
  Misses       2732     2732              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joaodordio joaodordio marked this pull request as ready for review December 15, 2025 23:37
Copy link
Member

@Ayyanchira Ayyanchira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments

let timeIntervalToRefresh = TimeInterval(expirationDate) - dateProvider.currentDate.timeIntervalSince1970 - expirationRefreshPeriod
if timeIntervalToRefresh > 0 {
scheduleAuthTokenRefreshTimer(interval: timeIntervalToRefresh, isScheduledRefresh: true, successCallback: onSuccess)
resetRetryCount()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schedule wont necessarily guarantees successful retrieval of token. Should we reset the count here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that since the schedule is fired, we can reset the retries. regardless of its success.

Comment on lines 229 to 248
if self.localStorage.email != nil || self.localStorage.userId != nil {
self.isInScheduledRefreshCallback = isScheduledRefresh
self.requestNewAuthToken(hasFailedPriorAuth: false, onSuccess: successCallback, shouldIgnoreRetryPolicy: isScheduledRefresh)
self.isInScheduledRefreshCallback = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should be well tested manually as well

Copy link
Member Author

@joaodordio joaodordio force-pushed the fix/SDK-199-Auth-Retry-Policy-Completely-Bypassed branch from 5fd4de2 to 4de526e Compare December 17, 2025 12:17
# Conflicts:
#	swift-sdk/Internal/AuthManager.swift
@joaodordio joaodordio force-pushed the fix/SDK-199-Auth-Retry-Policy-Completely-Bypassed branch from 4de526e to 935883e Compare December 17, 2025 15:44
@sumeruchat
Copy link
Contributor

Pull Request Review Comment

Please Consider Refactoring to Option 1 from the Ticket

The current fix works, but I strongly recommend simplifying to the Option 1 approach that was outlined in the ticket description. Here's why:

The Core Problem

This bug exists because shouldIgnoreRetryPolicy: true bypasses safety limits (pauseAuthRetry and maxRetry). The ticket correctly identifies these as critical controls that prevent infinite loops and battery drain.

The question is: should we ever allow these safety limits to be bypassed?

Current Implementation: Selective Bypass

Your fix introduces a flag isInScheduledRefreshCallback to detect when we're in a scheduled refresh, then enforces safety limits only in that specific case:

private func shouldPauseRetry(_ shouldIgnoreRetryPolicy: Bool) -> Bool {
    if pauseAuthRetry {
        return true
    }
    
    // Only enforces maxRetry for scheduled refreshes
    if isInScheduledRefreshCallback {
        return retryCount >= authRetryPolicy.maxRetry
    }
    
    // Still allows bypass in other cases
    return retryCount >= authRetryPolicy.maxRetry && !shouldIgnoreRetryPolicy
}

This means: Any code path that calls requestNewAuthToken(shouldIgnoreRetryPolicy: true) outside of scheduled refreshes can still bypass maxRetry. You've fixed the symptom (scheduled refresh infinite loop) but left the backdoor open.

Option 1: Always Enforce Safety Limits

private func shouldPauseRetry(_ shouldIgnoreRetryPolicy: Bool) -> Bool {
    if pauseAuthRetry {
        return true
    }
    
    if retryCount >= authRetryPolicy.maxRetry {
        return true
    }
    
    return false
}

This means: No code path can bypass safety limits, ever. Period.

Why This Is Better

1. Prevents ALL infinite loops, not just scheduled refresh loops

If tomorrow someone adds code that calls requestNewAuthToken(shouldIgnoreRetryPolicy: true) in a different context (error handler, manual retry, etc.), your current fix won't protect them. Option 1 would.

2. No state management complexity

Your approach requires:

  • A new instance variable
  • Setting the flag before the call
  • Clearing the flag after the call
  • Future developers understanding this flag exists and what it means

Option 1 requires: none of that.

3. Thread-safe by default

Flag management patterns can be racy if methods get called from multiple threads. Option 1 has no such risk.

4. Matches the stated intent

The ticket says: "pauseAuthRetry and maxRetry are safety controls that should ALWAYS be respected."

Your implementation respects them sometimes. Option 1 respects them always.

5. Fewer lines of code

Current: 12 lines in shouldPauseRetry() + flag declaration + flag management
Option 1: 8 lines total

Simple code is maintainable code.

What About shouldIgnoreRetryPolicy?

I think there's confusion about what this parameter is supposed to do. Based on the name and ticket context, it seems like it was meant to bypass retry backoff delays, not safety limits.

If the intent is "try again immediately without waiting," that's fine. But that's different from "ignore maxRetry and keep trying forever."

Proposal: Safety limits (pause + maxRetry) should be absolute. If we need to bypass backoff timing, that's a separate concern and should be named/handled differently.

The Root Cause Matters

The ticket identifies the root cause perfectly:

"The shouldIgnoreRetryPolicy parameter was likely introduced to allow immediate retries in certain scenarios (like manual user-triggered refreshes), but inadvertently became part of the automatic token expiration flow, bypassing critical safety controls."

The fix shouldn't be to add a special case that detects when we're in the problematic flow. The fix should be to remove the ability to bypass safety controls because that's fundamentally dangerous regardless of the flow.

What If We Need to Bypass?

If there's a legitimate use case where we need to retry beyond maxRetry, that should be:

  1. Explicitly documented in code comments
  2. A separate code path with a different method name
  3. Something we consciously choose, not accidentally enable via a boolean parameter

Right now, shouldIgnoreRetryPolicy: true is a loaded gun sitting in the codebase. Your fix adds a safety on one trigger, but the gun is still loaded.

Bottom Line

Option 1 is:

  • Simpler (less code, no state management)
  • Safer (no infinite loop scenarios, thread-safe)
  • Clearer (safety limits are always enforced)
  • More maintainable (future developers can't bypass accidentally)

The only downside is it changes behavior for all callers of shouldIgnoreRetryPolicy: true. But that's actually a feature because bypassing safety limits was the bug in the first place.

Please consider refactoring to Option 1. If there's a specific scenario that requires bypassing maxRetry, let's discuss it explicitly so we can design a proper solution.

@sumeruchat sumeruchat self-requested a review December 18, 2025 16:26
Copy link
Contributor

@sumeruchat sumeruchat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Claude that we should take the simpler approach @joaodordio

@joaodordio
Copy link
Member Author

joaodordio commented Dec 19, 2025

Agree with Claude that we should take the simpler approach @joaodordio

@sumeruchat We need to have a bigger discussion about this.

The solution you are recommending based on Claude, removes the shouldIgnoreRetryPolicy flag from the implementation.

We need to further discuss offline why we need that flag and what the purpose of it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants