[FIX] Fix Redis Sentinel crash in MetricsMixin#1856
Conversation
MetricsMixin was creating a raw StrictRedis client using REDIS_HOST/PORT directly, which in Sentinel HA mode points to the Sentinel process (port 26379). Sentinel doesn't support the SELECT command, causing ResponseError on every /answer-prompt call. - Replace StrictRedis with create_redis_client() from unstract-core which handles Sentinel discovery via master_for() - Add unstract-core as sdk1 dependency (all services using sdk1 already depend on core, so zero new transitive deps) - Add error handling around set_start_time() and collect_metrics() so Redis failures degrade gracefully instead of crashing requests - Add missing COPY for unstract/core in tool Dockerfiles (classifier, structure, text_extractor) since sdk1 now depends on core Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughAdds unstract-core to builds and editable tool requirements; updates SDK pyproject to depend on unstract-core; refactors Redis client creation to a centralized factory with SSL support and defensive error handling, and updates worker/cache initialization to use the factory. Changes
Sequence Diagram(s)sequenceDiagram
participant Metrics as MetricsMixin
participant Factory as create_redis_client
participant Redis as RedisServer
participant Logger as Logger
Metrics->>Factory: request redis client (db=1)
Factory-->>Metrics: Redis client (or None)
alt client available
Metrics->>Redis: GET/SET/LPUSH (metric ops)
Redis-->>Metrics: responses
Metrics->>Logger: log success
else client None or error
Metrics->>Logger: log error and nullify client
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a Redis Sentinel crash in Key changes:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant AP as /answer-prompt
participant MM as MetricsMixin.__init__
participant CRC as create_redis_client(db=1)
participant S as Redis Sentinel :26379
participant RM as Redis Master :6379
AP->>MM: instantiate(run_id)
MM->>CRC: create_redis_client(db=1)
alt Sentinel mode (REDIS_SENTINEL_MODE=true)
CRC->>S: discover master
S-->>CRC: master address
CRC->>RM: PING (validates connection)
RM-->>CRC: PONG
CRC-->>MM: Sentinel-backed redis.Redis client
else Standalone mode
CRC->>RM: connect (lazy)
CRC-->>MM: redis.Redis client
end
MM->>RM: SET metrics:run_id:op_id time.time() EX 86400
RM-->>MM: OK
Note over MM: (previously: SET issued to Sentinel port → SELECT 1 error)
AP->>MM: collect_metrics()
MM->>RM: EXISTS / GET / DELETE metrics:run_id:op_id
RM-->>MM: time_taken
MM-->>AP: {time_taken(s): X.XXX}
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
unstract/sdk1/src/unstract/sdk1/utils/metrics_mixin.py (1)
54-60: Avoidexists()+get()race and extra Redis roundtrip.
exists()followed byget()can fail if the key expires/deletes between calls; reading once and handlingNoneis simpler and quieter.♻️ Proposed change
- if not self.redis_client.exists(self.redis_key): - return {self.TIME_TAKEN_KEY: None} - - start_time = float(self.redis_client.get(self.redis_key)) - time_taken = round(time.time() - start_time, 3) - - self.redis_client.delete(self.redis_key) + start_time_raw = self.redis_client.get(self.redis_key) + if start_time_raw is None: + return {self.TIME_TAKEN_KEY: None} + + time_taken = round(time.time() - float(start_time_raw), 3) + self.redis_client.delete(self.redis_key)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/metrics_mixin.py` around lines 54 - 60, Replace the exists()+get() pattern with a single read and a None check: call self.redis_client.get(self.redis_key) once, return {self.TIME_TAKEN_KEY: None} if the result is None, otherwise convert the value to float, compute time_taken = round(time.time() - start_time, 3), then delete the key with self.redis_client.delete(self.redis_key); reference the existing symbols self.redis_client, self.redis_key, and self.TIME_TAKEN_KEY when making this change to avoid the race and extra Redis roundtrip.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/pyproject.toml`:
- Around line 79-80: The pyproject override [tool.uv.sources] only affects
uv-based installs and pip-based builds will fail to resolve the local
unstract-core dependency; fix by adding an explicit editable local install of
the core package to the pip requirements used by the failing Docker builds: add
"-e file:/unstract/core" to tools/text_extractor/requirements.txt and
tools/classifier/requirements.txt (or alternatively change those Dockerfiles to
use uv pip install like tools/structure/Dockerfile does), ensuring the
dependency referenced in pyproject.toml (unstract-core) is available to
pip-based installs.
---
Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/metrics_mixin.py`:
- Around line 54-60: Replace the exists()+get() pattern with a single read and a
None check: call self.redis_client.get(self.redis_key) once, return
{self.TIME_TAKEN_KEY: None} if the result is None, otherwise convert the value
to float, compute time_taken = round(time.time() - start_time, 3), then delete
the key with self.redis_client.delete(self.redis_key); reference the existing
symbols self.redis_client, self.redis_key, and self.TIME_TAKEN_KEY when making
this change to avoid the race and extra Redis roundtrip.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 037505e2-002b-4d73-9b91-10bf2ad8c741
⛔ Files ignored due to path filters (5)
backend/uv.lockis excluded by!**/*.lockplatform-service/uv.lockis excluded by!**/*.lockprompt-service/uv.lockis excluded by!**/*.lockunstract/filesystem/uv.lockis excluded by!**/*.lockworkers/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
tools/classifier/Dockerfiletools/structure/Dockerfiletools/text_extractor/Dockerfileunstract/sdk1/pyproject.tomlunstract/sdk1/src/unstract/sdk1/utils/metrics_mixin.py
Tool Dockerfiles (classifier, text_extractor, structure) use pip, not uv. Since sdk1 now depends on unstract-core (which isn't on PyPI), pip can't resolve it via [tool.uv.sources]. Adding explicit editable install of core before sdk1 ensures pip finds it locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Worker cache modules (cache_backends.py, cache_utils.py) duplicated
sentinel/standalone branching logic for Redis client creation. Both
now use create_redis_client() from unstract-core as the single entry
point, which already handles mode detection internally.
- Add SSL support (ssl, ssl_cert_reqs) to create_redis_client() via
env vars ({prefix}SSL, {prefix}SSL_CERT_REQS), disabled by default
- Remove if-sentinel-else-standalone branching in cache_backends.py
and cache_utils.py — replaced with single create_redis_client() call
- Remove unused os and redis imports from simplified modules
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
unstract/core/src/unstract/core/cache/redis_client.py (1)
300-308: Consider adding an__init__method for safer instantiation.The class declares
redis_client: redis.Redisas a type hint without a default, and relies onfrom_env()or manual__new__+ attribute assignment. This works but is fragile—calling methods on an improperly constructed instance will raiseAttributeError.Adding an explicit
__init__would make the expected construction pattern clearer and prevent misuse.💡 Optional: Add __init__ for explicit construction
class RedisClient: """Wrapper around redis.Redis providing a consistent interface. Always instantiate via RedisClient.from_env(), which delegates to create_redis_client() and handles both Sentinel and standalone modes. """ - redis_client: redis.Redis + def __init__(self, redis_client: redis.Redis): + """Initialize with an existing redis.Redis client. + + Prefer using from_env() for environment-based construction. + """ + self.redis_client = redis_client # ... existing methods ... `@classmethod` def from_env(cls, env_prefix: str = "REDIS_") -> "RedisClient": """Create client from environment variables.""" - instance = cls.__new__(cls) - instance.redis_client = create_redis_client( + return cls(create_redis_client( env_prefix=env_prefix, decode_responses=True - ) - return instance + ))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/core/src/unstract/core/cache/redis_client.py` around lines 300 - 308, Add an explicit __init__ to RedisClient that accepts a redis_client: redis.Redis (or Optional[redis.Redis] with validation) and assigns it to self.redis_client, and raise a clear ValueError if None or not an instance of redis.Redis; update the class docstring to note the preferred construction via RedisClient.from_env() or RedisClient(redis_client=...) so callers don't accidentally create instances without the required attribute. Ensure references to create_redis_client() and from_env() remain compatible with the new constructor.workers/shared/cache/cache_backends.py (1)
102-111: Consider usingRedisClient.from_env()or a proper constructor.The manual
__new__+ attribute assignment pattern works but bypasses any future__init__logic. Since this code needs custom connection parameters thatfrom_env()doesn't support, consider either:
- Extending
from_env()to accept additional kwargs, or- Adding a proper
__init__toRedisClient(as suggested in the other file)For now, adding a comment explaining why this pattern is necessary would help maintainability.
💡 Option 1: If RedisClient gets an __init__, use it directly
raw_client = create_redis_client( env_prefix="CACHE_REDIS_", decode_responses=True, socket_timeout=5, socket_connect_timeout=5, health_check_interval=30, ) - # Wrap in RedisClient for interface compatibility - self.redis_client = RedisClient.__new__(RedisClient) - self.redis_client.redis_client = raw_client + # Wrap in RedisClient for interface compatibility + self.redis_client = RedisClient(raw_client)💡 Option 2: If keeping __new__ pattern, add explanatory comment
raw_client = create_redis_client( env_prefix="CACHE_REDIS_", decode_responses=True, socket_timeout=5, socket_connect_timeout=5, health_check_interval=30, ) - # Wrap in RedisClient for interface compatibility + # Wrap raw client in RedisClient for interface compatibility. + # Using __new__ because RedisClient has no __init__ and from_env() + # doesn't support custom connection parameters. self.redis_client = RedisClient.__new__(RedisClient) self.redis_client.redis_client = raw_client🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@workers/shared/cache/cache_backends.py` around lines 102 - 111, The current code instantiates RedisClient via RedisClient.__new__ and assigns redis_client directly after create_redis_client; update this to either extend RedisClient.from_env to accept the custom kwargs from create_redis_client or add a proper RedisClient.__init__ that accepts (and uses) decode_responses/socket_timeout/socket_connect_timeout/health_check_interval and call that instead of using __new__; if you choose to keep the __new__ pattern for now, add a clear explanatory comment above the RedisClient.__new__ usage referencing create_redis_client and why __init__ is bypassed (mention RedisClient, RedisClient.from_env, and the redis_client attribute) so future maintainers know this is intentional and should be revisited when RedisClient gains an __init__ or extended from_env.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@unstract/core/src/unstract/core/cache/redis_client.py`:
- Around line 300-308: Add an explicit __init__ to RedisClient that accepts a
redis_client: redis.Redis (or Optional[redis.Redis] with validation) and assigns
it to self.redis_client, and raise a clear ValueError if None or not an instance
of redis.Redis; update the class docstring to note the preferred construction
via RedisClient.from_env() or RedisClient(redis_client=...) so callers don't
accidentally create instances without the required attribute. Ensure references
to create_redis_client() and from_env() remain compatible with the new
constructor.
In `@workers/shared/cache/cache_backends.py`:
- Around line 102-111: The current code instantiates RedisClient via
RedisClient.__new__ and assigns redis_client directly after create_redis_client;
update this to either extend RedisClient.from_env to accept the custom kwargs
from create_redis_client or add a proper RedisClient.__init__ that accepts (and
uses)
decode_responses/socket_timeout/socket_connect_timeout/health_check_interval and
call that instead of using __new__; if you choose to keep the __new__ pattern
for now, add a clear explanatory comment above the RedisClient.__new__ usage
referencing create_redis_client and why __init__ is bypassed (mention
RedisClient, RedisClient.from_env, and the redis_client attribute) so future
maintainers know this is intentional and should be revisited when RedisClient
gains an __init__ or extended from_env.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cad729f8-4faf-4da5-8f3f-de5d6e768698
📒 Files selected for processing (3)
unstract/core/src/unstract/core/cache/redis_client.pyworkers/shared/cache/cache_backends.pyworkers/shared/infrastructure/caching/cache_utils.py
|
All reviewer agents comments are addressed, hence proceeding for now. |
|
Skipped reviews on this due to urgency, can be done and addressed later. This PR also includes changes from a minor refactor in #1857 |



What
MetricsMixinin sdk1 to use Sentinel-aware Redis client factory instead of rawStrictRedisunstract-coreas a dependency ofsdk1COPY unstract/corein tool DockerfilesWhy
REDIS_HOST/PORTpoint to the Sentinel process (port 26379), not a Redis masterMetricsMixinwas connecting directly to Sentinel and issuingSELECT 1, which Sentinel doesn't supportredis.exceptions.ResponseError: unknown command 'SELECT'→ 500 errors on/answer-promptin onprem-haHow
StrictRedis(host, port, db=1)withcreate_redis_client(db=1)fromunstract.core.cache.redis_client, which handles Sentinel discovery viamaster_for()unstract-coreto sdk1'spyproject.tomldependencies (all services using sdk1 already depend on core — zero new transitive deps)set_start_time()andcollect_metrics()in try/except so Redis failures degrade gracefully (time_taken=None) instead of crashing requestsCOPY ${BUILD_PACKAGES_PATH}/core /unstract/coreto classifier, structure, and text_extractor Dockerfiles since sdk1 now depends on core at build timeCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
create_redis_client()already handles both standalone and Sentinel modes transparently. In standalone mode, behavior is identical to the oldStrictRedisconstructor. The added error handling only makes metrics more resilient — worst case,time_takenreportsNoneinstead of crashing.Database Migrations
Env Config
REDIS_SENTINEL_MODE,REDIS_HOST,REDIS_PORT,REDIS_SENTINEL_MASTER_NAMERelevant Docs
Related Issues or PRs
Dependencies Versions
unstract-coreas dependency ofunstract-sdk1(internal package, no new external deps)Notes on Testing
uv.lockfiles get auto-updated by CI for all affected services/answer-promptshould work without SELECT errorsScreenshots
N/A
Checklist
I have read and understood the Contribution Guidelines.