-
Notifications
You must be signed in to change notification settings - Fork 22
Add SkewnessMetric and KurtosisMetric with SQL/Search datasource support and tests #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add SkewnessMetric and KurtosisMetric with SQL/Search datasource support and tests #356
Conversation
|
Warning Rate limit exceeded@Necrouser045 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 35 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds Skewness and Kurtosis metrics: two new metric classes, SQL implementations (with identifier sanitization), search datasource stubs that raise NotImplementedError, metric manager registration, and tests covering SQL and search variants. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/core/metric/test_numeric_metric.py (1)
1037-1158: Skewness/Kurtosis tests are consistent and usefulThe new
TestSkewnessAndKurtosisMetricsuite:
- Mirrors the existing numeric-metric patterns (Postgres vs. OpenSearch, with/without filters, plus validation).
- Correctly stubs
query_get_skewness/query_get_kurtosison mocks, so tests are isolated from datasource details.- Exercises validation semantics in both directions (
ltandgtthresholds).If you want parity with some of the other metric tests, you could optionally add OpenSearch + filter cases for skewness and kurtosis, but the current coverage is already solid.
dcs_core/core/datasource/search_datasource.py (1)
449-455: Search datasource stubs are fine; consider adding type hints/docstringsDefining
query_get_skewness/query_get_kurtosisasNotImplementedErrorstubs is a reasonable interim step so metrics can at least fail fast when used with search backends.For consistency with the rest of this class, consider tightening them up with type hints and brief docstrings, e.g.:
- def query_get_skewness(self, index_name, field, filters=None): - raise NotImplementedError("Skewness metric not implemented for search datasource") + def query_get_skewness( + self, index_name: str, field: str, filters: Dict | None = None + ) -> float: + """Get skewness for a numeric field in a search index.""" + raise NotImplementedError( + "Skewness metric not implemented for search datasource" + )and similarly for
query_get_kurtosis.dcs_core/core/metric/numeric_metric.py (1)
415-479: New Skewness/Kurtosis metric implementations look correct and consistentThe
SkewnessMetricandKurtosisMetricclasses:
- Use
MetricIdentity.generate_identitywithMetricsType.SKEWNESS/MetricsType.KURTOSISand the same identity fields as otherFieldMetricsimplementations.- Correctly dispatch
_generate_metric_valuebased on datasource type:
SQLDataSource→query_get_skewness/query_get_kurtosiswith table/field/filters.SearchIndexDataSource→ corresponding search methods with index_name/field/filters.- Fall back to
ValueError("Invalid data source type")in the same style as existing metrics.From a core-metric perspective, this wiring is solid. Any future change to use a more specific exception type should probably be applied uniformly across all
FieldMetricssubclasses, not just these two.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
dcs_core/core/datasource/search_datasource.py(1 hunks)dcs_core/core/datasource/sql_datasource.py(1 hunks)dcs_core/core/metric/manager.py(2 hunks)dcs_core/core/metric/numeric_metric.py(1 hunks)tests/core/metric/test_numeric_metric.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
dcs_core/core/datasource/sql_datasource.py (1)
dcs_core/core/datasource/search_datasource.py (2)
query_get_skewness(450-451)query_get_kurtosis(453-454)
dcs_core/core/datasource/search_datasource.py (1)
dcs_core/core/datasource/sql_datasource.py (2)
query_get_skewness(1138-1140)query_get_kurtosis(1143-1145)
dcs_core/core/metric/numeric_metric.py (4)
dcs_core/core/metric/base.py (1)
FieldMetrics(213-234)dcs_core/core/common/models/metric.py (1)
MetricsType(26-53)dcs_core/core/datasource/sql_datasource.py (2)
query_get_skewness(1138-1140)query_get_kurtosis(1143-1145)dcs_core/core/datasource/search_datasource.py (2)
query_get_skewness(450-451)query_get_kurtosis(453-454)
tests/core/metric/test_numeric_metric.py (5)
dcs_core/core/metric/numeric_metric.py (2)
SkewnessMetric(415-446)KurtosisMetric(448-479)dcs_core/core/datasource/sql_datasource.py (3)
SQLDataSource(28-1145)query_get_skewness(1138-1140)query_get_kurtosis(1143-1145)dcs_core/core/datasource/search_datasource.py (3)
query_get_skewness(450-451)SearchIndexDataSource(23-454)query_get_kurtosis(453-454)dcs_core/core/common/models/metric.py (1)
MetricsType(26-53)dcs_core/core/metric/base.py (1)
get_metric_value(127-175)
dcs_core/core/metric/manager.py (2)
dcs_core/core/metric/numeric_metric.py (2)
SkewnessMetric(415-446)KurtosisMetric(448-479)dcs_core/core/common/models/metric.py (1)
MetricsType(26-53)
🪛 Ruff (0.14.6)
dcs_core/core/datasource/sql_datasource.py
1139-1139: Possible SQL injection vector through string-based query construction
(S608)
1144-1144: Possible SQL injection vector through string-based query construction
(S608)
dcs_core/core/metric/numeric_metric.py
446-446: Prefer TypeError exception for invalid type
(TRY004)
446-446: Avoid specifying long messages outside the exception class
(TRY003)
479-479: Prefer TypeError exception for invalid type
(TRY004)
479-479: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (3)
dcs_core/core/metric/manager.py (1)
39-40: New numeric metric imports look correctImporting
SkewnessMetricandKurtosisMetrichere is consistent with how other numeric metrics are exposed for theglobals()factory; no issues from this change.tests/core/metric/test_numeric_metric.py (1)
33-34: Imports for new metrics are wired correctlyExposing
SkewnessMetricandKurtosisMetricin the test module aligns with the new metric implementations innumeric_metric.py; nothing to change here.dcs_core/core/datasource/sql_datasource.py (1)
1138-1145: Align skewness/kurtosis SQL helpers with existing query_ patterns (and avoid S608 risk)*The new methods currently do:
def query_get_skewness(self, table, field, filters=None): query = f"SELECT skewness({field}) FROM {table}" return self.execute_query(query, filters) def query_get_kurtosis(self, table, field, filters=None): query = f"SELECT kurtosis({field}) FROM {table}" return self.execute_query(query, filters)Two problems:
- Inconsistent with the rest of the class: other helpers (
query_get_avg,query_get_variance, etc.) usequalified_table_nameandquote_column, and explicitly splice filters into the SQL string.- Static-analysis‑flagged SQL construction: embedding raw
table/fieldwithout quoting is exactly what Ruff's S608 is warning about and can lead to malformed SQL or injection if those values are not fully controlled.To match existing patterns and address the warning, consider something along these lines:
- def query_get_skewness(self, table, field, filters=None): - query = f"SELECT skewness({field}) FROM {table}" - return self.execute_query(query, filters) + def query_get_skewness( + self, table: str, field: str, filters: str | None = None + ) -> float: + qualified_table_name = self.qualified_table_name(table) + field = self.quote_column(field) + query = f"SELECT skewness({field}) FROM {qualified_table_name}" + if filters: + query += f" WHERE {filters}" + result = self.fetchone(query) + return result[0] if result else None @@ - def query_get_kurtosis(self, table, field, filters=None): - query = f"SELECT kurtosis({field}) FROM {table}" - return self.execute_query(query, filters) + def query_get_kurtosis( + self, table: str, field: str, filters: str | None = None + ) -> float: + qualified_table_name = self.qualified_table_name(table) + field = self.quote_column(field) + query = f"SELECT kurtosis({field}) FROM {qualified_table_name}" + if filters: + query += f" WHERE {filters}" + result = self.fetchone(query) + return result[0] if result else NoneThis keeps behavior in line with other numeric metrics and reduces the surface area for SQL construction issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
dcs_core/core/datasource/sql_datasource.py (1)
1159-1164: Critical: Same issues asquery_get_skewness-execute_queryundefined, pattern inconsistency, database compatibility.This method has identical issues to
query_get_skewness:
self.execute_query()does not exist in the class- Does not follow established patterns using
qualified_table_name()andquote_column()kurtosis()is not a standard SQL functionApply the same fix pattern:
def query_get_kurtosis(self, table, field, filters=None): - field = self._sanitize_identifier(field) - table = self._sanitize_identifier(table) - - query = f"SELECT kurtosis({field}) FROM {table}" - return self.execute_query(query, filters) + qualified_table_name = self.qualified_table_name(table) + field = self.quote_column(field) + query = f"SELECT kurtosis({field}) FROM {qualified_table_name}" + if filters: + query += f" WHERE {filters}" + return self.fetchone(query)[0]
🧹 Nitpick comments (1)
dcs_core/core/datasource/sql_datasource.py (1)
1138-1149: UseTypeErrorfor type validation and consider integration with existing quoting methods.The sanitizer validates identifiers correctly for alphanumeric and underscore characters. However:
- Type validation should raise
TypeErrorrather thanValueError(line 1144).- This sanitizer doesn't integrate with the existing
quote_column()andqualified_table_name()methods used throughout this class. Consider whether the sanitizer should be used in conjunction with those methods or if it's meant to replace them.Apply this diff to use
TypeErrorfor type validation:def _sanitize_identifier(self, identifier: str) -> str: """ Sanitize SQL identifiers to prevent injection. Allows only alphanumeric characters and underscores. """ if not isinstance(identifier, str): - raise ValueError("Identifier must be a string") + raise TypeError("Identifier must be a string") if not identifier.replace("_", "").isalnum(): raise ValueError(f"Invalid SQL identifier: {identifier}") return identifier
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dcs_core/core/datasource/sql_datasource.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
dcs_core/core/datasource/sql_datasource.py (1)
dcs_core/core/datasource/search_datasource.py (2)
query_get_skewness(450-451)query_get_kurtosis(453-454)
🪛 Ruff (0.14.6)
dcs_core/core/datasource/sql_datasource.py
1144-1144: Prefer TypeError exception for invalid type
(TRY004)
1144-1144: Avoid specifying long messages outside the exception class
(TRY003)
1147-1147: Avoid specifying long messages outside the exception class
(TRY003)
1155-1155: Possible SQL injection vector through string-based query construction
(S608)
1163-1163: Possible SQL injection vector through string-based query construction
(S608)
🔇 Additional comments (1)
dcs_core/core/datasource/sql_datasource.py (1)
1151-1156: I was unable to verify the claims in this review comment due to repository access issues. The shell scripts failed to clone the repository, preventing me from examining:
- Whether
execute_query()method actually exists in the class- The actual implementation patterns of other
query_get_*methods (e.g.,query_get_max,query_get_avg)- Whether
_sanitize_identifier()is a legitimate method in the class- Whether the suggested diff pattern using
qualified_table_name(),quote_column(), andfetchone()is correct- How
skewness()function is used elsewhere in the codebaseThe review comment makes specific, actionable claims about critical runtime failures and inconsistent patterns that require verification against the actual codebase. Without access to verify these factual assertions, I cannot confirm whether the issues are real or if the suggested fixes are appropriate.
User description
Summary
This PR adds full support for two statistical distribution metrics:
These metrics already existed in
MetricsTypebut lacked implementations,datasource handlers, and test coverage. This PR completes the feature.
Changes Included
Core
SkewnessMetricandKurtosisMetricimplementations innumeric_metric.pymanager.pyDatasources
query_get_skewnessquery_get_kurtosisTests
All new tests pass.
Why This is Needed
Although skewness and kurtosis were defined in the enum, the framework did not support them.
Users attempting to use these metrics encountered runtime errors. This PR brings complete,
tested support for these two metrics.
Result
7 tests passed successfully
Security Fixes
_sanitize_identifierto validate SQL identifiers.Thanks for reviewing!
PR Type
Enhancement
Description
Implement
SkewnessMetricandKurtosisMetricclasses with full datasource supportAdd SQL datasource handlers for skewness and kurtosis calculations
Add placeholder Search datasource handlers raising
NotImplementedErrorRegister both metrics in the metric manager for framework integration
Add comprehensive test suite covering SQL, Search, filters, and validation
Diagram Walkthrough
File Walkthrough
numeric_metric.py
Add SkewnessMetric and KurtosisMetric implementationsdcs_core/core/metric/numeric_metric.py
SkewnessMetricclass extendingFieldMetricswith metric identitygeneration
KurtosisMetricclass extendingFieldMetricswith metric identitygeneration
_generate_metric_value()supporting SQL andSearch datasources
ValueErrorfor invaliddatasource types
sql_datasource.py
Add SQL skewness and kurtosis query handlersdcs_core/core/datasource/sql_datasource.py
query_get_skewness()method executing SQL skewness function onfield
query_get_kurtosis()method executing SQL kurtosis function onfield
execute_query()helpersearch_datasource.py
Add Search datasource placeholder handlersdcs_core/core/datasource/search_datasource.py
query_get_skewness()method raisingNotImplementedErrorquery_get_kurtosis()method raisingNotImplementedErrormanager.py
Register skewness and kurtosis metrics in managerdcs_core/core/metric/manager.py
SkewnessMetricandKurtosisMetricfromnumeric_metricmoduleMETRIC_MAPPINGdictionary with enum valuesMetricsType.SKEWNESSto"SkewnessMetric"andMetricsType.KURTOSISto
"KurtosisMetric"test_numeric_metric.py
Add comprehensive skewness and kurtosis metric teststests/core/metric/test_numeric_metric.py
SkewnessMetricandKurtosisMetrictest classesTestSkewnessAndKurtosisMetricclassand validation thresholds
metrics
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.