Hot-reload procedure completed-cleaner & disk space warning threshold (+ fix evict-TTL unit)#18091
Open
CRZbulabula wants to merge 3 commits into
Open
Hot-reload procedure completed-cleaner & disk space warning threshold (+ fix evict-TTL unit)#18091CRZbulabula wants to merge 3 commits into
CRZbulabula wants to merge 3 commits into
Conversation
…eshold Config template (iotdb-system.properties.template): - Hide (remove from template, still parsed with config-class defaults): enable_auto_leader_balance_for_ratis_consensus, enable_topology_probing, topology_probing_base_interval_in_ms, topology_probing_timeout_ratio. - Change effectiveMode restart -> hot_reload for procedure_completed_evict_ttl, procedure_completed_clean_interval and disk_space_warning_threshold. procedure_completed_evict_ttl / procedure_completed_clean_interval hot reload: ProcedureExecutor keeps a reference to the CompletedProcedureRecycler and adds restartCompletedCleaner() to re-schedule it with the new interval/TTL (both are captured at construction). ProcedureManager.updateCompletedProcedureCleaner() re-applies on the running (leader) executor; ConfigManager.setConfiguration captures the previous values and re-applies via handleProcedureCleanerHotReload. The recycler reference is volatile and both writers are synchronized. disk_space_warning_threshold hot reload: CommonDescriptor.loadHotModifiedDiskSpaceWarningThreshold() parses, validates ([0,1)) and applies the value; it is shared by the ConfigNode path (so SHOW VARIABLES / cluster-parameter consistency reflect the change) and the DataNode path, which additionally refreshes the JVMCommonUtils static copy that the ReadOnly disk guard consumes.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18091 +/- ##
============================================
+ Coverage 41.57% 41.66% +0.09%
Complexity 318 318
============================================
Files 5294 5296 +2
Lines 371424 371732 +308
Branches 48061 48094 +33
============================================
+ Hits 154410 154881 +471
+ Misses 217014 216851 -163 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
CompletedProcedureRecycler converted the clean interval from seconds to milliseconds but stored the evict TTL as raw seconds, while CompletedProcedureContainer.isExpired compares it against a System.currentTimeMillis() delta. As a result completed procedures were evicted after ~evictTTL milliseconds instead of seconds (e.g. ~60 ms for the default 60 s), making the results effectively un-queryable. Convert the evict TTL to milliseconds at construction (renamed to evictTTLInMs for clarity). Tests: - CompletedProcedureRecyclerTest: the evict TTL is interpreted in seconds; a freshly completed procedure survives a large TTL while a stale one is evicted (regression guard for the seconds/millis bug). - TestProcedureExecutor.testRestartCompletedCleanerAppliesNewEvictTtl: hot reload replaces the recycler with one carrying the new evict TTL. - CommonDescriptorDiskSpaceWarningThresholdTest: disk_space_warning_threshold hot reload applies a valid value, keeps the current value when absent, and rejects out-of-range values without mutating the config.
…leaner ref The topology-probing tunables (enable_topology_probing, topology_probing_base_interval_in_ms, topology_probing_timeout_ratio) and enable_auto_leader_balance_for_ratis_consensus were removed from the config template, but PR #17933 (PingCode V2-995) deliberately exposed the three topology keys so 'set configuration' can toggle them at runtime, guarded by IoTDBSetConfigurationIT.testSetTopologyProbingConfiguration. Removing them reverted that fix and broke the test with '301: ignored config items ... immutable or undefined'. Restore all four keys in the template; this PR now only flips procedure_completed_evict_ttl / procedure_completed_clean_interval / disk_space_warning_threshold to hot_reload plus the hot-reload plumbing. Also address SonarCloud java:S3077: the completedProcedureRecycler field was 'volatile', but Sonar flags volatile on a mutable object reference as insufficient. All accesses already occur under this instance's monitor (startCompletedCleaner / restartCompletedCleaner are synchronized), so drop volatile and synchronize the @testonly getter, guarding the field purely by the monitor.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Motivation
While tidying the ConfigNode cluster-operations tunables, several parameters were documented as
restart-only despite being safely applicable at runtime. This PR:effectiveMode: restart→hot_reload, with the runtime plumbing to actually apply the new value):procedure_completed_evict_ttlprocedure_completed_clean_intervaldisk_space_warning_thresholdChanges
Procedure completed-cleaner hot reload
procedure_completed_evict_ttlandprocedure_completed_clean_intervalare captured byCompletedProcedureRecyclerat construction, so applying new values requires rescheduling it:ProcedureExecutorkeeps a reference to the recycler and addsrestartCompletedCleaner()(remove old + schedule fresh). All accesses to the reference field occur under the executor's monitor (startCompletedCleaner/restartCompletedCleaneraresynchronized), so the field is guarded bysynchronizedfor both mutual exclusion and visibility.ProcedureManager.updateCompletedProcedureCleaner()re-applies on the running (leader) executor; followers are no-ops and pick up the value on the next leader switch.ConfigManager.setConfigurationcaptures the previous values and re-applies viahandleProcedureCleanerHotReload, mirroring the existinghandleHeartbeatIntervalHotReload/handleTopologyProbingHotReloadpattern. Values are validated (> 0), consistent with existing hot-reload loaders.disk_space_warning_thresholdhot reloadCommonDescriptor.loadHotModifiedDiskSpaceWarningThreshold()parses, validates ([0, 1)) and applies the value.SHOW VARIABLES/ cluster-parameter consistency checks reflect the change, and by the DataNode path, which additionally refreshes theJVMCommonUtilsstatic copy that the ReadOnly disk guard actually consumes.Bug fix: completed-procedure evict TTL unit
CompletedProcedureRecyclerconverted the clean interval from seconds to milliseconds but stored the evict TTL as raw seconds, whileCompletedProcedureContainer.isExpiredcompares it against aSystem.currentTimeMillis()delta. Completed procedures were therefore evicted after ~evictTTLmilliseconds instead of seconds (≈60 ms for the default 60 s), making completed results effectively un-queryable. The TTL is now converted to milliseconds at construction (evictTTLInMs).effectiveMode/ templateiotdb-system.properties.templateupdated accordingly (the three keys above flipped tohot_reload).Testing
mvn compilepasses fornode-commons,confignode,datanode;spotless:checkclean.CompletedProcedureRecyclerTest— evict TTL is interpreted in seconds; a freshly completed procedure survives a large TTL while a stale one is evicted (regression guard for the seconds/millis fix).TestProcedureExecutor#testRestartCompletedCleanerAppliesNewEvictTtl— hot reload swaps the recycler to one carrying the new TTL.CommonDescriptorDiskSpaceWarningThresholdTest— disk threshold hot reload applies a valid value, keeps the current value when absent, and rejects out-of-range values without mutating the config.