Improve engine health monitoring and wakeup scheduling by lvhan028 · Pull Request #4645 · InternLM/lmdeploy

lvhan028 · 2026-06-04T08:21:22Z

Motivation

Health checks were too aggressive for PyTorch MP/Ray deployments: short probe timeouts and overlapping polls caused false unhealthy results while the engine was busy or recovering from sleep/wakeup.
Synchronous POST /wakeup also blocked the FastAPI event loop during long warmup, starving background health probes.
Scheduler tick advanced only after a full forward completed, so long prefills looked stalled to health logic.

Modification

lmdeploy/serve/core/health.py

Add env overrides: LMDEPLOY_HEALTH_POLL_INTERVAL, LMDEPLOY_HEALTH_PROBE_TIMEOUT, LMDEPLOY_HEALTH_UNHEALTHY_AFTER.
Change defaults to 12s poll interval, 10s probe timeout, 90s unhealthy/stale threshold.
Warn if poll_interval <= probe_timeout.
On pending probe result: log and skip snapshot update (keep last successful state).
Expire stale probes only when last snapshot was healthy (not sleeping).

lmdeploy/serve/core/async_engine.py

Return status='pending' instead of unhealthy when a previous health probe is still running.
Make wakeup() async and run engine.wakeup() via asyncio.to_thread() to avoid blocking the API event loop.

lmdeploy/serve/openai/api_server.py

await async wakeup() in the /wakeup handler.

lmdeploy/pytorch/engine/inputs_maker.py
lmdeploy/pytorch/engine/engine_loop.py
lmdeploy/pytorch/paging/scheduler.py

Call scheduler.tick() after each forward_async() (one tick per forward dispatch).

Copilot

Pull request overview

This PR tunes engine health monitoring and scheduling progress reporting to reduce false “unhealthy” states during long/overlapping probes and long prefills, and makes the /wakeup endpoint non-blocking for the FastAPI event loop.

Changes:

Add environment-variable overrides and updated defaults for health monitor polling/timeout/staleness behavior, including “pending” probe handling.
Make AsyncEngine.wakeup() asynchronous by offloading the blocking backend wakeup to a worker thread; update the OpenAI-compatible /wakeup route to await it.
Advance scheduler_tick once per forward_async() dispatch (moved to inputs dispatch path) so health logic sees progress even during long prefills.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
lmdeploy/serve/openai/api_server.py	Await the now-async engine `wakeup()` in the `/wakeup` handler.
lmdeploy/serve/core/health.py	Add env overrides and adjust probe/poll/staleness logic, including skipping snapshot updates on pending probes.
lmdeploy/serve/core/async_engine.py	Return `pending` when probes overlap; make `wakeup()` async via `asyncio.to_thread()`.
lmdeploy/pytorch/paging/scheduler.py	Clarify `tick()` semantics as one step per forward dispatch.
lmdeploy/pytorch/engine/inputs_maker.py	Call `scheduler.tick()` after each `forward_async()` dispatch.
lmdeploy/pytorch/engine/engine_loop.py	Remove the old `scheduler.tick()` location to avoid double-counting / wrong timing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _env_override_float(env_var: str, value: float) -> float:
+    """Return ``value`` unless ``env_var`` is set, then parse and return it."""
+    env_value = os.getenv(env_var)
+    if env_value is None:
+        return value
+    try:
+        return float(env_value)
+    except ValueError:
+        return value


        await self.engine.sleep(level)

-    def wakeup(self, tags: list[str] | None = None):
+    async def wakeup(self, tags: list[str] | None = None):


RunningLeon · 2026-06-08T10:35:54Z

            logger.warning(f'some tag in {tags} not in sleeping tags {self.sleeping_tags}')
            return
-        self.engine.wakeup(tags)
+        await asyncio.to_thread(self.engine.wakeup, tags)


quote:

await asyncio.to_thread(self.engine.wakeup, tags) runs PyTorch Engine.wakeup() off the event-loop thread. That path calls EngineLoop.resume_from_sleep() (lmdeploy/pytorch/engine/engine.py:553), which sets loop-owned asyncio.Events (lmdeploy/pytorch/engine/engine_loop.py:181-188). This is not thread-safe and can leave the engine loop stuck after /wakeup, or fail under asyncio debug. I’d split the blocking backend wakeup/warmup from the event-loop resume, or resume via loop.call_soon_threadsafe

Copilot AI review requested due to automatic review settings June 4, 2026 08:21

Copilot started reviewing on behalf of lvhan028 June 4, 2026 08:21 View session

lvhan028 requested a review from RunningLeon June 4, 2026 08:21

lvhan028 added the Bug:P0 label Jun 4, 2026

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Improve engine health monitoring and wakeup scheduling

ecebe47

lvhan028 force-pushed the fix-health branch from 525cfb6 to ecebe47 Compare June 8, 2026 04:03

RunningLeon reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve engine health monitoring and wakeup scheduling#4645

Improve engine health monitoring and wakeup scheduling#4645
lvhan028 wants to merge 1 commit into
InternLM:mainfrom
lvhan028:fix-health

lvhan028 commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

RunningLeon Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lvhan028 commented Jun 4, 2026

Motivation

Modification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

RunningLeon Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants