Skip to content

Add SwipeActionWrapper for gesture-level swipe actions#351

Open
kavyabhand wants to merge 4 commits into
google-deepmind:mainfrom
kavyabhand:add-swipe-action-wrapper
Open

Add SwipeActionWrapper for gesture-level swipe actions#351
kavyabhand wants to merge 4 commits into
google-deepmind:mainfrom
kavyabhand:add-swipe-action-wrapper

Conversation

@kavyabhand

Copy link
Copy Markdown

Summary

Adds SwipeActionWrapper, a higher-level action wrapper that converts a swipe (start → end position) into a sequence of interpolated TOUCH steps followed by a LIFT at the end position.

This complements the existing TapActionWrapper and supports the documented use case of hard-coding gesture skills for RL studies where agents need swipe/scroll primitives without manually chaining raw touch/lift actions.

Changes

  • android_env/wrappers/swipe_action_wrapper.py: new wrapper with:
    • Action spec: start_position and end_position (BoundedArray shape (2,) in [0, 1])
    • Configurable num_steps (default 10) for interpolation granularity
    • Reward accumulation across sub-steps
    • Early return on StepType.LAST (same behavior as TapActionWrapper)
    • env_steps tracking in stats()
  • android_env/wrappers/swipe_action_wrapper_test.py: unit tests for interpolation, single-step edge case, reward accumulation, early termination, and specs

Example usage

from android_env.wrappers import swipe_action_wrapper
import numpy as np

env = swipe_action_wrapper.SwipeActionWrapper(env, num_steps=10)
env.step({
    'start_position': np.array([0.5, 0.8], dtype=np.float32),
    'end_position': np.array([0.5, 0.2], dtype=np.float32),
})

Introduces a wrapper that converts start/end positions into interpolated
TOUCH steps followed by LIFT, matching TapActionWrapper patterns for
reward accumulation and early episode termination.
position = start
else:
alpha = i / (self._num_steps - 1)
position = start * (1.0 - alpha) + end * alpha

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this with np.linspace() or something like that?

step_type, reward, discount, observation = self._env.step(sub_action)
self._env_steps += 1
if reward is not None:
total_reward += reward

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the total reward should be discounted, otherwise rewards coming later would have the same value as the earlier rewards.

@kenjitoyama kenjitoyama left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for submitting this. I thought we had this wrapper, but somehow we don't. I'm pretty sure we used something just like this in https://arxiv.org/abs/2204.10374, but I guess we forgot to open source it.

This is the first time that I'm trying to merge a PR into Google's internal AndroidEnv version, so we might face a few hiccups. Hopefully it'll be fine.

@kavyabhand

Copy link
Copy Markdown
Author

@kenjitoyama
Thank you so much for reviewing! I have tried implementing the changes suggested by you, please if you could review those and let me know if there is anything else I could contribute you.

tldr;

  1. Replaced the manual interpolation loop with np.linspace (also removes the num_steps == 1 special case).
  2. Sub-step rewards are now accumulated with discounting (reward_discount *= discount per sub-step), with a test for the discounted case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants