Add SwipeActionWrapper for gesture-level swipe actions#351
Conversation
Introduces a wrapper that converts start/end positions into interpolated TOUCH steps followed by LIFT, matching TapActionWrapper patterns for reward accumulation and early episode termination.
| position = start | ||
| else: | ||
| alpha = i / (self._num_steps - 1) | ||
| position = start * (1.0 - alpha) + end * alpha |
There was a problem hiding this comment.
Can we simplify this with np.linspace() or something like that?
| step_type, reward, discount, observation = self._env.step(sub_action) | ||
| self._env_steps += 1 | ||
| if reward is not None: | ||
| total_reward += reward |
There was a problem hiding this comment.
I believe the total reward should be discounted, otherwise rewards coming later would have the same value as the earlier rewards.
kenjitoyama
left a comment
There was a problem hiding this comment.
Hi, thanks for submitting this. I thought we had this wrapper, but somehow we don't. I'm pretty sure we used something just like this in https://arxiv.org/abs/2204.10374, but I guess we forgot to open source it.
This is the first time that I'm trying to merge a PR into Google's internal AndroidEnv version, so we might face a few hiccups. Hopefully it'll be fine.
|
@kenjitoyama tldr;
|
Summary
Adds
SwipeActionWrapper, a higher-level action wrapper that converts a swipe (start → end position) into a sequence of interpolatedTOUCHsteps followed by aLIFTat the end position.This complements the existing
TapActionWrapperand supports the documented use case of hard-coding gesture skills for RL studies where agents need swipe/scroll primitives without manually chaining raw touch/lift actions.Changes
android_env/wrappers/swipe_action_wrapper.py: new wrapper with:start_positionandend_position(BoundedArrayshape(2,)in[0, 1])num_steps(default10) for interpolation granularityStepType.LAST(same behavior asTapActionWrapper)env_stepstracking instats()android_env/wrappers/swipe_action_wrapper_test.py: unit tests for interpolation, single-step edge case, reward accumulation, early termination, and specsExample usage