Haptic-ACT - Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers

Pedro Miguel Uriguen Eljuri1*, Hironobu Shibata2, Maeyama Katsuyoshi2, Yuanyuan Jia1, Tadahiro Taniguchi1,2
1Kyoto University, 2Ritsumeikan University,
Under Review

*Corresponding Author

Abstract

In this paper we introduce Haptic-ACT, an advanced robotic system for pseudo oocyte manipulation, integrating multimodal information and Action Chunking with Transformers (ACT). Traditional automation methods for oocyte transfer rely heavily on visual perception, often requiring human supervision due to biological variability and environmental disturbances. Haptic-ACT enhances ACT by incorporating haptic feedback, enabling real-time grasp failure detection and adaptive correction. Additionally, we introduce a 3D-printed TPU soft gripper to facilitate delicate manipulations. Experimental results demonstrate that Haptic-ACT improves the task success rate, robustness, and adaptability compared to conventional ACT, particularly in dynamic environments. These findings highlight the potential of multimodal learning in robotics for biomedical automation.

Overview

overview image

Haptic-ACT is a multimodal learning framework for robotic manipulation that enables delicate, contact-rich tasks by combining vision, robot proprioception, and haptic feedback. Our method extends Action Chunking with Transformers (ACT) to incorporate force sensing, allowing the robot not only to plan ahead but also to detect and recover from grasp failures autonomously. We apply Haptic-ACT to the task of pseudo-oocyte transfer, where the robot must grasp and place fragile biological objects safely. Unlike vision-only methods, our system can detect failed grasps in real time and retry, improving robustness significantly. Experiments in both trained and unseen environments show that Haptic-ACT improves manipulation success rates.

Contribution

The contribution of our work is three-fold: 1) We introduce Haptic-ACT, and enhacement of ACT that incorporates haptic feedback to improve grasp failure detection in contact rich manipulation tasks. 2)We use failure examples and their recovery behaviours in our training data together with successfull expert demonstrations. 3) We developed a novel gripper desing made of 3D TPU soft material. Its flexibility allows the robot to grasp the objects without damaging their soft tissue.

Model Architecture

architecture image

The Haptic-ACT architecture extends the original ACT framework by integrating multimodal sensory inputs: Vision (camera images), Robot proprioception (joint angles, gripper position) and Haptic feedback (force sensor readings).

These three inputs are encoded into feature vectors and fused into a single multimodal representation. This fused input is passed to a Transformer-based model, which predicts a chunk of future actions.

Data Collection

To train Haptic-ACT, we collected a diverse set of expert demonstrations by teleoperating the robot. To teleoperate the robot we are using a Touch-X haptic device.

In total, we collected 40 successfull demonstrations of grasping and placing the pseudo oocyte on a test tube and 10 recovery demonstrations, where the robot retries after a failed grasp.

Hardware Environment

We use a Denso Cobotta Robot arm to execute the task. To provide visual and haptic information, we use three D435 Intel RealSense cameras (2 subjective and 1 objective views) and a WACOH DynPick 3-axis force sensor. We design and 3D-printed TPU soft material gripper so it does not compress the target pseudo oocyte too much and can adapt to their size variations.

Results

architecture image

We did experiments comparing Haptic-ACT with ACT as a baseline with models trained with and without the recovery from failure data. In all the cases Haptic-ACT outperforms ACT. Additionally, we tested Haptic-ACT with different target objects that the model has not been trained to handle, in most of the cases the robot could perform the task with at least 40% of success case. The case that has the lowest success rate is because the target object shape is not spherical as the other items, when the robot attempts to pick the object it pushes it away.

Video Introduction

BibTeX


        @inproceedings{uriguen2025hapticact,
        author={Uriguen Eljuri, Pedro Miguel and Shibata, Hironobu and Maeyama, Katsuyoshi and Jia, Yuanyuan and Taniguchi, Tadahiro},
        title={Haptic-ACT - Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers},
        year={2025, under review}
      }
      

Laboratory Information

Funding

This work was supported by the Japan Science and Technology Agency (JST) Moonshot Research & Development Program, Grant Number JPMJMS2033.