MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Hand Models

1Tsinghua-Berkeley Shenzhen Institute, Shenzhen International Graduate School, Tsinghua University
*Equal contributions

Overview

Overview of MoDex

Abstract

Controlling hands in high-dimensional action spaces has long posed a challenge, yet humans effortlessly perform dexterous tasks using their hands. In this paper, we draw inspiration from the ontological cognition of biological organisms and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that employs a neural internal model to capture the dynamics characteristics of hand movements. Leveraging this model, we propose a bidirectional planning method that demonstrates efficiency in both training and inference. Furthermore, we integrate our approach with large language models to generate a variety of gestures, such as ”ThumbUp” and ”Rock&Roll”. We also find that decomposing system dynamics into a hand model and an external model enhances data efficiency, as supported by both theoretical analysis and empirical experiments.


Method

Method of MoDex

  • We propose to model dexterous hands with Neural Networks (NNs). A hand model is composed of a forward model that depicts the forward dynamics and an inverse model that provides decision proposals. We further propose a bidirectional framework for efficient control, which integrates hand model with CEM planning.
  • We propose to combine a learned hand model with a large language model to generate gestures. We link these two independent modules via prompting LLM to yield cost functions for planning.
  • We propose to realize data-efficient in-hand manipulation by learning decomposed system dynamics models.

Main Results

Hand Tips Reach

Quasi-static setting
Quasi-static Setting
Sequential setting
Sequential Setting

Quasi-static

Sequential



Gesture Generation

gestures

Allegro


Shadowhand


Myohand



Object Orientation

Object orienration
Object Orientation Tasks