Controlling hands in high-dimensional action spaces has long posed a challenge, yet humans effortlessly perform dexterous tasks using their hands. In this paper, we draw inspiration from the ontological cognition of biological organisms and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that employs a neural internal model to capture the dynamics characteristics of hand movements. Leveraging this model, we propose a bidirectional planning method that demonstrates efficiency in both training and inference. Furthermore, we integrate our approach with large language models to generate a variety of gestures, such as ”ThumbUp” and ”Rock&Roll”. We also find that decomposing system dynamics into a hand model and an external model enhances data efficiency, as supported by both theoretical analysis and empirical experiments.