Humans use rich facial expressions to indicate unpleasant emotions, such as pain. Automatic pain intensity estimation is useful in a variety of applications in social and medical domains. However, the existing pain intensity estimation approaches are limited to either classifying the discrete intensity levels in pain or estimating the continuous pain intensities without considering the key-frame. The first approach suffers from abnormal fluctuations while estimating the pain intensity levels. Further, continuous pain estimation approaches suffer from low prediction capabilities. Hence, in this paper, we propose a deep hybrid network based approach to automatically estimate the continuous pain intensities by incorporating spatiotemporal information. Our approach consists of two key components, namely key-frame analyser and temporal analyser. We use one conventional and two recurrent convolutional neural networks to design key-frame and temporal analysers, respectively. Further, the evaluation on a benchmark dataset shows that our model can estimate the continuous emotions better than existing state-of-the-art methods.