Detecting emotional states of human from videos is essential in order to automate the process of profiling human behaviour, which has applications in a variety of domains, such as social, medical and behavioural science. Considerable research has been carried out for binary classification of emotions using facial expressions. However, a challenge exists to automate the feature extraction process to recognise the various intensities or levels of emotions. The intensity information of emotions is essential for tasks such as sentiment analysis. In this work, we propose a metric-based intensity estimation mechanism for primary emotions, and a deep hybrid convolutional neural network-based approach to recognise the defined intensities of the primary emotions from spontaneous and posed sequences. Further, we extend the intensity estimation approach to detect the basic emotions. The frame level facial action coding system annotations and the intensities of action units associated with each primary emotion are considered for deriving the various intensity levels of emotions. The evaluation on benchmark datasets demonstrates that our proposed approach is capable of correctly classifying the various intensity levels of emotions as well as detecting them.