Assignment Task Introduction
Emotional expressions are a very important part of human interactions. As technology has found everywhere in our society and is taking on coaching roles in education, emotion recognition from facial expressions has become an important part of human-computer interaction. However, human facial expressions change so insignificantly that automatic facial expression recognition has always been a challenging task. In this work, we propose a deep learning approach based on an attentional convolutional network, which is able to focus on salient parts of the face and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique that is able to find salient face regions for detecting various emotions, based on the user’s output. Through experimental results, we show that different emotions seem to be reactive to different parts of the face.
1.1 Convolutional Neural Networks
Artificial Intelligence has been facing a monumental growth in bridging the gap between the capabilities of humans and machines. Researchers work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision.
The main reason for this field is to enable machines to view the world as humans do, perceive it in a similar manner and even use the knowledge for various tasks such as Image & Video recognition, Image Analysis & Classification, Media Recreation, Recommendation Systems, Natural Language Processing, etc. The advancements in Computer Vision with Deep Learning have been constructed and perfected with time, basically over one particular algorithm Convolutional Neural Network .
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take in an input image, and give importance (learnable weights and biases) to various aspects in the image and be able to differentiate one from the other. The preprocessing required in a ConvNet is much lower as compared to other classification
algorithms While in basic methods filters are hand-engineered, with enough training, Conv Nets have the ability to learn these characteristics.
The structure of a ConvNet is parallel to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a particular region of the visual field known as the Receptive Field. A collection of such fields clashes to cover the entire visual area.
CNN’s, like other neural networks, are made up of neurons with known weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and gives an output. The whole network has a loss function and all the tips that are developed for ANN will still apply to CNNs.
Benefits of CNN:
The usage of CNNs is initiated by the fact that they can capture the relevant features from an image or video at different levels similar to a human brain. This is called feature learning. Conventional Neural networks (CNN) cannot do
For a completely new task CNNs are very good feature identifiers. This means that you can extract useful attributes from an already trained CNN with its trained weights by feeding your data on each level and tune the CNN with its trained weights by feeding your data on each level and tune the CNN a bit for a specific task. For Eg. Add a classifier after the last layer with labels specific to the task. This is also called pre- training and CNNs are very efficient in such tasks compared to Another advantage of this pre-training is we avoid CNN training and save memory time. The only thing to train the classifier is at the end of the label.
CNN takes advantage of local spatial coherence in the input (often images), which allows them to have fewer weights as some parameters are shared. This process, taking the form of convolutions, makes them especially well-suited to extract relevant information at a low computational cost.
Limitations of CNN:
Convolutional neural networks are computationally costly, like any other neural network model. But this is more of an inconvenience than a weakness.
For better computer hardware such as Graphical User Interface and Neuromorphic processors, this can be resolved.
So, what are the drawbacks of CNN?
Missing the theory, memory, learning without supervision. A hypothesis to understand why and how this research in architecture is missing. For example, in understanding how much information or how many layers are needed to get a certain result, this concept may be important.
Literature Survey
Effective semantic features for facial expressions recognition using SVM
Abstract
The Most traditional facial expression-recognition systems track facial components such as eyes, eyebrows, and mouth for feature extraction. Though some of the features can provide solutions for expression recognition, other finer changes of the facial expressions can also be deployed for classifying various facial expressions. This study locates facial components by active shape model to extract seven dynamic face regions (eyebrows, nose wrinkle, two nasolabial folds, lips, and mouth). Proposed semantic facial features could then be acquired using directional gradient operators like Gabor filters and Laplacian of Gaussian. A support vector machine (SVM) was trained to classify six facial expressions (neutral, happiness, surprise, anger, disgust, and fear). The popular Cohn–Kanade database was tested and the average recognition rate is 94.7.
Facial expressions analysis and proposed semantic facial features
According to research on facial expressions, the facial action coding system (FACS) [14], which is a standard method for describing a facial expression, was used in this study. Facial expression in the FACS comprises variations of the upper face (i.e., the forehead, eyebrows, and eyes) and those of the lower face (i.e., the mouth and nose). These varying facial components, such as the eyebrows stretching upward or the eyes opening wide, are called action units, of which 44 have been found. Figure 2 shows how these action units can be combined to describe a variety of facial expressions.