Face_Identification_Using_Novel_Frequency-Domain_Representation_of_Facial_Asymm.pdf
(
1190 KB
)
Pobierz
591030488 UNPDF
350
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 1, NO. 3, SEPTEMBER 2006
Face Identification Using Novel Frequency-Domain
Representation of Facial Asymmetry
Sinjini Mitra, Marios Savvides
, Member, IEEE
, and B. V. K. Vijaya Vijaya Kumar
, Senior Member, IEEE
Abstract—
Face recognition is a challenging task. This paper
introduces a novel set of biometrics, defined in the frequency
domain and representing a form of “facial asymmetry.” A com-
parison with existing spatial asymmetry measures suggests that
the frequency-domain representation provides an efficient ap-
proach for performing human identification in the presence of
severe expressions and for expression classification. Error rates of
less than 5% are observed for human identification and around
25% for expression classification on a database of 55 individuals.
Feature analysis indicates that asymmetry of the different face
parts helps in these two apparently conflicting classification
problems. An interesting connection between asymmetry and the
Fourier domain phase spectra is then established. Finally, a com-
pact one-bit frequency-domain representation of asymmetry is
introduced, and a simplistic Hamming distance classifier is shown
to be more efficient than traditional classifiers from storage and
the computation point of view, while producing equivalent human
identification results. In addition, the application of these compact
measures to verification and a statistical analysis are presented.
Index Terms—
Asymmetry, efficiency, expression, face, features,
frequency domain, identification, one-bit code, phase.
such as eyes, nose, mouth, and chin, as well as their shapes and
sizes are widely used as discriminative features for identifica-
tion. One family of features that has only recently come into
use in face recognition problems is facial asymmetry.
Facial asymmetry can be caused either by external factors
such as expression changes, viewing orientation and lighting di-
rection, or by internal factors such as growth, injury, and age-
related changes. The latter is more interesting, being directly
related to the individual face structure, whereas the former can
be controlled to a large extent and even removed with the help
of suitable image normalization. Psychologists have long been
interested in the relationship between facial asymmetry and at-
tractiveness and its role in identification. It has been observed
that the more asymmetric a face is, the less attractive it is [2].
Furthermore, the less attractive a face is, the more recognizable
it is [3]. A commonly accepted notion in computer vision is that
human faces are bilaterally symmetric [4]. Gutta
et al.
[5] re-
ported no difference whatsoever in recognition rates while using
only the right and left halves of the face. However, a well-known
fact is that manifesting expressions cause a considerable amount
of facial asymmetry, as they are more intense on the left side of
the face [6]. Differences were indeed found in recognition rates
for the two halves of the face under a given facial expression
[7]. All of these indicate the potential significance of asymmetry
in automatic human face recognition, particularly in the pres-
ence of expressions. Identifying twins based on face images has
defied the best of facial recognition systems, and [8] reported
statistically significant differences among facial asymmetry pa-
rameters of monozygotic twins. This shows the potential of fa-
cial asymmetry in producing efficient identification tools.
Despite extensive studies on facial asymmetry, its use in
human identification began in computer vision in 2001 with the
seminal works by Liu [9], who, for the first time, showed that
certain facial asymmetry measures are efficient human identi-
fication tools under expression variations. This was followed
by more in-depth studies [10], [11] which further investigated
the role of asymmetry measures both for human and expression
classifications. The goal in this paper is to study an alternative
representation of facial asymmetry in the frequency domain,
which constitutes a completely novel and unique research
agenda. We wish to establish its efficacy in various recognition
tasks and other properties by exploiting the characteristics of
the frequency domain.
The rest of the paper is organized as follows. Section II
contains a brief description of the database and Section III
introduces the new frequency-domain asymmetry biometrics.
Section IV presents a feature analysis, and the classification
results appear in Section V. Section VI explores the connection
a system’s security has spurred the growth of the newly
emerging technology of biometric-based identification. Of all
the biometrics that are being used today, the face is the most
acceptable because it is one of the most common methods that
humans use in their visual interaction and perception. In fact,
facial recognition is an important human ability—an infant in-
nately responds to face shapes at birth and can discriminate his
or her mother’s face from a stranger’s at the tender age of 45 h
[1]. In addition, the method of acquiring face images with dig-
ital cameras is nonintrusive. However, face-based identification
poses many challenges. Several images of a single person may
be dramatically different because of changes in viewpoint, color,
and illumination, or simply because the person’s face looks dif-
ferent from day to day due to makeup, facial hair, glasses, etc.
Faces are rich in information about individual identity, mood,
and mental state, and position relationships between face parts,
Manuscript received October 24, 2005; revised May 19, 2006. This work was
supported by a grant from the Army Research Office to CyLab, Carnegie Mellon
University. The associate editor coordinating the review of this manuscript and
approving it for publication was Dr. Anil Jain.
S. Mitra is with the Information Sciences Institute, University of Southern
California, Marina del Rey, CA 90292 USA (e-mail: mitra@isi.edu).
M. Savvides and B. V. K. Vijaya Kumar are with the Electrical and Computer
Engineering Department and CyLab, Carnegie Mellon University, Pittsburgh,
PA 15213 USA (e-mail: msavvid@cs.cmu.edu; kumar@ece.cmu.edu).
Digital Object Identifier 10.1109/TIFS.2006.879301
1556-6013/$20.00 © 2006 IEEE
I. I
NTRODUCTION
I
N THE MODERN world, the ever-growing need to ensure
MITRA
et al.
: FACE IDENTIFICATION USING NOVEL FREQUENCY-DOMAIN REPRESENTATION
351
only outperformed spatial-domain PCA, but also have attractive
properties such as illumination tolerance. All of these show that
frequency-domain features possess the potential for improving
classification results.
Symmetry properties of the Fourier transform are often very
useful. According to [18], any sequence
can be expressed
as a sum of an even part or the symmetry part
and an odd
part or the asymmetry part
. Specifically
Fig. 1. Sample images from our database. (Courtesy Liu et al. [13]).
between asymmetry and Fourier domain phase spectra while
Section VII introduces a computationally efficient one-bit code.
We conclude with a discussion in Section VIII.
where and
. When a Fourier transform is performed on a real se-
quence , the even part transforms to the real part
of the Fourier transform and the odd part transforms to
its imaginary part (Fourier transform of any sequence is gener-
ally complex-valued). In the context of a face image, the even
part corresponds to the symmetry of a face (in the left–right
direction, across from the face midline) and, hence, the more
symmetric a face region is, the higher the value will be of the
corresponding odd part and vice-versa. This implies that spa-
tial asymmetry of the face corresponds to the imaginary part
of the Fourier transform and the symmetry part corresponds to
the real part, and this correspondence lays the ground for devel-
oping asymmetry features in the frequency domain. However,
all of these relations hold for one-dimensional (1-D) sequences
alone and, hence, we define our asymmetry features based on the
Fourier transforms of row slices of the images (either singly, or
averaging over a certain number of rows at a time, as described
in the next section).
II. D
ATA
We use a part of the “Cohn–Kanade AU-coded Facial Ex-
pression Database” [12], consisting of images of 55 individ-
uals expressing three different emotions—joy, anger, and dis-
gust. The data thus consist of video clips of people showing
an emotion, beginning with a neutral expression and gradually
evolving into its peak form. Each clip is broken down into sev-
eral frames and the raw images are normalized using an affine
transformation technique based on a combination of scaling and
rotation that was employed by [13]. Each normalized image is of
size 128 128 and has a face midline determined so that every
point on one side of the face has a corresponding point on the
other. We do not include details here for space constraints but
an interested reader is referred to [13] for details on the align-
ment procedure. Some normalized images from our dataset are
shown in Fig. 1. This is the only known database as per our
knowledge, that allows for a thorough investigation of the role
of facial asymmetry in identification in the presence of extreme
expression variations, since the images were carefully captured
under controlled background lighting. We use this small subset
as the initial testbed for our experiments and hope to extend to
a bigger database in the near future. This also facilitates a fair
comparison of our results to those in [9]–[11], which were based
on this small subset too.
A. Asymmetry Biometrics
Following the notion presented above, we define three asym-
metry biometrics in the frequency domain based on the imagi-
nary components of the Fourier transform as:
•
I-face:
frequency-wise imaginary components of Fourier
transforms of each row slice—128 128 matrix of fea-
tures (need to use half of these owing to symmetry proper-
ties of Fourier Transform—128 64 features);
•
Ave I-face:
frequency-wise imaginary components of
Fourier transforms of averages of two-row slices of the
face—64 64 matrix of features;
•
E-face:
energy of the imaginary components of the Fourier
transform of averages of two-row slices of the face—a fea-
ture vector of length 64.
For all three sets of features, the higher their values, the greater
the amount of asymmetry, and vice-versa. The averaging over
rows is done in order to smooth out noise in the image which
is likely to create artificial asymmetry artifacts and give mis-
leading results. Averaging over more rows, on the other hand,
can lead to oversmoothing and a loss of relevant information.
The two-row blocks were selected as optimal after some exper-
imentation. We also wish to compare the performances of the
three feature sets, especially to explore whether it is justified to
consider the higher-dimensional I-faces instead of the E-faces in
terms of better classification performance. Note that all of these
features are simple in essence, yet the goal is to show that they
are capable of forming effective identification tools. To the best
III. F
REQUENCY
D
OMAIN
Many signal processing applications in computer engineering
involve the frequency-domain representation of signals. The fre-
quency spectrum consists of two components at each frequency:
magnitude and phase. In two-dimensional (2-D) images partic-
ularly, the phase component captures more of the image intelli-
gibility than magnitude and, hence, is very significant for per-
forming image reconstruction [14]. Savvides
et al.
[15] showed
that correlation filters built in the frequency domain can be used
for efficient face-based recognition. Recently, the significance
of phase has also been used in biometric authentication. Sav-
vides
et al.
[16] proposed correlation filters based only on the
phase component of an image, which performed as well as the
original filters and [17] demonstrated that performing principal
component analysis (PCA) in the frequency domain by elimi-
nating the magnitude spectrum and retaining only the phase not
352
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 1, NO. 3, SEPTEMBER 2006
Fig. 2. Two people showing four different facial expressions.
Fig. 4. Distributions of the asymmetry metric for 55 people and for the four
different expressions. The top panel shows the distributions of the eye region
and the bottom panel corresponds to the mouth region.
Fig. 3. Asymmetry of the different facial features for the four expressions of
two people. The horizontal axis represents the different frequencies at which the
asymmetry biometrics were computed for each row slice of the face from the
forehead to the chin. (a) Person 1. (b) Person 2.
valuable discriminating information but also are prone to sig-
nificant changes during expressing emotions—eyes and mouth.
We select a few rows in those regions, average them, and com-
pute the energy of the imaginary parts of the Fourier transform
(measure of asymmetry). Fig. 4 shows these energy values for
the four expressions for all 55 people in our dataset. The mul-
tiple peaks in these figures reveal that there exists a significant
difference in asymmetry among the different people and also
among the four expressions.
A. Discriminative Feature Sets
Next, we study the discriminative power of the asymmetry
measures to determine the specific facial regions that contribute
to the identification process, both for humans and expressions.
Ideally, those features which contribute to interclass differences
should have a large variation between classes and small varia-
tion within the same class. Hence, a measure of discrimination
can be provided by a variance ratio type quantity; in particular,
we use what is known as an augmented variance ratio (AVR),
following along the lines of [13]. AVR compares within class
and between class variances and, at the same time, penalizes
features whose class means are too close to one another. For a
feature with values
of our knowledge, these frequency-based features representing
facial asymmetry are fairly novel in any computer vision and
pattern recognition problem.
IV. F
EATURE
A
NALYSIS
The different exploratory feature analyses that we present in
this section are aimed at providing a preliminary idea about the
nature of the frequency asymmetry metrics and their utility in
classification methods. For the first set of feature analysis, we
use the E-faces due to their low dimensionality. Fig. 2 shows
the images of two individuals with four different expressions,
and Fig. 3 shows how asymmetry varies among them. For in-
stance, for person 1, joy produces the greatest degree of asym-
metry, and neutral expression the lowest, whereas for person
2, joy and neutral expressions show maximum asymmetry fol-
lowed by anger and disgust. Moreover, Person 1 has a greater
amount of asymmetry over the whole face for joy (forehead,
nose, and mouth regions) while only in the forehead region for
the other three emotions. As for Person 2, on the other hand,
the mouth region appears to have the maximum asymmetry for
all four emotions. Therefore, although we looked at only two
people in the database, these analyses give a preliminary idea
that people may tend to express different emotions differently
which, in turn, suggests that these measures may be helpful in
automatic face recognition tasks in the presence of expression
variations as well as in identifying expressions.
We next study the distribution of the asymmetry metric for all
of the 55 people for certain facial regions that not only contain
in a data set with total classes, AVR
is calculated as
where is the mean of the subset of values from fea-
ture belonging to class . The higher the AVR value of a fea-
ture is, the more discriminative it is for classification. For human
identification, the 55 subjects form the classes and for expres-
sion classification, the classes are the three emotions.
Fig. 5 shows the E-face AVR values for human and expres-
sion classifications calculated based on all 55 individuals in the
database. Looking carefully at the human AVR values, we dis-
cover that a few subjects in the database have some artificial
asymmetry in the forehead region arising from either falling
hair or edge artifacts introduced in the normalization procedure
(Fig. 6). This is highly undesirable and spuriously raises the
first few AVR values in Fig. 5(a). Fig. 5(b) shows the AVR plot
MITRA
et al.
: FACE IDENTIFICATION USING NOVEL FREQUENCY-DOMAIN REPRESENTATION
353
TABLE I
M
ISCLASSIFICATION
R
ATES FOR
H
UMAN
I
DENTIFICATION
U
SING
F
REQUENCY
-D
OMAIN
A
SYMMETRY
M
EASURES
TABLE II
M
ISCLASSIFICATION
R
ATES FOR
E
XPRESSION
-I
NVARIANT
H
UMAN
I
DENTIFICATION
B
ASED
O
N
S
PATIAL
D-F
ACE
M
EASURES
Fig. 5. AVR values for E-faces. Human identification: (a) for all features, (b) all
features except top 3, (c) nose bridge with the highest AVR value. Expression
classification: (d) all features and (e) the mouth region with the highest AVR
value. The features 0–64 represent the regions from the forehead to the chin of
a face.
A. Human Identification
For human identification, we train on the neutral frames of the
three emotions of joy, anger, and disgust from all 55 individuals,
and test on the peak frames of the three emotions from all of
the people. We thus use three frames per person for training
(165 total) and three frames per person for testing (165 total).
This represents an expression-invariant human identification
problem, similar to the one reported in [13] which uses a spatial
asymmetry measure called Difference face (D-face) defined as
. denotes a (normalized) face
image and is its reflected version along the face midline. LDA
was used as the classifier in their case. Note that [13] reported
classification results on five different experiments—training on
frames from two emotions and testing on the third, training on
neutral frames and testing on peak ones and vice-versa. How-
ever, we use only one of these experimental setups (training on
neutral and testing on peak) and, hence, we will compare our
results to the corresponding cases from the spatial measures.
Table I shows the misclassification error rates for human
identification (percentage of cases that are wrongly classified)
based on our frequency-domain asymmetry biometrics from the
different classifiers. They show that IPCA performed the best,
closely followed by linear-kernel-based SVM (in fact, their
I-face and Ave I-face results are almost identical to the IPCA
ones). Moreover, I-faces proved to be significantly better than
E-faces, and this shows that the feature reduction by way of
summing over each row in constructing the E-faces destroyed
features crucial for discrimination and, hence, deteriorated
performance. We will henceforth work with the I-faces alone.
We next compared our best results from IPCA with those
obtained with D-faces [13] shown in Table II. We compare to
D-face alone (except for D which are significantly worse) be-
cause, by construction, I-faces and E-face are analogous to those
(following the reasoning presented in Section III). The results
indicate that our proposed frequency-domain measures are sig-
nificantly better than D-face and have no statistically significant
differences with the D-face principal components (PCs) at the
1% level. These are adjudged by using “p-values” (or proba-
bility values), a quantity commonly used in statistical inference
problems to test the significance of a proposed hypothesis. The
lower the p-value of a test, the more evidence the data exhibit
against its acceptance. (Casella [24] contains details on the pro-
cedure of statistical hypothesis testing). For both I-faces and
Fig. 6. Images with artificial asymmetry in the forehead. (a) Hair. (b) Edge
artifacts.
without the top three features, and this clearly shows that the
nose bridge contains the most discriminative information per-
taining to recognition of individuals under different expressions
[marked in Fig. 5(c)]. Fig. 5(d)–(e) shows that the region around
a person’s mouth is most discriminative across different expres-
sions (no problem here due to edges). We thus conclude that
the asymmetry of different face regions drives these two appar-
ently conflicting classification problems and, hence, may be ef-
fective for both. Moreover, these results are consistent with sim-
ilar feature analysis results in [11], based on spatial asymmetry
measures.
V. C
LASSIFICATION
R
ESULTS
We tried different classification methods which include
Fisher faces (FF) [19], support vector machines (SVMs) [20],
linear discriminant analysis (LDA) [21], and the individual
principal component analysis (IPCA) [22]. The IPCA method is
different from the global PCA approach [23] where a subspace
is computed from all of the images regardless of identity.
In an individual PCA, on the other hand, subspaces are
computed for each person and each test image is projected
onto each individual subspace using . The
image is then reconstructed as and the
reconstruction error computed . The final
classification chooses the subspace with the smallest .Of
these four classifiers, LDA did not perform well and so we omit
those results and report those from the other three classifiers.
LDA is known to be effective for spatial or image domain
features computed from pixel intensity values, and this may be
a probable cause of its failure for our frequency-based features.
354
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 1, NO. 3, SEPTEMBER 2006
TABLE III
M
ISCLASSIFICATION
R
ATES FOR
E
XPRESSION
C
LASSIFICATION
.T
HE
F
IGURES
IN THE
P
ARENTHESES
D
ENOTE THE
S
TANDARD
D
EVIATIONS
O
VER THE
20 R
EPETITIONS
TABLE IV
M
ISCLASSIFICATION
R
ATES FOR
H
UMAN
I
DENTIFICATION AND
E
XPRESSION
C
LASSIFICATION
U
SING
R-F
ACES
.S
TANDARD
D
EVIATIONS FOR THE
L
ATTER
A
PPEAR IN
P
ARENTHESES
E-faces, we test the hypothesis that the error rates from using
the spatial-domain and the frequency-domain measures are the
same. When comparing D-face, we obtain p-values
and with D-face PCs, p-values , thus suggesting that
the frequency-domain measures are at least as robust to intrap-
ersonal variations caused by expression changes as their spa-
tial-domain counterparts.
TABLE V
E
RROR
R
ATES FOR
H
UMAN
I
DENTIFICATION AND
E
XPRESSION
C
LASSIFICATION
U
SING
I-F
ACE
+R-F
ACE
C
OMBINATION
F
EATURES
.S
TANDARD
D
EVIATIONS FOR
THE
E
XPRESSION
C
LASSIFICATIONS ARE
C
OMPUTED
O
VER
20 R
EPETITIONS
B. Expression Classification
A person’s expression is helpful to identify his or her mood
and mental state, and is often an individualized characteristic.
Different people express different emotions differently, which
echoes human behavior and often helps in the identification of
a particular individual. In fact, [7] showed convincing results
that face-recognition rates depend on different types of facial
expressions.
Our dataset has images with three different expressions: joy,
anger, and disgust. We follow the same experimental setup as
in [11]—train on peak frames from all three expressions for a
randomly selected subset of 30 individuals (out of a total of
55), and test on peak frames of the three expressions from the
remaining 25 individuals. This random division of the subjects
into training and test sets was repeated 20 times (in order to
remove selection bias) and the final error rates are obtained by
averaging over those from these 20 repetitions.
The results tabulated in Table III show that unlike the case of
human identification, the frequency-domain features have lower
misclassification rates than both the D-face measures (again,
comparing only with D-face) with significant improvements of
over 10% (p-values in all cases) with IPCA. SVM, how-
ever, proved to be not as efficient for expression classification
as it was for human identification, although the results were
not significantly worse than those based on D-face (but signifi-
cantly worse than IPCA). The FF results are considerably poorer
and, thus, have been ignored. Note that all of these results, al-
though not very satisfactory, are at least significantly better than
those obtained from pure random guessing (probability of error:
66.67% for a three-class problem).
image. Just as S-faces were designed to emphasize symmetry in-
stead of asymmetry (higher values indicate more symmetry, see
[13]), we construct a set of frequency-domain symmetry mea-
sures using the real parts of the Fourier transforms of the one-di-
mensional (1-D) row slices of the edged images . Recall here
that the symmetric part of any sequence transforms to the real
part of Fourier transform (Section III). We call these R-faces
and the higher its value for a feature, the more symmetric (less
asymmetric) the corresponding facial region is and vice-versa.
The same experimental setup for human identification and ex-
pression classification as with the I-faces are followed here. The
results in Table IV (only the best ones with IPCA) show that
the R-face yields lower error rates than the spatial S-face for
human identification (we only compare the spatial S-face results
owing to the correspondence, and omit the S results which
were quite poor). These are, however, higher than I-face ones
(Table I), and this is consistent with what [13] observed (D-face
results better than S-face results). For expression classification,
on the other hand, the error rates are poorer than those obtained
with S-face, but they are almost identical to those obtained with
I-face. However, the spatial S-face features were more efficient
than D-face features for expression classification [11]. We thus
conclude that R-faces are useful for identifying people in the
presence of expressions, but not so much for classifying expres-
sions when compared with the corresponding spatial measures.
D. Combining R-Faces With I-Faces
In order to investigate whether the I-face and the R-face fea-
ture sets complement each other for better performance, we con-
catenate the two sets of features to yield a 2-D feature vector
per frequency and perform both human identification and ex-
pression classification using the same setup as before. We use
the Ave I-faces for human identification and the I-faces for ex-
pression classification, since they give the best results for the
respective problems, and IPCA classifier. The results from the
combination of the feature sets in Table V indicate that improve-
ment occurs in their performances for both classification tasks.
C. “Edge”-Based Features
Edges are known to contain information valuable for discrim-
inating among individuals and, hence, it seems natural to con-
sider identification features based on those. Mitra [11] and Liu
et al.
[13] used a set of spatial asymmetry measures called sym-
metry face (S-face) defined as ,
where is the edged image of , and is its vertically reflected
Plik z chomika:
Kuya
Inne pliki z tego folderu:
Behavior_Forensics_for_Scalable_Multiuser_Collusion_Fairness_Versus_Effectivenes.pdf
(964 KB)
Block_QIM_Watermarking_Games.pdf
(869 KB)
Correspondence.pdf
(176 KB)
Dual_Protection_of_JPEG_Images_Based_on_Informed_Embedding_and_Two-Stage_Waterma.pdf
(1549 KB)
Estimation_of_Message_Source_and_Destination_From_Network_Intercepts.pdf
(613 KB)
Inne foldery tego chomika:
2006-1
2006-2
2006-4
2007-1
2007-2
Zgłoś jeśli
naruszono regulamin