In many applications such as speech processing, we are interested in the
frequency content of a signal locally in time. That is, the signal
parameters (frequency content etc.) evolve over time. Such signals are
called non-stationary. For a non-stationary signal,
, the
standard Fourier Transform is not useful for analyzing the signal.
Information which is localized in time such as spikes and high frequency
bursts cannot be easily detected from the Fourier Transform.
Time-localization can be achieved by first windowing the signal so as to
cut off only a well-localized slice of
and then taking its
Fourier Transform. This gives rise to the Short Time Fourier Transform,
(STFT) or Windowed Fourier Transform. The magnitude of the STFT is called
the spectrogram. By restricting to a discrete range of frequencies
and times we can obtain an orthogonal basis of functions.
The Short Time Fourier Transform of a signal
using a window
function
is defined as follows.
Think of the window
as sliding along the signal
and for each
shift
we compute the usual Fourier Transform of the product
function
. For example, if
is the box of width
1/2 then we have (see the Matlab m-file
fig1.m):
In the frequency domain we can use the convolution theorem to recognize
as the convolution of
with the Fourier
transform of
(which is
).
Recall that we have the Fourier Transform pair:
In the case where
is a box of width
, that is,
then
. That is,
the nulls of
are at multiple of
. See the figure below where
the box has width
.
In the case where the signal is a pure sinusoid of frequency
the
windowed transform will be the sinc function shifted by
.
In the figure below the box has width
and the first sinusoid
has frequency
Hz.
In the case where the signal consists of two sinusoids of frequencies
and
the windowed transform will be the
superposition of two shifted sinc functions. The individual frequencies cannot
be resolved unless
. In fact, for adequate separation
we should have
.
That is, the ``frequency resolution'' of this analysis is
.
In the following figure a signal is the sum of two sinusoids with
frequencies
Hz and
Hz. The window
size is
. We get two distinct peaks in the frequency
response (see fig2.m).
In the case where the signal consists of two spikes close together
in time we can resolve the spikes if the window size
is
smaller that the time difference between the spikes.
This analysis shows the ``trade-off'' between time resolution and
frequency resolution: if we use a window of length
then we
have a ``time-resolution'' of
but our frequency resolution is
.
The magnitude of the Short Time Fourier Transform is called the
spectrogram. We can make 2 dimensional plots of the spectrogram with time
on the horizontal axis, frequency on the vertical axis and amplitude given
by a gray-scale colour. Alternately we can make 3 dimensional plots where
we plot amplitude on the third axis. The Matlab command specgram
can be used to generate these plots.
In the following example, (see fig3.m)
a signal
is the sum of two sinusoids of
frequencies
and
and two impulses at times
ms and
ms. We use a window width of
ms
(
Hz).
The resolution in frequency is
Hz. The time resolution is
ms. As the plots show, we can can resolve both the sinusoids and
the impulses.
Now suppose that we move the two frequencies closer together. Let's use a
signal
which is the sum of two sinusoids of frequencies
and
and two impulses at times
ms and
ms with a window width of
ms
(see fig4.m).
As the spectrograms now show we cannot resolve the frequencies but we can still resolve the spikes.
Now suppose that we change the window size to
ms. As the
spectrograms below show, we can resolve the frequencies but not the spikes
(see fig4cd.m).
We can obtain an orthogonal basis of functions related to the Short Time
Fourier Transform when using the window function
= the box of width
as follows. Instead of computing
for all frequencies
and
all time shifts
we restrict the calculation to
and
. To see that this corresponds to orthonormal functions
define:
Then we have:
Since
is non-zero only for
it is
clear that these are orthogonal functions.
Because we have analysis and synthesis on each interval
to
it
follows that we have analysis and synthesis in general. That is:
In summary, if we restrict the STFT calculation to a discrete set
of frequencies and times we can regard the STFT values as the
coordinates of our signal
with respect to an orthogonal basis.
Hence we can recover our signal
from these STFT values.