Detection Basics

# Introduction

In a general communication system, the modulator at the transmitter maps the information sequence into a signal waveform.These waveforms are transmitted over the channel and corrupted version of them is received at the receiver. This leads to errors in our system. An optimal design of the receiver might reduce the errors and increase the reliability of our communication system. In this chapter we study the design and performance of optimal receivers.

In this section we analyse the details of optimal detection. We consider a vector channel and quantify probability of error precisely. In many cases evaluating the exact probability of error might involve complex computation. For such situations we resort to bounds on probability of error.It must be noted, initially we develop an general rule for optimal detection and later confine to a specific case of AWGN channel i.e the channel law is governed by Gaussian distribution.

Problem setup:
Consider a signal set s1 , s2 , ……….. , sm(vector channel). We transmit one of the M possible vectors s1 , s2 , ……….. , sm and observe y at the receiver. It is also assumed that the behavior of the channel is known before hand i.e channel transition probabilities(channel law), (P(r| sm) for all m, r). Now the problem is to choose one of the M signals after observing y in-order to improve the reliability of the system.

# M-ary hypothesis testing

Consider a signal set with M signals. Mathematically, the detector can be perceived as a function g(r) that maps r to one of the M possible values, where r is the received vector . In the below figure we observe that the space of r (vector space of all n-tuples, n is the order of the basis corresponding to the signal set) is divided into a set of M regions. Each region corresponds to signal sm and is denoted as Decision region Dm of sm.

Question: What is the optimal (with respect to prob of error) function g(r)? Equivalently,how to partition the space of r into decision regions?

## probability of error:

Suppose sm is transmitted through the channel. Error occurs in a detection scheme when the received vector r is not in the decision region Dm. Therefore the Symbol error probability of a receiver with Dm , m={1,2,3….M} is therefore given by

(1)
\begin{eqnarray} P_{\mathbf{e}|m} &=& P (\textbf{r} \not\in D_m | \mathbf{s_m} \mbox{ sent })\\ &=& \int_{\textbf{r} \not\in D_m} P(\mathbf{r}| \mathbf{s_m}) \,d \mathbf{r}\\ &=& \sum_{m \not= m'} \int_{\textbf{r} \not\in D_m'} P(\mathbf{r}| \mathbf{s_m'}) \,d \mathbf{r} \end{eqnarray}
(2)
\begin{align} P_\mathb{e} = \sum_{m=1}^{M} P_m P_{\textbf{e}|m} \end{align}

Pe denotes the probability of symbol error. Then Pe is bounded by the following inequality,

(3)
\begin{align} P_{b} \leq P_{e} \leq k P_{b} \end{align}

# Intuitive Rules

## MAP( Maximum aposteriori)

Choose the signal sm that maximizes P(sm sent | r received),

(4)
\begin{align} P(\mathbf{s_m} | \mathbf{r}) = \frac {P(\mathbf{r}|\mathbf{s_m}) P(\mathbf{s_m})} {P(\mathbf{r})} \end{align}

P(r) is independent of hypothesis.So, MAP rule can be stated as follows:

(5)
\begin{eqnarray} g(\mathbf{r}) &=& \mbox{ arg } \max_m P(\mathbf{s_m} | \mathbf{r})\\ &=& \mbox{ arg } \max_m P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m}) \end{eqnarray}

P(sm) is the apriori probability.

## Maximum Likelihood Rule

ML rule: choose the signal sm that maximizes P(r received | sm sent)

(6)
\begin{align} g(\mathbf{r})= \mbox{ arg } \max_m P(\mathbf{r} | \mathbf{s_m}) \end{align}
 MAP Rule $g(\mathbf{r})= \mbox{ arg } \max_m P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m})$ ML Rule $g(\mathbf{r})= \mbox{ arg } \max_m P(\mathbf{r} | \mathbf{s_m})$

# Optimal Detection in terms of probability of error

Other intuitive rules are tempting such that use a min distance rule or maximum correlation rule. While these rules are optional under some conditions, they are not optimal in general.

MAP rule is optimal in terms of prob of error

The MAP detector minimizes the probability of error under a general setting . We will even allow for the detector function to be probabilistic, i.e for a given r we say g(r) = sm with probability h(sm , r).A deterministic function is simply one for which h(sm , r) = 1 for one m.
let probability of error given sm be p(e|sm) and let P(e) be the average probability of error.Also let P(c|sm) be the probability that the decision g(r) is correct.

(7)
\begin{eqnarray} P(\mathbf{e}) &=& \sum_{m} P(\mathbf{e} | \mathbf{s_m}) P(\mathbf{s_m})\\ &=& \sum_{m} (1 - P(\mathbf{c} | \mathbf{s_m})) P(\mathbf{s_m})\\ &=& 1- \sum_{m} P(\mathbf{e} | \mathbf{s_m}) P(\mathbf{s_m}) \end{eqnarray}
(8)
\begin{align} 1-P(\mathbf{e}) &=& \sum_{m} P(\mathbf{e} | \mathbf{s_m}) P(\mathbf{s_m})\\ &=& \sum_{m} \int_\mathbf{r} P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m}) h(\mathbf{s_m,r}) \,d\mathbf{r} \\ &=& \int_\mathbf{r} \sum_{m} P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m}) h(\mathbf{s_m,r}) \,d\mathbf{r} \end{align}

since $0 \leq h(\mathbf{r,s_m}) \leq 1$ and $\sum_{m} h(\mathbf{r,s_m}) = 1$

(9)
\begin{align} \sum_{m} P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m}) h(\mathbf{s_m,r}) \leq \max P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m}) \end{align}

This can be achieved by choosing h(sm , r) =1 for the m that maximizes $P(\mathbf{r} | \mathbf{s_m}) P(\mathbf{s_m})$ which is the MAP rule.Now we can define optimal decision regions as,

(10)
\begin{align} D_m= \{ \mathbf{r} : P(\mathbf{s_m} | \mathbf{r}) \geq \max P(\mathbf{s_i} | \mathbf{r}) \hspace{2mm} \forall i \not= m\} \end{align}

let probability of correct decision be P(c)

(11)
\begin{align} P(\mathbf{c})= \sum_{m=1}^M \int_\mathbf{D_m} P(\mathbf{s_m} | \mathbf{r}) P(\mathbf{r}) \,d\mathbf{r} \end{align}

Example:
Consider a vector channel with two message signals and noise distribution given by,

(12)
\begin{align} \mathbf{s_1}=(0,0) \hspace{3mm} \mathbf{s_2}=(1,1) \end{align}
(13)
\begin{align} P_{\underline{n}}(n_1,n_2) = e^{-n_1} e^{-n_2} \hspace{4mm} n_1 \geq 0 , n_2 \geq 0 \end{align}

Let the received vector be $\mathbf{r}= \mathbf{s_m} + \mathbf{n}$.

The channel transition probabilities can be expressed as,

(14)
\begin{eqnarray} P(\mathbf{r}|\mathbf{s_1})&=& e^{-r_1} e^{-r_2}\\ P(\mathbf{r}|\mathbf{s_2})&=& e^{-r_1+1} e^{-r_2+1} \hspace{3mm} r_1>1 , r_2>1 \end{eqnarray}

According to the above described rules we conclude that, if r1>1 and r2>1 then P(r|s2) > P(r|s1).

## Sufficient Statistic

General Criteria:

If there exists a black box such that the same aposteriori probability are produced when the input is T(r) instead of r then T(r) is a sufficient statistic.
Mathematically, if

(15)
\begin{eqnarray} P(\mathbf{r}|\mathbf{s_m}) &=& f(T(\mathbf{r}),\mathbf{s_m} h(\mathbf{r})\\ &=& f_m(T(\mathbf{r}))h(\mathbf{r}) \end{eqnarray}

then only T(r) is a sufficient statistic.

## Markov chain

$X\longleftrightarrow Y\longleftrightarrow Z$ for a markov chain if ,

(16)
$$P(x,y,z)=P(x)P(y|x)P(z|y), P(z|xy)=P(z|y), P(x,z|y)=P(x|y)P(z|y)$$

Markov condition: $\mathbf{s_m} \longleftrightarrow T(\mathbf{r})\longleftrightarrow\mathbf{r}$

(17)
\begin{eqnarray} P(\mathbf{s_m} | T(\mathbf{r}),\mathbf{r})&=&P(\mathbf{s_m} | T(\mathbf{r}))\\ P(T(\mathbf{r}),\mathbf{r} | \mathbf{s_m})&=& P(T(\mathbf{r})| \mathbf{s_m}) P(\mathbf{r} | T(\mathbf{r}),\mathbf{s_m})\\ &=& P(T(\mathbf{r})| \mathbf{s_m})P(\mathbf{r} | T(\mathbf{r})) \end{eqnarray}

only T(r) matters

## Reversibility

Any reversible transformation does not affect the performance the performance of the optimal detector
Example:
$\mathbf{r}=\mathbf{s_m} + \mathbf{n} , \mathbf{n} \sim N(0,\mathbf{C_n}), \mathbf{C_n}\not=\sigma^2I$
Suppose there exists a W such that V=Wn is white then, $W\mathbf{r}= W\mathbf{s_m} + W\mathbf{n}$. This can be simplified as $\mathbf{r'}=\mathbf{s'_m} + \mathbf{n'}$
Key: We must perform optimal detection.The performance of the suboptimal detectors can depend on the preprocessing.

## Irrelavance

Suppose we can break up r=(r1 , r2). r2 is irrelevant if

(18)
\begin{eqnarray} P(\mathbf{s_m} | \mathbf{r_1},\mathbf{r_2})&=& P(\mathbf{s_m} | \mathbf{r_1})\\ P(\mathbf{r_1},\mathbf{r_2}| \mathbf{s_m}) P(\mathbf{s_m}) &=&P(\mathbf{r_1} | \mathbf{s_m}) P(\mathbf{r_2} | \mathbf{r_1},\mathbf{s_m}) P(\mathbf{s_m}) \end{eqnarray}

If $P(\mathbf{r_2} | \mathbf{r_1},\mathbf{s_m}) =P(\mathbf{r_2} | \mathbf{r_1})$ then r2 is irrelavent in the computation of $P(\mathbf{s_m} | \mathbf{r})$

Example 1:

(19)
\begin{align} P(\mathbf{\underline{r}_2,\underline{r}_1|\underline{s}_m})= P(\mathbf{\underline{r_1}|s_m})P(\mathbf{\underline{r}_2|\underline{r}_1}) \end{align}

So r2 is irrelevant. Notice however that if r1 is not known we cannot ignore r2 !

Example2 :

(20)
\begin{eqnarray} P(\mathbf{r_1},\mathbf{r_2}| \mathbf{s_m}) &=& P(\mathbf{r_1} | \mathbf{s_m}) P(\mathbf{r_2} | \mathbf{s_m},\mathbf{r_1})\\ &=& P(\mathbf{r_1}-\mathbf{s_m}) P(\mathbf{r_2}-(\mathbf{r_1}-\mathbf{s_m}))\\ \end{eqnarray}

Therefore we cannot ignore $\mathbf{r_2}$.

# Maximum Likelihood Decision Region for the vector AWGN channel

When P(Sm) is the same for all m and the noise is Gaussian,then Rm has a simpler description

(21)
\begin{align} \mathbf{R_m} = \{ \mathbf{r} \colon \frac{1}{\pi ^n det(C)} e^{\frac{-1}{2 \sigma^2}(\mathbf{r}-\mathbf{s_m})^H C^{-1} (\mathbf{r}-\mathbf{s_m})} \} \end{align}
(22)
\begin{align} \mathbf{R_m} = \{ \mathbf{r} \colon \frac{1}{(\sqrt{2\pi \sigma^2})^N} e^{\frac{-1}{2 \sigma^2} \|\mathbf{r}-\mathbf{s_m}\|^2 } \} \end{align}

We can maximize $\ln \left( \frac{1}{(\sqrt{2\pi \sigma^2})^N} e^{\frac{-1}{2 \sigma^2} \|\mathbf{r}-\mathbf{s_m}\|^2 \right)$ instead and throw away the $\ln \frac{1}{(\sqrt{2\pi \sigma^2})^N}$ term.Finally giving,

(23)
\begin{align} \mathbf{R_m} = \{\mathbf{r} \colon \| \mathbf{r} - \mathbf{s_m}\|^2 \leq \| \mathbf{r} - \mathbf{s_i}\|^2 \forall i \not= m \end{align}

i.e. Rm is the region of r that is closer to Sm than to any other Si . Notice that this may not be true for other noise distribution or channels.

Can be simplified into

(24)
\begin{align} \|\mathbf{r}\|^2 - 2 \Re <\mathbf{r},\mathbf{s_m}> + \|\mathbf{s_m}\|^2 < \|\mathbf{r}\|^2 -2\Re <\mathbf{r},\mathbf{s_i}>+\|\mathbf{s_i}\|^2 \end{align}
(25)
\begin{align} \Re <\mathbf{r},\mathbf{s_m}> - \frac{1}{2}\|\mathbf{s_m}\|^2 < \Re <\mathbf{r},\mathbf{s_i}> - \frac{1}{2}\|\mathbf{s_i}\|^2 \end{align}

If $\|\mathbf{s_i}\|$ is same for all $i$ then,

(26)
\begin{align} \Re <\mathbf{r},\mathbf{s_m}> < \Re <\mathbf{r} , \mathbf{s_i}> \end{align}

# Optimum detection for the waveform AWGN channel

A new symbol sm(t) is transmitted every Ts seconds. However, let us not worry about the sequence of symbols transmitted and let us focus on one of these sm(t) 's. { sm(t) , m=1,…..,M} transmitted with prob {p1…….pm}. let r(t)= sm(t) + n(t) , where n(t) is a zero mean Gaussian RP Snn(f)=No/2

(27)
$$r(t)= s_{m}(t) + n_{1}(t) + n_{2}(t)$$

n_{1}(t)- component of noise that lies in the subspace where the signals lie.

(28)
\begin{align} s_{m}(t) = \sum_{k=1}^N s_{mk} f_{k}(t) \longleftrightarrow \mathfb{s_m}= [s_{m1} ........... s_{mN}] \end{align}
(29)
\begin{align} n(t)= \sum_{k=1}^N n_k \phi_k(t) + \sum_{k=1}^\infty {n_k'} \psi_{k}(t) \end{align}
(30)
\begin{align} r(t)= \sum_{k=1}^N r_1\phi_k(t) + \sum_{k=1}^\infty {r_k'} \psi_k(t) \end{align}

$\{ \phi_1(t),\phi_2(t),\phi_3(t).........\phi_N(t),\psi_1(t),\psi_2(t),........\}$ is a complete basis for r(t). There are some mathematical intricacies involved here. These can be shown to be not a problem in engineering. One reason for this is that we are interested only in r(t) observed over a finite time interval r(t) $0\leqt\leqT$, However large T is if r(t) restricted to $o\leqt\leqT$lies in L2 , then all these problems can be properly addressed. See Galleger's book for a detailed description.
Prelude to theorem of irrelavance: we will discuss this theorem in detail later on. The gist is that n'(t) and therefore projection of r(t) onto any basis outside of
$\phi_1(t),\phi_2(t),\phi_3(t).........\phi_N(t)$ is irrelevent to our decision problem.
We will look at two examples of Bandlimited sm(t) and Time limited sm(t). however this theorem is applicable to many more situations.

Structure of demodulator:

We want to compute P(r|sm). To do this, we need to understand the statistics of nks.

(31)
\begin{align} n_k= \int n(t) \phi_k(t) dt \end{align}
(32)
\begin{align} E[n_k]=\int E[n(t)]\phi_k(t) dt = 0 \end{align}
(33)
\begin{eqnarray} E[n_k n_l]&=& E \left[ \int n(t)\phi_k(t)dt \int n(u) \phi_l(u)du \right] \\ &=& E \left[ \int \int n(t)n(u) \phi_k (t) \phi_l(u)dt du \right] \\ &=&\int\intE[n(t)n(u)] \phi_k(t) \phi_l(u) dt du\\ &=& \frac{N_0}{2}\delta(k-l) \end{eqnarray}

nk 's are Gaussian and uncorrelated and hence independent . Therefore they are iid random variavles

(34)
\begin{eqnarray} E[n_2(t) n_k]&=& E[ (n(t)- n_1(t)) n_k]\\ &=& E \left[ n(t) \int n(s) \phi_k (s) ds \right]- E[n_1(t)n_k] \\ &=& \int \frac{N_o}{2} \delta(s-t) \phi_k(s) ds] - E \left[ \left( \sum n_i \phi_i(t) \right)n_k \right] \\ &=& \frac{N_o}{2}\phi_k(t) - \sum E[n_i n_k]\phi_i(t) = 0 \end{eqnarray}

## Correlator Demodulator

We can also decompose a signal according to

(35)
\begin{align} \Delta(s_m) = \Re <\mathbf{r},\mathbf{s_m}> - \frac{1}{2} \|\mathbf{s_m}\|^2 \end{align}

By noting that $\Re <\mathbf{r},\mathbf{s_m}> = \Re \int r(t) s_m^*(t) dt$

## Matched Filters

Correlation operations can be performed through filtering.For example, to compute

(36)
\begin{align} Z_m = \int_0^T r(t) s_m(t) dt \end{align}
(37)
\begin{align} Z(t_o) =\int_0^T r(u) h(t_o - u) du \Rightarrow h(t_o-u)=s_m(u) \Rightarrow h(t) = s_m(t_o-t) \end{align}

To compute $Z_m = \int_0^T r(t) s_m^*(t_o-t) dt$ choose$h(t) = s_m^*(t_o - t)$.The sampling time is chosen such that h(t) is causal.

### Properties of Matched Filter

Suppose we have $r(t) = s(t) + n(t) 0\leq t \leq T$

(38)
\begin{align} SNR_o =\frac{|y_s(T)|^2}{E[y_n^2(T)]} \end{align}

What is the filter h(t) that maxmizes SNRo ?

(39)
\begin{align} y_s(T) = \int_0^T s(\tau)h(T-\tau) d\tau \end{align}
(40)
\begin{align} |y_s(T)|^2 = \left|\int_0^Ts(\tau) h(T-\tau) d\tau \right| ^2 \end{align}
(41)
\begin{align} y_n(T) = \int n(\tau) h(t-\tau) d\tau \end{align}
(42)
\begin{align} E[y_n(T) y_n^*(T)] = \frac{N_o}{2} \int_0^T h^*(T-\tau)h(T-\tau) d\tau = \frac{N_o}{2} \int_0^T|h(T-\tau)|^2d\tau \end{align}

Let

(43)
\begin{align} g(t) = h^*(t) SNR_o = \frac{|\int_0^T s(\tau) g*(T-\tau)d\tau|^2}{\frac{N_o}{2}\int_0^T |g(T-\tau)|^2 d\tau} \end{align}
(44)
\begin{align} Numerator \leq \int_0^T|s(\tau)|^2d\tau \int_0^T|g(t-\tau)|^2d\tau \end{align}

Cauchy-Schwartz with equality if $g(t-\tau)= c s(\tau)$.It is easy to see that c=1

(45)
$$h(t)=s^*(T-t)$$
(46)
\begin{align} \max SNR_o = \frac{2E}{N_o} \end{align}

### Frequency Domain Interpretation

(47)
\begin{align} s(t)\longleftrightarrow s(f) \end{align}
(48)
\begin{align} h(t) = s^*(T-t)\longleftrightarrow s(f) e^{-j2 \pi f T} \end{align}

If$s(f) = |s(f)| e^{j\phi(f)}$ then $H(f) = |s(f)| e^{-j(2\pi f T + \phi(f)}$

Remarks: Gain of filter is proportional to frequency content of the signal.Automatically rejects out of band noise.Consider what happens to the individual frequency components of s(t), i.e $s(t) = \int s(f) e^{j2 \pi f t} df$,

(49)
\begin{align} s(f) = |s(f)|e^{j \phi(f)} e^{j 2 \pi f t} \end{align}

at sampling instant T=t we get same phase for all frequency components and "cohere" them.

page revision: 100, last edited: 07 Jun 2010 18:00