Elements of information theory pdf cover
The historical notes that follow each chapter recap the main points. Now current and enhanced, the Second Edition of Elements of Information Theory remains the ideal textbook for upper-level undergraduate and graduate courses in electrical engineering, statistics, and telecommunications. Shannon Award, Dr. He has authored more than technical papers and is coeditor of Open Problems in Communication and Computation. JOY A. After receiving his PhD at Stanford, Dr. Thomas spent more than nine years at the IBM T.
Only logged in customers who have purchased this product may leave a review. Its extensions can be found in [59]. This is an easy exercise on trees. A path from the root to a node creates a code.
The converse part is easy to prove, and we omit it here. In other words, there is a limit how much we can compress on the average. In Theorem 6. We shall return to the code redundancy in Section 8. To formulate a universal source coding problem, we start by introducing the bit rate R of a code C n. Hereafter, we only deal with extended codes for X1 ; : : : ; Xn generated by a source with the underlying probability measure P.
Let M be the number of codewords for C n. Observe that R represents the number of bits per source symbol, and it can be viewed as the reciprocal of the compression ratio.
In other words, on average the best bit rate is equal to the entropy of the source. We assign codes that are reproduced correctly to elements of the set Gn". Clearly, then the error PE is bounded by ". Let C 0 be the subset of all codewords C that are encoded without an error. This proves the theorem. Such a communication is performed by sending the signal X W that is received as a signal Y depending on X , hence on the message W.
This communication process is illustrated in Figure 6. As before, we shall write P for the probability measure characterizing the source and the channel. The point to observe is that we consider a noisy channel that can alter the signal, so that the received signal is not necessary equal to the original one.
In a sense, the channel coding is an opposing problem to the source coding. We shall prove that the channel capacity is the achievability rate of channel codes. To see that this intuitively, we observe that there are approximately 2nH Y typical Y -sequences.
Hence, we can send at most 2nI X ;Y distinguishable sequences of length n. We make below this statement more rigorous. In our presentation, we shall follow quite closely the excellent exposition of Cover and Thomas [59].
Then, we formulate and complete the proof of the Shannon coding theorem. Exercise 4. This fact is known as Fano's inequality, and we derive it next. In view of the above, we know that the rate of a reliable transmission i.
But, can we achieve this capacity. We formulate it in the following channel coding theorem. The converse part was already proved above, so we now concentrate on the achiev- ability part. The proof of this part is a classic example of the random coding technique. Joint AEP Theorem 6.
Then, by Theorem 6. To complete the proof, we choose the source distribution P to be the distribution P that achieves the capacity C. The former situation arises when dealing with continuous signals while the latter in a lossy data compression when one agrees to lose some information in order to get better compression. Rate distortion theory deals with such problems and can be succinctly described as follows: given a source distribution and a distortion measure, what is the minimum rate required to achieve a particular distortion.
We now introduce a distortion measure that was already mentioned in Section 6. R" ; D is called achievable rate-distortion pair resp. We show that 6. Observe that 6. This implies 6. Therefore, we shall use freely either 6. This is the main result of this section which was proved by Shannon in The main result of this section can be stated as follows. We start with the converse part which is easier. Index this codewords by w 2 n n f1; 2; : : : ; 2nR g. We estimate now PE.
Let Sn C be the set of source sequences xn1 for which there is at least one codeword in C that is distortion typical with xn1. Using distortion AEP Theorem 6. We continue now with 6.
Before we leave this section, we formulate yet another representation for the rate distortion function R D that captures quite well the main thrust of R D and is easy to understand intuitively. As for the converse, let us consider "-achievable codes 2nR ; n; D , and let Sn be the set of xn1 for which 6. The other xn1 are encoded arbitrary, e.
We just sketched the derivation of the following important result. This turns out to be equivalent to the computation of the phrase length in the Wyner-Ziv [] compression scheme, as discussed in Section 1. We follow up this path and extend the analysis to lossy Lempel-Ziv scheme. The next problem we tackle is more of interest to DNA recombination than to data compression. We prove that these algorithms are asymptotically optimal cf. Let us assume that a mixing source MX cf. Section 2. We establish separately an upper bound and a lower bound for Dn.
The quantity Dn M can be treated in a similar manner. Throughout, we use the lossy AEP formulated in Theorem 6. Chapter 3. Surprisingly, there is a certain detail we have to take care of to get it right. This completes the proof of the upper bound. For the lower bound, we use the second moment method cf. Section 3. Observe that the rate of convergence in The- orem 6.
We recall, however, that in Section 4. But, the height Hn is a nondecreasing function of n, and this allows to extend convergence in probability to almost sure conver- gence. Actually, this was an open problem posed by Wyner and Ziv in [] for the lossless case. It was answered in the negative by the author in [] cf. We shall prove this fact below. Regarding Fn , we establish the following generalization of the result presented in Exer- cise 10 of Chapter 4.
We start with an upper bound which is quite simple in this case. By Theorem 6. We show that with probability tending to 1 as n! The height Hn is harder to handle. First of all, we formulate it for the lossless case. Its proof follows the idea already presented in Section 4. A generalization of Theorem 6. One expects that 6. This fact was only proved for memoryless and Markovian sources by Arratia and Waterman [22]. The reader is asked to prove it in Exercise 20 for memoryless sources.
Now, we are able to answer the question posed just after the proof of Theorem 6. We only consider the lossless case to simplify the presentation. In a similar manner we can prove the lim inf part by using Fn and Theo- rem 6. Can 6. These strings are generated by n independent memoryless sources. Section 1. We compare Onopt to an overlap Ongr that is obtained by an application of a greedy algorithm to construct a superstring.
A class of greedy algorithms was also discussed in Section 1. We prove a surprising result showing that both Onopt and Ongr asymptotically grow like 1 h n log n where h is the entropy rate of the source. As explained in Section 1. We continue this process until we exhaust the set of available strings. This, of course, does not construct an optimal superstring which would require to examine all n!
In fact, the problem is NP- hard see Section 1. And so on, that is, we choose in every step a remaining string that has the largest overlap with the superstring built so far. Those n" longer strings cannot have depths bigger than Hn , hence they total contribution to the sum above is bounded by O n" ln n. In words, Nt y1k denotes the number of positions in y1k that are equal to the tth symbol of the alphabet.
In other words, "2 k! We now make the trie Tn a dynamic tree, by allowing random deletions leading to tries Tn 1; : : : T1. We argue next that whp 6. In passing we should mention that the Shortest Common Superstring is not a good com- pression algorithm. We start with a description of the compression code based on the Shortest Common Superstring.
The reader is referred to Section 1. Below we only sketch the model. The source sequence X1M is emitted also by a Markovian source with distribution P. We assume that the database sequence and the source sequence are independent, but one can extend this analysis to mixing dependent sequences.
The average bit rate attains asymptotically nlim! We start with an upper bound. Let LM and SM be the sets of long and short phrases, respectively. We shall write below De n1 for a match phrase that can occur in any position. We now deal with the lower bound. Section 4. Nevertheless, the reader is asked to prove 6. Hence 6. What distribution p minimizes the entropy h p? Furthermore, prove the following representation for the divergence rate D P k P k cf. Using Lemma 6. Prove that cf. Wolfowitz i Consider the channel coding problem, and let N n be the " number of messages that can be sent reliably i.
Extend Theorem 6. Frieze and Szpankowski Extend Theorem 6. As before, we write BD X1n for the distortion-ball of all strings of length n that are within distortion D from the center X1n.
0コメント