ayofoto.info Biography A USERS GUIDE TO MEASURE THEORETIC PROBABILITY PDF

A users guide to measure theoretic probability pdf

Sunday, April 14, 2019 admin Comments(0)

This book grew from a one-semester course offered for many years to a mixed audience of graduate and undergraduate students who have not had the luxury of. Rights Reserved. Download as PDF, TXT or read online from Scribd . Pollard, David, – A user's guide to measure theoretic probability / David Pollard. A very concise book that contains the essentials is Probability Theory by S. R. S. I suggest David Pollard's A User's Guide to Measure Theoretic Probability.


Author: KRYSTEN SILVERBERG
Language: English, Spanish, French
Country: Macedonia
Genre: Fiction & Literature
Pages: 456
Published (Last): 04.11.2015
ISBN: 359-1-79932-736-3
ePub File Size: 17.84 MB
PDF File Size: 11.59 MB
Distribution: Free* [*Regsitration Required]
Downloads: 38517
Uploaded by: HILMA

ayofoto.info: A User's Guide to Measure Theoretic Probability (Cambridge Series in Statistical and Probabilistic Mathematics) (): David Pollard. Cambridge Core - Abstract Analysis - A User's Guide to Measure Theoretic Probability - by David Pollard. Subjects: General Statistics and Probability, Probability Theory and Stochastic Processes, Abstract Analysis, . PDF; Export citation. Why bother with measure theory? 1. The cost and benefit of rigor. 3. Where to start: probabilities or expectations? 5. The de Finetti notation. 7. Fair prices.

I feel uneasy if it is not clear how a convention is disposing of awkward cases. Even measures that don't fit the traditional idea of a continuous distribution can be specified by densities, as in the case of measures dominated by a counting measure. Consider first the question of existence. The methods of Section 6 extend easily to higher dimensional Euclidean spaces. I had every intention of making my little stack of notes into a little book. To complete the proof of FT2.

B-measurable we can carry fi over to y. Xool for each n. Actually the operation is more one of carrying the sets back to the measure rather than carrying the measure over to the sets.

Pollard D. A User's Guide to Measure Theoretic Probability

The following two conditions are equivalent i The sequence is uniformly integrable and it converges in probability to a random variable Xoo. If T is. Image measures and distributions Suppose fi is a measure on a sigma-field A of subsets of X and T is a map from X into a set y. Take expectations. Suppose i holds. With M fixed. The third and fourth forms. If X is a random variable.

We have gained a theorem with almost no extra work. And soon. Don't confuse distribution. Several familiar probabilistic objects are just image measures.

Y o from Q into M2. It is called the image measure of fi under T. B by vg: The distribution function has the following properties. A modicum of measure theory It is easy to check that v is a measure on 3. Its distribution function also known as a cumulative distribution function is defined by FP x: In the last sentence I used the qualifier at least. The small circle symbol o denotes the composition of functions: Image measures also figure in a construction that is discussed nonrigorously in many introductory textbooks.

In general there are many plausible. The simplest form of such an argument has three steps: The image measure P: In probability theory the construction often goes by the name of quantile transformation. Let m denote Lebesgue measure restricted to the Borel sigma-field on 0. To construct such a P. Property b also follows from Dominated Convergence. The result is often restated as: Except in introductory textbooks.

Suppose F is a right-continuous. Then one deduces that. There is a converse to the assertions a and b about distribution functions. Then there exists a probability measure P such that P —oo. Generating classes of sets To prove that all sets in a sigma-field A have some property one often resorts to a generating-class argument. By right continuity of the increasing function F. Property a follows from that fact that the integral is an increasing functional.

Do is the smallest A. Notice that a A-system is also a sigma-field if and only if it is stable under finite intersections. A modicum of measure theory For some properties. Instead we need to work with a possibly smaller A-system Do. Some authors start from a slightly different definition. The choice of Do is easy. Many authors including me. The change in definition would have little effect on the role played by A-systems. It would be enough to show that D is a sigma-field.

I leave it to you to check the easy details that prove Do to be a A. It will then follow that Do is a sigma-field. The trick is to work one component at a time. I think that the letter A.

Define Di: Let Do equal the intersection of all these D a. The n stands for product. See the Notes at the end of this Chapter. It is easy to check that the class D: The Theorem.

The next Example shows what can happen if you forget about the stability under finite intersections. Just write Di instead of D. B R with the same distribution function. Consider a set X consisting of four points. When you employ a A. One first invokes the Theorem to establish a property for sets in a sigmafield. The proof of the last Theorem is typical of many generating class arguments. You will be getting plenty of practice at filling in the details behind frequent assertions of "a generating class argument shows that.

Put another way—this step is the only subtlety in the proof—we can assert that the class D 2: Sometimes it is simpler to invoke an analog of the A. By a trivial generalization of Problem [25]. The sigma-field properties of A-cones are slightly harder to establish than their A-system analogs.

I also toyed with the name cdl-cone. The function fn h: The name X-cone is not standard. By virtue of properties i and ii of X-cones. I found it hard to come up with a name that was both suggestive of the defining properties and analogous to the name for the corresponding classes of sets. For suppose hn. From the monotone convergence.

See the Notes. For a while I used the term Dynkin-cones but abandoned it for historical reasons. X-cone if: First we need an analog of the fact that a A-system that is stable under finite intersections is also a sigma-field. The functions kn increase monotonely to k9 which consequently also belongs to!

The sigma-field a Cj coincides with the Borel sigma-field. C1 norm. Proof Let JCj be the smallest A. From the previous Lemma.. See Problem [26] for the extension of the approximation result to infinite measures.

For a fixed h and C. Trivially it contains C j. Show that the two convergence assumptions also hold for the sequence Am. Be careful about. The next Problem establishes a similar result under weaker assumptions.. Show that g has the desired property. It is called the a-field generated by the map T.

Theoretic a probability measure users guide pdf to

It is often denoted by cr F. For the argument in one direction. For each positive integer n. Show that the F in the definition of inner regularity can then be assumed compact. Deduce that every Borel set is inner regular. Show that fi is tight: Deduce via Dominated Convergence that Xn converges in probability to X. Follow these steps to prove Minkowski's inequality: This result is called the Holder inequality. First dispose of the trivial case where one of the factors on the righthand side is 0 or oo..

P define X oo: Choose countable Tn such that Psup. Part b shows that it is. Show that Psup. Show that the function G: A modicum of measure theory: M and the triangle inequality holds: Let denote its inverse function. You may assume these elementary facts: Write ixB for the restriction of it to B. Consider approximations ght for i large enough.

Notes I recommend Royden as a good source for measure theory. The books of Ash and Dudley are also excellent references, for both measure theory and probability.

Dudley's book contains particularly interesting historical notes. See Hawkins , Chapter 4 to appreciate the subtlety of the idea of a negligible set. The result from Problem [10] is often attributed to Pratt , but, as he noted in his Acknowledgment of Priority , it is actually much older. Ash, R. Dellacherie, C.

Dudley, R. Dynkin, E. Hawkins, T. Hoffmann-J0rgensen, J. I, Chapman and Hall, New York. Oxtoby, J. Pratt, J. Acknowledgement of priority, same journal, vol 37 , page Protter, P. Roy den, H. Sierpiiiski, W.

With Applications to Statistics, Springer-Verlag. Densities and derivatives SECTION 1 explains why the traditional split of introductory probability courses into two segments—the study of discrete distributions, and the study of "continuous" distributions—is unnecessary in a measure theoretic treatment.

Absolute continuity of one measure with respect to another measure is defined. A simple case of the Radon-Nikodym theorem is proved. SECTION 4 explains the connection between the classical concept of absolute continuity and its measure theoretic generalization.

Densities and absolute continuity Nonnegative measurable functions create new measures from old. The word derivative suggests a limit of a ratio of v and fi measures of "small" sets. Section 4 explains the one-dimensional case.

Chapter 6 will give another interpretation via martingales. The qualifier joint sometimes creeps into the description of densities with respect to Lebesgue measure on! From a measure theoretic point of view the qualifier is superfluous, but it is a comforting probabilistic tradition perhaps worthy of preservation. Existence of a density is a property that depends on two measures. Even measures that don't fit the traditional idea of a continuous distribution can be specified by densities, as in the case of measures dominated by a counting measure.

Some introductory texts use the technically correct term density in that case, much to the confusion of students who have come to think that densities have something to do with continuous functions. Densities are useful because they allow integrals with respect to one measure to be reexpressed as integrals with respect to a different measure.

Let m denote Lebesgue measure on [0,1. The map T x: The image measure v: The distribution function F x: The distinction between a continuous function and a function expressible as an integral was recognized early in the history of measure theory, with the name absolute continuity being used to denote the stronger property. The original definition of absolute continuity Section 4 is now a special case of a streamlined characterization that applies not just to measures on the real line.

Probability measures dominated by Lebesgue measure correspond to the continuous distributions of introductory courses, although the correct term is distributions absolutely continuous with respect to Lebesgue measure. By extension, random variables whose distributions are dominated by Lebesgue measure are sometimes called "continuous random variables," which I regard as a harmful abuse of terminology.

Indeed, there need be no topology on Q; the very concept of continuity for the function might be void. Many a student of probability has been misled into assuming topological properties for "continuous random variables. If the measure v is finite, there is an equivalent formulation of absolute continuity that looks more like a continuity property. Define A: The equivalence can fail if v is not a finite measure.

The functions fn x: Existence of a density and absolute continuity are equivalent properties if we exclude some pathological examples, such as those presented in Problems [1] and [2].

Radon-Nikodym Theorem. Then every sigma-finite measure that is absolutely continuous with respect to [i has a density, which is unique up to fx-equivalence.

The Theorem is a special case of the slightly more general result known as the Lebesgue decomposition, which is proved in Section 2 using projections in Hilbert spaces. Most of the ideas needed to prove the general version of the Theorem appear in simpler form in the following proof of a special case. The linear subspace JCo: From Section 2.

To theoretic pdf measure probability a guide users

With C: Perhaps it would be better to say that the two measures are mutually singular, to emphasize the symmetry of the relationship. For example, discrete measures—those that concentrate on countable subsets—are singular with respect to Lebesgue measure on the real line. Avoidance of all probability measures except those dominated by a counting measure or a Lebesgue measure—as in introductory probability courses—imposes awkward constraints on what one can achieve with probability theory.

The restriction becomes particularly tedious for functions of more than a single random variable, for then one is limited to smooth transformations for which image measures and densities with respect to Lebesgue measure can be calculated by means of Jacobians. The unfortunate effects of an artificially restricted theory permeate much of the statistical literature, where sometimes inappropriate and unnecessary requirements are imposed merely to accommmodate a lack of an appropriate measure theoretic foundation.

Why should absolute continuity with respect to Lebesgue measure or counting measure play such a central role in introductory probability theory? I believe the answer is just a matter of definition, or rather, a lack of definition. For a probability measure P concentrated on a countable set of points, expectations Pg X become countable sums, which can be handled by elementary methods.

For general probability measures the definition of Wg X is typically not a matter of elementary calculation. The last integral has the familiar look of a Riemann integral, which is the subject of elementary Calculus courses.

Seldom would A or g be complicated enough to require the interpretation as a Lebesgue integral—one stays away from such functions when teaching an introductory course. From the measure theoretic viewpoint, densities are not just a crutch for support of an inadequate integration theory; they become a useful tool for exploiting absolute continuity for pairs of measures.

In much statistical theory, the actual choice of dominating measure matters little. The following result, which is often called SchefiK's lemma, is typical. Consider first the question of existence. With no loss of generality we may assume that both v and JJL are finite measures. Let ii be a sigma-finite measure on a space X.

Define A. The Lebesgue decomposition Absolute continuity and singularity represent the two extremes for the relationship between two measures on the same space. The general decomposition would follow by piecing together the results for countably many disjoint subsets of X. Lebesgue decomposition. The restriction vabs of v to Jic is called the part of v that is absolutely continuous with respect to fi.

Together the two decompositions partition the underlying space into four measurable sets: Define K: In fact every real valued. The proof of uniqueness of the representation. The total variation distance between two signed measures is the norm of their difference. Throughout the section.

Problem [9] will step you through the details. Several easily proved facts about these distances have important application in mathematical statistics. Mbdd will denote the space of all bounded.

Aen Aen. Densities and derivatives Equality is achieved for a partition with two sets: To prove that it is the largest such measure.

For each bounded. C1 norm does not depend on the particular choice of dominating measure. Without loss of generality. To each pair of finite.

For probability measures. By a similar argument. The Hellinger distance defines a bounded metric on the space of all probability measures on A.

Relative entropy Let P and Q be two probability measures with densities p and q with respect to some dominating measure k. Again the distance does not depend on the choice of dominating measure Problem [13]. For nonnegative measures. Convergence in that metric is equivalent to convergence in -C1 norm. The square roots of the densities. The Cauchy-Schwarz inequality gives a useful lower bound: Some authors prefer to have an upper bound of 1 for the Hellinger distance.

The relative entropy also known as the KullbackLeibler "distance. Equality at V2 occurs when the Hellinger affinity is zero. Hellinger distance between probability measures Let P and Q be probability measures with densities p and q with respect to a dominating measure k.

Nonnegativity would also follow via Jensen's inequality. Densities and derivatives At first sight. It can also be infinite even if P and Q are mutually absolutely continuous Problem [15]. In a similar vein. The relative entropy is well defined and nonnegative.

This inequality is trivially true unless P is absolutely continuous with respect to Q. As with the Cl and Hellinger distances. A Taylor expansion comes to the rescue: For that case. Let PQ denote the N 0. Each of the three distances between PQ and P9 can be calculated in closed form: I will proceed heuristically.

For 0 near zero. The method from the previous Example can be extended to other families of densities.. Taylor expansion gives H2 Po. The last two assertions can be made rigorous by domination assumptions. And finally. Write g and g for the first and second derivatives of the function g x. Continuing the heuristic. Suppose Pe has a density exp g.

Readers familiar with the information inequality from theoretical statistics will recognize the two ways of representing the information function for the family of densities. There can be no improvement in the constants in the last two displayed inequalities.

From the Taylor approximation. Densities and derivatives of convergence and integrability. Of course it is actually immaterial for ii and iii how Hf is defined on the Lebesgue negligible set of points at which the derivative does not exist.

It is one of the most celebrated results of classical analysis. A real valued function H defined on an interval [a. In the Definition. Which functions on the real line can be expressed as integrals of their derivatives? The proof in Section 6 provides two other natural choices. We may fruitfully think of the Fundamental Theorem as making two separate assertions: Their intersection has zero Lebesgue measure.

If H is absolutely continuous then it is differentiate almost everywhere. The connection between absolute continuity of functions and integration of derivatives was established by Lebesgue The function h is unique up to Lebesgue almost sure equivalence. The classical concept of absolute continuity 65 The classical concept of absolute continuity The fundamental problem of Calculus concerns the interpretation of differentiation and integration.

See Problem [21] for an outline of the proof. More precisely. As a pointwise limit of continuous functions. Densities and derivatives As shown at the end of this Section.. The Theorem justifies the usual treatment in introductory Calculus courses of integration as an inverse operation to differentiation. It is therefore quite surprising that existence of even a one sided. The function F is not absolutely continuous..

By splitting [a. Notice what the Fundamental Theorem does not assert: Assertion FTi can be recovered from the Radon-Nikodym theorem for measures. The proof of Assertion FT2 Section 6. It follows that H is absolutely continuous as a function on [a. Let H be a continuous function defined on an interval [a. Close inspection of the arguments in Section 6 would reveal that functions of bounded variation have a derivative almost everywhere.

Without absolute continuity. The uniqueness of the representing h can be established by another A-class generating argument. M sense. Such a function is said to have bounded variation on the interval [a.

We didn't really need absolute continuity to break H into a difference of two increasing functions. Ja D That is. As shown in Section 2. In the limit we get H x. By the Radon-Nikodym Theorem. The proof works via repeated reduction of mD by removal of unions of finite families of disjoint sets from V. More refined versions of the Theorem such as the results presented by Saks The method for the first step sets the pattern.

Of course skinny is not a technical term. F c G is also a Vitali covering for E: Call a collection V of closed subsets ofRd a y-regular Vitali covering of a set E if each member of V is y-regular and if. We will sometimes need to write BF as B x. Section 7. For the application to FT2.

There are many different results presented in texts as the Vitali theorem. Put another way. Section IV. Notice that if G is an open set with G 3 E. There are various ways in which we may approximate D by simpler sets.

The result is trivial if mD is zero. The extra generality is useful. As you will see later. We will have no need for the more general versions. Discard all those members of V that are not subsets of G. The second step is analogous.

It has a finite subcover. I will reuse the symbol V. AS the proof proceeds. What remains is still a Vitali covering of D. Rather than invent a new symbol for the class of sets that remains after each discard. Repeat the argument from the previous paragraphs to find disjoint sets from V whose union. The ordering of the radii ensures that B xiy 3r. V will denote different things at different stages of the proof. Continuity of H at the endpoints of E lets us expand E slightly. In the limit we have the countable disjoint family whose existence is asserted by the Lemma.

In the limit as first 8 tends to zero then n tends to infinity we have. It suffices if we consider the case where h is nonnegative. For nonnegative h9 write v for the measure on!

Show that k has no density with respect to JJL. To complete the proof of FT2. Take the infimum over G to obtain the assertion from i. For i. Cast out countably many negligible sets.

Suppose there exists a. Let G be an open set and K be a compact set. The argument for ii is similar: Reverse the direction of the inequality in the definition of V.

Decompose X into countably many sets fX. Use the properties of fields to argue that the En sets can be chosen disjoint.

H with respect to fi. A2 are both Lebesgue decompositions for a finite measure v with respect to a finite measure Suppose v has Lebesgue decomposition A. Deduce via Problem [8] that there exists a strictly positive. What value does AiHj take on Hi? Argue similarly for the companion equality. AO and N2. Find the Lebesgue decomposition of ii with respect to v. Where exactly would it fail? I feel uneasy if it is not clear how a convention is disposing of awkward cases.

My advice: Subtle errors are easy to miss when concealed within a convention. Then the set function defined on the sigma-field A by i is a countably additive, nonnegative measure, with ji the functional that it generates. It is even possible to use the equality, or something very similar, as the basis for a direct construction of the integral, from which properties i through v are then derived, as you will see from Section 4.

In summary: Accordingly, we should have no qualms about denoting it by the same symbol. In elementary algebra we rely on parentheses, or precedence, to make our meaning clear. With traditional notation, the f and the dfi act like parentheses, enclosing the integrand and separating it from following terms. With linear functional notation, we sometimes need explicit parentheses to make the meaning unambiguous.

As a way of eliminating some parentheses, I often work with the convention that integration has lower precedence than exponentiation, multiplication, and division, but higher precedence than addition or subtraction.

Some of the traditional notations also remove ambiguity when functions of several variables appear in the integrand. For example, in f f x,y fi dx the y variable is held fixed while the fx operates on the first argument of the function.

When a similar ambiguity might arise with linear functional notation, I will append a superscript, as in fjLxf x,y , to make clear which variable is involved in the integration.

Let P be a probability measure, and X be an integrable random variable. Choose JCO: Integrals with respect to Lebesgue measure Lebesgue measure m on S R corresponds to length: Don't worry about confusing the Lebesgue integral with the Riemann integral over finite intervals.

Whenever the Riemann is well defined, so is the Lebesgue, and the two sorts of integral have the same value. The Lebesgue is a more general concept. Indeed, facts about the Riemann are often established by an appeal to theorems about the Lebesgue. You do not have to abandon what you already know about integration over finite intervals. For example, if s: More generally, if s: Thus we can uniquely define jl s for a simple function s: Here you will see why measurability is needed.

For simplicity of notation I will assume s to be very simple: Define simple functions u: A-measurability of all A the sets entering into the definitions of u and v. Finally, note that the simple functions were chosen so that u. Proof of the Monotone Convergence property. Define approximating simple functions s n: Two direct consequences of this limit property have important applications throughout probability theory.

The first, Fatou's Lemma, asserts a weaker limit property for nonnegative functions when the convergence and monotonicity assumptions are dropped. I have slowly realized over the years that many simple probabilistic results can be established by Dominated Convergence arguments.

Variations on the following example form the basis for many counterexamples. The function fn x: And 6n for n even AO f 6 for n even f for n odd. Remember what a liminf means. Define gn: For dominated sequences of functions, a splicing together of two Fatou Lemma assertions gives two liminf consequences that combine to produce a limit result.

See Problem [10] for a generalization. Dominated Convergence. Simplify, using the fact that a liminf is the same as a lim for convergent sequences.

Notice that we cannot yet assert that the liminf on the right-hand side is actually a limit. The negative sign turns a liminf into a lim sup. The convergence assertion follows.

YOU might well object to some of the steps in the proof on o o - o o grounds. It is a common mistake amongst students new to the result to allow F to depend on n. Dominated Convergence turns up in many situations that you might not at first recognize as examples of an interchange in the order of two limit procedures. The neatest justification uses a Dominated Convergence argument.

Domination of the partial derivative will suffice. Please make sure that you understand why continuous limits can be replaced by sequential limits in this way. It is a common simplification. An appeal to Dominated Convergence completes the argument. As the name suggests, we can usually ignore bad things that happen only on a negligible set.

There are several useful facts about negligible sets that are easy to prove and exceedingly useful to have formally stated. They depend on countable additivity, via its Monotone Convergence generalization.

For i: For ii: Reverse the roles of g and h to get the reverse inequality. For iii: For iv: Put Nn: Notice the appeals to countable additivity, via the Monotone Convergence property, in the proofs.

Results such as iv fail without countable additivity, which might trouble those brave souls who would want to develop a probability theory using only finite additivity.

Property iii can be restated as: Actually we can drop the assumption that A A if we enlarge the sigma-field slightly. The Lebesgue sigma-field on the real line is the completion of the Borel sigma-field with respect to Lebesgue measure. Here is one of the standard methods for proving that some measurable set A has zero fi measure.

Many limit theorems in probability theory assert facts about sequences that hold only almost everywhere. Problem [2] establishes an even stronger converse, replacing independence by a weaker limit property. The Borel-Cantelli argument often takes the following form when invoked to establish almost sure convergence. You should make sure you understand the method, because the details are usually omitted in the literature.

A sum of integers converges if and only if the summands are eventually zero. We are allowed to neglect only countable unions of negligible sets. Define N: Many theorems have trivial modifications with equalities replaced by almost sure equalities, and convergence replaced by almost sure convergence, and so on. For example, Dominated Convergence holds in a slightly strengthened form: Then Most practitioners of probability learn to ignore negligible sets and then suffer slightly when they come to some stochastic process arguments where the handling of uncountable families of negligible sets requires more delicacy.

For example, if I could show that a sequence [fn] converges almost everywhere I would hardly hesitate to write: What happens at those x where fn x does not converge? If hard pressed I might write: You might then wonder if the function so-defined were measurable it is , or if the set where the limit exists is measurable it is.

A sneakier solution would be to write: Define f x: It doesn't much matter what happens on the negligible set where the limsup is not equal to the liminf, which happens only when the limit does not exist.

We avoid the first complication by restricting attention to the vector space Cp: We could avoid the second complication by working with the vector space Lp: Again, few authors are careful about maintaining the distinction between Lp and Lp.

Problem [19] shows that the norm defines a complete pseudometric on Lp and a complete metric on Lp. For our purposes, the case where p equals 2 will be the most important. The inner product has the properties: A vector space equipped with an inner product whose corresponding norm defines a complete metric is called a Hilbert space, a generalization of ordinary Euclidean space.

Arguments involving Hilbert spaces look similar to their analogs for Euclidean space, with an occasional precaution against possible difficulties with infinite dimensionality. Many results in Probability and Statistics rely on Hilbert space methods: Some of the basic theory for Hilbert space is established in Appendix B.

For the next several Chapters, the following two Hilbert space results, specialized to L2 spaces, will suffice. Ko must equal the union of all equivalence classes in 3foFor us the most important subspaces of 2 X, A, fz will be defined by the subsigma-fields AQ of A.

Hilbert space in its own right, and therefore it is a closed subspace of L2 X, A, n. Ao-measurable; the subspace Jio need not be closed. The converse is not true: At least when we deal with finite measures, there is an elegant circle of equivalences, involving a concept convergence in measure slightly weaker than almost sure convergence and a concept uniform integrability slightly weaker than domination.

Problem [14] guides you through the proofs of the following facts. Uniform integrability requires that the convergence holds uniformly over a class of random variables. Very roughly speaking, it lets us act almost as if all the random variables were bounded by a constant M, at least as far as Lx arguments are concerned.

It is sometimes slightly more convenient to check for uniform integrability by means of an e-5 characterization. The diagram summarizes the interconnections between the various convergence concepts, with each arrow denoting an implication. The relationship between almost sure convergence and convergence in probability corresponds to results a and b noted above.

A family [Zt: The following two conditions are equivalent i The sequence is uniformly integrable and it converges in probability to a random variable Xoo, which is necessarily integrable.

Suppose i holds. Image measures and distributions Suppose fi is a measure on a sigma-field A of subsets of X and T is a map from X into a set y, equipped with a sigma-field B-measurable we can carry fi over to y, by defining vB: The small circle symbol o denotes the composition of functions: B by vg: We have gained a theorem with almost no extra work, by starting with the linear functional as the definition of the image measure. For example, by splitting into positive and negative parts then subtracting, we could extend the equality to functions in JC'ftl.

And soon. Several familiar probabilistic objects are just image measures. The image measure T F on S M2 is called the joint distribution of X and Y, and is often denoted by fxjSimilar terminology applies for larger collections of random variables. Image measures also figure in a construction that is discussed nonrigorously in many introductory textbooks. Let P be a probability measure on M. Its distribution function also known as a cumulative distribution function is defined by FP x: Don't confuse distribution, as a synonym for probability measure, with distribution Junction, which is a function derived from the measures of a particular collection of sets.

The distribution function has the following properties. Except in introductory textbooks, and in works dealing with the order properties of the real line such as the study of ranks and order statistics , distribution functions have a reduced role to play in modern probability theory, mostly in connection with the following method for building measures on R as images of Lebesgue measure.

In probability theory the construction often goes by the name of quantile transformation. There is a converse to the assertions a and b about distribution functions. However, if F is continuous and strictly increasing, then q is just the inverse function of F, and the plausible equalities hold. Let m denote Lebesgue measure restricted to the Borel sigma-field on 0,1. The image measure P: The result is often restated as: Generating classes of sets To prove that all sets in a sigma-field A have some property one often resorts to a generating-class argument.

The simplest form of such an argument has three steps: Then one deduces that. That is, the property holds for all sets in A. For some properties, direct verification of all the sigma-field requirements for Ao proves too difficult. In such situations an indirect argument sometimes succeeds if has some extra structure. For example, if is possible to establish that Ao is a X-system of sets, then one needs only check one extra requirement for in order to produce a successful generating-class argument.

The change in definition would have little effect on the role played by A-systems. Many authors including me, until recently use the name Dynkin class instead of X-system, but the name Sierpinski class would be more appropriate. See the Notes at the end of this Chapter. Notice that a A-system is also a sigma-field if and only if it is stable under finite intersections.

This stability property can be inherited from a subclass , as in the next Theorem, which is sometimes referred to as the n-X theorem. The n stands for product, an indirect reference to the stability of the subclass under finite intersections products. I think that the letter A. If is stable under finite intersections, and if D is a X-system with D 2 , then D 2 a.

It would be enough to show that D is a sigma-field, by establishing that it is stable under finite intersections, but that is a little more than I know how to do. Instead we need to work with a possibly smaller A-system Do, with D 2 Do 2 , for which generating class arguments can extend the assumption.

The choice of Do is easy. Let Do equal the intersection of all these D a. That is, let Do consist of all sets D for which D e T a for each a. I leave it to you to check the easy details that prove Do to be a A.

In other words, Do is the smallest A. Define Di: Actually, the assertion that Dj is. Put another waythis step is the only subtlety in the proofwe can assert that the class D 2: Just write Di instead of D, and Ei instead of B, in the definition to see that it is only a matter of switching the order of the sets. The proof of the last Theorem is typical of many generating class arguments, in that it is trivial once one knows what one has to check. The Theorem, or its analog for classes of functions see the next Section , will be my main method for establishing sigma-field properties.

You will be getting plenty of practice at filling in the details behind frequent assertions of "a generating class argument shows that B R with the same distribution function. Write for the class of all intervals , f], with t e R. Clearly is stable under finite intersections. It is easy to check that the class D: B R , and the equality of the two Borel measures is established.

When you employ a A. The next Example shows what can happen if you forget about the stability under finite intersections.

Consider a set X consisting of four points, labelled nw, ne, sw, and se. Notice that generates the sigma-field of all subsets of X, but it is not stable under finite intersections. Sometimes it is simpler to invoke an analog of the A. The name X-cone is not standard.

I found it hard to come up with a name that was both suggestive of the defining properties and analogous to the name for the corresponding classes of sets. For a while I used the term Dynkin-cones but abandoned it for historical reasons. See the Notes. I also toyed with the name cdl-cone, as a reminder that the cone contains the positive constant functions and that it is stable under proper differences and monotone increasing limits of uniformly bounded sequences.

The sigma-field properties of A-cones are slightly harder to establish than their A-system analogs, but the reward of more streamlined proofs will make the extra, one-time effort worthwhile.

First we need an analog of the fact that a A-system that is stable under finite intersections is also a sigma-field. The function fn h: That is, is a A-system, stable under finite intersections, and containing. It is a sigma-field containing. That is,! The functions kn increase monotonely to k9 which consequently also belongs to! Proof Let JCj be the smallest A.

Similarly, the class: That is, show that Co is dense in -C! C1 norm. Trivially it contains C j , the class of nonnegative members of Co. The sigma-field a Cj coincides with the Borel sigma-field. See Problem [26] for the extension of the approximation result to infinite measures. The next Problem establishes a similar result under weaker assumptions. Suppose T is a function from a set X into a set y, and suppose that y is equipped with a cr-field B.

It is called the a-field generated by the map T. It is often denoted by cr F. Show that g has the desired property. Suppose a class of sets cannot separate a particular pair of points x, y: Show that a also cannot separate the pair. A collection of sets Jo that is stable under finite unions, finite intersections, and complements is called a field. Suppose v: A collection of sets is called a monotone class if it is stable under unions of increasing sequences and intersections of decreasing sequences.

Deduce that every Borel set is inner regular. Show that the F in the definition of inner regularity can then be assumed compact. Show that fi is tight: For each positive integer n, show that the space X is a countable.

Deduce via Dominated Convergence that Xn converges in probability to X. First dispose of the trivial case where one of the factors on the righthand side is 0 or oo. Then, without loss of generality why? This result is called the Holder inequality. For each random variable on a probability space Q, 7, P define X oo: Let L: Show that oo is a norm on L, which is a vector space, complete under the metric defined by X oo. It is denoted by esssup r r Xt. Part b shows that it is, unique up to an almost sure equivalence.

A, fi and each real t prove the following assertions. Apply Dominated Convergence to. Show that the function G: Let denote its inverse function.

You may assume these elementary facts: Write ixB for the restriction of it to B. Consider approximations ght for i large enough. Notes I recommend Royden as a good source for measure theory. The books of Ash and Dudley are also excellent references, for both measure theory and probability. Dudley's book contains particularly interesting historical notes.

See Hawkins , Chapter 4 to appreciate the subtlety of the idea of a negligible set. The result from Problem [10] is often attributed to Pratt , but, as he noted in his Acknowledgment of Priority , it is actually much older. Ash, R. Dellacherie, C. Dudley, R. Dynkin, E. Hoffmann-J0rgensen, J.

I, Chapman and Hall, New York. Oxtoby, J. Pratt, J. Acknowledgement of priority, same journal, vol 37 , page Protter, P. Roy den, H. Sierpiiiski, W. With Applications to Statistics, Springer-Verlag. Densities and derivatives SECTION 1 explains why the traditional split of introductory probability courses into two segmentsthe study of discrete distributions, and the study of "continuous" distributionsis unnecessary in a measure theoretic treatment.

Absolute continuity of one measure with respect to another measure is defined. A simple case of the Radon-Nikodym theorem is proved.

A User's Guide to Measure Theoretic Probability - David Pollard - Google книги

SECTION 4 explains the connection between the classical concept of absolute continuity and its measure theoretic generalization. Densities and absolute continuity Nonnegative measurable functions create new measures from old. The word derivative suggests a limit of a ratio of v and fi measures of "small" sets.

Section 4 explains the one-dimensional case. Chapter 6 will give another interpretation via martingales. The qualifier joint sometimes creeps into the description of densities with respect to Lebesgue measure on! From a measure theoretic point of view the qualifier is superfluous, but it is a comforting probabilistic tradition perhaps worthy of preservation. Existence of a density is a property that depends on two measures.

Even measures that don't fit the traditional idea of a continuous distribution can be specified by densities, as in the case of measures dominated by a counting measure. Some introductory texts use the technically correct term density in that case, much to the confusion of students who have come to think that densities have something to do with continuous functions.

Densities are useful because they allow integrals with respect to one measure to be reexpressed as integrals with respect to a different measure. Let m denote Lebesgue measure on [0,1. The map T x: The image measure v: The distribution function F x: The distinction between a continuous function and a function expressible as an integral was recognized early in the history of measure theory, with the name absolute continuity being used to denote the stronger property.

The original definition of absolute continuity Section 4 is now a special case of a streamlined characterization that applies not just to measures on the real line. Probability measures dominated by Lebesgue measure correspond to the continuous distributions of introductory courses, although the correct term is distributions absolutely continuous with respect to Lebesgue measure. By extension, random variables whose distributions are dominated by Lebesgue measure are sometimes called "continuous random variables," which I regard as a harmful abuse of terminology.

There need be nothing continuous about a "continuous random variable" as a function from a set 2 into R. Indeed, there need be no topology on Q; the very concept of continuity for the function might be void.

Many a student of probability has been misled into assuming topological properties for "continuous random variables. If the measure v is finite, there is an equivalent formulation of absolute continuity that looks more like a continuity property.

Define A: In other words, the -8 property is equivalent to absolute continuity, at least when v is a finite measure. The equivalence can fail if v is not a finite measure.

However, even if v: The functions fn x: Existence of a density and absolute continuity are equivalent properties if we exclude some pathological examples, such as those presented in Problems [1] and [2]. Radon-Nikodym Theorem.

Then every sigma-finite measure that is absolutely continuous with respect to [i has a density, which is unique up to fx-equivalence. The Theorem is a special case of the slightly more general result known as the Lebesgue decomposition, which is proved in Section 2 using projections in Hilbert spaces.

Most of the ideas needed to prove the general version of the Theorem appear in simpler form in the following proof of a special case. The linear subspace JCo: From Section 2.

With C: That is, the two measures concentrate on disjoint parts of X, a situation denoted by writing v JJL. Perhaps it would be better to say that the two measures are mutually singular, to emphasize the symmetry of the relationship.

For example, discrete measuresthose that concentrate on countable subsetsare singular with respect to Lebesgue measure on the real line. Avoidance of all probability measures except those dominated by a counting measure or a Lebesgue measureas in introductory probability coursesimposes awkward constraints on what one can achieve with probability theory. The restriction becomes particularly tedious for functions of more than a single random variable, for then one is limited to smooth transformations for which image measures and densities with respect to Lebesgue measure can be calculated by means of Jacobians.

The unfortunate effects of an artificially restricted theory permeate much of the statistical literature, where sometimes inappropriate and unnecessary requirements are imposed merely to accommmodate a lack of an appropriate measure theoretic foundation. Why should absolute continuity with respect to Lebesgue measure or counting measure play such a central role in introductory probability theory?

I believe the answer is just a matter of definition, or rather, a lack of definition. For a probability measure P concentrated on a countable set of points, expectations Pg X become countable sums, which can be handled by elementary methods.

Users probability pdf guide theoretic a to measure

For general probability measures the definition of Wg X is typically not a matter of elementary calculation. The last integral has the familiar look of a Riemann integral, which is the subject of elementary Calculus courses. Seldom would A or g be complicated enough to require the interpretation as a Lebesgue integralone stays away from such functions when teaching an introductory course.

From the measure theoretic viewpoint, densities are not just a crutch for support of an inadequate integration theory; they become a useful tool for exploiting absolute continuity for pairs of measures.

Guide to pdf probability users a theoretic measure

In much statistical theory, the actual choice of dominating measure matters little. The following result, which is often called SchefiK's lemma, is typical. The Lebesgue decomposition Absolute continuity and singularity represent the two extremes for the relationship between two measures on the same space.

Lebesgue decomposition. Let ii be a sigma-finite measure on a space X,. The restriction vabs of v to Jic is called the part of v that is absolutely continuous with respect to fi. Together the two decompositions partition the underlying space into four measurable sets: Consider first the question of existence. With no loss of generality we may assume that both v and JJL are finite measures.

The general decomposition would follow by piecing together the results for countably many disjoint subsets of X. Define A. Define K: The proof of uniqueness of the representation, up to various almost sure equivalences, follows a similar style of argument. Problem [9] will step you through the details. In fact every real valued, countably additive set function defined on a sigma field can be represented in that way, but we shall not be needing the more basic characterization.

Throughout the section, Mbdd will denote the space of all bounded, real-valued, yi-measurable functions on X, and JVtjd will denote the cone of nonnegative functions in MbddThere are a number of closely related distances between the measures, all of which involve calculations with densities.

Several easily proved facts about these distances have important application in mathematical statistics. The total variation distance between two signed measures is the norm of their difference. Equality is achieved for a partition with two sets: In particular, the total variation distance between two measures equals the JC1 distance between their densities; in fact, total variation distance is often referred to as i;1 distance.

C1 norm does not depend on the particular choice of dominating measure. Without loss of generality, we may assume ix has a density m with respect to X. Hellinger distance between probability measures Let P and Q be probability measures with densities p and q with respect to a dominating measure k.

The Hellinger distance between the two measures is defined as the 2 distance between the square roots of their densities, H P, Q 2: Again the distance does not depend on the choice of dominating measure Problem [13].

The Cauchy-Schwarz inequality gives a useful lower bound:. The Hellinger distance defines a bounded metric on the space of all probability measures on A. For example, discrete distributions concentrated on a countable set are always at the maximum Hellinger distance from nonatomic distributions zero mass at each point. Some authors prefer to have an upper bound of 1 for the Hellinger distance; they include an extra factor of a half in the definition of H Py Q 2.

Relative entropy Let P and Q be two probability measures with densities p and q with respect to some dominating measure k. At first sight, it is not obvious that the definition cannot suffer from the oo oo problem. A Taylor expansion comes to the rescue: The relative entropy is well defined and nonnegative.

Nonnegativity would also follow via Jensen's inequality. That is, the relative entropy is infinite unless P is absolutely continuous with respect to Q. It can also be infinite even if P and Q are mutually absolutely continuous Problem [15]. As with the Cl and Hellinger distances, the relative entropy does not depend on the choice of the dominating measure X Problem [14].

This inequality is trivially true unless P is absolutely continuous with respect to Q, in which case we can take X equal to Q. Each of the three distances between PQ and P9 can be calculated in closed form: The method from the previous Example can be extended to other families of densities, providing a better indication of how much improvement might be possible in the three inequalities.

I will proceed heuristically, ignoring questions. Home Questions Tags Users Unanswered. Lecture notes for measure theoretic probability theory Ask Question. So, it's an actual textbook, but with the same cost as lecture notes. Did k 23 Michael Chernick Michael Chernick 4, 2 13 Dilip Sarwate Dilip Sarwate Rus May Rus May 1, 8 Sign up or log in Sign up using Google.

Sign up using Facebook. Sign up using Email and Password. Post as a guest Name.