== Distributions and tempered distributions == Proof: Let T {\displaystyle {\mathcal {T}}} be a tempered distribution, and let O ⊆ R d {\displaystyle O\subseteq \mathbb {R} ^{d}} be open. 1. We show that T ( φ ) {\displaystyle {\mathcal {T}}(\varphi )} has a well-defined value for φ ∈ D ( O ) {\displaystyle \varphi \in {\mathcal {D}}(O)} . Due to theorem 3.9, every bump function is a Schwartz function, which is why the expression T ( φ ) {\displaystyle {\mathcal {T}}(\varphi )} makes sense for every φ ∈ D ( O ) {\displaystyle \varphi \in {\mathcal {D}}(O)} . 2. We show that the restriction is linear. Let a , b ∈ R {\displaystyle a,b\in \mathbb {R} } and φ , ϑ ∈ D ( O ) {\displaystyle \varphi ,\vartheta \in {\mathcal {D}}(O)} . Since due to theorem 3.9 φ {\displaystyle \varphi } and ϑ {\displaystyle \vartheta } are Schwartz functions as well, we have ∀ a , b ∈ R , φ , ϑ ∈ D ( O ) : T ( a φ + b ϑ ) = a T ( φ ) + b T ( ϑ ) {\displaystyle \forall a,b\in \mathbb {R} ,\varphi ,\vartheta \in {\mathcal {D}}(O):{\mathcal {T}}(a\varphi +b\vartheta )=a{\mathcal {T}}(\varphi )+b{\mathcal {T}}(\vartheta )} due to the linearity of T {\displaystyle {\mathcal {T}}} for all Schwartz functions. Thus T {\displaystyle {\mathcal {T}}} is also linear for bump functions. 3. We show that the restriction of T {\displaystyle {\mathcal {T}}} to D ( O ) {\displaystyle {\mathcal {D}}(O)} is sequentially continuous. Let φ l → φ {\displaystyle \varphi _{l}\to \varphi } in the notion of convergence of bump functions. Due to theorem 3.11, φ l → φ {\displaystyle \varphi _{l}\to \varphi } in the notion of convergence of Schwartz functions. Since T {\displaystyle {\mathcal {T}}} as a tempered distribution is sequentially continuous, T ( φ l ) → T ( φ ) {\displaystyle {\mathcal {T}}(\varphi _{l})\to {\mathcal {T}}(\varphi )} . ◻ {\displaystyle \Box } == The convolution == The convolution of two functions may not always exist, but there are sufficient conditions for it to exist: Theorem 4.5: Let p , q ∈ [ 1 , ∞ ] {\displaystyle p,q\in [1,\infty ]} such that 1 p + 1 q = 1 {\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1} and let f ∈ L p ( R d ) {\displaystyle f\in L^{p}(\mathbb {R} ^{d})} and g ∈ L q ( R d ) {\displaystyle g\in L^{q}(\mathbb {R} ^{d})} . Then for all y ∈ O {\displaystyle y\in O} , the integral ∫ R d f ( x ) g ( y − x ) d x {\displaystyle \int _{\mathbb {R} ^{d}}f(x)g(y-x)dx} has a well-defined real value. Proof: Due to Hölder's inequality, ∫ R d | f ( x ) g ( y − x ) | d x ≤ ( ∫ R d | f ( x ) | p d x ) 1 / p ( ∫ R d | g ( y − x ) | q d x ) 1 / q < ∞ {\displaystyle \int _{\mathbb {R} ^{d}}|f(x)g(y-x)|dx\leq \left(\int _{\mathbb {R} ^{d}}|f(x)|^{p}dx\right)^{1/p}\left(\int _{\mathbb {R} ^{d}}|g(y-x)|^{q}dx\right)^{1/q}<\infty } . ◻ {\displaystyle \Box } We shall now prove that the convolution is commutative, i. e. f ∗ g = g ∗ f {\displaystyle f*g=g*f} . Proof: We apply multi-dimensional integration by substitution using the diffeomorphism x ↦ y − x {\displaystyle x\mapsto y-x} to obtain ( f ∗ g ) ( y ) = ∫ R d f ( x ) g ( y − x ) d x = ∫ R d f ( y − x ) g ( x ) d x = ( g ∗ f ) ( y ) {\displaystyle (f*g)(y)=\int _{\mathbb {R} ^{d}}f(x)g(y-x)dx=\int _{\mathbb {R} ^{d}}f(y-x)g(x)dx=(g*f)(y)} . ◻ {\displaystyle \Box } Proof: Let α ∈ N 0 d {\displaystyle \alpha \in \mathbb {N} _{0}^{d}} be arbitrary. Then, since for all y ∈ R d {\displaystyle y\in \mathbb {R} ^{d}} ∫ R d | f ( x ) ∂ α η δ ( y − x ) | d x ≤ ‖ ∂ α η δ ‖ ∞ ∫ R d | f ( x ) | d x {\displaystyle \int _{\mathbb {R} ^{d}}|f(x)\partial _{\alpha }\eta _{\delta }(y-x)|dx\leq \|\partial _{\alpha }\eta _{\delta }\|_{\infty }\int _{\mathbb {R} ^{d}}|f(x)|dx} and further | f ( x ) ∂ α η δ ( y − x ) | ≤ | f ( x ) | {\displaystyle |f(x)\partial _{\alpha }\eta _{\delta }(y-x)|\leq |f(x)|} ,Leibniz' integral rule (theorem 2.2) is applicable, and by repeated application of Leibniz' integral rule we obtain ∂ α f ∗ η δ = f ∗ ∂ α η δ {\displaystyle \partial _{\alpha }f*\eta _{\delta }=f*\partial _{\alpha }\eta _{\delta }} . ◻ {\displaystyle \Box } == Regular distributions == In this section, we shortly study a class of distributions which we call regular distributions. In particular, we will see that for certain kinds of functions there exist corresponding distributions. Two questions related to this definition could be asked: Given a function f : R d → R {\displaystyle f:\mathbb {R} ^{d}\to \mathbb {R} } , is T f : D ( O ) → R {\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} } for O ⊆ R d {\displaystyle O\subseteq \mathbb {R} ^{d}} open given by T f ( φ ) := ∫ O f ( x ) φ ( x ) d x {\displaystyle {\mathcal {T}}_{f}(\varphi ):=\int _{O}f(x)\varphi (x)dx} well-defined and a distribution? Or is T f : S ( R d ) → R {\displaystyle {\mathcal {T}}_{f}:{\mathcal {S}}(\mathbb {R} ^{d})\to \mathbb {R} } given by T f ( ϕ ) := ∫ R d f ( x ) ϕ ( x ) d x {\displaystyle {\mathcal {T}}_{f}(\phi ):=\int _{\mathbb {R} ^{d}}f(x)\phi (x)dx} well-defined and a tempered distribution? In general, the answer to these two questions is no, but both questions can be answered with yes if the respective function f {\displaystyle f} has the respectively right properties, as the following two theorems show. But before we state the first theorem, we have to define what local integrability means, because in the case of bump functions, local integrability will be exactly the property which f {\displaystyle f} needs in order to define a corresponding regular distribution: Now we are ready to give some sufficient conditions on f {\displaystyle f} to define a corresponding regular distribution or regular tempered distribution by the way of T f : D ( O ) → R , T f ( φ ) := ∫ O f ( x ) φ ( x ) d x {\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} ,{\mathcal {T}}_{f}(\varphi ):=\int _{O}f(x)\varphi (x)dx} or T f : S ( R d ) → R , T f ( ϕ ) := ∫ R d f ( x ) ϕ ( x ) d x {\displaystyle {\mathcal {T}}_{f}:{\mathcal {S}}(\mathbb {R} ^{d})\to \mathbb {R} ,{\mathcal {T}}_{f}(\phi ):=\int _{\mathbb {R} ^{d}}f(x)\phi (x)dx} : Proof: 1. We show that if f ∈ L loc 1 ( O ) {\displaystyle f\in L_{\text{loc}}^{1}(O)} , then T f : D ( O ) → R {\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} } is a distribution. Well-definedness follows from the triangle inequality of the integral and the monotony of the integral: | ∫ U φ ( x ) f ( x ) d x | ≤ ∫ U | φ ( x ) f ( x ) | d x = ∫ supp φ | φ ( x ) f ( x ) | d x ≤ ∫ supp φ ‖ φ ‖ ∞ | f ( x ) | d x = ‖ φ ‖ ∞ ∫ supp φ | f ( x ) | d x < ∞ {\displaystyle {\begin{aligned}\left|\int _{U}\varphi (x)f(x)dx\right|\leq \int _{U}|\varphi (x)f(x)|dx=\int _{{\text{supp }}\varphi }|\varphi (x)f(x)|dx\\\leq \int _{{\text{supp }}\varphi }\|\varphi \|_{\infty }|f(x)|dx=\|\varphi \|_{\infty }\int _{{\text{supp }}\varphi }|f(x)|dx<\infty \end{aligned}}} In order to have an absolute value strictly less than infinity, the first integral must have a well-defined value in the first place. Therefore, T f {\displaystyle {\mathcal {T}}_{f}} really maps to R {\displaystyle \mathbb {R} } and well-definedness is proven. Continuity follows similarly due to | T f φ l − T f φ | = | ∫ K ( φ l − φ ) ( x ) f ( x ) d x | ≤ ‖ φ l − φ ‖ ∞ ∫ K | f ( x ) | d x ⏟ independent of l → 0 , l → ∞ {\displaystyle |T_{f}\varphi _{l}-T_{f}\varphi |=\left|\int _{K}(\varphi _{l}-\varphi )(x)f(x)dx\right|\leq \|\varphi _{l}-\varphi \|_{\infty }\underbrace {\int _{K}|f(x)|dx} _{{\text{independent of }}l}\to 0,l\to \infty } , where K {\displaystyle K} is the compact set in which all the supports of φ l , l ∈ N {\displaystyle \varphi _{l},l\in \mathbb {N} } and φ {\displaystyle \varphi } are contained (remember: The existence of a compact set such that all the supports of φ l , l ∈ N {\displaystyle \varphi _{l},l\in \mathbb {N} } are contained in it is a part of the definition of convergence in D ( O ) {\displaystyle {\mathcal {D}}(O)} , see the last chapter. As in the proof of theorem 3.11, we also conclude that the support of φ {\displaystyle \varphi } is also contained in K {\displaystyle K} ). Linearity follows due to the linearity of the integral. 2. We show that T f {\displaystyle {\mathcal {T}}_{f}} is a distribution, then f ∈ L loc 1 ( O ) {\displaystyle f\in L_{\text{loc}}^{1}(O)} (in fact, we even show that if T f ( φ ) {\displaystyle {\mathcal {T}}_{f}(\varphi )} has a well-defined real value for every φ ∈ D ( O ) {\displaystyle \varphi \in {\mathcal {D}}(O)} , then f ∈ L loc 1 ( O ) {\displaystyle f\in L_{\text{loc}}^{1}(O)} . Therefore, by part 1 of this proof, which showed that if f ∈ L loc 1 ( O ) {\displaystyle f\in L_{\text{loc}}^{1}(O)} it follows that T f {\displaystyle {\mathcal {T}}_{f}} is a distribution in D ∗ ( O ) {\displaystyle {\mathcal {D}}^{*}(O)} , we have that if T f ( φ ) {\displaystyle {\mathcal {T}}_{f}(\varphi )} is a well-defined real number for every φ ∈ D ( O ) {\displaystyle \varphi \in {\mathcal {D}}(O)} , T f {\displaystyle {\mathcal {T}}_{f}} is a distribution in D ( O ) {\displaystyle {\mathcal {D}}(O)} . Let K ⊂ U {\displaystyle K\subset U} be an arbitrary compact set. We define μ : K → R , μ ( ξ ) := inf x ∈ R d ∖ O ‖ ξ − x ‖ {\displaystyle \mu :K\to \mathbb {R} ,\mu (\xi ):=\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|} μ {\displaystyle \mu } is continuous, even Lipschitz continuous with Lipschitz constant 1 {\displaystyle 1} : Let ξ , ι ∈ R d {\displaystyle \xi ,\iota \in \mathbb {R} ^{d}} . Due to the triangle inequality, both ∀ ( x , y ) ∈ R 2 : ‖ ξ − x ‖ ≤ ‖ ξ − ι ‖ + ‖ ι − y ‖ + ‖ y − x ‖ ( ∗ ) {\displaystyle \forall (x,y)\in \mathbb {R} ^{2}:\|\xi -x\|\leq \|\xi -\iota \|+\|\iota -y\|+\|y-x\|~~~~~(*)} and ∀ ( x , y ) ∈ R 2 : ‖ ι − y ‖ ≤ ‖ ι − ξ ‖ + ‖ ξ − x ‖ + ‖ x − y ‖ ( ∗ ∗ ) {\displaystyle \forall (x,y)\in \mathbb {R} ^{2}:\|\iota -y\|\leq \|\iota -\xi \|+\|\xi -x\|+\|x-y\|~~~~~(**)} , which can be seen by applying the triangle inequality twice. We choose sequences ( x l ) l ∈ N {\displaystyle (x_{l})_{l\in \mathbb {N} }} and ( y m ) m ∈ N {\displaystyle (y_{m})_{m\in \mathbb {N} }} in R d ∖ O {\displaystyle \mathbb {R} ^{d}\setminus O} such that lim l → ∞ ‖ ξ − x l ‖ = μ ( ξ ) {\displaystyle \lim _{l\to \infty }\|\xi -x_{l}\|=\mu (\xi )} and lim m → ∞ ‖ ι − y m ‖ = μ ( ι ) {\displaystyle \lim _{m\to \infty }\|\iota -y_{m}\|=\mu (\iota )} and consider two cases. First, we consider what happens if μ ( ξ ) ≥ μ ( ι ) {\displaystyle \mu (\xi )\geq \mu (\iota )} . Then we have | μ ( ξ ) − μ ( ι ) | = μ ( ξ ) − μ ( ι ) = inf x ∈ R d ∖ O ‖ ξ − x ‖ − inf y ∈ R d ∖ O ‖ ι − y ‖ = inf x ∈ R d ∖ O ‖ ξ − x ‖ − lim m → ∞ ‖ ι − y m ‖ = lim m → ∞ inf x ∈ R d ∖ O ( ‖ ξ − x ‖ − ‖ ι − y m ‖ ) ≤ lim m → ∞ inf x ∈ R d ∖ O ( ‖ ξ − ι ‖ + ‖ x − y m ‖ ) ( ∗ ) with y = y m = ‖ ξ − ι ‖ {\displaystyle {\begin{aligned}|\mu (\xi )-\mu (\iota )|&=\mu (\xi )-\mu (\iota )&\\&=\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|-\inf _{y\in \mathbb {R} ^{d}\setminus O}\|\iota -y\|&\\&=\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|-\lim _{m\to \infty }\|\iota -y_{m}\|&\\&=\lim _{m\to \infty }\inf _{x\in \mathbb {R} ^{d}\setminus O}\left(\|\xi -x\|-\|\iota -y_{m}\|\right)&\\&\leq \lim _{m\to \infty }\inf _{x\in \mathbb {R} ^{d}\setminus O}\left(\|\xi -\iota \|+\|x-y_{m}\|\right)&(*){\text{ with }}y=y_{m}\\&=\|\xi -\iota \|&\end{aligned}}} .Second, we consider what happens if μ ( ξ ) ≤ μ ( ι ) {\displaystyle \mu (\xi )\leq \mu (\iota )} : | μ ( ξ ) − μ ( ι ) | = μ ( ι ) − μ ( ξ ) = inf y ∈ R d ∖ O ‖ ι − y ‖ − inf x ∈ R d ∖ O ‖ ξ − x ‖ = inf y ∈ R d ∖ O ‖ ι − y ‖ − lim l → ∞ ‖ ξ − x l ‖ = lim l → ∞ inf y ∈ R d ∖ O ( ‖ ι − y ‖ − ‖ ξ − x l ‖ ) ≤ lim l → ∞ inf y ∈ R d ∖ O ( ‖ ξ − ι ‖ + ‖ y − x l ‖ ) ( ∗ ∗ ) with x = x l = ‖ ξ − ι ‖ {\displaystyle {\begin{aligned}|\mu (\xi )-\mu (\iota )|&=\mu (\iota )-\mu (\xi )&\\&=\inf _{y\in \mathbb {R} ^{d}\setminus O}\|\iota -y\|-\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|&\\&=\inf _{y\in \mathbb {R} ^{d}\setminus O}\|\iota -y\|-\lim _{l\to \infty }\|\xi -x_{l}\|&\\&=\lim _{l\to \infty }\inf _{y\in \mathbb {R} ^{d}\setminus O}\left(\|\iota -y\|-\|\xi -x_{l}\|\right)&\\&\leq \lim _{l\to \infty }\inf _{y\in \mathbb {R} ^{d}\setminus O}\left(\|\xi -\iota \|+\|y-x_{l}\|\right)&(**){\text{ with }}x=x_{l}\\&=\|\xi -\iota \|&\end{aligned}}} Since always either μ ( ξ ) ≥ μ ( ι ) {\displaystyle \mu (\xi )\geq \mu (\iota )} or μ ( ξ ) ≤ μ ( ι ) {\displaystyle \mu (\xi )\leq \mu (\iota )} , we have proven Lipschitz continuity and thus continuity. By the extreme value theorem, μ {\displaystyle \mu } therefore has a minimum κ ∈ R d {\displaystyle \kappa \in \mathbb {R} ^{d}} . Since μ ( κ ) = 0 {\displaystyle \mu (\kappa )=0} would mean that ‖ ξ − x l ‖ → 0 , l → ∞ {\displaystyle \|\xi -x_{l}\|\to 0,l\to \infty } for a sequence ( x l ) l ∈ N {\displaystyle (x_{l})_{l\in \mathbb {N} }} in R d ∖ O {\displaystyle \mathbb {R} ^{d}\setminus O} which is a contradiction as R d ∖ O {\displaystyle \mathbb {R} ^{d}\setminus O} is closed and κ ∈ K ⊂ O {\displaystyle \kappa \in K\subset O} , we have μ ( κ ) > 0 {\displaystyle \mu (\kappa )>0} . Hence, if we define δ := μ ( κ ) {\displaystyle \delta :=\mu (\kappa )} , then δ > 0 {\displaystyle \delta >0} . Further, the function ϑ : R d → R , ϑ ( x ) := ( χ K + B δ / 4 ( 0 ) ∗ η δ / 4 ) ( x ) = ∫ R d η δ / 4 ( y ) χ K + B δ / 4 ( 0 ) ( x − y ) d y = ∫ B δ / 4 ( 0 ) η δ / 4 ( y ) χ K + B δ / 4 ( 0 ) ( x − y ) d y {\displaystyle \vartheta :\mathbb {R} ^{d}\to \mathbb {R} ,\vartheta (x):=(\chi _{K+B_{\delta /4}(0)}*\eta _{\delta /4})(x)=\int _{\mathbb {R} ^{d}}\eta _{\delta /4}(y)\chi _{K+B_{\delta /4}(0)}(x-y)dy=\int _{B_{\delta /4}(0)}\eta _{\delta /4}(y)\chi _{K+B_{\delta /4}(0)}(x-y)dy} has support contained in O {\displaystyle O} , is equal to 1 {\displaystyle 1} within K {\displaystyle K} and further is contained in C ∞ ( R d ) {\displaystyle {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})} due to lemma 4.7. Hence, it is also contained in D ( O ) {\displaystyle {\mathcal {D}}(O)} . Since therefore, by the monotonicity of the integral ∫ K | f ( x ) | d x = ∫ O | f ( x ) | χ K ( x ) d x ≤ ∫ R d | f ( x ) | ϑ ( x ) d x {\displaystyle \int _{K}|f(x)|dx=\int _{O}|f(x)|\chi _{K}(x)dx\leq \int _{\mathbb {R} ^{d}}|f(x)|\vartheta (x)dx} , f {\displaystyle f} is indeed locally integrable. ◻ {\displaystyle \Box } Proof: From Hölder's inequality we obtain ∫ R d | ϕ ( x ) | | f ( x ) | d x ≤ ‖ ϕ ‖ L 2 ‖ f ‖ L 2 < ∞ {\displaystyle \int _{\mathbb {R} ^{d}}|\phi (x)||f(x)|dx\leq \|\phi \|_{L^{2}}\|f\|_{L^{2}}<\infty } .Hence, T f {\displaystyle {\mathcal {T}}_{f}} is well-defined. Due to the triangle inequality for integrals and Hölder's inequality, we have | T f ( ϕ l ) − T f ( ϕ ) | ≤ ∫ R d | ( ϕ l − ϕ ) ( x ) | | f ( x ) | d x ≤ ‖ ϕ l − ϕ ‖ L 2 ‖ f ‖ L 2 {\displaystyle |T_{f}(\phi _{l})-T_{f}(\phi )|\leq \int _{\mathbb {R} ^{d}}|(\phi _{l}-\phi )(x)||f(x)|dx\leq \|\phi _{l}-\phi \|_{L^{2}}\|f\|_{L^{2}}} Furthermore ‖ ϕ l − ϕ ‖ L 2 2 ≤ ‖ ϕ l − ϕ ‖ ∞ ∫ R d | ( ϕ l − ϕ ) ( x ) | d x = ‖ ϕ l − ϕ ‖ ∞ ∫ R d ∏ j = 1 d ( 1 + x j 2 ) | ( ϕ l − ϕ ) ( x ) | 1 ∏ j = 1 d ( 1 + x j 2 ) d x ≤ ‖ ϕ l − ϕ ‖ ∞ ‖ ∏ j = 1 d ( 1 + x j 2 ) ( ϕ l − ϕ ) ‖ ∞ ∫ R d 1 ∏ j = 1 d ( 1 + x j 2 ) d x ⏟ = π d {\displaystyle {\begin{aligned}\|\phi _{l}-\phi \|_{L^{2}}^{2}&\leq \|\phi _{l}-\phi \|_{\infty }\int _{\mathbb {R} ^{d}}|(\phi _{l}-\phi )(x)|dx\\&=\|\phi _{l}-\phi \|_{\infty }\int _{\mathbb {R} ^{d}}\prod _{j=1}^{d}(1+x_{j}^{2})|(\phi _{l}-\phi )(x)|{\frac {1}{\prod _{j=1}^{d}(1+x_{j}^{2})}}dx\\&\leq \|\phi _{l}-\phi \|_{\infty }\left\|\prod _{j=1}^{d}(1+x_{j}^{2})(\phi _{l}-\phi )\right\|_{\infty }\underbrace {\int _{\mathbb {R} ^{d}}{\frac {1}{\prod _{j=1}^{d}(1+x_{j}^{2})}}dx} _{=\pi ^{d}}\end{aligned}}} .If ϕ l → ϕ {\displaystyle \phi _{l}\to \phi } in the notion of convergence of the Schwartz function space, then this expression goes to zero. Therefore, continuity is verified. Linearity follows from the linearity of the integral. ◻ {\displaystyle \Box } == Equicontinuity == We now introduce the concept of equicontinuity. So equicontinuity is in fact defined for sets of continuous functions mapping from X {\displaystyle X} (a set in a metric space) to the real numbers R {\displaystyle \mathbb {R} } . Proof: In order to prove uniform convergence, by definition we must prove that for all ϵ > 0 {\displaystyle \epsilon >0} , there exists an N ∈ N {\displaystyle N\in \mathbb {N} } such that for all l ≥ N : ∀ x ∈ Q : | f l ( x ) − f ( x ) | < ϵ {\displaystyle l\geq N:\forall x\in Q:|f_{l}(x)-f(x)|<\epsilon } . So let's assume the contrary, which equals by negating the logical statement ∃ ϵ > 0 : ∀ N ∈ N : ∃ l ≥ N : ∃ x ∈ Q : | f l ( x ) − f ( x ) | ≥ ϵ {\displaystyle \exists \epsilon >0:\forall N\in \mathbb {N} :\exists l\geq N:\exists x\in Q:|f_{l}(x)-f(x)|\geq \epsilon } .We choose a sequence ( x m ) m ∈ N {\displaystyle (x_{m})_{m\in \mathbb {N} }} in Q {\displaystyle Q} . We take x 1 {\displaystyle x_{1}} in Q {\displaystyle Q} such that | f l 1 ( x 1 ) − f ( x 1 ) | ≥ ϵ {\displaystyle |f_{l_{1}}(x_{1})-f(x_{1})|\geq \epsilon } for an arbitrarily chosen l 1 ∈ N {\displaystyle l_{1}\in \mathbb {N} } and if we have already chosen x k {\displaystyle x_{k}} and l k {\displaystyle l_{k}} for all k ∈ { 1 , … , m } {\displaystyle k\in \{1,\ldots ,m\}} , we choose x m + 1 {\displaystyle x_{m+1}} such that | f l m + 1 ( x m + 1 ) − f ( x m + 1 ) | ≥ ϵ {\displaystyle |f_{l_{m+1}}(x_{m+1})-f(x_{m+1})|\geq \epsilon } , where l m + 1 {\displaystyle l_{m+1}} is greater than l m {\displaystyle l_{m}} . As Q {\displaystyle Q} is sequentially compact, there is a convergent subsequence ( x m j ) j ∈ N {\displaystyle (x_{m_{j}})_{j\in \mathbb {N} }} of ( x m ) m ∈ N {\displaystyle (x_{m})_{m\in \mathbb {N} }} . Let us call the limit of that subsequence sequence x {\displaystyle x} . As Q {\displaystyle {\mathcal {Q}}} is equicontinuous, we can choose δ ∈ R > 0 {\displaystyle \delta \in \mathbb {R} _{>0}} such that ‖ x − y ‖ < δ ⇒ ∀ f ∈ Q : | f ( x ) − f ( y ) | < ϵ 4 {\displaystyle \|x-y\|<\delta \Rightarrow \forall f\in {\mathcal {Q}}:|f(x)-f(y)|<{\frac {\epsilon }{4}}} .Further, since x m j → x {\displaystyle x_{m_{j}}\to x} (if j → ∞ {\displaystyle j\to \infty } of course), we may choose J ∈ N {\displaystyle J\in \mathbb {N} } such that ∀ j ≥ J : ‖ x m j − x ‖ < δ {\displaystyle \forall j\geq J:\|x_{m_{j}}-x\|<\delta } .But then follows for j ≥ J {\displaystyle j\geq J} and the reverse triangle inequality: | f l m j ( x ) − f ( x ) | ≥ | | f l m j ( x ) − f ( x m j ) | − | f ( x m j ) − f ( x ) | | {\displaystyle |f_{l_{m_{j}}}(x)-f(x)|\geq \left||f_{l_{m_{j}}}(x)-f(x_{m_{j}})|-|f(x_{m_{j}})-f(x)|\right|} Since we had | f ( x m j ) − f ( x ) | < ϵ 4 {\displaystyle |f(x_{m_{j}})-f(x)|<{\frac {\epsilon }{4}}} , the reverse triangle inequality and the definition of t | f l m j ( x ) − f ( x m j ) | ≥ | | f l m j ( x m j ) − f ( x m j ) | − | f l m j ( x ) − f l m j ( x m j ) | | ≥ ϵ − ϵ 4 {\displaystyle |f_{l_{m_{j}}}(x)-f(x_{m_{j}})|\geq \left||f_{l_{m_{j}}}(x_{m_{j}})-f(x_{m_{j}})|-|f_{l_{m_{j}}}(x)-f_{l_{m_{j}}}(x_{m_{j}})|\right|\geq \epsilon -{\frac {\epsilon }{4}}} , we obtain: | f l m j ( x ) − f ( x ) | ≥ | | f l m j ( x ) − f ( x m j ) | − | f ( x m j ) − f ( x ) | | = | f l m j ( x ) − f ( x m j ) | − | f ( x m j ) − f ( x ) | ≥ ϵ − ϵ 4 − ϵ 4 ≥ ϵ 2 {\displaystyle {\begin{aligned}|f_{l_{m_{j}}}(x)-f(x)|&\geq \left||f_{l_{m_{j}}}(x)-f(x_{m_{j}})|-|f(x_{m_{j}})-f(x)|\right|\\&=|f_{l_{m_{j}}}(x)-f(x_{m_{j}})|-|f(x_{m_{j}})-f(x)|\\&\geq \epsilon -{\frac {\epsilon }{4}}-{\frac {\epsilon }{4}}\\&\geq {\frac {\epsilon }{2}}\end{aligned}}} Thus we have a contradiction to f l ( x ) → f ( x ) {\displaystyle f_{l}(x)\to f(x)} . ◻ {\displaystyle \Box } Proof: We have to prove equicontinuity, so we have to prove ∀ x ∈ X : ∃ δ ∈ R > 0 : ∀ y ∈ X : ‖ x − y ‖ < δ ⇒ ∀ f ∈ Q : | f ( x ) − f ( y ) | < ϵ {\displaystyle \forall x\in X:\exists \delta \in \mathbb {R} _{>0}:\forall y\in X:\|x-y\|<\delta \Rightarrow \forall f\in {\mathcal {Q}}:|f(x)-f(y)|<\epsilon } .Let x ∈ X {\displaystyle x\in X} be arbitrary. We choose δ := ϵ b {\displaystyle \delta :={\frac {\epsilon }{b}}} . Let y ∈ X {\displaystyle y\in X} such that ‖ x − y ‖ < δ {\displaystyle \|x-y\|<\delta } , and let f ∈ Q {\displaystyle f\in {\mathcal {Q}}} be arbitrary. By the mean-value theorem in multiple dimensions, we obtain that there exists a λ ∈ [ 0 , 1 ] {\displaystyle \lambda \in [0,1]} such that: f ( x ) − f ( y ) = ∇ f ( λ x + ( 1 − λ ) y ) ⋅ ( x − y ) {\displaystyle f(x)-f(y)=\nabla f(\lambda x+(1-\lambda )y)\cdot (x-y)} The element λ x + ( 1 − λ ) y {\displaystyle \lambda x+(1-\lambda )y} is inside X {\displaystyle X} , because X {\displaystyle X} is convex. From the Cauchy-Schwarz inequality then follows: | f ( x ) − f ( y ) | = | ∇ f ( λ x + ( 1 − λ ) y ) ⋅ ( x − y ) | ≤ ‖ ∇ f ( λ x + ( 1 − λ ) y ) ‖ ‖ x − y ‖ < b δ = b b ϵ = ϵ {\displaystyle |f(x)-f(y)|=|\nabla f(\lambda x+(1-\lambda )y)\cdot (x-y)|\leq \|\nabla f(\lambda x+(1-\lambda )y)\|\|x-y\| 0 {\displaystyle \alpha _{k}>0} (we may do this because | α | = k + 1 > 0 {\displaystyle |\alpha |=k+1>0} ). We define again e k = ( 0 , … , 0 , 1 , 0 , … , 0 ) {\displaystyle e_{k}=(0,\ldots ,0,1,0,\ldots ,0)} , where the 1 {\displaystyle 1} is at the k {\displaystyle k} -th place. Due to Schwarz' theorem and the ordinary product rule, we have ∂ α f g = ∂ α − e k ( ∂ x k f g ) = ∂ α − e k ( ∂ x k f g + f ∂ x k g ) {\displaystyle \partial _{\alpha }fg=\partial _{\alpha -e_{k}}\left(\partial _{x_{k}}fg\right)=\partial _{\alpha -e_{k}}\left(\partial _{x_{k}}fg+f\partial _{x_{k}}g\right)} .By linearity of derivatives and induction hypothesis, we have ∂ α − e k ( ∂ x k f g + f ∂ x k g ) = ∂ α − e k ( ∂ x k f g ) + ∂ α − e k ( f ∂ x k g ) = ∑ ς ≤ α − e k ( α − e k ς ) ∂ ς ∂ x k f ∂ α − e k − ς g + ∑ ς ≤ α − e k ( α − e k ς ) ∂ ς f ∂ α − e k − ς ∂ x k g {\displaystyle {\begin{aligned}\partial _{\alpha -e_{k}}\left(\partial _{x_{k}}fg+f\partial _{x_{k}}g\right)&=\partial _{\alpha -e_{k}}\left(\partial _{x_{k}}fg\right)+\partial _{\alpha -e_{k}}\left(f\partial _{x_{k}}g\right)\\&=\sum _{\varsigma \leq \alpha -e_{k}}{\binom {\alpha -e_{k}}{\varsigma }}\partial _{\varsigma }\partial _{x_{k}}f\partial _{\alpha -e_{k}-\varsigma }g+\sum _{\varsigma \leq \alpha -e_{k}}{\binom {\alpha -e_{k}}{\varsigma }}\partial _{\varsigma }f\partial _{\alpha -e_{k}-\varsigma }\partial _{x_{k}}g\end{aligned}}} .Since ∂ α − e k − ς = ∂ α − ( ς + e k ) {\displaystyle \partial _{\alpha -e_{k}-\varsigma }=\partial _{\alpha -(\varsigma +e_{k})}} and { ς ∈ N 0 d | 0 ≤ ς ≤ α − e k } = { ς − e k ∈ N 0 d | e k ≤ ς ≤ α } {\displaystyle \{\varsigma \in \mathbb {N} _{0}^{d}|0\leq \varsigma \leq \alpha -e_{k}\}=\{\varsigma -e_{k}\in \mathbb {N} _{0}^{d}|e_{k}\leq \varsigma \leq \alpha \}} ,we are allowed to shift indices in the first of the two above sums, and furthermore we have by definition ∂ ς ∂ x k = ∂ ς + e k {\displaystyle \partial _{\varsigma }\partial _{x_{k}}=\partial _{\varsigma +e_{k}}} .With this, we obtain ∑ ς ≤ α − e k ( α − e k ς ) ∂ ς ∂ x k f ∂ α − e k − ς g + ∑ ς ≤ α − e k ( α − e k ς ) ∂ ς f ∂ α − e k − ς ∂ x k g = ∑ e k ≤ ς ≤ α ( α − e k ς − e k ) ∂ ς f ⋅ ∂ α − ς g + ∑ ς ≤ α − e k ( α − e k ς ) ∂ ς f ∂ α − ς g {\displaystyle \sum _{\varsigma \leq \alpha -e_{k}}{\binom {\alpha -e_{k}}{\varsigma }}\partial _{\varsigma }\partial _{x_{k}}f\partial _{\alpha -e_{k}-\varsigma }g+\sum _{\varsigma \leq \alpha -e_{k}}{\binom {\alpha -e_{k}}{\varsigma }}\partial _{\varsigma }f\partial _{\alpha -e_{k}-\varsigma }\partial _{x_{k}}g=\sum _{e_{k}\leq \varsigma \leq \alpha }{\binom {\alpha -e_{k}}{\varsigma -e_{k}}}\partial _{\varsigma }f\cdot \partial _{\alpha -\varsigma }g+\sum _{\varsigma \leq \alpha -e_{k}}{\binom {\alpha -e_{k}}{\varsigma }}\partial _{\varsigma }f\partial _{\alpha -\varsigma }g} Due to lemma 4.18, ( α − e k β − e i ) + ( α − e k β ) = ( α β ) {\displaystyle {\binom {\alpha -e_{k}}{\beta -e_{i}}}+{\binom {\alpha -e_{k}}{\beta }}={\binom {\alpha }{\beta }}} .Further, we have ( α − e i 0 ) = ( α 0 ) = 1 {\displaystyle {\binom {\alpha -e_{i}}{0}}={\binom {\alpha }{0}}=1} where 0 = ( 0 , … , 0 ) {\displaystyle 0=(0,\ldots ,0)} in N 0 d {\displaystyle \mathbb {N} _{0}^{d}} ,and ( α − e k α − e k ) = ( α α ) = 1 {\displaystyle {\binom {\alpha -e_{k}}{\alpha -e_{k}}}={\binom {\alpha }{\alpha }}=1} (these two rules may be checked from the definition of ( α β ) {\displaystyle {\binom {\alpha }{\beta }}} ). It follows ∂ α ( f g ) = ∑ e k ≤ ς ≤ α ( α − e k ς − e k ) ∂ ς f ⋅ ∂ α − ς g + ∑ ς ≤ α − e k ( α − e k ς ) ∂ ς f ∂ α − ς g = ( α − e k 0 ) f ∂ α g + ∑ e k ≤ ς ≤ α − e k [ ( α − e k ς − e k ) + ( α − e k ς ) ] ∂ ς f ∂ α − ς g + ( α − e k α − e k ) f ∂ α g = ∑ ς ≤ α ( α ς ) ∂ ς f ∂ α − ς {\displaystyle {\begin{aligned}\partial _{\alpha }(fg)&=\sum _{e_{k}\leq \varsigma \leq \alpha }{\binom {\alpha -e_{k}}{\varsigma -e_{k}}}\partial _{\varsigma }f\cdot \partial _{\alpha -\varsigma }g+\sum _{\varsigma \leq \alpha -e_{k}}{\binom {\alpha -e_{k}}{\varsigma }}\partial _{\varsigma }f\partial _{\alpha -\varsigma }g\\&={\binom {\alpha -e_{k}}{0}}f\partial _{\alpha }g+\sum _{e_{k}\leq \varsigma \leq \alpha -e_{k}}\left[{\binom {\alpha -e_{k}}{\varsigma -e_{k}}}+{\binom {\alpha -e_{k}}{\varsigma }}\right]\partial _{\varsigma }f\partial _{\alpha -\varsigma }g+{\binom {\alpha -e_{k}}{\alpha -e_{k}}}f\partial _{\alpha }g\\&=\sum _{\varsigma \leq \alpha }{\binom {\alpha }{\varsigma }}\partial _{\varsigma }f\partial _{\alpha -\varsigma }\end{aligned}}} . ◻ {\displaystyle \Box } == Operations on Distributions == For φ , ϑ ∈ D ( R d ) {\displaystyle \varphi ,\vartheta \in {\mathcal {D}}(\mathbb {R} ^{d})} there are operations such as the differentiation of φ {\displaystyle \varphi } , the convolution of φ {\displaystyle \varphi } and ϑ {\displaystyle \vartheta } and the multiplication of φ {\displaystyle \varphi } and ϑ {\displaystyle \vartheta } . In the following section, we want to define these three operations (differentiation, convolution with ϑ {\displaystyle \vartheta } and multiplication with ϑ {\displaystyle \vartheta } ) for a distribution T {\displaystyle {\mathcal {T}}} instead of φ {\displaystyle \varphi } . Proof: We have to prove two claims: First, that the function φ ↦ T ( L ( φ ) ) {\displaystyle \varphi \mapsto {\mathcal {T}}({\mathcal {L}}(\varphi ))} is a distribution, and second that Λ {\displaystyle \Lambda } as defined above has the property ∀ φ ∈ D ( O ) : Λ ( T φ ) = T L ( φ ) {\displaystyle \forall \varphi \in {\mathcal {D}}(O):\Lambda ({\mathcal {T}}_{\varphi })={\mathcal {T}}_{L(\varphi )}} 1. We show that the function φ ↦ T ( L ( φ ) ) {\displaystyle \varphi \mapsto {\mathcal {T}}({\mathcal {L}}(\varphi ))} is a distribution. T ( L ( φ ) ) {\displaystyle {\mathcal {T}}({\mathcal {L}}(\varphi ))} has a well-defined value in R {\displaystyle \mathbb {R} } as L {\displaystyle {\mathcal {L}}} maps to D ( O ) {\displaystyle {\mathcal {D}}(O)} , which is exactly the preimage of T {\displaystyle {\mathcal {T}}} . The function φ ↦ T ( L ( φ ) ) {\displaystyle \varphi \mapsto {\mathcal {T}}({\mathcal {L}}(\varphi ))} is continuous since it is the composition of two continuous functions, and it is linear for the same reason (see exercise 2). 2. We show that Λ {\displaystyle \Lambda } has the property ∀ φ ∈ D ( O ) : Λ ( T φ ) = T L ( φ ) {\displaystyle \forall \varphi \in {\mathcal {D}}(O):\Lambda ({\mathcal {T}}_{\varphi })={\mathcal {T}}_{L(\varphi )}} For every ϑ ∈ D ( U ) {\displaystyle \vartheta \in {\mathcal {D}}(U)} , we have Λ ( T φ ) ( ϑ ) := ( T φ ∘ L ) ( ϑ ) := ∫ O φ ( x ) L ( ϑ ) ( x ) d x = by assumption ∫ U L ( φ ) ( x ) ϑ ( x ) d x =: T L ( φ ) ( ϑ ) {\displaystyle \Lambda ({\mathcal {T}}_{\varphi })(\vartheta ):=({\mathcal {T}}_{\varphi }\circ {\mathcal {L}})(\vartheta ):=\int _{O}\varphi (x){\mathcal {L}}(\vartheta )(x)dx{\overset {\text{by assumption}}{=}}\int _{U}L(\varphi )(x)\vartheta (x)dx=:{\mathcal {T}}_{L(\varphi )}(\vartheta )} Since equality of two functions is equivalent to equality of these two functions evaluated at every point, this shows the desired property. ◻ {\displaystyle \Box } We also have a similar lemma for Schwartz distributions: The proof is exactly word-for-word the same as the one for lemma 4.20. Noting that multiplication, differentiation and convolution are linear, we will define these operations for distributions by taking L {\displaystyle L} in the two above lemmas as the respective of these three operations. Proof: The product of two C ∞ {\displaystyle {\mathcal {C}}^{\infty }} functions is again C ∞ {\displaystyle {\mathcal {C}}^{\infty }} , and further, if φ ( x ) = 0 {\displaystyle \varphi (x)=0} , then also ( f φ ) ( x ) = f ( x ) φ ( x ) = 0 {\displaystyle (f\varphi )(x)=f(x)\varphi (x)=0} . Hence, f φ ∈ D ( O ) {\displaystyle f\varphi \in {\mathcal {D}}(O)} . Also, if φ l → φ {\displaystyle \varphi _{l}\to \varphi } in the sense of bump functions, then, if K ⊂ R d {\displaystyle K\subset \mathbb {R} ^{d}} is a compact set such that supp φ n ⊆ K {\displaystyle {\text{supp }}\varphi _{n}\subseteq K} for all n ∈ N {\displaystyle n\in \mathbb {N} } , ‖ ∂ α ( f ( φ l − φ ) ) ‖ ∞ = ‖ ∑ ς ≤ α ( α ς ) ∂ ς f ∂ α − ς ( φ l − φ ) ‖ ∞ ≤ ∑ ς ≤ α ‖ ∂ ς f ∂ α − ς ( φ l − φ ) ‖ ∞ ≤ ∑ ς ≤ α max x ∈ K | ∂ ς f | ‖ ∂ α − ς ( φ l − φ ) ‖ ∞ → 0 , l → ∞ {\displaystyle {\begin{aligned}\|\partial _{\alpha }(f(\varphi _{l}-\varphi ))\|_{\infty }&=\left\|\sum _{\varsigma \leq \alpha }{\binom {\alpha }{\varsigma }}\partial _{\varsigma }f\partial _{\alpha -\varsigma }(\varphi _{l}-\varphi )\right\|_{\infty }\\&\leq \sum _{\varsigma \leq \alpha }\|\partial _{\varsigma }f\partial _{\alpha -\varsigma }(\varphi _{l}-\varphi )\|_{\infty }\\&\leq \sum _{\varsigma \leq \alpha }\max _{x\in K}|\partial _{\varsigma }f|\|\partial _{\alpha -\varsigma }(\varphi _{l}-\varphi )\|_{\infty }\to 0,l\to \infty \end{aligned}}} .Hence, f φ l → f φ {\displaystyle f\varphi _{l}\to f\varphi } in the sense of bump functions. Further, also f ϕ ∈ C ∞ ( R d ) {\displaystyle f\phi \in {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})} . Let α , β ∈ N 0 d {\displaystyle \alpha ,\beta \in \mathbb {N} _{0}^{d}} be arbitrary. Then ∂ β f ϕ = ∑ ς ≤ β ( β ς ) ∂ ς f ∂ β − ς ϕ {\displaystyle \partial _{\beta }f\phi =\sum _{\varsigma \leq \beta }{\binom {\beta }{\varsigma }}\partial _{\varsigma }f\partial _{\beta -\varsigma }\phi } .Since all the derivatives of f {\displaystyle f} are bounded by polynomials, by the definition of that we obtain ∀ x ∈ R d : | ∂ ς f ( x ) | ≤ | p ς ( x ) | {\displaystyle \forall x\in \mathbb {R} ^{d}:|\partial _{\varsigma }f(x)|\leq |p_{\varsigma }(x)|} , where p ς , ς ∈ N 0 d {\displaystyle p_{\varsigma },\varsigma \in \mathbb {N} _{0}^{d}} are polynomials. Hence, ‖ x α ∂ β f ϕ ‖ ∞ ≤ ∑ ς ≤ β ‖ x α p ς ∂ β − ς ϕ ‖ ∞ < ∞ {\displaystyle \|x^{\alpha }\partial _{\beta }f\phi \|_{\infty }\leq \sum _{\varsigma \leq \beta }\|x^{\alpha }p_{\varsigma }\partial _{\beta -\varsigma }\phi \|_{\infty }<\infty } .Similarly, if ϕ l → ϕ {\displaystyle \phi _{l}\to \phi } in the sense of Schwartz functions, then by exercise 3.6 ‖ x α ∂ β f ( ϕ − ϕ l ) ‖ ∞ ≤ ∑ ς ≤ β ‖ x α p ς ∂ β − ς ( ϕ − ϕ l ) ‖ ∞ → 0 , l → ∞ {\displaystyle \|x^{\alpha }\partial _{\beta }f(\phi -\phi _{l})\|_{\infty }\leq \sum _{\varsigma \leq \beta }\|x^{\alpha }p_{\varsigma }\partial _{\beta -\varsigma }(\phi -\phi _{l})\|_{\infty }\to 0,l\to \infty } and hence f ϕ l → f ϕ {\displaystyle f\phi _{l}\to f\phi } in the sense of Schwartz functions. If we define L ( φ ) := L ( φ ) := f φ {\displaystyle L(\varphi ):={\mathcal {L}}(\varphi ):=f\varphi } , from lemmas 4.20 and 4.21 follow the other claims. ◻ {\displaystyle \Box } Proof: We want to apply lemmas 4.20 and 4.21. Hence, we prove that the requirements of these lemmas are met. Since the derivatives of bump functions are again bump functions, the derivatives of Schwartz functions are again Schwartz functions (see exercise 3.3 for both), and because of theorem 4.22, we have that L {\displaystyle L} and L {\displaystyle {\mathcal {L}}} map D ( O ) {\displaystyle {\mathcal {D}}(O)} to D ( O ) {\displaystyle {\mathcal {D}}(O)} , and if further all a α {\displaystyle a_{\alpha }} and all their derivatives are bounded by polynomials, then L {\displaystyle L} and L {\displaystyle {\mathcal {L}}} map S ( R d ) {\displaystyle {\mathcal {S}}(\mathbb {R} ^{d})} to S ( R d ) {\displaystyle {\mathcal {S}}(\mathbb {R} ^{d})} . The sequential continuity of L {\displaystyle {\mathcal {L}}} follows from theorem 4.22. Further, for all ϕ , θ ∈ S ( R d ) {\displaystyle \phi ,\theta \in {\mathcal {S}}(\mathbb {R} ^{d})} , ∫ R d ϕ ( x ) L ( θ ) ( x ) d x = ∑ α ∈ N 0 d ( − 1 ) | α | ∫ R d ϕ ( x ) ∂ α ( a α θ ) ( x ) d x {\displaystyle \int _{\mathbb {R} ^{d}}\phi (x){\mathcal {L}}(\theta )(x)dx=\sum _{\alpha \in \mathbb {N} _{0}^{d}}(-1)^{|\alpha |}\int _{\mathbb {R} ^{d}}\phi (x)\partial _{\alpha }(a_{\alpha }\theta )(x)dx} .Further, if we single out an α ∈ N 0 d {\displaystyle \alpha \in \mathbb {N} _{0}^{d}} , by Fubini's theorem and integration by parts we obtain ∫ R d ϕ ( x ) ∂ α ( a α θ ) ( x ) d x = ∫ R d − 1 ∫ R ϕ ( x ) ∂ α ( a α θ ) ( x ) d x 1 d ( x 2 , … , x d ) = ∫ R d − 1 ∫ R ϕ ( x ) ∂ α ( a α θ ) ( x ) d x 1 d ( x 2 , … , x d ) = ∫ R d − 1 ( − 1 ) α 1 ∫ R ∂ ( α 1 , 0 , … , 0 ) ϕ ( x ) ∂ α − ( α 1 , 0 , … , 0 ) ( a α θ ) ( x ) d x 1 d ( x 2 , … , x d ) = ⋯ = ( − 1 ) | α | ∫ R d ∂ α ϕ ( x ) a α ( x ) θ ( x ) d x {\displaystyle {\begin{aligned}\int _{\mathbb {R} ^{d}}\phi (x)\partial _{\alpha }(a_{\alpha }\theta )(x)dx&=\int _{\mathbb {R} ^{d-1}}\int _{\mathbb {R} }\phi (x)\partial _{\alpha }(a_{\alpha }\theta )(x)dx_{1}d(x_{2},\ldots ,x_{d})\\&=\int _{\mathbb {R} ^{d-1}}\int _{\mathbb {R} }\phi (x)\partial _{\alpha }(a_{\alpha }\theta )(x)dx_{1}d(x_{2},\ldots ,x_{d})\\&=\int _{\mathbb {R} ^{d-1}}(-1)^{\alpha _{1}}\int _{\mathbb {R} }\partial _{(\alpha _{1},0,\ldots ,0)}\phi (x)\partial _{\alpha -(\alpha _{1},0,\ldots ,0)}(a_{\alpha }\theta )(x)dx_{1}d(x_{2},\ldots ,x_{d})\\&=\cdots =(-1)^{|\alpha |}\int _{\mathbb {R} ^{d}}\partial _{\alpha }\phi (x)a_{\alpha }(x)\theta (x)dx\end{aligned}}} .Hence, ∫ R d ϕ ( x ) L ( θ ) ( x ) d x = ∫ R d L ( ϕ ) ( x ) θ ( x ) d x {\displaystyle \int _{\mathbb {R} ^{d}}\phi (x){\mathcal {L}}(\theta )(x)dx=\int _{\mathbb {R} ^{d}}L(\phi )(x)\theta (x)dx} and the lemmas are applicable. ◻ {\displaystyle \Box } Proof: 1. Let x ∈ R d {\displaystyle x\in \mathbb {R} ^{d}} be arbitrary, and let ( x l ) l ∈ N {\displaystyle (x_{l})_{l\in \mathbb {N} }} be a sequence converging to x {\displaystyle x} and let N ∈ N {\displaystyle N\in \mathbb {N} } such that ∀ n ≥ N : ‖ x n − x ‖ ≤ 1 {\displaystyle \forall n\geq N:\|x_{n}-x\|\leq 1} . Then K := ⋃ n ≥ N supp φ ( x n − ⋅ ) ∪ ⋃ n < N supp φ ( x n − ⋅ ) ¯ {\displaystyle K:={\overline {\bigcup _{n\geq N}{\text{supp }}\varphi (x_{n}-\cdot )\cup \bigcup _{n 0 {\displaystyle \beta _{k}>0} (this is possible since otherwise β = 0 {\displaystyle \beta =\mathbf {0} } ). Further, we define e k := ( 0 , … , 0 , 1 ⏞ k th place , 0 , … , 0 ) {\displaystyle e_{k}:=(0,\ldots ,0,\overbrace {1} ^{k{\text{th place}}},0,\ldots ,0)} .Then | β − e k | = n {\displaystyle |\beta -e_{k}|=n} , and hence ∂ β − e k ( T ∗ φ ) = T ∗ ( ∂ β − e k φ ) {\displaystyle \partial _{\beta -e_{k}}({\mathcal {T}}*\varphi )={\mathcal {T}}*(\partial _{\beta -e_{k}}\varphi )} . Furthermore, for all ϑ ∈ D ( R d ) {\displaystyle \vartheta \in {\mathcal {D}}(\mathbb {R} ^{d})} , lim λ → 0 T ∗ ϑ ( x + λ e k ) − T ∗ ϑ ( x ) λ = lim λ → 0 T ( ϑ ( x + λ e k − ⋅ ) − ϑ ( x − ⋅ ) λ ) {\displaystyle \lim _{\lambda \to 0}{\frac {{\mathcal {T}}*\vartheta (x+\lambda e_{k})-{\mathcal {T}}*\vartheta (x)}{\lambda }}=\lim _{\lambda \to 0}{\mathcal {T}}\left({\frac {\vartheta (x+\lambda e_{k}-\cdot )-\vartheta (x-\cdot )}{\lambda }}\right)} .But due to Schwarz' theorem, ϑ ( x + λ e k − ⋅ ) − ϑ ( x − ⋅ ) λ → ∂ x k ϑ , λ → 0 {\displaystyle {\frac {\vartheta (x+\lambda e_{k}-\cdot )-\vartheta (x-\cdot )}{\lambda }}\to \partial _{x_{k}}\vartheta ,\lambda \to 0} in the sense of bump functions, and thus lim λ → 0 T ( ϑ ( x + λ e k − ⋅ ) − ϑ ( x − ⋅ ) λ ) = T ( ϑ ( x − ⋅ ) ) {\displaystyle \lim _{\lambda \to 0}{\mathcal {T}}\left({\frac {\vartheta (x+\lambda e_{k}-\cdot )-\vartheta (x-\cdot )}{\lambda }}\right)={\mathcal {T}}(\vartheta (x-\cdot ))} .Hence, ∂ β ( T ∗ φ ) = ∂ e k T ∗ ( ∂ β − e k φ ) = T ∗ ( ∂ β φ ) {\displaystyle \partial _{\beta }({\mathcal {T}}*\varphi )=\partial _{e_{k}}{\mathcal {T}}*(\partial _{\beta -e_{k}}\varphi )={\mathcal {T}}*(\partial _{\beta }\varphi )} , since ∂ β − e k φ {\displaystyle \partial _{\beta -e_{k}}\varphi } is a bump function (see exercise 3.3). 3. This follows from 1. and 2., since ∂ β φ {\displaystyle \partial _{\beta }\varphi } is a bump function for all β ∈ N 0 d {\displaystyle \beta \in \mathbb {N} _{0}^{d}} (see exercise 3.3). ◻ {\displaystyle \Box } == Exercises == Let T 1 , … , T n {\displaystyle {\mathcal {T}}_{1},\ldots ,{\mathcal {T}}_{n}} be (tempered) distributions and let c 1 , … , c n ∈ R {\displaystyle c_{1},\ldots ,c_{n}\in \mathbb {R} } . Prove that also ∑ j = 1 n c j T j {\displaystyle \sum _{j=1}^{n}c_{j}{\mathcal {T}}_{j}} is a (tempered) distribution. Let f : R d → R {\displaystyle f:\mathbb {R} ^{d}\to \mathbb {R} } be essentially bounded. Prove that T f {\displaystyle {\mathcal {T}}_{f}} is a tempered distribution. Prove that if Q {\displaystyle {\mathcal {Q}}} is a set of differentiable functions which go from [ 0 , 1 ] d {\displaystyle [0,1]^{d}} to R {\displaystyle \mathbb {R} } , such that there exists a c ∈ R > 0 {\displaystyle c\in \mathbb {R} _{>0}} such that for all g ∈ Q {\displaystyle g\in {\mathcal {Q}}} it holds ∀ x ∈ R d : ‖ ∇ g ( x ) ‖ < c {\displaystyle \forall x\in \mathbb {R} ^{d}:\|\nabla g(x)\|