A regularity class for the roots of non-negative functions

We investigate the regularity of the positive roots of a non-negative function of one-variable. A modified H\"older space $\mathcal{F}^\beta$ is introduced such that if $f\in \mathcal{F}^\beta$ then $f^\alpha \in C^{\alpha \beta}$. This provides sufficient conditions to overcome the usual limitation in the square root case ($\alpha = 1/2$) for H\"older functions that $f^{1/2}$ need be no more than $C^1$ in general. We also derive bounds on the wavelet coefficients of $f^\alpha$, which provide a finer understanding of its local regularity.


Introduction
We study the smoothness properties of roots of non-negative Hölder continuous functions on [0, 1]. More precisely, we derive sufficient flatness-type conditions such that if f is in a Hölder space with index β, then f α has Hölder regularity αβ for 0 < α ≤ 1.
This question was first studied in the square-root case (α = 1/2) by Glaeser [11] and Dieudonné [7], who showed that any non-negative, twice continuously differentiable function on R admits a continuously differentiable square root. This result is sharp in general in the sense that there exist infinitely differentiable functions such that no admissible square root has Hölder index larger than one (Theorem 2.1 in [1]). Flatness conditions thus become necessary for β > 2 in order to ensure that a β-smooth function has an admissible square root with Hölder index greater than one. Some extensions of the result of Glaeser can be found in Lengyel [13] and Mandai [15]. Macchia [14], Reichard [23], Rainer [21] and Colombini et al. [6] studied what regularity f 1/r has in a neighbourhood of a zero of f , provided that enough derivatives of f vanish at that point.
The key property for additional regularity of roots is flatness, in the sense that whenever f is small then so are its derivatives, and in a series of recent papers, Bony et al. [1,2,3] consider flatness conditions for admissible square roots. Recall that g is an admissible square root of f if f = g 2 (allowing g to switch signs). The additional freedom to pick an admissible square root does not always help however. Suppose f possesses a converging sequence of local minima (x n ) n such that f (x n ) is strictly positive for all n and f (x n ) → 0 as n → ∞. The non-negative square root is then the most regular admissible square root and there exist infinitely differentiable functions of this form with no admissible square root having Hölder index larger than one [1]. In this article we restrict to studying roots of non-negative functions that do not change sign, which we without loss of generality take to be positive.
Theorem 1. For α ∈ (0, 1], β > 0 and all non-negative f ∈ F β ([0, 1]), In particular we see that f ∈ F β is a sufficient condition for √ f ∈ C β/2 , thereby overcoming the usual limitation that √ f need be no more than C 1 in general. For β ≤ 1, bounds on the Hölder or Sobolev norms of f 1/r can be found in Glaeser [11] (r = 2), Colombini and Lerner [5] and Ghisi and Gobbino [9]. For bounds for the roots of more general polynomials see the recent work of Parusiński and Rainer [17,18,19].
The condition f ∈ F β requires a certain notion of "flatness" on the derivatives of f that is different from that considered in [1,2,3]. In particular, it contains functions not covered by their results for which Theorem 1 nonetheless applies (see (2.2) for a simple example). In Section 2 we study the Hölder cone of non-negative functions satisfying such a flatness constraint and compare it to existing notions of flatness in the literature. We show that the present flatness constraints are only non-trivial for Hölder indices larger than two (Theorem 4).
We also derive bounds on the wavelet coefficients of f α that provide additional insight into the local regularity of f α (Proposition 1). Note that if f ∈ C β is uniformly bounded away from zero then f α ∈ C β (see Lemma 2). This is substantially better than the regularity provided by Theorem 1, which is driven by small function values. The wavelet bounds reflect the local function size and allow one to account for both of these regimes, thereby giving a finer account of the regularity of f α .
An application of this approach is in nonparametric statistics, where one aims to reconstruct a Hölder function f when observing a noisy version of √ f (cf. [16]). In the context of this problem, one must necessarily restrict to positive square roots and it is not enough to find flatness conditions ensuring that √ f has some regularity: some control of the Hölder norm is also required [22].
For f ∈ C β , define |f | F β to be the infimum over all non-negative real numbers κ satisfying with |f | F β = 0 for β ≤ 1. This can be written concisely as The quantity |f | F β measures the flatness of a function near zero. In particular, if f vanishes at some point, then so do all its derivatives. Define This space contains for example the constant functions, any C β -function bounded away from 0 and those of the form f (x) = (x − x 0 ) β g(x) for g ≥ ε > 0 infinitely differentiable.
Let us compare condition (2.1) to the flatness conditions considered in the literature on admissible square roots. For a given function f, suppose that there exists a continuous function γ, vanishing on the set of flat points of f , such that for any positive minimum x 0 of f , f ′′ (x 0 ) ≤ γ(x 0 )f (x 0 ) 1/2 . Theorem 3.5 of Bony et al. [1] establishes that this is a necessary and sufficient condition for a four times continuously differentiable function to have a twice continuously differentiable, admissible square root. This is comparable to the flatness seminorm (2.1), which for β = 4 also gives that f ′′ should be bounded by f 1/2 . Nevertheless, in order to obtain bounds on the norm of √ f , constraints on the third derivative of f must also be imposed in our framework. One can exploit extra regularity by assuming that f and its derivatives vanish at all local minima (e.g. in [1,2,3] for admissible square roots). However, we take a different approach permitting functions to take small non-zero values. In fact, as mentioned in [1], the obstacle preventing the existence of a twice continuously differentiable square root for a general nonnegative four times continuously differentiable function is a converging sequence of small nonzero minima. A sufficient condition for arbitrary even integer β ≥ 4 was found in Theorem 2.2 in [3]: a β-times continuously differentiable function f has a β/2-times continuously differentiable square root if at each local minimum of f , the function and its derivatives up to order β − 4 vanish. While this constraint need only hold for the local minima of f , forcing all local minima to be exactly zero can be overly restrictive. Indeed, a converging sequence of small non-zero minima can be allowed if it does so in a controlled way. We now give an example which is not covered by [3], but satisfies (2.1) and hence the conditions of Theorem 1.
Let K be a smooth non-negative function supported on [0, 1] and positive on (0, 1), for Using the upper bound for ε, it can be checked that f (x) ≥ 2 −1−(j+1)β > 0. Using this inequality, it holds that for k < β, We now show that f has an infinite sequence of non-zero local minima tending to zero. Since f ′ is continuous and Differentiating f and rearranging the desired inequality shows that a sufficient condition for the existence of such a point is β < ε2 −(β−1) σ −1 K ′ ∞ , which is equivalent to the lower bound for ε.
In conclusion, we have shown that f possesses (infinitely many) non-zero local minima, so that the conditions of Theorem 2.2 of [3] are not satisfied, but that f ∈ F β so that Theorem 1 nonetheless applies.

Basic properties.
In a slight abuse of notation, we say that · is a norm on a convex cone A if v ≥ 0 and v = 0 if and only if v = 0, λv = λ v for all λ > 0, and v + w ≤ v + w for all v, w ∈ A. We have thus replaced absolute homogeneity by positive homogeneity. Similarly, we say that | · | is a seminorm on A if |v| ≥ 0 for all v ∈ A and both positive homogeneity and the triangle inequality hold on A.
Proof. For the first part, it is enough to prove that |f | F β is a seminorm. Observe that We may assume that , and applying Jensen's inequality For the second statement, we establish that each term in the norm f g F β is bounded by a constant times f F β g F β . For β ∈ (0, 1] the result follows immediately, so assume that β > 1. Using that By the triangle inequality and arguing similarly to (3.5) for each term in the sum below, To see the difference between the classical Hölder space C β and F β , consider the functions It is well-known that f γ ∈ C β if and only if either γ / ∈ N and β ≤ γ or γ ∈ N and β > 0. The function f γ thus has arbitrarily large Hölder smoothness if it is a polynomial. Conversely, one can easily check that f γ ∈ F β if and only if β ≤ γ. We next show that the Hölder cones F β are nested.
Part (i) of the previous theorem shows that there are two regimes. If the flatness seminorm dominates the L ∞ -norm, then | · | F β increases in β, which matches our intuition that the seminorms should become stronger for larger β. For functions that are very flat in the sense that |f | F β ≤ f ∞ , this need not be true. Consider the function f (x) = x + q on [0, 1] with q > 0. For β > 1, |f | F β = q 1−β and thus for functions of low flatness the seminorm can also decrease in β. The full flatness norms · F β are however unaffected by this since they also involve f ∞ . Statement (ii) of the previous theorem says that the low-flatness phenomenon only occurs if the function is bounded away from zero and in this case (iii) shows that the flatness seminorm is always finite.
For β ∈ (0, 2], the additional derivative constraint is in fact always satisfied and F β contains all non-negative functions that can be extended to a β-Hölder function on R.
Up to smoothness β = 2, flatness can thus be defined directly without the need to resort to a quantity such as (2.1). This was used in the recent statistical literature, see [20]. To further illustrate the previous result, consider the linear function f 1 (x) = x. While f 1 ∈ C β ([0, 1]) for any β > 0, it has regularity one with respect to the Hölder cones. Indeed, f 1 cannot be extended to a non-negative function on R that is smoother than Lipschitz since the nonnegativity constraint induces a kink at zero.
The previous theorem cannot be extended to smoothness indices above β = 2. Indeed, the function f (x) = (x − 1/2) 2 is at most in F 2 ([0, 1]), but can be extended to a function in C β (R) for any β > 0. The reason is that for smoothness β ∈ (1, 2], any violation of (2.1) must occur at the boundary, since the smoothness and non-negativity constraints together imply (2.1) for interior points. For smoothness indices β > 2, the given example shows that there can be points in the interior of [0, 1] for which (2.1) does not hold.
If f ∈ F β then f ′ is not necessarily in F β−1 , but the following theorem shows that integration viewed as an operator from F β to F β+1 is bounded.
x 0 f (u)du for the antiderivative. Then there exists a constant C(β) such that In particular, F ∈ F β+1 .
Recall that f α ∈ C β if f ∈ C β is bounded away from zero, rather than f α ∈ C αβ in general. We obtain two bounds on the wavelet coefficients | f α , ψ j,k | that reflect these two regimes. The first holds for all (j, k) and is most useful for those ψ j,k on whose support the function f is small. The second bound depends explicitly on the local function values and becomes useful when f is large, in which case f α typically has more regularity than αβ by Lemma 2. Proposition 1. Suppose that α ∈ (0, 1], ψ is S-regular and that f ∈ F β for 0 < β < S. . For x 0 ∈ [0, 1], let j(x 0 ) be the smallest integer satisfying where a = a(β) is the constant in Lemma 1. Then for any wavelet ψ j,k with j ≥ j(x 0 ) and x 0 ∈ supp(ψ j,k ), The decay of the wavelet coefficients | f α , ψ j,k | characterizes the Besov norms f α B β pq (see Chapter 4 of [10] for a full definition) and one could use Proposition 1 to prove Theorem 1. However, while B β ∞∞ = C β for non-integer β, B β ∞∞ equals the slightly larger Hölder-Zygmund space for integer β, resulting in a slight suboptimality when αβ ∈ N. While Theorem 1 provides a more concise statement, the extra local information provided by the above wavelet bounds can be crucial to obtain sharp results, for example in certain non-linear statistical inverse problems [22]. Recall that f ∈ C γ if and only if | f, ψ j,k | 2 − j 2 (2γ+1) for all (j, k) (with a slight correction for γ ∈ N). From Proposition 1 we can therefore conclude that on low resolution levels, that is if j is small, the wavelet coefficients have the decay of a C αβ -function. The result also shows that on high frequencies the function is C β and quantifies the resolution level, which depends on the function value, at which the transition between these two cases occurs.

Proof of Theorem 1
The proof is based on two technical lemmas. The first lemma identifies a local neighbourhood of each point on which the function is relatively constant. Lemma 1. Suppose that f ∈ F β with β > 0 and let a = a(β) > 0 be any constant satisfying (e a − 1) + a β /(⌊β⌋!) ≤ 1/2. Then for Proof. Without loss of generality, we may assume that f F β = 1. By a Taylor expansion and the definition of F β , there exists ξ between x and x + h such that For f ∈ F β , the function f α satisfies a Hölder-type condition with exponent β and locally varying Hölder constant. The following is the key technical result for establishing the smoothness of f α and hence the decay of | f α , ψ j,k |. The main ingredient in the proof is a careful analysis of Faà di Bruno's formula, which generalizes the chain rule to higher derivatives [8]: where M k is the set of all k-tuples of non-negative integers satisfying k j=1 jm j = k. Note that for h(x) = x α , we have h (r) (x) = C α,r x α−r for some C α,r = 0 (except the trivial case α = 1). We can relate the derivatives appearing in (3.1) to f using the seminorm | · | F β .
Lemma 2. For α ∈ (0, 1], β > 0, there exists a constant C(α, β) such that for all f ∈ F β , 0 ≤ k < β and x, y ∈ [0, 1], Proof. Without loss of generality assume f (y) ≤ f (x) and |f | C β + |f | F β = 1. For β ∈ (0, 1], by the mean value theorem, Consider now β > 1 and write k = ⌊β⌋ (the following also holds for all k ≤ ⌊β⌋ with certain simplifications). We must consider separately the two cases where |x − y| is small and large. Let C(α, β) be a generic constant, which may change from line to line. Suppose first that |x − y| ≤ af (x) 1/β with a as in Lemma 1. By Lemma 1 we have f (y)/2 ≤ f (x) ≤ 3f (y)/2, which will be used freely without mention in the following. The proof is based on a careful analysis of Faà di Bruno's formula (3.1).
We shall establish the result by proving a Hölder bound for each of the summands in (3.1) individually. Fix a k-tuple (m 1 , ..., m k ) ∈ M k and write M := k j=1 m j . By the triangle inequality Before bounding the terms in (3.3), we require some additional estimates. Firstly, by the definition of F β , Secondly, for any function g and integer r ≥ 1, we have by the mean value theorem g(x) r − g(y) r = rg(ξ) r−1 g ′ (ξ)(x−y) for some ξ between x and y. Noting that f (ξ) ≈ f (x) ≈ f (y), that β − k ∈ (0, 1] and applying the above to g(x) = f (j * ) (x) with r = m j * and j * ∈ {1, ..., k − 1} yields (3.5) Strictly speaking, we cannot invoke (3.5) for j * = k, since f is only k-times differentiable. However, noting that m k can only take values 0 or 1, we see directly from the Hölder continuity of f (k) that the conclusion of (3.5) holds as well for j * = k since we have the bound |x−y| β−k . By the same argument as (3.5) For the second term in (3.3), we repeatedly apply the triangle inequality, each time changing the variable in a single derivative. Fix j * and define vectors (z j * j ) k j=1 , (z j * j ) k j=1 that are identically equal to x or y in all entries and differ only in the j * -coordinate, where z j * j * = x, z j * j * = y. Using (3.5) where in the last line we have used that j jm j = k. By repeatedly applying the triangle inequality and using (3.7), we can bound the second term in (3.3) by thereby completing the proof in the case |x − y| ≤ af (x) 1/β . Applying Faà di Bruno's formula (3.1) and (3.4) yields For |x − y| > af (x) 1/β we thus have as required. This completes the proof for the first statement. Note that (3.2) follows directly from (3.8), since this last expression also holds for all 0 ≤ k < β. Suppose now f ≥ ε > 0. It follows immediately from the results that have just been , Proof of Theorem 1. By rescaling, we may assume that f F β = 1. Throughout the proof, C denotes a generic constant which only depends on α and β and may change from line to line. It is enough to prove f α F αβ ≤ C. With (3.2), we find that f α ∞ + (f α ) (⌊αβ⌋) ∞ +|f α | F αβ ≤ C. It thus remains to establish |f α | C αβ ≤ C.