Towards the fast scrambling conjecture

Many proposed quantum mechanical models of black holes include highly nonlocal interactions. The time required for thermalization to occur in such models should reflect the relaxation times associated with classical black holes in general relativity. Moreover, the time required for a particularly strong form of thermalization to occur, sometimes known as scrambling, determines the time scale on which black holes should start to release information. It has been conjectured that black holes scramble in a time logarithmic in their entropy, and that no system in nature can scramble faster. In this article, we address the conjecture from two directions. First, we exhibit two examples of systems that do indeed scramble in logarithmic time: Brownian quantum circuits and the antiferromagnetic Ising model on a sparse random graph. Unfortunately, both fail to be truly ideal fast scramblers for reasons we discuss. Second, we use Lieb-Robinson techniques to prove a logarithmic lower bound on the scrambling time of systems with finite norm terms in their Hamiltonian. The bound holds in spite of any nonlocal structure in the Hamiltonian, which might permit every degree of freedom to interact directly with every other one.


Introduction
There is a growing consensus based on evidence from string theory and gauge-gravity correspondences that black holes do not destroy information when they evaporate. Roughly, the argument is that black holes can be realized in string theory in a manner that accounts for their entropy [1][2][3][4][5][6][7], and that certain string theories are equivalent to manifestly unitary systems [8][9][10]. For a recent review, see [11].
Instead of being lost, information about the microscopic state of the black hole leaks out with the hole's Hawking radiation, much as it would for any other radiating object. Early estimates for the amount of time it would take to recover a bit from a black hole, however, suggested that no information would leak out for an amount of time proportional to the black hole lifetime [12][13][14]. Since astrophysical black holes have lifetimes many orders of magnitude longer than the age of the universe, that is tantamount to the information being lost forever. More specifically, such a long delay before the escape of information provided a plausible resolution to some of the conceptual conundrums of quantum gravity, most notably the apparent inconsistency of information release with the quantum no-cloning principle [14].
More recent estimates using techniques from quantum information theory, on the other hand, suggest that information could be released from black holes much more quickly [15]. Those calculations indicate that the relevant time scale is not the amount of time it takes for the black hole to evaporate but, instead, the amount of time the dynamics takes to "scramble" the black hole's microscopic degrees of freedom in such a way that initially localized perturbations become undetectable by observables that fail to probe a significant fraction of all the degrees of freedom. While a direct calculation of this scrambling time remains out of reach, the relaxation timescales associated with classical black holes are incredibly fast. So fast, in fact, that if they also govern the scrambling time, then the black hole complementarity principle, one of the guiding principles for many researchers in quantum gravity [14,16,17] is only just saved from inconsistency -faster scrambling would lead to a paradox.
Motivated by these considerations, as well as the implications of the existence of fast scramblers for the underlying structure of the degrees of freedom of quantum gravity, Sekino and Susskind elaborated on the speculations of [15] to formulate the following three-part fast scrambling conjecture [18,19]: 1. The most rapid scramblers take a time logarithmic in the number of degrees of freedom.
2. Matrix quantum mechanics (systems whose degrees of freedom are n by n matrices) saturate the bound.
3. Black holes are the fastest scramblers in nature.
The purpose of this article is to explore the validity of the conjecture, focusing primarily on the first part. While the conjecture implicitly refers to the most rapid scramblers in nature, we allow ourselves the freedom to investigate the most rapid scramblers in quantum mechanics (and even slightly beyond) without worrying if our models are physically realizable. Thanks to earlier research in quantum computation by Dankert et al., it is already known how to define a time-dependent Hamiltonian which will scramble in logarithmic time with high probability [20]. The scrambler, however, is a very carefully engineered quantum circuit, so that it is difficult to ascribe the fast scrambling specifically to interactions between the constituents as opposed to clever tuning of their external knobs. Ideally, therefore, we would like to exhibit a fast scrambler described by a simple time-independent Hamiltonian. To that end, we present two examples: • Brownian quantum circuits. The scrambler of [20] was a highly structured quantum circuit.
Other work has studied circuits composed of random gates [21][22][23][24][25] but a rigorous proof that they scramble in logarithmic time remains to be found. Instead, we present a continuous-time analog of a quantum circuit in which the Hamiltonian is a stochastically varying two-body interaction, and prove that it scrambles in logarithmic time.
• Ising model. We consider scrambling by the antiferromagnetic Ising interaction on a general graph with an external field parallel to the spin quantization axis. Despite its triviality, this model nonetheless exhibits a form of weak scrambling in logarithmic time on some graphs.
The careful reader will have observed that neither of these examples meets all of our criteria for a convincing scrambler: the Brownian quantum circuits are time-dependent, if not structured, and the Ising model fails to scramble fully. Nonetheless, we feel that, taken together, the examples provide substantial evidence that quantum systems with simple time-independent Hamiltonians can scramble in logarithmic time.
The fast scrambling conjecture not only states that logarithmic-time scramblers exist, but also asserts that it is impossible to scramble faster. It might seem hopeless to address this question without invoking additional physical assumptions beyond just the validity of quantum mechanics. After all, scrambling is a form of information propagation, and limits on information propagation normally depend on locality. A Hamiltonian allowing all degrees of freedom to interact directly has no locality to speak of. Nonetheless, using bounds of Lieb-Robinson-type [26][27][28] to rigorously control a meanfield approximation, we are able to show the following: • Subject to some nontrivial norm assumptions on the terms in the Hamiltonian, no physical system described by a Hamiltonian with dense two-body interactions can scramble in time faster than O(log n), where n is the number of degrees of freedom. 1 "Dense" here means that the number of interacting pairs of degrees of freedom scales like O(n 2 ).
• The bound extends to certain four-body Hamiltonians similar to the BFSS matrix model [8].
• With more sparsely interacting systems, there is a lower bound of O( √ log n) on the scrambling time.
While the norm assumptions are unfortunately too stringent to allow us to apply the results rigorously to the matrix model Hamiltonian and, thereby, to black hole physics, these results are strong evidence that scrambling in less than logarithmic time is impossible. (A related obstacle is our focus on distinguishable degrees of freedom; bosonic degrees of freedom naturally lead to unbounded operators.)

Related work
Asplund, Berenstein and Trancanelli [29] have numerically investigated relaxation in matrix models. Their approach is to look at the classical dynamics of the system, with initial states selected stochastically in such a way as to enforce the uncertainty principle. They do indeed find what appears to be very rapid relaxation of the system to an attractor state, but their article only considers a fixed-sized and relatively small system, so it cannot directly address the scaling of relaxation time with system size. The relationship between this classical relaxation time and quantum mechanical scrambling is also an interesting and currently unexplored question. Barbon and Magan [30] have approached the conjecture from a different direction. They suggest that the logarithmic factor in the black hole scrambling time arises from the hyperbolic geometry of the so-called "optical metric" ds 2 /g 00 associated to a simple coordinatization of Rindler space. Specifically, they argue that the Lyapunov time for a classical billiards game on such a geometry agrees with the scrambling time.
More indirectly, while most work prior to [15] argued that black holes held information for an amount of time comparable to the black hole lifetime, if not forever, occasional hints were found that information might leak out faster [31]. Reversing the reasoning, one could interpret such arguments as evidence in favour of the fast scrambling conjecture.
The seemingly paradoxical idea that a closed quantum system undergoing unitary dynamics can exhibit equilibration or thermalization is an old one dating back, at least, to von Neumann [32]; the apparent contradiction with the fact that the global state is pure and never equilibrates is resolved by noticing that any small subregion in an interacting closed quantum system generically becomes entangled with the rest and may appear, at least locally, thermal. For large systems the recurrence time is extremely long so, for all intents and purposes, it is meaningful to say that the system has become (locally) thermalized. There is now an enormous literature on this topic (see e.g., [33] for a textbook treatment). Recently these old questions have received new impetus from quantum chaos, quantum information theory, and many-body physics, all of which have brought new tools to bear [34][35][36][37][38][39][40][41][42][43][44][45] leading to an emerging understanding of the general conditions under which a closed quantum system will exhibit (local) thermalization.

Scrambling: definition and properties
Scrambling is nothing other than a strong form of thermalization applicable to closed system evolution. A closed system never forgets its initial state, but over time it might become impossible to distinguish different initial states without measuring a large fraction of all the system degrees of freedom. The minimum time required for the information about the initial state to be lost is called the scrambling time.
In general, the scrambling time depends on the nature of the set of initial states. For example, small perturbations of an equilibrium configuration will generally get scrambled more rapidly than will a pair of metastable configurations. Likewise, it could be easier to scramble a discrete set of states than all possible superpositions of those states. In this article, we will focus on product initial states, but a slightly different formulation will likely be necessary in order to study black hole physics. In particular, energy conservation will usually prohibit the strong form of scrambling we demand here. 2 Suppose that we have a system with n distinguishable degrees of freedom and a Hamiltonian H = x,y H x,y acting on a Hilbert space H = H 1 ⊗ H 2 ⊗ · · · ⊗ H n , where the sum ranges over pairs x, y of degrees of freedom. An initial state |Ψ(0) evolves to a state |Ψ(t) = exp(−iHt)|Ψ(0) . For S ⊆ {1, 2, . . . , n} a subset of the degrees of freedom and S c the complement, let Ψ S (t) = tr S c |Ψ(t) Ψ(t)|.
Ideally, a scrambler will delocalize any information initially localized with respect to the factorization of H into subsystems. We therefore define the scrambling time t * to be smallest time t such that Ψ S (t) Φ S (t) for all S such that |S| < κn for some 0 < κ < 1/2, and for all initial states |Ψ(0) and |Φ(0) that factorize into the form |ω 1 ⊗ |ω 2 ⊗ · · · ⊗ |ω n . For concreteness, we will fix κ = 1/3, but its specific value will not affect our conclusions.
The scrambling time obviously depends on the normalization of the Hamiltonian. In Sekino and Susskind's original formulation, the fast scrambling conjecture was that t * /β ≥ C(β) log n, where β is the inverse temperature and C is an unspecified function. In much of what follows, we will work either far from equilibrium, where β is not be well-defined, or near infinite temperature, where it doesn't accurately reflect the energy per degree of freedom (which stays finite as β → 0 in the spin models we consider). This leaves a couple of alternatives for a dimensionless measure of scrambling time: • One can consider the ratio of the amount of time it takes to scramble systems of different sizes, hopefully cancelling the temperature dependence. Let t (k) * be the scrambling time for subsystems of size |S| ≤ k and set τ * = t (1) * . The revised conjecture is then that τ * ≥ O(log n).
• The Hamiltonians we consider do not have their interactions arranged in a lattice structure.
Instead, each subsystem S generally participates in a number of interactions growing with n. As a second option, one can require that the energy scales extensively with the system size n, thereby selecting a normalization for the Hamiltonian which, while coarse, is sufficient to determine the scaling of t * with n.
The final step in formalizing the notion of scrambling time is to clarify the meaning of Ψ S (t) Φ S (t). The trace distance provides a notion of statistical distinguishability that meshes well with the quantum information theoretic applications of scrambling. Specifically, one should demand that Ψ S (t) − Φ S (t) * < where X * = tr √ X † X. (See, e.g., [46] for a discussion of the statistical interpretation of the norm.)

Scrambling as entanglement generation
Scrambling information is by definition just storing that information in complicated correlations between many subsystems, which means that scrambling is intimately related to the production of entanglement. In fact, the concepts are essentially one and the same. Intuitively, the reason is that if the restriction Ψ S (t * ) of a scrambled state is not highly mixed, then there won't be enough room in the Hilbert space H at time t * to accommodate all the scrambled states, which contain a basis for H. (The relationship is simplest when H is finite dimensional, which we will assume here but not elsewhere in the article.) Formalizing that intuition is a simple exercise in quantum information theory. Recall that the von Neumann entropy of a density operator ρ restricted to subsystem A is H(A) ρ = H(ρ A ) = − tr ρ A log ρ A , and that the mutual information between subsystems A and B for ρ is defined as Fix an orthonormal product basis {|ψ x 1 |ψ x 2 · · · |ψ xn } for H = H 1 ⊗ H 2 ⊗ · · · ⊗ H n . After time t * , all of these product states will be scrambled, so consider |Ψ (x 1 ,...,xn) = exp(−iHt * )|ψ x 1 ⊗ · · · ⊗ |ψ xn . It it convenient to introduce an auxiliary Hilbert space X and consider the following density operator on the combined XH system: The system X records in an orthonormal basis which state describes H, and the overall state is an equal mixture over choice of x 1 , . . . , x n . Because subsystem S is scrambled, all of the states Ψ (x 1 ,...,xn) S = tr S c Ψ (x 1 ,...,xn) will be essentially indistinguishable, so there can't be any significant correlations between X and S. A quantitative way of expressing that fact is that the mutual information I(X : S) ρ will be small, say less than δ. (A standard continuity result implies that δ can be chosen to be 3 log dim H + f ( ), where f ( ) goes to zero with and is independent of n [47].) On the other hand, the states |Ψ (x 1 ,...,xn) form an orthonormal basis for H, so their equal mixture is just the maximally mixed state on H. The state ρ H is by construction precisely that equal mixture. It follows that ρ S is also maximally mixed and, therefore, that H(S) ρ = log dim H S .
Substituting into the inequality I(X : S) < δ then gives The quantity on the righthand side, H(XS) ρ − H(X) ρ is known as the conditional entropy H(S|X) ρ of S given X. It can be interpreted as the uncertainty remaining in S once X is known and evaluates in this case to The entropy of a mixed state on S measures how much entanglement there is between S and S c in the corresponding pure state. Good scrambling can therefore only be achieved by a time evolution that produces nearly maximal entanglement, and vice versa.

Brownian quantum circuits
A quantum circuit is an idealized model of the time evolution of a quantum computer, which is generally assumed to consist of a number of qubits. At a given discrete time step, a collection of "gates" is applied to the state, where a gate is a unitary transformation involving one or two qubits. Each qubit participates in at most one gate per time step.
As mentioned earlier, Dankert et al. found a quantum circuit that scrambles n qubits after O(log n) time steps [20]. Their circuit, however, is quite an intricate construction that doesn't plausibly model any naturally occurring interactions. Other researchers have studied random quantum circuits, establishing that they are scramblers, but the question of whether they scramble in time O(log n) remains open [21][22][23][24][25].
In this section, we study a continuous-time analog of a random quantum circuit, which provably does scramble in time O(log n). Consider n qubits interacting according to a stochastically varying Hamiltonian. Time is subdivided into steps of length = ∆t and during a given time step, the interaction between each pair of qubits is given by a random Wigner matrix. More formally, the Hamiltonian from time t r = r∆t to t r+1 = (r + 1)∆t is given by where the ∆B r,j,k,α j ,α k are independent and identically chosen real Gaussians N (0, 2 ) with zero mean and variance 2 . The operator σ α j j represents the Pauli operator σ α j acting on qubit j, with σ 0 the identity matrix.
The time evolution from t 0 to t r is given by For this process to have a well-defined and nontrivial limit as ∆t → 0, one must choose 2 ∝ (∆t) −1 [48]. That is, the strength of the interactions must increase as the size of the time steps decreases. This requirement makes it problematic to interpret t * in units of energy. Instead, we show that the ratio τ * = t The limiting dynamics of the random Hamiltonian evolution is given by U (0) = I and U (t + dt) = exp(i dG(t)) U (t) for where the dB j,k,α j ,α k (t) are independent Brownian motions with unit variance per unit time. Since we are only interested in τ * , the normalization factor is of no real consequence; it is chosen such that dG(t) 2 2 = dt. Calculating using the Ito calculus (see [49] for an accessible introduction) leads to the following stochastic differential equation for U (t): (In a slight abuse of notation, we henceforth write dB α j ,α k (t) := dB j,k,α j ,α k (t). I \{j,k} denotes the identity on all sites except for i and j.) Suppose we have some initial state |Ψ(0) . Then the state The time evolution will have scrambled subsystem S once Ψ S (t) is independent of the initial state, as measured by the trace distance as discussed in Section 2. Equivalently, Ψ S (t) should approach a fixed state independent of Ψ(0). In the case of Brownian circuits, that fixed state is close to maximally mixed provided S is not too large. Rather than calculating Ψ S (t) − I S / dim H S 1 directly, it is much easier to evaluate We therefore introduce the purity of a subsystem S: The equation of motion for the purity h S (t) is given by After some algebra, it is shown in Appendix A that (3.8) gives the following dynamics for the purity averaged over realizations of the Brownian motion, Here |A| means log dim A. If the initial configuration Ψ(0) consists of a pure product state, then h S depends only on |S| = k, so the system of ODE's collapses to a tridiagonal system and can be written in the form The rough features of the system (3.10) are sketched in Figure 1 and the system's behavior is studied in Appendix B, with the conclusion that the ratio scrambling time Pure state Figure 1. Schematic plot of the decay of the average purity h k (t) of a subsystem S of size k. When the initial state is a pure product state all purities begin equal to one. The scrambling time for a system of size k is defined as the amount of time required before purity of subsystems of size k becomes less than (1 + δ)2 −k ; a purity of exactly 2 −k corresponds to the maximally mixed state. For subsystems of size smaller than n/2, the dynamics ensures that larger systems have smaller purities, a property not necessarily true of general entangled states.

Ising interaction on random graphs
There is an inherent difficulty in searching for fast scramblers: the intuition that a given system will rapidly scramble information is usually based on a sense that the dynamics is complicated, which is almost invariably an obstacle to studying the details of the system's time evolution. Complexity is not an absolute requirement, however. In this section, we will see that one of the simplest conceivable quantum mechanical systems has lessons to teach us about scrambling time. Let G = (V, E) be an undirected graph. Assign a spin-1 2 to each vertex v ∈ V and allow spins adjacent with respect to the edge set E to interact via the antiferromagnetic Ising Hamiltonian as illustrated in Figure 2. The normalization factor |V |/|E| is chosen to ensure that the energy per spin scales extensively with the system size, n = |V |, as discussed in Section 2. Choosing |0 z and |1 z to be the +1 and −1 eigenstates of σ z , the Hamiltonian can be written more simply as The system obviously can't scramble because any product state of the form |i z 1 |i z 2 · · · |i z n is an eigenstate of H. Local information encoded in that basis remains locally accessible for all times. On the other hand, information in the conjugate basis of σ x eigenstates, |0 x and |1 x , potentially has more interesting behavior. Suppose then that the initial state is |Ψ(0) = |i x 1 |i x 2 · · · |i x n . The system is periodic with period 2π|E|/|V | and the state |Ψ(t) at time t is most entangled at time t ent = π|E|/|V |. The state |Ψ(t ent ) is known as a graph state in quantum computation, where it Figure 2. Antiferromagnetic Ising interaction on an undirected graph G = (V, E). There is term H u,v in the Hamiltonian for each edge u, v ∈ E of the graph. Generic sparse graphs with average vertex degree roughly log |V | will quickly scramble information stored in the simultaneous {σ plays a central role in the measurement-based quantum computing architecture [50,51]. For a subset S ⊆ V of spins, the entanglement entropy of the density operator Ψ S (t ent ) = tr S c |Ψ(t ent ) Ψ(t ent )| has a simple formula in terms of the submatrix Adj S of the adjacency matrix of G that selects the rows of S and the columns of S c [52]: where the entropy is measured in bits. It follows that if Adj S has full rank as a matrix over Z 2 , then the entanglement is |S| bits. The only density operator with |S| bits of entropy on |S| qubits, however, is the maximally mixed density operator. Therefore, if Adj S has rank |S|, the final density operator on S will be independent of the choice of initial state |Ψ(0) = |i x 1 · · · |i x n . That is, the system will have scrambled the σ x eigenstates.
Each edge from S to S c contributes a nonzero entry to Adj S , but formula (4.3) implies that too many connections can reduce entanglement. For example, for the fully connected graph, every row of Adj S is just a sequence of ones, so there is never more than one bit of entanglement entropy. To maximize the entanglement between S and S c , one needs the matrix Adj S to have full rank for all |S| ≤ n/2. This is generically the case for appropriate random graphs in which edges are included randomly and independently in G according to the rule Pr[(u, v) ∈ E] = p.
Since t ent = π|E|/n, minimizing t ent requires minimizing the expected number of edges in the graph, which is n 2 p, subject to the constraint that the rank of Adj S be maximal for all |S| ≤ n/2. As n goes to infinity and |S| |S c | goes to any constant α, the rank defect of the matrix is Poisson distributed with parameter αe −γ provided (log n + γ)/n ≤ p ≤ 1 − (log n + γ)/n [53]. Therefore, Adj S will be full rank with probability at least 1 − e −γ . Thus, the minimal value of t ent is equal to π(log n + γ), where γ can be regarded as a constant. Even though the system doesn't scramble fully in the sense of making all local information locally inaccessible, it does scramble the basis of σ x eigenstates and does so in time logarithmic in n, as required of a fast scrambler.
For the sake of comparison with the Brownian circuit model, it is also instructive to consider the analog of τ * , the ratio of the amount of time to scramble systems of size κn to the time required to scramble a single qubit. Since the system is exactly solvable, it is straightforward to establish by direct calculation that for S = {j} a singleton, the Hamiltonian (4.2) and initial state |Ψ(0) imply where d j is the number of graph neighbors of site j. The expected number of neighbors per site is p(n − 1). Requiring that (4.4) be close to minimal, i.e. 1 2 , gives the 1-scrambling time as O( The ratio of the times required for scrambling σ x eigenstates therefore scales like O( . This hints at the possibility that for systems that do scramble all product states, unlike this Ising model, τ * might also fail obey an Ω(log n) lower bound as required by the fast scrambling conjecture.
Regardless, the Ising model provides an example of a system capable of producing large scale multipartite entanglement sufficient to scramble all information stored locally in a fixed basis on a time scale no more than logarithmic with the number of degrees of freedom.

Lower bounds on the scrambling time
One way to prove lower bounds on the scrambling time is to exploit the connection between scrambling and signalling. In particular, scrambling a subsystem S implies the ability to signal to the complementary subsystem S c . The main task of this section is therefore to prove signalling bounds, but we must do so without relying on relativity or, more generally, any underlying geometry in the organization of the degrees of freedom. Our technique goes back to Lieb and Robinson [26], who proved bounds on commutators [O A (t), O B ] for observables O A and O B localized on subsystems A and B of lattice spin systems. To signal reliably from B to A, there must be normalized observables for which the norm of the commutator is O(1). Hastings improved the original Lieb-Robinson technique so as to produce dimension-independent bounds [54] and Nachtergaele-Sims showed how to adapt it to general graphs [27]. The version we start from combines both features and is due to Hastings and Koma [28].
As we will see, the Lieb-Robinson technique gives lower bounds on the time required to signal from B to A provided A and B are both constant-sized subsystems. The definition of scrambling used in this paper, however, only implies signalling from a constant-sized B to the complementary subsystem S c , and S c will generally involve at least half the degrees of freedom in the whole system. To deal with this large S c , we use the Lieb-Robinson bound to show that a mean-field approximation to the time evolution remains reasonably good for sufficiently short times, provided the initial state has product form. For as long as the mean-field approximation holds, the dynamics cannot generate any significant entanglement, which prohibits signalling to S c and, of course, scrambling.

Scrambling implies signalling
Any information initially stored as a state on H 1 will have become inaccessible to measurements on S alone once scrambling has occurred. One way of phrasing this mathematically is by introducing a "reference" system N that does not participate in the interaction and will initially be entangled with system H 1 . The scrambling condition ensures that if the initial state has the form |Ψ(0) = |ψ 1 N H 1 ⊗ |ψ 2 H 2 ⊗ · · · |ψ n Hn , then the time evolution destroys any entanglement between N and H 1 in the sense that (See, e.g., Lemma 19 of [55].) To study signalling of a single bit's worth of information, it suffices to let for a some orthonormal states |0 and |1 . As discussed in [15,56], inequality (5.1) implies that the entanglement with N can be recovered without use of the degrees of freedom of S. That means there is a unitary transformation V on S c and a qubit subsystem M of S c such that for the maximally entangled state |Φ = 1 √ 2 (|00 + |11 ). The ability to send entanglement to S c in this way is at least as strong as mere signalling, however. Working from (5.2), standard manipulations imply that if H 1 were prepared in one of two orthogonal initial states |0 H 1 and |1 H 1 then there are orthogonal projectors Π 0 and Π 1 on S c such that where |Ψ (j) (0) = |j H 1 ⊗ |ψ 2 H 2 ⊗ · · · ⊗ |ψ n Hn . That is, the signal has been transmitted from H 1 to S c with an average probability of error in the decoding of at most 4 . These conclusions are illustrated in Figure 3.

Lieb-Robinson bounds for nonlocal interactions
As has been the case throughout the paper, the state space will have the form H = H 1 ⊗ · · · ⊗ H n . Suppose that the Hamiltonian has the two-body form H = x,y H x,y , where the sum is over unordered pairs of sites x, y and each of x and y range from 1 to n. Each term H x,y acts only on H x ⊗ H y . We can associate to such a Hamiltonian an interaction graph G = (V, E) with n vertices representing Hilbert spaces H 1 , . . . , H n and edges connecting vertices x and y if the 2-body interaction term H x,y is nonzero. The antiferromagnetic Ising interactions discussed in Section 4 are a special case, and the graph of Figure 2 is, of course, the interaction graph. Denote by D the maximum degree of any vertex in the interaction graph. Let us further require the constraint H x,y ≤ c/D on the strength of pairwise interactions for some constant c. Physically, this constraint ensures that the energy per degree of freedom will remain finite for all states even in the limit n → ∞.
For X ⊆ {1, 2, . . . , n}, denote by A X the algebra of bounded norm operators acting on H X . We start by discretizing time into steps of size = t/N for some large integer N and let t j = j . Then, S c (t * ) Figure 3. Scrambling implies signalling. Site 1 is prepared in one of two orthogonal states |j for j either 0 or 1. All other sites are prepared in states that are independent of j. After the scrambling time t * any subsystem S of size at most κn will be essentially independent of j, but the reduced states Ψ (j) S c (t * ) on the complementary subsystem S c will be nearly orthogonal. Scrambling therefore implies signalling from the first site to the complementary system S c .

The observable O A evolves after time to O
The norm of each term of the sum in (5.4) can be bounded from above using Hence, we have (5.5) In the limit → 0, the above expression becomes the inequality We now specialize to the case where B is the singleton set {y}. Fixing attention on a particular O B ∈ A y , define There is therefore a path with the following edges: x, z , z, z 1 , z, z 2 , z 2 , y and y, z 4 .
If the subsystem X in the inequality (5.6) is A, we have whereas for X = x, z we obtain (5.9) By using the above bound iteratively in (5.8), we find On a graph of maximum vertex degree D, the i th sum in the right hand side of (5.11) has at most 4(4D) i−1 terms, which can be seen by a simple combinatorial argument. The sums that appear have the form z,z 1 ,z 1 ,...,z i−1 : (5.12) One can think of terms in the above sum as paths made from edges that connect y and x ∈ A, as illustrated in Figure 4. A path is made by identifying a vertex in each pair z j , z j with a vertex in z j+1 , z j+1 . Once a vertex z j is identified with some z j+1 , there is a maximum of D different choices for z j+1 because the interaction graph has maximum degree D. The path starts either at x or z and ends either at z i−1 or y. For each of these cases, it is not hard to see that the number of paths of length i is less than (4D) i−1 . Therefore, the overall number of terms in the sum (5.12) is always less than 4(4D) i .
Moreover, from the constraint H z,z ≤ c/D on the strength of two-body interactions, it follows that each term is bounded above by (c/D) i . Therefore, In the case of a fully connected graph, D = n − 1, which would seem to force logarithmic scaling of the signalling and, therefore, of the scrambling time. Unfortunately, as discussed in Section 5.1, scrambling only implies signalling to S c so we must take A = S c , and systems of size larger than n/2 don't scramble, so |S c | ≥ n/2. Naïve substitution into (5.14) then yields no bound at all on the scrambling time so further analysis will be necessary.

Scrambling highly mixed initial states
It's interesting to note that (5.14) does yield a logarithmic lower bound for the type of scrambling relevant to information retrieval from highly entangled black holes. This paper has thus far focused exclusively on pure initial states for H. Replacing |Ψ(0) with a state pure on H 1 and maximally mixed on H 2 through H n corresponds to a different communication scenario. The retrieval of the |j · · · Figure 5. Scrambling implies signalling for mixed initial states. Site 1 is prepared in one of two orthogonal states |j for j either 0 or 1, and all other states are prepared in states that are independent of j and highly mixed. These mixed states can be viewed as parts of pure states that are entangled with environmental degrees of freedom E 2 through E n . When the initial states are maximally mixed, it is possible to scramble subsystems S of size n − O(1). This leads to signalling to the complementary degrees of freedom S c , adjoined with the environmental degrees of freedom E = E 2 · · · E n . That is, the states Ψ (j) S c E (t * ) are nearly orthogonal to each other. Because S c can be taken to be constant-sized, the Lieb-Robinson bound provides nontrivial lower bounds on the signalling, and hence scrambling, time in this setting without the need for additional argument.
information stored in H 1 would need to make use of some degrees of freedom S c ⊆ {1, 2, . . . n} supplemented by the environmental degrees of freedom required to "purify" the initial state. When the initial state is so highly mixed, however, it is possible to scramble many more degrees of freedom than when the initial state is pure, leading to a much smaller S c . The resulting signalling scenario is illustrated in Figure 5. Brownian quantum circuits, for example, will scramble subsystems S of size n − O(1), leaving a constant-sized complementary system S c with |S c | = O(1). Because the environmental degrees of freedom don't participate in the interaction, one can take |A| = |S c | = O(1) and recover the logarithmic lower bound on scrambling from (5.14). Moreover, it is necessary to consider these larger systems: numerical investigations show that it is possible to scramble any constant fraction of the degrees of freedom in constant time if the initial state is highly mixed.

Controlled mean-field approximation via Lieb-Robinson
Having proven the Lieb-Robinson bound, we now prove that up to times of order log(D), the reduced density matrix on each site x is close to a pure state. Since scrambling requires entanglement, this will provide the desired lower bound on the scrambling time. Since D is the maximum vertex degree, this evaluates to an order log(n) lower bound for Hamiltonians in which every degree of freedom interacts with a constant fraction of all the others.
A slightly subtle point is that all of a system's single site density operators can in principle be close to pure even if the wavefunction of the whole system is not. The issue is that the number of sites, n, is large, and the overlap of the true wavefunction with the mean-field pure product state can easily be a factor exponentially smaller in n than the corresponding single-site overlap. The analysis of this section will therefore not imply that the wavefunction of the whole system is product up to times of order log(D).
We begin by defining a time-dependent "mean-field" Hamiltonian where each operator H M F x is supported on site x. We define the operators H M F x self-consistently as follows. Let Ψ M F x (t) be the reduced density matrix on site x at time t assuming that the state is initialized to a product state |Ψ(0) = |ψ 1 H 1 ⊗ · · · ⊗ |ψ n Hn at time t = 0 and evolves under Hamiltonian x,y H x,y again assuming that the state is initialized to the product state |Ψ(0) at time t = 0. We now prove that, for t small compared to log(n), Ψ x (t) is close (in trace norm distance) to Ψ M F x (t). For notational convenience, we will write O to indicate Ψ(0)|O|Ψ(0) . Further, we define a unitary U M F x (t) to define the mean-field evolution on site x by and That is, U M F x (t, s) describes mean-field evolution from times s to time t. Since the mean-field time evolution on all of H for time t has the form ⊗ n x=1 U M F x (t), it can never generate any entanglement between different sites. For as long as it remains a decent approximation to the true time evolution, scrambling will be impossible.
Similarly, we define U (t) to be the unitary describing evolution under H, with U (0) = I, (5.22) and In proving the Lieb-Robinson bound above in Section 5.2, we used the Heisenberg notation for operator evolution: O(t) denoted U (t) † OU (t). In this section, we will not use this Heisenberg notation, and we will instead explicitly write out U (t) or U (t) † to describe evolution of operators or states. The reason for this is that we are going to evaluate the expectation values of operators whose time-dependence is not necessarily given by conjugation by U (t), so that the parenthetical (t) could be ambiguous if we were to use it to denote Heisenberg evolution.
Consider any operator O x supported on site x. For any two times, t i and t f , we have This equation can be proven by differentiating the right-hand side with respect to t i and verifying that the result is equal to the right-hand side multiplied by i and commuted with H. Call the first and second terms on the right-hand side T 1 and T 2 respectively. When T 1 is differentiated with respect to We will apply this equation to the specific case of the time-dependent operator O x = 1 − Ψ M F x (t), using it to compute the expectation value and R x,y (s) is defined by Eq. (5.28). Then, Ψ M F y (s)L x,y (s) = 0 and similarly R x,y (s)Ψ M F y (s) = 0.
Taking into account Eq. (5.27) as well as the definitions of L and R, we can replace H −H M F x (s) in Eq. (5.25) with a sum over y of L x,y (s) + R x,y (s). This gives a sum over y of a sum of two terms (the L and R terms). Consider an L term for given y, s. This is Similarly, for an R term, we write For an L term, we apply Eq. We proceed iteratively in this fashion, getting an infinite series of terms. Each term in the series at a given order, say the k-th order, involves a k-fold integral over s 1 , s 2 , . . . , s k , with 0 ≤ s 1 ≤ · · · ≤ s k ≤ t. Further, each term in the series has a sum over k different sites y 1 , y 2 , . . . , y k and finally each term has a sum over k different choices of L or R terms. Our goal is to bound the expectation of the sum of terms at k-th order. Each such term will have one operator 1 − Ψ y k (s k ) in it. This operator may be in the middle of a sequence of terms. Suppose the last term was an L term. Then we have some expectation value for some operators P, Q. We commute U M F (s) † (1 − Ψ M F y k (s))U M F (s) through P using the Lieb-Robinson bounds above. Note that the reason that we choose to commute through P rather than through Q is that whenever the last term is an L term, one of the operators in Q is L y k−1 ,y k . We would not be able to bound the associated commutator since Q has support on y k . Conversely, if the last term was an R term, we commute to the right through Q instead. Note that for any operator S. Therefore, the expectation value Eq. (5.32) is bounded by the commutator in the case that the last term was an L term. (Similarly. it is bounded by a commutator with Q in the case of an R term.) To bound this commutator, we consider two different cases. First, there is the case that y k = y i for 1 ≤ i < k. In this case, we can bound the commutator by (const./D) k × (k/D) × exp(const. × t) using the Lieb-Robinson bound from Section 5.2, which contributes a factor of const. × (1/D) × exp(const. × t). The factor of k appears because P is a product of up to k different operators while the final factor of (const./D) k comes from the fact the norms of all of the operators L x,y and R x,y are bounded above by const/D. The case when y k = y i for some 1 ≤ i < k might seem to be more problematic because the Lieb-Robinson bound doesn't apply, but we will see below that this bad case happens infrequently enough to not affect the final conclusion.
To bound the sum over terms in the series at given order, we note that the sum over choices of y 1 , . . . , y k decomposes into these same two cases. The sum in the first case, when y k = y i for all 1 ≤ i < k, is bounded by where the factor of (k/D) exp(const. × t) is due to the commutator bound, with the factor of 1/D k that was present there cancelled by an D k in the numerator arising from the sum over y 1 , . . . , y k . The factor of t k /k! in Eq. (5.34) arises from integrating over the k different times 0 ≤ s 1 ≤ · · · ≤ s k ≤ t. Summing over the different choices of L or R contributes an extra factor of 2 k which can be absorbed into the constant raised to the power k. In the second case, when y k = y i for at least one 1 ≤ i < k, the sum over y i is bounded by const.
where the factor of k/D arises because any of the k − 1 different y i for 1 ≤ i < k may be equal to y k . (By constraining the choice of y i we reduce the number of different choices for y i in the sum.) So, the sum over all orders k is bounded by Recall that this is an upper bound on the quantity 1 − Ψ M F x (t)|Ψ x (t)|Ψ M F x (t) , the deviation of Ψ x (t) from being a pure state. If the deviation is small at time t, the continuity of the von Neumann entropy implies that H(Ψ x (t)) ≤ δ log dim H x for some universal δ going to zero with the deviation [47]. The subadditivity property of H then implies that As discussed in Section 2.1, scrambling requires that H(Ψ S (t)) be close to its maximal value of log dim H S , which can only occur if the deviation of each Ψ x (t) is significant. For this to happen, (5.35) requires that t be order log(D), which is the desired lower bound on the scrambling time provided D ∼ n. (Note that it is equally possible, if slightly more technical, to supply a dimensionindependent argument.)

Sparse graphs
If the degree D is constant or even scaling sublinearly with n, then (5.35) might not be a useful bound. For sufficiently slowly growing D, however, it is possible to substitute the more traditional Lieb-Robinson bound for the version proved in Section 5.2. Specifically, the version of the bound proved in [28] ensures that for some positive constants v and ξ. The function d(A, B) measures the distance from A to B in the interaction graph so the interpretation of (5.37) is that there is a maximum effective velocity v of information propagation between degrees of freedom. For complete graphs, the bound is trivial, but not for graphs of lower connectivity.
In particular, there can be at most D l vertices at distance exactly l from any fixed vertex. It follows that at most a fraction α of all pairs of vertices x and y can satisfy d(x, y) ≤ log(αn)/ log D. Therefore, most x and y satisfy d(x, y) ≥ O(log n/ log D). Substituting into (5.37) and comparing with (5.13) implies that the signalling time between x and y must satisfy For regular graphs, in which every vertex has degree D, this reasoning can even be extended to the scrambling time t * . From the mean-field argument, we already know that t * ≥ O(log D). A direct application of Lieb-Robinson, however, requires that t * ≥ O(log n/ log D). To see this, fix x and let S be the set of all sites y such that d(x, y) ≤ log n/ log(D − 1) + const. This will be a constant fraction of all the sites. Different initial states at site x are eigenstates of rank one projectors acting on that site. By a standard argument [54], (5.37) ensures that for times t < d(x, S c )/v − const., the time-evolved projectors will be well-approximated by operators acting only on S, in which case the different initial states can be distinguished by measurements on S alone, which is inconsistent with scrambling. Optimizing over D as in (5.38) yields t * ≥ O( √ log n).

Conclusions
We have explored two aspects of the fast scrambling conjecture, both of which are implicit in the statement that the most rapid scramblers take a time logarithmic in the number of degrees of freedom.
For the statement to be true, there must exist systems scrambling quickly enough to saturate the bound. Conversely, no system should be capable of scrambling in time faster than logarithmic. We demonstrated that Brownian quantum circuits and the Ising model on sparse random graphs both scramble information in logarithmic time. Each example, however, has its own deficiencies, not quite meeting the objective of finding a time-independent Hamiltonian that scrambles all locally available information in logarithmic time. Namely, Brownian quantum circuits are not actually described by a time-independent Hamiltonian, while the Ising model only scrambles information in one basis, leaving the conjugate basis invariant. Nonetheless, the examples illustrate that the entanglement creation required for scrambling can indeed be accomplished in logarithmic time without the need for an intricately structured Hamiltonian. Finding a completely fast scrambling time-independent Hamiltonian remains an open problem. While it's simple enough to write down plausible candidates, analyzing them is a challenge.
To find limits on scrambling, we used Lieb-Robinson techniques to prove a general lower bound on the scrambling time of arbitrary quantum systems with two-body interactions. The strategy was to estimate the amount of time required to signal in such systems, which in turn bounds the amount of time required to scramble. Mathematically, we used a modified Lieb-Robinson bound to argue that for sufficiently small times, a mean-field approximation to the single-site evolution is a good approximation. If most pairs of systems interact with terms of comparable norm in the Hamiltonian, the result is a logarithmic lower bound on the scrambling time. The same bound applies to fourbody Hamiltonians with structure similar to the BFSS matrix model. However, our argument does contain a loophole: in the general case of graphs with lower connectivity, we could only prove a requirement that the scrambling time be at least O( √ log n), although we strongly suspect that this is only a reflection of the limitations of our technique.
One of the lessons of this investigation is that some plausible mathematical formulations of the conjecture are false. In the case of the Ising model, for example, the scrambling time ratio τ * = t * /t (1) * , which a priori one might have thought should also grow at least logarithmically with the number of degrees of freedom, is parametrically smaller. More subtly, the fast scrambling conjecture is formulated in terms of pure initial states and scrambling sets S of size |S| = κn for constant κ. The argument for rapid release of information from highly entangled black holes, however, requires starting from a mixed initial state and studying larger scrambling sets S of size n − O(1) instead of just κn. We have found logarithmic lower bounds on the scrambling time in both cases but not using identical reasoning. The pure state scenario, perhaps surprisingly, was more difficult to analyze.
The understanding gained here should ultimately be helpful in properly formulating and evaluating the scrambling time of matrix quantum mechanics or other models of black holes. The correct analog of the simple decomposition into subsystems used here already poses a bit of a puzzle. Likewise, since some initial configurations are known not to scramble quickly, care is required in identifying the set of states that are rendered locally indistinguishable by the dynamics. The correct analog of "local information" should be physically well-motivated and basis-independent. The reward for resolving these issues will be great: a microscopic description of information leakage from black holes and, more generally, a deeper understanding of how nonlocal degrees of freedom in quantum gravity can be reconciled with the causal nature of semiclassical physics.
N0001480811249. DS was supported by the United States NSF under the GRF program, as well as grant 0756174.

A Equations of motion for Brownian quantum circuits
In this appendix we describe in detail the dynamics of the purity of the subsystem S as it evolves according to a Brownian quantum circuit. Our starting point is the equation of motion for Ψ S (t). This can be found by tracing out the degrees of freedom in S c in (3.5): The right hand side of this equation of motion consists of a noisy part and a noiseless part We'll deal with both of these terms in turn. First, the noisy part ( †) can be reduced to and we have omitted tensor products with the identity to make the expressions more compact. The noiseless part ( † †) can be rewritten as which expands to a form that distinguishes different contributions: Reassembling the pieces yields the final equation of motion for Ψ S (t): By another application of Ito's rule, the equation of motion for the purity h S (t) can be derived from the relation dh S (t) = 2 tr(Ψ S (t)dΨ S (t)) + tr((dΨ S (t)) 2 ).
Because of the number of terms, it will be necessary to work with the equation of motion in pieces, as we did for Ψ S (t): dh S (t) = ( * ) + ( * * ) + ( * * * ), (A. 10) where ( * ) and ( * * ) are, respectively, the noisy and noiseless parts coming from the first term in (A.9), and ( * * * ) is the contribution of the second term. Firstly, ( * ) is given by There is no need to simplify this term any further because it will average to zero when we consider h S . The second term is more important for what follows: Finally, ( * * * ) is just tr((dΨ S (t)) 2 ): which simplifies to After straightforward manipulations the expression further reduces to Combining ( * ), ( * * ) and ( * * * ) then averaging over the realizations of the Brownian motion yields the following system of coupled ODE's:

B Solutions of the purity ODE system
This appendix discusses solutions of the system of ODE's We have investigated these equations numerically with initial conditions h k = 1, and found a logarithmic behavior in the ratio of scrambling times τ * = t κn * /t 1 * ∼ log n. Here, we will give a heuristic a) where 2 F 1 is the Gaussian hypergeometric function. These eigenvectors blow up in the limit k → ∞ unless λ = −3j with j a positive integer. The general solution to (B.2) in the limit n → ∞ is given by At late time, the largest contribution comes form the zero eigenfunction, which selects a 0 = 1. We can get a sense for the relaxation time by examining the eigenfunction corresponding to the second eigenvalue, namely the term with j = 1. Direct evaluation of the hypergeometric function (which reduces to a polynomial in the above case) shows that the contribution of the j = 1 eigenvalue is proportional to 2 −k ka 1 e −3τ . Provided that the first correction qualitatively reflects the higher order corrections (which is does if a j decreases appropriately with j), we find t k * ∼ log k, so that τ * ∼ log n.
Next, we turn to a numerical study of the eigenvectors for subsystems of larger k/n. Similarly, the solutions will have the general form h k (t) = n j=1 a j e λ j (n)t A j (k, n), (B.5) where the λ j (n)'s are eigenvalues of the matrix B (and therefore k-independent), and the A j (k, n) are the corresponding eigenvectors. It is only the largest nonzero eigenvalue and eigenvector that are important for scrambling time. As can be seen in Figure 6, numerical results suggest that the largest nonzero eigenvalue λ 1 −3/n and its corresponding eigenvector A 1 (k) ∼ 2 −k k α for α ∼ O(1).

C r-body interactions and the BFSS matrix model
Here, we revisit the Lieb-Robinson argument presented above for systems with r-body nonlocal interactions. The Hamiltonian for such systems has the form H = X H X , where the sum is over subsets of maximum size r and H X acts on ⊗ x∈X H x . We will restrict our analysis to systems where r is a constant, not a function of n. In analogy with the interaction graphs introduced in Section 5. where the indices a and b range from 1 to 9 and the M a are n by n traceless Hermitian matrices. The degrees of freedom M a ij are indexed by triples (a, i, j) with i ≤ j. The operators in the Hamiltonian have unbounded norm, so strictly speaking the Lieb-Robinson approach cannot be used. In this section we nonetheless proceed formally as if the operators had bounded norm in order to determine whether the counting is consistent with a logarithmic signalling time.
The kinetic termṀ a ijṀ a ji in (C.1) is a single-body interaction, whereas the potential term is comprised of 4-body interactions of the form Repeating the same arguments as in the case of two-body interactions, for hypergraphs we find the inequality: 3 where Z is any multiset 4 of one or four degrees of freedom that has a nonzero contribution H Z to the Hamiltonian. C B (Z, t) itself is bounded from above by The contribution of each degree of freedom (a, i, j) to the energy is bounded by Figure 7. The interaction hypergraph of the BFSS matrix model includes hyperedges that contain one or four vertices. The Lieb-Robinson bound in (C.5) is found by summing over a set of hyperedges that contain a path between y and A. This figure illustrates a typical path connecting y and A with seven hyperedges.
Next, we focus on counting the number of terms in the sum on the right hand side of (C.8). Denote this number by P i−k . If p (j,j+1) is the number of ways X j can intersect X j+1 , then P i−k ≤ p (A,1) p (1,2) p (2,3) · · · p (i−k−1,i−k) . (C.9) Notice that each four-body interaction term M a 1 ij M a 2 jk M a 3 kl M a 4 li in the Hamiltonian has four indices i, j, k and l that run from 1 to n. Fixing one degree of freedom fixes two of these indices, while fixing a second degree of freedom leaves only one index. Therefore, p (j,j+1) is order n 2 if y / ∈ X j+1 and is order n if y ∈ X j+1 . Since y has to belong to X j for some j, there are a maximum of P = O n 2(i−k)−1 nonzero terms in the sum (C.8). Plugging this result back in (C.8) gives  for some positive constant c . Now from (C.5) we find the inequality This finishes the "proof" that in the BFSS matrix model, signalling takes time at least t signal ≥ O(log n). Of course, we have really just proved the weaker statement that a logarithmic lower bound holds for a related system with bounded operators in its Hamiltonian. It is therefore conceivable that this proof could be adapted to hold for the real BFSS Hamiltonian for all states in a low energy subspace. Alternatively, Lieb-Robinson bounds for lattice systems have been proved for some Hamiltonians containing unbounded operators [57]. Similar techniques might be applicable to the matrix model.