Counting the “uncountable”: what granular materials, machine learning and string theory have in common

A recent work I am familiar with has attracted quite a lot of attention in the media and I am very happy about it. Yet, a bitter taste remains eventually as most news reporting about it (due to lack of time and space most probably) did not do justice to the history of that work, why it is actually difficult and why it may be important in physics.

The work, published in Physics Review E, has been performed by an A-team of Cambridge researchers comprising S. Martiniani, J. Schrenk, J. Stevenson, D. Whales and D. Frenkel. One of the paper’s achievements is that it gives for the first time an estimate to the total number of different rigid structures that can be made out of N=128 “tennis balls” (from now on, we shall call them spheres) confined in a three dimensional box with prescribed dimensions. And the number is big: it is 10²⁵⁰ or, in other words, 1 followed by 250 zeros. This number is so big that it is one hundred billions of billions of billions of billions times bigger than the estimated number of atoms in the observable universe.

A non-trivial problem about counting

Now, the actual feat is not in the size of the number per se — physicists and mathematicians have been manipulating bigger numbers than that for a long time now — but lies in the ability to “count” up to such a big number for the particular problem the researchers were looking at. In fact, there does exist methods that enable one to estimate in a matter of seconds the number of all possible arrangements of spheres in a prescribed box but these arrangements won’t necessarily be mechanically rigid (cf. Fig. 1).

martiniani — Fig. 1: Example of a rigid structure made from spheres (taken from the preprint version of Martiniani et al. ‘s paper)

The difficulty is thus to count only those arrangements which have the right set of properties and this is a much tougher problem if one does not know a priori where or how to find them. The difference in the difficulty is quite like the difference between having to estimate the total area spanned by all the pages of a book and having to estimate how many times the word “obvious” appears in it. In the latter case, one needs a set of criteria to separate the target word from anything else in the book (the other words, spacings, figures etc…) and somehow devise a strategy to enumerate them all.

In technical jargon, searching for elements that satisfy a specified set of constraints is called a constraint satisfaction problem and the counting problems we are talking about (both for sphere packs and target words) belong to that class. However, in the case of the word-search problem the difficulty is said to be P-hard (“P” as “Polynomial” because doubling the number of pages of the book roughly doubles the search time) while the sphere packs problem is said to be NP-hard (roughly meaning that the time increases exponentially with the number of balls).

There are thus two aspects to the difficulty of counting the number of packs: 1) one needs to search for specific and yet-undetermined elements in a set and 2) the time for a complete search, albeit finite, grows exponentially with the number of spheres and the number of spatial dimensions.

The proposed solution

For various reasons, that will be subsequently addressed, physicists and mathematicians alike have been seriously interested in counting the number of stable sphere packs since at least the 1960s’. However, because of the great number of constraints required to get mechanically rigid arrangements, the first direct estimates were only obtained 40 years later with the use of extensive computer calculations. Even so, the exhaustive enumeration strategy used by these calculations was so tedious that in 2011 it could only give quite reliable estimates of the number of sphere packs in a box with prescribed dimensions for a number of spheres up to N=16 only and in 2D!

EnergyLandscape — Fig. 2: 3d graphical representation of a simple function of 2 variables (x and y) with hills and valleys: the z-axis is used to plot the value of the function for various values of x and y

To bypass this huge technical issue, a research project, initiated by Cambridge professor D. Frenkel in 2008, sought a different route to get tractable estimates of the number of sphere packs for N>16. The new strategy relied on the fact that most algorithms would look for stable mechanical packs of spheres by actually identifying them to minima of a particular potential energy function (whose details do not matter much) that depends on the positions of all the spheres. In this picture the function would graphically look like a landscape (in a very high dimensional space) with many valleys (the stable packs) separated by hills (see Fig. 2 for an example in 3d).

Without entering into too much details, this strategy is alright provided the spheres only interact via so-called conservative forces. With this picture in mind, Frenkel’s idea was that each minimum can be attributed a basin of attraction, a region of the space of spheres arrangements that ultimately lead the spheres into a rigid structure under a dynamics that forces the system to only go “downhill” in the landscape. In the example of Fig. 2, that would be akin to imagine it representing a real-life landscape in which a dropped ball would roll down under gravity to the bottom of one of the valleys.

LandscapeTiling — Fig. 3: Top view of the landscape of Fig. 2 with a color code representing the value of the function f(x,y): the reder the colour, the higher the value. (a) shows the lines of identical value of the function f and (b) shows the 13 visible basins of attraction with blue-ish centres

In Fig. 3(a) is represented a top view of the landscape of Fig. 2 with a color code instead of the z-coordinate to visualise the value of f(x,y). This representation is more suitable to draw the various basins of attraction existing in Fig. 2 as we can see in Fig. 3(b). Basically, upon following the prescription of only going downhill, any point within a region delimited by a dark line will inexorably go towards the bluest point of that region corresponding to a rigid structure configuration. Moreover, one notices that by stiching the basins together, they cover an area equal to the total area available in the landscape. Putting this property in a simple equation form gives (for our example with 13 basins):

A_total = a₁ + a₂ +..+ a₁₃

Now, if we divide the above equation by 13 we get:

A_total/13 = (a₁ + a₂ +..+ a₁₃)/13

and by introducing the mean basin area a_m = (a₁ + a₂ +..+ a₁₃)/13, we get the simple relation:

13 = A_total/a_m

which carries the essence of Frenkel’s idea: A_total is easy to calulate and we don’t necessarilly need to find all the basins to get a reliable estimate of a_m, so we can get an estimate of the total number of sphere packs without enumerating all of them. To see why not all minima are required, think about it this way: if we imagine an ideal case where all the basins would have the same area, then estimating the area of a single basin of attraction is enough to get a reliable estimate of the mean basin area and hence of the total number of basins/rigid structures. So, the narrower the basin area distribution, the smaller the number of basin areas we need to evaluate in practice.

The basis of this strategy as well as the principal tool to estimate the area of a basin of attraction were developped and eventually published in 2011 by N. Xu and collaborators and the first full implementation to estimate the total number of sphere rigid arrangements for N=128 spheres in 2D was carried out in a study by D. Asenjo, D. Frenkel and myself (with also an important help from the now independent Principia Marsupia‘s blogger Alberto Sicilia) published in 2014. After that, much work remained to be done to make these techniques able to estimate the number of sphere arrangements for “real” systems of spheres in 3D and to do so –with basically the same computing power more or less– required the use of better inference methods and a very efficient algorithmic design.

And this is exactly, among many other things, what S. Martiniani and his coworkers just realised in their recent work: by optimising the algorithmic performance and adding Machine Learning techniques to an already elaborate set of tools, they were able to get a 100 fold speedup for the “counting” strategy devised by Asenjo’s team, thus enabling them to probe for the first time more real-life system sizes up to N=128 spheres in 3D and getting the reliable estimate of 10²⁵⁰ rigid structures out of it.

Why does it matter?

I see you coming. You may wonder why does counting how many rigid structures can be made out of a bunch of balls in a box matter? There are at least three trains of thoughts to support the idea that it is an important result.

The problem of optimal packing of grains and the search for a granular “thermodynamics”

The practical problem of finding an optimal arrangement of grains certainly dates back much before the 17th century as it is logistically more efficient to transport materials which take the smallest volume possible. However, the first well documented occurence of a scientific inquiry related to the best way of arranging spheres so that they occupy the least amount of space appears in the work of Johannes Kepler in 1611 where he conjectured — meaning without proof — that the best arrangements for equally-sized spheres were the so-called hexagonally closed-packed and face-centred-cubic packings. This became known as the Kepler Conjecture and it puzzled mathematicians for a good 400 hundred years before a formal mathematical proof of the conjecture was announced by Hales and coworkers in 2014. Part of the difficulty of the proof was due, again, to the NP-hard character of the highest density problem which required to run an automated proof checking program for 10 years to reach completion of the proof.

Far from being an end point, solving the Kepler Conjecture is more of a starting point. Research interests about optimal packings now branch in two quite opposite directions: one consists in finding the best optimal filling of space for other geometrical figures such as tetrahedrons and much has to be done there, and the other concerns the problem of the existence of an optimal random packing, in other word, a density for a rigid arrangement of spheres such that any arrangement with a higher density will be less random.

Incidentally, the two above problems can be related to Martiniani’s work via an ongoing and much debated research program in theoretical physics that aims at devising thermodynamic-like laws for granular materials. As a matter of fact, the great many particles comprising a granular material make it behave sometimes like a fluid and some other times like a solid but not quite exactly in the same way as their molecular counterparts. Since the 1990s’, it has thus been considered a goal to find general laws governing the behaviour of granular systems in the same way that the laws of thermodynamics govern the behaviour of molecular matter. In particular, a heated discussion — which has yet to reach a consensus — has arisen regarding what should be the “right” definition of entropy for a granular material. In fact, it is precisely to discuss this problem that Martiniani and his coworkers estimated the accessible number of rigid sphere packings in various ways. Thus, regardless of the debates, choosing a definition for a granular entropy (and a way to calculate it) will set a standard to measure its randomness which, in turn, may enable one to give a non ambiguous answer to the optimal random packing problem. Likewise, if one is looking for the densest possible arrangement of arbitrary shapes, then knowing some granular entropy for all possible rigid pack densities can prove useful to frame the possibly highest packing value as the entropy may be expected to be the smallest at that point.

Machine Learning and global optimisation problems

The field of artificial intelligence has two major trends: automated reasoning based on predicate logic (used for example in language processing and automated proof checking) and machine learning whose goal is to infer a trend from a given data set (used for example for subjects as various as personalised advertisments, self driving vehicles or even medical diagnosis).

In the case of so-called supervised learning, the goal is to make a computer learn from a data set containing both input and outputs and once this is done see how the computer infers from it yet-unknown outputs for new data inputs on which it did not train before. Here, we are solely interested in the “learning” process. To illustrate how it works, the best example is that of finding the slope of a straight line that best matches a set of data points.

ScatterPoints — Fig. 4: examples of candidate curves to match some data points

Like in any teaching task, the teaching program supervising the learning process of the fitting program ought to rely on a marking scheme that is such that, in the case of Fig. 4, case (b) deserves a good mark and case (a) a bad one. In practice, this is often done by a penalty function which estimates some measure of the “distance” between the straight line and the points. If one were to plot this penalty function for all possible values of the curve slope, it would give a graph with at least one minimum corresponding to the curve that best matches the data. The learning process could simply try all the slope values and pick the one that gives the highest mark (lowest penalty) but this is not very efficient because in that case the teaching program simply returns the mark and nothing else. A better way to proceed is for the teaching program to provide feedback to the fitting program in addition to the mark so that the next try can only be better. The learning process then stops when the teaching program does not find a way to improve the score of the fitting program. Incidentally, this feedback strategy is exactly equivalent to forcing the fitting program to only go downhill along the penalty function so that it is bound to end up at the bottom of a valley.
For situations more complicated than the one depicted in Fig. 4, for instance if there are two inputs n₁ and n₂ and the data points do not have a simple linear trend, then if not chosen carefully the penalty function will look like the landscape of Fig. 2 with many hills and valleys. What it means is that the learning process will stop at the bottom of a valley but not necessarilly in the one with the highest score. In this context, seeking the lowest valley is called a global optimisation problem. Unless one is lucky enough for the penalty landscape to have a “funnel” shape, looking for the global minimum of the landscape requires to have an idea of how many valleys it possesses and this is where Martiniani’s work comes handy and is even a first in the field of optimisation problems as the method is very general and exact (in the sense that no particular assumptions are made along the way).

String theory and cosmology

From the ancient greece atomists to the popular view of atoms, they are often represented as tiny ping-pong balls with a handful of mechanical properties like size, mass, rigidity and “stickyness”. This type of fixed description has been very successful to explain most of the physics observed on Earth and is extensively used even today to perform Newtonian mechanics-based simulations of the phases of matter. Yet, everybody now learns in school that atoms are not the most elementary objects there is and that they are in fact made of a nucleus, itself comprising protons and neutrons, about which wonder much tinier objects called electrons. This has the profound consequence that the properties of a given atomic element are not set in stone but instead depend on the experimental circumstances (for the record, atoms can be put in so-called Rydberg states where their size can be of the order of a millimeter!). The success of the fixed atomic properties described above is then explained by the fact that they are just somewhat typical values corresponding to the circumstances under which we explore natural phenomena on Earth. With the Standard Model of particle physics, the story doesn’t stop there: proton and neutron are made of a quark-gluon mixture and a whole zoo of additional elementary particles have to be accounted for, each with their own fixed properties. In spite of its great success, some unsatifactory aspects of the Standard Model have pushed some physicists to undertake the development of a new theory where the fundamental building blocks of Nature are tiny wiggling strings. In this String Theory, particles of the Standard Model, as well as their parameters, are expected to emerge as being particular modes of oscillations of these strings. The twist is that String Theory needs 9 dimensions of space to be physically consistent. This sounds preposterous (and it somehow is) because all our experiences of the world solely account for 3 spatial dimensions. The usual solution is for 6 dimensions of space to be wrapped on themselves in such a way that they become effectively invisible. The problem is that there are many such viable compactifications (among many other background properties of the theory) and each one of them corresponds to a given universe with its own set of parameter values for the emerging particles of the Standard Model.

reality_landscape — Fig. 5: Sketch of the string landscape for 4 compactified spatial dimensions (picture taken from here). Each valley corresponds to a possible universe.

When expressed in terms of continuous parameters, this gives rise to a landscape in the space of the string theory background parameters where stable universes are at the bottom of the valleys in that landscape (as represented in Fig. 5). This begs the following question: how “typical” are the masses and other charges of the particles that we observe in our universe? This is to answer this type of questions that being able to count how many valleys there are in the landscape (and people expect a great many of them) or how many are there with a given set of properties is of primary interest for some arguments put forward to answer cosmological puzzles such as the Fine Tuning problem, the Cosmological Constant problem or even the evolution of the laws of physics themselves over cosmological times. Some practice steps forward for this counting have been made recently and I am confident that the general tools developed in Frenkel’s team and improved by Martiniani and coworkers in their most recent work will prove useful to get further insight into the number of possible universes within string theory.

As a conclusion, I hope I have convinced a bit the reader brave enough to reach this last sentence that there is more to the “tennis balls” packs counting problem (and proposed solution in a recent paper) than meets the eye as this problem bridges with many disciplines within physics and beyond it.

Fabien Paillusson