Many aspects of modern applied research relies on a crucial algorithm called gradient descent. This is a procedure typically used to find the largest or smallest values of a particular mathematical function, a process known as function optimization. It can be used to calculate everything from the most profitable way to make a product, to the best way to allocate teams to workers.
Yet despite this widespread utility, researchers have never fully understood the situations the algorithm struggles with the most. Now new work explains it, establishment this gradient descent, basically, tackles a fundamentally difficult computational problem. The new result places limits on the type of performance researchers can expect from the technique in particular applications.
“There is a kind of worst-case harshness that deserves to be known,” said Paul Goldberg from the University of Oxford, co-author of the book with John Fearnley and Rahul savani from the University of Liverpool and Alexandros Hollender from Oxford. The result received a Best article award in June at the annual meeting Computing Theory Symposium.
You can imagine a feature as a landscape, where the elevation of the land equals the value of the feature (the “benefit”) at that particular location. Gradient descent searches for the local minimum of the function by finding the direction of the steepest climb at a given location and finding the descent away from it. The slope of the landscape is called the gradient, hence the name gradient descent.
Gradient descent is an essential tool in modern applied research, but there are many common issues where it does not work well. But prior to this research, there was no comprehensive understanding of what makes gradient descent difficult and when – questions that another area of computer science known as computational complexity theory has helped answer. .
“Much of the work in gradient descent did not speak with complexity theory,” said Costis Daskalakis from the Massachusetts Institute of Technology.
Computational complexity is the study of the resources, often computing time, required to solve or verify solutions to various computer problems. Researchers classify problems into different classes, with all problems in the same class sharing some basic computational characteristics.
To take an example, which is relevant to the new newspaper, imagine a city where there are more people than houses and everyone lives in one house. You are given a phone book with the names and addresses of everyone in town, and you are asked to find two people who live in the same house. You know you can find an answer because there are more people than houses, but it might take a bit of research (especially if they don’t share a last name).
This question belongs to a class of complexity called TFNP, short for “non-deterministic total function polynomial”. It is the set of all calculation problems whose solutions are guaranteed and whose accuracy can be quickly verified. The researchers focused on the intersection of two subsets of problems within the TFNP.
The first subset is called PLS (polynomial local search). This is a set of problems that involve finding the minimum or maximum value of a function in a particular region. These problems are sure to have answers that can be found by relatively simple reasoning.
One problem that falls under the PLS category is the task of planning a route that allows you to visit a fixed number of cities with the shortest possible travel distance since you can never change the trip by changing the order of any pair of consecutive cities in the tour. It’s easy to calculate the length of any suggested route, and with a limit on how you can change the route, it’s easy to see which changes shorten the trip. You are guaranteed to end up finding a route that you cannot improve with acceptable movement, local minimum.
The second subset of problems is PPAD (polynomial parity arguments on directed graphs). These problems have solutions that emerge from a more complicated process called Brouwer’s fixed point theorem. The theorem says that for any continuous function it is guaranteed that there is a point that the function leaves unchanged – a fixed point, as it is called. This is true in everyday life. If you stir a glass of water, the theorem guarantees that there absolutely must be a water particle that will end up in the same place it started from.