Tech and AIThis New Algorithm for Sorting Books or Files Is...

This New Algorithm for Sorting Books or Files Is Close to Perfection

-


The original version of this story appeared in Quanta Magazine.

Computer scientists often deal with abstract problems that are hard to comprehend, but an exciting new algorithm matters to anyone who owns books and at least one shelf. The algorithm addresses something called the library sorting problem (more formally, the “list labeling” problem). The challenge is to devise a strategy for organizing books in some kind of sorted order—alphabetically, for instance—that minimizes how long it takes to place a new book on the shelf.

Imagine, for example, that you keep your books clumped together, leaving empty space on the far right of the shelf. Then, if you add a book by Isabel Allende to your collection, you might have to move every book on the shelf to make room for it. That would be a time-consuming operation. And if you then get a book by Douglas Adams, you’ll have to do it all over again. A better arrangement would leave unoccupied spaces distributed throughout the shelf—but how, exactly, should they be distributed?

This problem was introduced in a 1981 paper, and it goes beyond simply providing librarians with organizational guidance. That’s because the problem also applies to the arrangement of files on hard drives and in databases, where the items to be arranged could number in the billions. An inefficient system means significant wait times and major computational expense. Researchers have invented some efficient methods for storing items, but they’ve long wanted to determine the best possible way.

Last year, in a study that was presented at the Foundations of Computer Science conference in Chicago, a team of seven researchers described a way to organize items that comes tantalizingly close to the theoretical ideal. The new approach combines a little knowledge of the bookshelf’s past contents with the surprising power of randomness.

“It’s a very important problem,” said Seth Pettie, a computer scientist at the University of Michigan, because many of the data structures we rely upon today store information sequentially. He called the new work “extremely inspired [and] easily one of my top three favorite papers of the year.”

Narrowing Bounds

So how does one measure a well-sorted bookshelf? A common way is to see how long it takes to insert an individual item. Naturally, that depends on how many items there are in the first place, a value typically denoted by n. In the Isabel Allende example, when all the books have to move to accommodate a new one, the time it takes is proportional to n. The bigger the n, the longer it takes. That makes this an “upper bound” to the problem: It will never take longer than a time proportional to n to add one book to the shelf.

The authors of the 1981 paper that ushered in this problem wanted to know if it was possible to design an algorithm with an average insertion time much less than n. And indeed, they proved that one could do better. They created an algorithm that was guaranteed to achieve an average insertion time proportional to (log n)2. This algorithm had two properties: It was “deterministic,” meaning that its decisions did not depend on any randomness, and it was also “smooth,” meaning that the books must be spread evenly within subsections of the shelf where insertions (or deletions) are made. The authors left open the question of whether the upper bound could be improved even further. For over four decades, no one managed to do so.

However, the intervening years did see improvements to the lower bound. While the upper bound specifies the maximum possible time needed to insert a book, the lower bound gives the fastest possible insertion time. To find a definitive solution to a problem, researchers strive to narrow the gap between the upper and lower bounds, ideally until they coincide. When that happens, the algorithm is deemed optimal—inexorably bounded from above and below, leaving no room for further refinement.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Crypto․com Pushes Through 70 Billion CRO Re-Mint Despite Community Backlash

Crypto.com has greenlit the controversial re-minting of 70 billion CRO tokens, overriding widespread community opposition with a last-minute...

Canada Crypto Fund Are Exploding: Is Canada Bitcoin Reserve Next After Trudeau?

Canada crypto markets are heating up as interest in digital assets grows. Talks of a Canada Bitcoin reserve...

Advertisement

Binance memecoin platform Four Meme exploited again — this time for $130K

Four Meme suspended its launch function and conducted an emergency probe after suffering an “attack” today. Source link

How to Protect Your Cats (and Backyard Chickens) From Bird Flu

For cats who enjoy spending time outside, Feah says that leashed walks are a good option. She also...

Must read

You might also likeRELATED
Recommended to you