Tag: sorting algorithm

Idea #4 – Information Technology – Binary Sort

I happen to have two (slightly functionally different) descriptions of this idea stored away, so I’ll just copy both.


Description #1:

My (novel?) idea for a sorting algorithm. It doesn’t do any comparisons, and it works in O(n) time.

We do not compare any elements. We create a binary tree, and when navigating the tree we use the binary pattern of the element we’re currently sorting as the path. When we come to a 1 in the path and there’s no 1 branch, or a 0 in the path and there’s no 0 branch, we create branches until we get to our number. A node which contains 1 or 2 branches might also represent the endpoint of an element, as not all elements will necessarily have the same number of bits (for example, if we’re sorting strings), so some element’s bits may have ended at that point. Each node would store a count of how many times its value (i.e. what sorted element it encodes when taking with all its above nodes, if it’s taken as an end node) occurs in the list. That’s how we could have duplicate values and how a node with 1 or to branches can also be an element.

This has a worst-case scenario of O(n*l) where n is the length of the list and l is the average length of a member in bits, or O(t) where n is the total length of all the members in bits. An O(n log n) sort algorithm actually has to compare member values byte by byte, so for comparison it’s really O(n l/8 * log n), or O(n * l/16 * log n) if they’re comparing words, or O(n * l/32 * log n) if they’re comparing double-words. (Constants don’t matter in big-O notation, of course, but I want to illustrate a sense in which normal sorting algorithms are at a disadvantage relative to this algorithm.)

It might be more advantageous than other sorting methods to implement this one in assembly.

The algorithm could alternatively use separate binary trees for each number of bits in the given elements, but I don’t know how that would have any benefits.


Description #2:

A sorting algorithm that sorts in constant amortized time, but which is proportional to the average length, in binary, of the elements being sorted

Let’s say we have a function that returns a list of {0, 1} values which is a given element’s binary representation. Now we make an array of the size of our input list, which we will use as a binary tree. For each element in our list to be sorted, we navigate this tree using the element’s binary sequence as the path, creating new branches as necessary. When we get to the end, we insert a reference to the original object into the beginning of a linked list of references to objects belonging at that node (i.e., equivalent elements). (We put it at the beginning because that way we don’t have to traverse the whole linked list to find the endpoint.) Then, at the end of our sorting, we simply walk the tree and traverse its linked lists to return our sorted list. If we want equivalent objects to appear in the same order in which they appeared in the original list, we can append them to the end of the linked lists instead and always have the node store a pointer to the current last linked list item (so that we don’t have to calculate its location by traversing the linked list each time). Or, of course, we could just reverse the order of the linked list for each endpoint, because if we’re always inserting equivalent elements to the beginnings of the lists then they’ll be stored in reverse order.

This is only the naive version, though… we don’t really have to make the full path for every element. Instead of having a function that returns an object’s binary representation, we’ll have a function that returns an iterator for it. Then, instead of navigating/creating nodes to the end of the path, we navigate until we find a node that simply stores a pointer to an iterator that gets from there to an element that’s somewhere underneath it. Now, since a node can only contain one such iterator pointer, when you come to such a node you must poll values from both iterators (your current element’s and the one pointed to by that node) and create a path until the values diverge, at which point you create two children and put the pointer to the iterator that gets to your element and the pointer to your element in one child, and put the pointer to the iterator that gets to the element you found above and a pointer to its element in the other child. Then you’d nullify the element and iterator pointers in the node you originally found. The point of doing this is that we stop making the path early. If we don’t come across any element later that forces us to further bifurcate the path, we still know exactly where the elements pointed to belong in the sorted list. When you traverse the binary tree to actually sort the list, just insert elements wherever you find them pointed to by nodes.

If your iterator exhausts itself before you get to a node that points to another iterator and element, you delete the iterator object, nullify the iterator pointer and start the linked list of equivalent elements for that node.

Alternatively to the linked lists we could simply have a pointer to the first instance of an equivalent member and a count value that specifies how many times it repeats, but that only works with members that don’t have any data that’s not part of the binary representation we’re using to navigate the tree. Plain old strings or numbers would work just fine that way. Sorting person objects by their last names, for example, wouldn’t.

This algorithm isn’t generally suitable for mixed-type collections or any type that uses a special comparison function. Strings, ints, floats and fixed decimals should work fine. The above is optimized for strings. For a numerical type, all iterators that don’t stop short are exhausted at the same branching level (8, 16, 32, 64, etc.—whichever bit depth your numbers are), although that doesn’t change the algorithm a whole lot—maybe it provides for some minor optimizations. But because comparing two numbers might be less expensive than grabbing a value from a binary iterator, we could forgo the binary representations altogether for numbers. Instead of a node storing a pointer to an iterator and an element, it would simply store a number. That could be a huge improvement. And even if we still used binary representations, most numeric values probably start with lots of 0’s, so we could speed up the process by performing a BSF assembly command on it to determine the number of leading zeros, then jump directly to a node, skipping the first portion of the path constituting zeros, based on a table of pointers for numbers of leading zeros up to 8, 16, 32, 64, or whatever.


Idea #4B – Linked List Sort

Make a linked list of elements. For each element in your list to be sorted, traverse the linked list until you find the first element that’s bigger than it. Then insert it into that spot in the linked list. Of course, if you’ve gotten to the end of the linked list and no item was bigger than it, then just add it at the end. Then to gather the sorted results, just traverse the linked list.  This is basically an insertion sort but much more efficient because you don’t have to shift half the array for each insertion.