Data and Intuition

January 22, 2023

Memory is foundational to computation. Without memory, can you describe to a computer the computation you want it to perform?

Each of us could organize memory in a different way—for example, I could represent a number using five physical memory units, and you could decide to do so using four units. If we all organized memory differently, it would be difficult for us to share programs. Therefore it is best we agree upon a standard on how we structure memory. This is one of the reasons why the field of data structures exists.

In our shared goal of standardization, we could begin by agreeing on how to represent numbers in memory. That is, how do we structure numbers in memory? The way we represent numbers in memory is one of the ground-zero data structures. We could also define other "primitive" data structures such as a character and a string. (While a string is far from being a "primitive" data structure, let us abstract away the details for now.)

Now, we have structure to memory. I could tell the computer that, at memory location 0x02, I have a number, and it will know how exactly this number is structured. But is manually using memory addresses to keep track of numbers, and values in general, the best way to write programs? I could have a number at 0x04 that holds my weight, and a number at 0xFF that holds my pet's weight. In such a program, 0x04 and 0xFF do not lend themselves, at all, to any level of readability. Other programmers might ask me questions such as "What is the number you have stored at 0xFF all about? What does it represent?" In addition, manually dealing with memory locations, is, self-evidently, not suitable for higher-level programs. This is why we created "variables", or descriptive names that reference an address in memory, but that conceal all the low-level memory dealings under a layer of abstraction. I can now say myWeight = 70 and petWeight = 3, and this makes my program more readable and less prone to errors. Essentially, we created variables to provide us with a layer of abstraction over memory.

Going back to numbers, it is important to note that a "number" in itself is not a data structure--rather, it is an abstract data type, whose operations include storing and accessing numbers. In any case, we build on these primitive abstract data types and variables to create ever more abstract types that unlock higher-order thinking. To illustrate this point, let us examine how we think about, say, an "article". When we think about an article, we do not just think about it in isolation. Rather, its attributes, such as its title, the date on which it was published, and its author are pulled in into our minds. That is, an article does not exist in isolation—rather, it is a "collection of attributes", and very often, these attributes are primitive abstract data types. This is why we created "struct"s, which are abstract data types that are really a collection of several primitive types, all of which serve to describe a higher-order entity. Here's an example:

struct article {
	title
	author
	nWords
	. . .
}

myArticle = article{title: "Data Structures", author: "Ajay", nWords: "1000"}

Now, when I write a subroutine that computes an article's rating, I could pass it an "instance" of "article", which in this case, is myArticle: computeRating(myArticle). If I passed to this subroutine the attributes of an article separately, it wouldn't appeal to our higher-order intuition as much.

Now consider our having to process a hundred articles in our program. We can create a hundred instances of article, but how do we address each of these one hundred articles? Only imagine the tedium of creating those many variables! article1, article2, . . ., article100. Wouldn't this hurt readability of the program, in addition to the patience of the reader? Wouldn't it be splendid to store all these articles in a single variable? This is where "arrays" come in: they allow us to store a collection of values, without our having to name each one of them: articles = [100]article. articles allows us to reference each of its articles by using an index. I could refer to the first article by using articles[0], in a zero-indexed data structure, or articles[1] in a one-indexed data structure. By allowing us to access values by their index, arrays display their ordered nature. Arrays' inherent order unlocks operations like sorting values within an array. In our case, we could sort articles by their ratings, for example, that we obtained from the computeRating subroutine. We could then publish this sorted array, which now has the best articles on the top (or the bottom, depending on how it is implemented), on a website, and people can choose to read the best articles first!

See? As we move to higher and higher levels of thinking, we allow newer intuitions to form and unlock more possibilities of what computation can do. In other words, the way we think about data informs the algorithms that operate on them. Isn't that a thought-provoking idea? It means that if we want to devise a new way of solving a problem, we should perhaps think of the problem's substance differently, and move up and down on the scale of abstraction, unlocking new intuitions along the journey!