On Attention

Part 1: What is Attention?

Jul 16, 2024

Attention is one of those crucial concepts that is the familiar unknown: it is the family member you always hear but never meet in person. This is probably true for most of the concepts including well-established scientific ones, but attention is in addition opaque because it is largely based on our subjective perception of the environment, so it is a first-person view object. Remembering Searle [7], attention is a subjective phenomenon which can be studied objectively, a hopeless (but hopefully true) belief. It is in this sense familiar to us, and also a black-box at the same time. Below, I lay out some observations regarding attention. Of course, the ‘seriousness’ of this text is not at the level of a scientific article, and hence does not claim neither these properties are universally agreed nor they are exhaustive.

Attention requires complexity.

Attention and complexity are related. I usually find hard to spot a certain object in a very complicated painting of Picasso, but this task can be much easier if you are looking at a medieval painting. Our daily activities involve interaction with the environment including millions of objects: people, buildings, cars, trains, lights, shops, flags, trash bins, sunlight, somebody playing a double bass in a jazz club near your house (which you cannot hear obviously). Many things happen while we mainly pay attention to a very small subset of our environment, which is effective in forming a memory with which we try to model the world, and this changes how attention shifts over time.

What would happen if the environment is too simple compared to our capabilities? I expect that this would be something different from what we call consciousness. Everything in the environment can be processed by paying a sufficiently small cognitive cost, and in this case one does not need to construct a model to reduce the dimension of the information coming over. In the presence of sufficiently high minimum complexity level, the decision-maker (henceforth DM) will engage in pre-attentive filtering: The environment of the individual is searched according to the salient features before attention is applied consciously. This reduces the complexity of the environment by ignoring the areas that are not selected by the filter. Thus, the salient features of the environment induce a probability distribution over features one searches in the environment, and areas that will be searched in more detail by the DM later. Each feature induces a certain level of complexity of the environment, which in turn effects the complexity of the problem the DM tries to solve at the conscious level. On the other hand, the goals related to the specific decision problem affects the pre-attentive filtering. In the literature, the former is usually associated with bottom-up attention mainly affected by the features of the environment, while the latter is called top-down attention determined by the goals of the DM in the corresponding decision problem. This property reflects the observation that individuals pay more attention to the products that are more valuable to them. Although how these values are established in the first case is a problematic issue, this can be theoretically possible: for instance, an unconscious attention process can detect the objects in the environment that are more valuable by pre-attentive filtering. These show that:

Attention is context-dependent.

By reducing the dimension of the information upload from the environment, attention mitigates complexity. However, this is not the only way it does so. Attention also helps the formation of memories (I would say it is furthermore necessary). For something to be stored somewhere, it needs to be picked in the first place. Attention is the process which picks up things from the totality of the world, and some of these things that are picked up are coded in the memory for future use. Hence:

Attention mitigates complexity both by itself and also by forming memory.

Furthermore:

Attention is a selection.

In its essence, attention is a selection. This is what inspired one of the prominent modeling approaches of attention: consideration sets. Long ago, Simon [8] envisioned the human as a bounded rational agent whose boundedness is the cognitive (internal) constraints the agent has. Taking its roots from marketing and psychology literatures, consideration sets formalize the idea that a DM only considers a subset of the available alternatives. Note that a consideration set can be used in any setting where a choice is made: this can be a strategic game and consideration set can be defined on the set of strategies, or the situation can be a binary choice problem where you are told to hit a button when you see something particular (for instance: hit the button when you see a red object in the screen). The first definition provides what I mean by a decision problem in the traditional choice-theory.

Definition 1 (Standard Choice Data): X denotes the finite set of choice alternatives with cardinality |X| = n. The set of all subsets of X including the empty set is denoted as X . A decision problem is a member of X.

Given the definition for a decision problem, a definition for attention mapping (or consideration mapping) can be provided:

Definition 2 (Attention Mapping): An attention mapping is a conditional probability distribution

such that:

The definition provided here does not take account of the variation in M. For instance, this can be determined by the equilibrium play of firms as in Demirkan [5] or it can be intentionally devised by an experimenter to induce variation. Alternatively, one can define another attention mapping measuring the probability that a particular menu attracts her attention. So, attention itself can be a dynamic process in which you decrease the size of the set you consider. As an example, consider a DM who is asked to find the hidden cat in a picture. She will look first at the ground and not at the sky, directly eliminating something that seems impossible for cat to be there (which contradicts the fact that for a cat everything is possible). Then, she can eliminate any sea near the ground, maybe looking for the top of a tree or places where birds are flying. The “real decision-making effort" would be spent after one stops narrowing down. Still, this is possible to capture in the language of conditional probabilities. When the mapping is deterministic, the consumer is endowed with a deterministic attention mapping. So, θ is a deterministic attention mapping where θ(M) ⊆ M for all M.

Note that this definition ignores out the time dimension in attention: narrowing down might happen in steps. An immediate remedy is to view the attention mapping θ as a function of time t, using the notation θ_{t} . This information over time can be aggregated in a simple attention mapping θ not without a cost: the patterns in choice due to changes in attention over time cannot be captured. Using time dependent attention, one can look for transformation rules. For example, a Markov attention would be an attention distribution whose distribution only depends on the previous period’s realization of attention.

Attention happens in levels.

What are we paying attention to? Do we pay attention to the features of objects, or objects directly without using a pre-attentive selection of features as in Itti and Koch [6]? How come I realize the beautiful woman sitting on the bar, but not the Nobel-prize winner sitting next to me? I think the easiest answer to this is: attention is multidimensional. Various features in the environment can impact where we look at and the patterns of attention in the specific location. As pointed out in Itti and Koch, attention is composed mainly of two events:

• Directing the attention towards a certain location (directed attention)

• Movements/patterns inside the attended location (attention shift)

Let us go together to one of the most frequent places you can see in Istanbul: a shopping mall. Imagine that your goal is buying a pair of sports shoes. You know that the mall has a certain floor dedicated to sports shoes, and you directly go to that floor. When you are at that floor, the first level of attention is applied to the whole of the scene. Then, you can zoom in more to the various shops selling sports shoes. For instance, if the floor seems not to end, and you guess there is at least hundred stores, then you can simply choose to go to the store you previously purchased your shoes. On the other hand, if there are only two stores, then it is not very costly to visit both stores and check some of its salient products. So, one of the reasons why stores care about their showcases is to attract the attention of the consumer who decided to try out several stores, but do not know from which one to make the final purchase. To sum up, there are different levels of attention taking place, which can be classified as follows for this example:

1. Zeroth category: The whole.

2. First category: Shops in the floor.

3. Second category: Products in the shops.

4. Third level: Attributes of the products in the shops.

The zeroth category is the same across different contexts: it is the primary attentive process that is responsible for our conscious perception of the environment (which means perception and active self-reflection). The following categories depend on the specific context. There can be also further categories, since attention is also an operator that can be applied on any object no matter it is concrete or abstract. One can pay attention to the train arriving to the station, and also to the statement that ‘one can pay attention to the train arriving to the station’, and so on. Although this can be applied in such a recursive way, one cannot pay attention (at a conscious level) to multiple objects at the same time. This is related to the observation that attention is a selection, but strengthened to the following for the case of conscious attention:

Conscious attention is a selection which is unique at a fixed point of the world (in our world: at a single point of time).

Let me define what a stochastic choice function is:

Definition 3: A stochastic choice function is a mapping

such that:

When p is generic so that it is deterministic, it becomes a choice correpondence which I denote as c[p] such that c[p](M) ⊆ M for all M

. Another approach to construct such a deterministic choice correspondence is to construct a choice correspondence which consist of the set of "rationalized" alternatives according to the stochastic choice data. For such an approach, I refer the reader to Balakrishnan et al. [2] and Ok and Tserenjigmid [9].

Imagine that you have such a correspondence which summarizes your choices from the distribution and denoted by c[p]. Given c[p](M) for any decision problem M, the consideration set approach would claim that this is chosen from the set of alternatives considered in M. Interpreting the attention mapping also in deterministic manner, this says the following:

c[p](M) ⊆ θ(M)

for any M. This in turn leads to many problems in applying the revealed preference methods, since one needs to have further information about what the DM pays attention to. This expression is a subexpression of the following:

c[p](M) ⊆ θ(M) ⊆ M

where one can view the first narrowing-down (from M to θ(M)) as an unconscious act resulted from covert attention, while the second narrowing-down is the deliberative selection represented by the choice correspondence. This view lays out the foundation of my approach: two selection operators are applied sequentially on M, θ models the result of a covert or unconscious attention process, while c models the selection that happens overtly or consciously. Note that this is also what bounded rational choice theory does, but they are usually silent about how θ is formed, although they lay out the procedures of how these are formed. Observe that this separation implicitly assumes that laying out a definition of something is not equal to pointing out how it is formed. Indeed, a better theory of attention should put much more emphasis on the processes as emphasized by Arthur [1] in the context of economic theory. These points can be summarized as:

An unconscious selection function is an attention mapping, and the conscious selection function applied on this is a choice mapping.

This claim does not separate my view from the literature on rational inattention (yet). The theory of rational inattention has the following core idea: a DM chooses how to perceive her environment optimally by balancing out the benefits of what they are paying attention to and costs associated with it (such as cognitive costs). My claim includes rational inattention as a special case: if the first-stage of attention formation happens on rational grounds, then we are good (for a review, please see Caplin [3]).

This claim connects attention to the concept of intention: A DM who reports the intention of choosing the best alternative with respect to some criteria can be unconsciously and systematically affected by covert or unconscious attention. Again, the crucial word here is systematic: if there is no systematic impact of the underlying covert attention, then it can be impossible to detect it at least empirically. I believe this is not the case, and even if it is, our conscious perception will systematize it any way. I conclude with an example.

Example 1 (Attention in Markets). Consumers are typically active in many markets: they go and purchase their daily needs from supermarkets, decide what to drink in the lunch break and where to get it, buy tickets for the weekend activity they are waiting for, invest in stocks or other financial instruments to utilize their savings etc. The point that these are different markets are important. Each market has a different level of importance for the consumer, and also different level of complexity. Furthermore, the knowledge of the consumer is also market-specific: most consumers (except medical professionals for sure) do not know much about medicines and pharmaceutical industry, so the knowledge level is pretty low, but this market is of crucial importance and has a certain level of complexity due to the presence of many brands offering the same formula under a different label.

Using the attention mapping θ, one can model the consumer’s attention to these different markets. For simplicity, consider only two markets: the market for groceries and the pharmaceutical market. There are three important levels attention in these markets should be analyzed: the first level consists of the places which sell the products in the corresponding market, the second level is the menu of products offered in the places visited by the consumer, and finally the level of observable properties of the products that are considered from the menu. Of course, there are other levels, like paying attention to the environment when you visit a shopping mall (consisting of different places that sell groceries and medical supplies) or the level of attention which is necessary for you to be conscious (ignoring the discussion about the necessity of attention for consciousness, for a discussion, see, De Brigard and Prinz [4]).

Let θ(S[k]|M[1],M[2]) denote the attention paid to the products in S[k] (which is a subset of the products offered in market k, no matter which firm offers it) given the market consists of products in M[k] for k ∈ {1,2}. Note that M[k] consists of all products offered in any store or place relevant to market k. Consider the first market, which is the market for groceries. Then, M[1] is the set of all products offered in any groceries, while S[1] can be the set of products offered in the groceries in your local neighborhood. Why is M[2] important in determining the attention to S[k]? Imagine that you got Covid and want to buy some vitamin supplements, which is sold in a pharmacy. The closest pharmacy to you is not in your local neighborhood, so you have to go somewhere else. This can increase the attention paid to groceries close to the pharmacy, which is an example of attention complementarity. An easier to digest example is firms using this as a strategy: if a firm wants to increase its brand awareness in music streaming industry, then it can boost its ``brand awareness” by offering a high quality product in bluetooth earbuds (or vice versa).

The next chapter of this “thread” will discuss the types of data that can be collected to measure attention.

References

[1] Arthur, W. B. (2022). Mathematical language shapes how we understand the economy.

[2] Balakrishnan, N., Ok, E. A., and Ortoleva, P. (2022). Inference of Choice Correspondences.

[3] Caplin, A. (2016). Measuring and modeling attention. Annual Review of Economics, 8, 379-403.

[4] De Brigard, F., and Prinz, J. (2010). Attention and consciousness.Wiley Interdisciplinary Reviews: Cognitive Science, 1(1), 51-59.

[5] Demirkan, Yusufcan (2022). Market Power and Limited Attentive Consumers. mimeo

[6] Itti L, Koch C. Computational Modelling of Visual Attention. Nat Rev Neurosci. 2001 March ;2(3):194-203. doi: 10.1038/35058500. PMID: 11256080.

[7] Searle, J. R. (1989). Consciousness, unconsciousness, and intentionality. Philosophical topics, 17(1), 193-209.

[8] Simon H. A. (1997). Administrative behavior : a study of decision-making processes in administrative organizations (Fourth). Free Press.

[9] Ok, E. A., and G. Tserenjigmid, Comparative rationality of stochastic choice rules, mimeo, New York Univ., 2020.

This X is meant to be:

|apeiron|

Discussion about this post