Human Cognitive Architecture and Cognitive Load Theory

Discuss the Cognitive Structures and Processes that shape our Human Cognitive Architecture. How can our Knowledge assist people to Teach and Communicate?

Note. This is an advanced paper that we recommend to people who have had extensive training in NLP or the cognitive sciences. In this essay, human learning capacity will be considered with reference to the Modal Model of human cognitive architecture, which draws on empirical research for support and exists within an evolutionary framework. This description of human cognitive architecture Baddeley (1968), (1992), (2001), Ericsson & Kintsch (1995), provides the framework used by Sweller, Van Merrienboer and Pass (19880, to develop and research the cognitive load theory of learning and its applications to the design of effective learning programs.

Human Cognitive Architecture

Architecture is the description of structures or processes to produce reliable results that meet the specification. Cognitive architecture applies to a thinking or computing device that accepts data input, applies processes to the data, stores the data and produces output, all within a problem solving frame. Human cognitive architecture describes the necessary and sufficient conditions for a human to input data, process the data, store the processed data which now becomes information and output the result. Thus, it is a system for processing information. In humans, this is achieved via the brain and the rest of the body using sensory input, mental process and memory to generate demonstrable changes in behaviour.

There are two major directions for models of human cognitive architecture, either of which would lend itself to formulating a theory of effective learning. One direction, which first arose with Maudsley (1876) and James (1890), proposed that memory had two elements, in which a short term memory enabled part of its contents to be stored permanently in a long term memory. Little change occurred during the next 50 years, as Behaviourism became the dominant psychological theory. Whatever took place between input and output could not be studied, as it resided in a ‘black box’, hence only observable behaviour was available for observation.

The Magic Number Seven, plus or minus Two, Miller (1956), along with the Macey Conferences on cybernetics and cognition, started the Cognitive Revolution and brought the attention of researchers back to the investigation of memory structures and learning. James’ (1890) original two part model was revived and subjected to further work, in due course becoming the Modal Model of memory in three parts. This will be discussed below, as it provides the intellectual underpinning for Cognitive Load Theory.

The other direction of memory structure and learning research leans towards a recent connectionist model involving neural nets based on brain function, McClelland and Rogers (2003). In this model, parallel distributed processors act on information, which resides in the relationships between different parts of the system.

A related network model is the Adaptive Character of Thought-Rational, or ACT-R model, Anderson and Matessa (1997), which includes elements of connectionism. This model also concentrates on how information processing takes place, with reference to stored declarative and procedural knowledge. In this model, declarative knowledge is equivalent to fact or content and procedural knowledge is equivalent to action taken, or process. ACT-R also has elements in common with the Modal Model of memory, in that it seeks to provide a unifying theory of human cognitive architecture and proposes a serial memory process with static storage elements for declarative knowledge, Anderson and Matessa (1997).

The Modal Model of Information Processing

The Modal Model of Information Processing has influenced memory structure theory through the second half of the 20th century. It proposes a serial processing, static storage system of memory based on three stages of memory structures: Input, or acquisition, storage and subsequent retrieval of information. The Modal Model in its three stage form was articulated by several researchers, including Newell and Simon (1972), who approached memory and information processing from a computing stance via human processing models. Atkinson and Shiffrin (1968), provide a representative description of the Modal Model in human processing. They proposed that input entered via sensory memory, subject to processing in short term memory and proceeded to storage in long term memory . Output then returned from long term memory, via short term memory.

In Atkinson and Shiffrin’s (1968) model of memory structures, short term memory is subdivided into a temporary store and a control processing function. This was changed later by Baddeley and Hitch (1974) and developed further by Baddeley (1992), to become known as Working Memory and to have a processing and retrieval function in relationship to long term memory.

Sensory Memory

Sensory memory describes perception of incoming data via sensory registers, Bruning et al. (2004). These are specific to each sense and hold data for a very short time. Sperling (1960), established that the maximum useful life of a four letter datum in a visual register is 500 milliseconds and Darwin, Turvey and Crowder, (1972) established that auditory input decays after three to four seconds. The function of sensory memory is to perceive data input and make it available to short term memory.

Short Term or Working Memory

Short term memory, as described by Miller (1956), has the capacity to hold seven plus or minus two chunks of information at any given time. Miller did not specify whether the chunks of information were novel or familiar, interrelated or discrete; simply that a chunk is a unit of knowledge. Although Miller excluded relationships from his count of units of knowledge, for the purposes of this essay, references to seven plus or minus two chunks will be used to include relata as chunks. Miller (1956) also coined the term ‘Working Memory’ in the same paper, though Baddeley (2001) was the first to adopt it in place of ‘Short Term Memory’ when he proposed a more comprehensive model than Atkinson and Shiffrin’s (1968) two part temporary store and control process centre.

Baddeley and Hitch (1974) proposed that working memory, as it will be known hereafter, comprised a visuospatial sketch pad for recording images, a phonological loop like a tape loop for recording sounds and a central executive to direct attention and manage processing in the two slave systems. The life of new information in working memory is three to four seconds unless the information is repeated.

Baddeley and Hitch (1974), observed that attention is usually limited to a maximum of four chunks of information when working memory is engaged in a task. This applies where there are relationships between units of information, which have to be considered simultaneously with that information. However Ericsson and Kintsch (1995) established a link between working memory and long term memory to account for those occasions where working memory appears to hold many chunks of information simultaneously.

Ericsson and Kintsch (1995), and Baddeley (2001), distinguish between the capacity of working memory when it is processing new information and new relationships compared with processing prior knowledge drawn from long term memory. In this model, the idea of a central executive is rejected in favour of using relevant information recalled from long term memory to guide the flow of process in working memory. Ericsson and Kintsch (1995), describe this function as ‘Long Term Working Memory’.

Long Term Memory

Long term memory is a permanent store of experience, knowledge and process, all of which is held outside conscious awareness until specific knowledge structures are recalled into conscious awareness in working memory. In contrast with working memory, which Baddeley (1986 p. 34), describes as “The temporary storage of information that is being processed in any of a range of cognitive tasks”, long term memory is not only permanent, but also highly structured. It functions in a flexible manner that enables continuing refinements to accommodate cross-referencing and increasing levels of domain specific expertise.

Long term memory, according to Baddeley (2001), does not have an executive function. While increasing knowledge is reclassified in larger chunks or more refined and automated schemata in long term memory, information retrieval is initiated via working memory. Thus, conscious attention is brought to bear on a topic using working memory, and associated information is recalled from long term memory. It is as if working memory provides long term memory with an initial set of criteria from which to search and the results of the search modify the search criteria.

Components of Long Term Memory

Bruning et al. (2004), propose a model of long term memory that draws on research from Anderson (1993), Tulving (2002) among others and is represented graphically below:

Long Term Memory (implicit and explicit)

Declarative Knowledge

Procedural Knowledge

Conditional Knowledge

Semantic Memory Episodic Memory

Declarative knowledge denotes factual information, such as Pythagoras’ Theorem: “For any right angled triangle, the square of the length of the hypotenuse is equal to the sum of the squares of the lengths of the other two sides”.

Procedural knowledge denotes process information, for example, to demonstrate the validity of Pythagoras’ Theorem:

Draw a right angled triangle large enough to measure and small enough to fit on your substrate (paper, whiteboard, sandy beach etc.)

Nominate the hypotenuse as A and the other two sides as B and C respectively.

Using a suitable scale, measure the length of each side (A, B, C) of the triangle.

Find the square of length A, then B, then C. (To do this, multiply the value of A by A, the value of B by B and the value of C by C).

Add B squared to C squared and compare with A squared. (Declarative statement: They should be equal).

Conditional knowledge is knowledge about the content of procedural knowledge; that is, when and how to use it. For example; the appropriate times and places to apply Pythagoras’ Theorem are when teaching maths or ancient history to 8-10 year old children or ancient Greek to 13 year old children. Other contexts include using it as an example in the context of this essay or using a physical right angled triangle as an example of an analogue computer to compute Pythagoras’ Theorem.

Semantic and episodic memory functions are sub-classes of declarative knowledge, Tulving (2002). Semantic memory contains knowledge of the world, such as the fact that cats are furry, except Cornish Rex, which is bald. Meaning is formalised, culturally supported and linguistically common to native speakers of that language.

Episodic memory contains personal experience, childhood memory and conversations with people, Bruning et al. (2004). It is accessed with personal reference to times, places and events. Episodic memory contains knowledge learned informally by association.

Here is an example of an English ex-governess using episodic memory to great effect in her teaching. Miss Sherwin’s school had 24 pupils aged between five and ten years old. When a group of children was reciting multiplication tables, other children were present, including younger ones. When Miss Sherwin observed a younger child singing along with the table reciters, she invited the child to join them. Every child left her school effortlessly word perfect in their multiplication tables.

There are two other categories of information held in long term memory; implicit and explicit information, Bruning et al. (2004). These simply refer to information to which a person has conscious access and information which is acted upon or information which is inferred without conscious access. Scholl (2005) identifies a normally implicit presupposition in humans that light comes from above, with reference to an optical illusion that uses six convex or concave discs to demonstrate the principle, Scholl (2005 p 46). In this experiment, a photograph of a set of six craters or hilltops is identified as convex if seen from one direction and concave if viewed from 180 degrees to the first direction. This demonstrates a possible response to the direction from which the scene is lit. Both responses place the light above, not below the scene.

Schemata

Information is stored in long term memory in knowledge structures of varying complexity called ‘Schemata’. These are maps of knowledge and are first incorporated in small chunks and simple concepts. As a learner becomes more familiar with a domain, their schemata integrate, increase and include relationships. A single schema in a novice’s mind could be equivalent to a single option in a developed expert schema for a whole process or domain, Bruning et al. (2004), Sweller (2003). For example, double declutching to change gear on a crash gear box was once a four step process while learning to drive. Now, for experienced drivers, it is a smooth, unconscious element of driving, usually confined to steep hills and sharp bends.

Schema theory also postulates that schemata are developed for retrieving information from long term memory, Ericsson and Kintsch (1995), using reading comprehension as a test. As knowledge structures, schemata can contain any information, be it fact, process or unsubstantiated rumour.

Features of schemata include the capacity to store information in long term memory, to represent the information in different levels of complexity, to relate to other schemata containing information, to form hierarchies of classification of that information and to represent relationships between members of classes and classes of information. Thus, a single schema for a new topic may have one element of information in it and a complex schema may include multiple relationships to other schemata to represent a complex concept. Information can be held conceptually, in language and, or in images, Bruning et al. (2004). Schemata are also subject to revision in light of subsequent learning or internal review and understanding. For the purposes of this essay, the most important feature of schemata is the ability to create more of them and to relate relevant concepts as expertise develops.

Schema Construction and Automation and Learning

Schema construction and automation is the goal of learning. The evidence that learning has taken place is the quality and relevance of the topic specific output retrieved from long term memory via working memory after the learning activity has ceased. Schema automation is a feature of schemata if and only if sufficient information has been stored in long term memory in schemata as a result of learning and there has been sufficient purposeful practice to automate the task. The more advanced a learner becomes, the more schemata they have for that topic. Therefore they can hold more knowledge in working memory while learning additional units of content and elements of relationship, Sweller (2003).

When someone learns to drive, initially there is too much information to hold in working memory at once. Steering, changing gear, reading the road conditions, judging distance, judging speed, using the brakes and accelerator and using the rear view mirror are all separate units of knowledge. These do not exist in isolation so the elements of relationship between them also take up working memory. There could be more than 64 elements and units in this list alone, derived from eight units and their relationships. Some driving instructors use private space for the first lesson, to reduce the load so the learner can concentrate on controlling the vehicle before going on the road.

In contrast, an experienced driver can do all of the above while conversing with a passenger, changing the radio station, opening a window, tracking for police cars and navigating to their destination. They have had practice in using the skills they learned so the schemata for driving become complex, related and automated. This process is built into skill acquisition in Australia with provisional licensing of newly qualified drivers for two years. In flying, the different grades of license are obtained by a combination of demonstrated skill and hours of flying time. Ericsson (2005) identifies 10,000 hours or ten years of directed practice to reach expert status in a domain.

Schema construction has to occur before schemata can automate, Sweller (2003). For example, a child has to learn to co-ordinate their hand, arm and body for writing before they can make fine writing moves. They also have to learn to recognise and construct letters in their alphabet before they can write them knowingly. To avoid the requirement for non-existent prior knowledge, children are given exercises called ‘Writing Patterns’, which they copy between double lines. First, they use widely spaced lines and as their co-ordination improves, the lines are placed closer together. The patterns are not related to specific letters, simply copied with as much accuracy as possible. One system which has been commonly used in England is the set of writing patterns and related material developed by Marion Richardson, now out of print.

During the same time frame that a child uses writing patterns in one lesson, they learn to read and print letters in other lessons. Then, when the child is fluent in using writing patterns and can print in upper and lower case, they can apply the patterns to the written word and learn to use joined writing.

Learning in the early stages of any skill or topic is necessarily slower than more advanced work due to the need to construct schemata to supplement the initial seven plus or minus two chunks of combined information and relata. The act of constructing schemata enables a student to solve similar problems to those used in the learning context, but not to transfer that ability to new or unrecognised problems. This only occurs when schemata have automated sufficiently to enable a student to bring sufficient attention to new situations without overloading working memory, Baddeley (1992).

Evolution of Human Cognitive Architecture

In the context of the Theory of Evolution, at first glance, this pattern of learning might seem counter intuitive. A human starting out with minimal schemata appears to be at risk of making fatal errors before they can learn enough to survive, let alone reproduce. However, as they are cared for during their formative years until they reach physical maturity, there is time to experience their environment for long enough to acquire a basic set of schemata and automate them. The humans who survive to adulthood know friend from stranger and kin from other, prey from predator and other relevant matters, Tooby, Cosmides and Barrett (2005).

When a human encounters an unknown situation, they use means to an end analysis to solve problems. This uses all their seven plus or minus two chunk capacity in working memory on two or three options and their relationships. If the human picks an option that works, they survive and learn more. If the human had the capacity to consider five or six options and their relationships with no prior knowledge, they could take too long to decide anything and be eaten.

An initial working memory capacity greater than seven plus or minus two, including relationships, would allow for untested learning on a scale that militates against survival, Sweller (2003). Not only would the human consider for too long, but also they could learn untried material that undermined functional schemata they had already. This would also militate against survival.

Evolutionary change is incremental, Sweller (2003). A single genetic mutation can occur in one human. If it is non-viable, that human’s fitness is reduced and the mutation is self limiting. If a mutation is viable, it is passed on to the human’s offspring. If the mutation enhances fitness, it will enter the gene pool more rapidly than one that is merely viable, due to the increased number and fitness of offspring and their descendants. Over a period of many generations in a stable environment, the gene pool becomes fitter for that environment. If the environment changes, the criteria for fitness change and those populations with the capacity to adapt to the new environment survive by learning how to live in it.

Similarity in Principles of Evolutionary Change and Learning

Learning can be compared with evolution, Sweller (2003), Van Merrienboer and Sweller, (2005). Although evolutionary changes are genetic and refer to a population through millions of years and schema construction and automation is conceptual and refers to an individual over a few years, there is a pattern of conservation in the existing state for both. The slow dissemination of fitness enhancing mutations compared with the size of the human genome protects the existing genome from radical change. The small capacity of working memory for new information and relationships protects the existing knowledge in long term memory from radical reorganisation with untried and possibly damaging material, Sweller, (2003).

The organisation of long term memory is protected from loss of fitness from learning non-functional new information and by the time sufficient schemata have been constructed to enable advanced learning to occur, the human has sufficient knowledge to be aware of the possible value of additional knowledge and to use informed judgement to handle it safely. At this stage, problem solving in the domain uses domain specific knowledge and promotes fitness instead of relying on means to an end analysis.

With learning placed in a context of evolution and the Modal Model of memory and research findings on differences in working memory capacity for new and developed learning, Sweller (2003) had the framework in which to develop a theory of learning on empirical grounds. This is the Cognitive Load Theory of learning, which uses the capacity of working memory at different stages to inform the design of learning sessions and materials, Sweller (2003).

Cognitive Load Theory

Cognitive Load Theory states that effective learning can only take place where the cognitive capacity of an individual student in a particular domain is not exceeded. If the cognitive load of a lesson has too many units and elements for that learner at their stage of knowledge, working memory cannot hold the load for long enough to transfer it to long term memory. Then schemata are not constructed and learning does not occur.

Cognitive load includes units of knowledge and elements of relationship. Element interactivity occurs between units and their relationship. Together, these create the cognitive load of a learning task, Sweller (2003). If the element activity of knowledge units and their relata exceeds seven plus or minus two chunks, then new learning does not take place. Sweller and Chandler (1994), actually cite two to three knowledge units as maximum in this context, as the rest of the load comprises elements of interactive relata.

Cognitive Load and Learning

From this information it follows that basic instruction needs to be clear, simple and specific. Rote learning may have a function to facilitate basic schema contruction to enable more relata further into the program. The first lesson in touch typing introduces the ‘Home Keys’ (asdf jkl;) and their function as the place to return to and triangulate from, when using other keys. Many F’s and J’s have a raised dot on their keys to facilitate identification by touch. This approach is framed for students so they can accept being unable to produce text immediately.

Cognitive Load Theory enables course designers to increase cognitive load as students progress through the program. The amount of element interactivity between units of knowledge, combined with the units themselves, equals the cognitive load of a lesson. If the cognitive load is greater than the capacity of a student’s working memory for that lesson content, the lesson will not be learned. As small, specific schemata are constructed from the building blocks, they become available to contribute to handling an increased cognitive load in subsequent lessons, Sweller (2005).

Touch typing progresses by learning two additional keys at a time with reference to the home keys. Only when all keys have been learned and practiced with correct fingering at low speed does typing real text begin with this exercise: The quick, brown fox jumped over the lazy dog.

Worked Examples and Problem Completion, Van Merrienboer and Sweller (2005)

Constructed schemata enable a student to solve relevant problems, but to learn to solve these problems, students need schemata. To develop more comprehensive schemata than single, rote learned items, relata need to be learned and worked examples of problems can assist with this. An effective way to use these, in keeping with Cognitive Load Theory, is to guide students through up to 10 different examples worked all the way through. Then students can work the last line of another set of worked examples and progress to the last two lines, gradually working backwards until they can work whole problems of the same class of problems by themselves. This applies to algebra, English grammar tasks, foreign language tasks, the investment method of valuation and any other problem solving elements of learning. Many classical music students learn new pieces backwards in phrases from the end in this manner. With each new phrase, they play onwards to the end, thus increasing familiarity with the parts they have learned already. They find it assists them to memorise the whole piece for performance. If they can remember the opening phrase, the rest is familiar.

The Expertise Reversal Effect

For advanced learners, worked examples may be detrimental. When a student has automated schemata, or at least well developed schemata for a particular task, following a set of instructions which is different from their own understanding, can reduce their effectiveness. Automated schemata run so fast and contain such comprehensive relata that sudden exposure to a different method for solving the problem can slow the process and reduce their accuracy. This is the ‘Expertise Reversal Effect’, Kalyuga, Ayres, Chandler and Sweller (2003), which can manifest with advanced students when not enough knowledge is presupposed in a lesson or too much instruction is given.

Three post-graduate classes used different criteria for setting written work. One offered minimal guidance and required students to produce, two research essays of a given length each. No more information was forthcoming. One gave a topic and two papers to get the students started and one gave a title and detailed plan for use as an option. Retrospectively, the majority of students preferred the second approach. Prospectively they had thought the detailed plan would help, but in practice, some found it elicited the expertise reversal effect. None of the students had ever been presented with a detailed plan before, but agreed that it would have been particularly helpful at school or as a first year undergraduate.

Having matched the amount of guidance to the knowledge level of particular students and the amount of information to the capacities of their working memories, course designers can create programs with increasing cognitive load to fit the changing requirements of students. However, the quality of their instruction is as important as the sequence and amount of information given. Poor instruction can increase cognitive load even for individual knowledge units, Sweller, Van Merrienboer, J and F Paas (1998).

Intrinsic Cognitive Load

Every knowledge unit has an intrinsic cognitive load. This is the load imposed by learning a unit and its relata. While it can be reduced to rote components initially, it still has a minimum, irreducible cognitive load, Van Merrienboer and Sweller (2005).

Extraneous Cognitive Load

Instruction imposes additional cognitive load, some of which can be extraneous to the process. Unnecessary detail, insufficient instruction, inappropriate orders of delivery and poor use of audio-visual aids can all contribute to extraneous cognitive load, Van Merrienboer and Sweller (2005). For example, there is a split-attention effect, Chandler and Sweller (1992), which occurs when cross-referencing sources from different places. If a diagram is on one page and its explanatory text is on another, cognitive load is increased by having to remember the content of one page while consulting the other. Reading from slides also splits attention, as does looking up data from tables while reading a text. This is called the Modality Effect, Van Merrienboer and Sweller (2005) and can be excluded from learning materials by providing materials with all required information in one place or split between different sensory systems for simultaneous delivery. When using the Modality Effect it is important to avoid redundant presentation of identical, material in different locations, Van Merrienboer and Sweller (2005).

Germane Cognitive Load

Germane cognitive load is created by activities especially designed to create schema construction, Sweller, Van Merrienboer, J and F Paas (1998). Worked examples at the appropriate stage impose germane cognitive load. Typing practice of real text and business letters produces fluency, schema automation, and speed by imposing germane cognitive load. The key is creating relevant learning activities.

Cognitive Load and Course Design

The possibilities of cognitive load theory for influencing course design and teacher training are far reaching. Learning has become big business and ineffective ideologies with no empirical backing are used routinely as frameworks for course design and delivery. The Discovery Model of Experiential Learning requires extensive background knowledge for problem solving to be effective, yet it is used commonly for students with little or no domain specific knowledge. It has already been established that problem solving by means to an end requires enough extraneous cognitive load to disable effective learning. Instructional guidance is particularly necessary for beginners to reduce cognitive load to an acceptable level, Sweller (1999). Shared projects may require communication between students, but in the absence of sufficient guidance, they may learn mistakes from each other. Also, in any shared project there is room for individual members to avoid learning some or all of the material themselves, as someone else will cover it.

The most effective approach to course design is to ascertain the level of prior knowledge of students, start with minimal new information where the subject is new, give clear, accurate instruction and provide sources in a manner that concentrates visual attention in one place while explaining the visuals verbally, and not asking the students to read at the same time as they see visual aids and hear the explanation.

Where there is prior knowledge, refer to it and include it to aid recalling existing relevant schemata and use worked examples with incremental student participation. Move to less guidance with advancement until advanced learners have minimal guidance but directed practice. At this stage the intention is to elicit schemata automation to enable transfer of the knowledge to other contexts of problem solving, Sweller (1999).

Conclusion

Cognitive Load Theory provides an effective framework for designing and delivering course work to learners of any standard. It is backed by empirical research supporting different amounts and types of instruction according to the level, of learners and it enables teachers to provide well crafted guidance in their subjects. As the only learning theory currently demonstrating academic credibility, it can be offered to teaching organisations in any context. While the credentials of Cognitive Load Theory should make it attractive in the training environment, there is some resistance from adherents to the fashionable ideologies of the time, they are accustomed to defending their causes in the absence of evidence by research.

References

Atkinson, R.C., Shiffrin, R.M. (1968). Human Memory: A proposed system and its control processes. In Spence, K.W. & Spence, J.T. (Eds), The Psychology of Learning and Motivation, New York: Academic Press

Atkinson, R.C., Shiffrin, R.M. (1971). The Control of Short Term Memory, Scientific American, 225(2). 82-90.

Anderson, J.R., Matessa, M. (1997). A Production System Theory of Serial Memory. Psychological Review, 104. 728-748

Baddeley, A.D. (1968). How does acoustic similarity influence short-term memory? Quarterly Journal of Experimental Psychology, 20, 249-264.

Baddeley, A.D., Hitch, G.J. (1974). Working Memory. In Bower, G.H. (ed.), Recent Advances in Learning and Motivation (Volume 8), New York: Academic Press.

Baddeley, A.D. (1992). Working Memory. Science, Vol 255. 556-559.

Baddeley, A.D. (2001). Is Working Memory Still Working? American Psychologist, 56, 849-864.

Bartlett, F. (1932) Remembering, A Study in Experimental and Social Psychology, London. Cambridge University Press.

Bruning, R.H., Shraw, G.J., Norby, M.N. Ronning, R.R. (2004). Cognitive Psychology and Instruction. 8. Ch. 2, Sensory, Short Term and Working Memory pp 14-35 and Ch. 3, Long Term Memory: Structures and Models. 36-64. Upper Saddle River, N.J.: Pearson/Merrill/ Prentice Hall, c2004.

Chandler, P., Sweller, J. (1991). Cognitive Load Theory and the Format of Instruction. Cognition and Instruction. 8. 293-332.

Chandler, P., Sweller, J. (1992). The Split Attention Effect as a Factor in the Design of Instruction. British Journal of Education Psychology, 62. 233-246.

Darwin, C.J., Turvey, M.T., Crowder, R.G. (1972). An Auditory Analogue of the Sperling Partial Report Procedure: Evidence for Brief Auditory Store. Cognitive Psychology, 3:255-267.

Ericsson, K.A., Kintsch, W. (1995). Long Term Working Memory. Psychological Review, 102(2). 211-245.
James, W. (1890). The Principles of Psychology. New York: Holt.

Kalyuga, S., Ayres, P., Chandler, P. Sweller, J. (2003). The Expertise Reversal Effect. Educational Psychologist. 38. 23-31.

Maudsley, H. (1876). The Physiology of Mind. London: MacMillan.

McClelland, J.L. (1995). A Connectionist Perspective on Knowledge and Development. In T.J. Simon & G.S. Halford (Eds.), Developing Cognitive Competence: New Approaches to Process Modeling. 157-204. Hillsdale, NJ: Erlbaum.

McClelland, J.L., McNaughton, B.L., & O’Reilly, R.C. (1995). Why there are Complementary Learning Systms in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory. Psychological Review, 102, 419-457.

McClelland, J.L., Rogers, T.T. (2003). The Parallel Distributed Processing Approach to Semantic Cognition. Nature Reviews Neuroscience, 4, 310-322.

Miller, G.A., (1956). The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information. Psychological Review, 63. 81-97.

Newell, A., Simon, H.A. (1972). Human Problem Solving. Englewood Cliffs, NJ: Prentice Hall.

Scholl, B.J. (2005). Innateness and (Bayesian) Visual Perception. In Carruthers, P., Laurence, S., Stitch, S. (Ed.) The Innate Mind. 305-337. New York, Oxford University Press.

Smith, D.J. (2001), www.smithsrisca.demon.co.uk/PSYatkinsonetal1971.html

Sperling, G. (1960), The Information Available in Brief Visual Presentations. Psychological Monographs: General and Applied, 74, (11, Whole No. 498).

Sweller, J., Van Merrienboer, J., & Paas, F. (1998). Cognitive Architecture and Instructional Design, Educational Psychology Review. 10 (3), 257-287.

Sweller, J. (1999). Instructional Design in Technical Areas. Camberwell, Victoria. Australian Council for Educational Research.

Sweller, J. (2003). Evolution of Human Cognitive Architecture. In Ross, B. (Ed.) The Psychology of Learning and Motivation, Vol 43, Academic Press, San Diego, pp 215-266.

Tooby, J., Cosmides, L., Barrett, H.C. (2005). Resolving the Debate on Innate Ideas: Learnability Constraints and the Evolved Interpretation of Motivational and Conceptual Functions. In Carruthers, P., Laurence, S., Stitch, S. (Ed.) The Innate Mind, pp 305-337. New York, Oxford University Press.

Van Merrienboer, J., Sweller, J. (2005). Cognitive Load Theory and Complex Learning: Recent Developments and Future Directions. Educational Psychology Review. 17 (2) 147-177.

Learn more about NLP by reading our Ultimate Compendium of NLP

If you found this article useful please share.