– all angles 60°
Before they can solve problems, however, students must first know what type of visual representation to create and use for a given mathematics problem. Some students—specifically, high-achieving students, gifted students—do this automatically, whereas others need to be explicitly taught how. This is especially the case for students who struggle with mathematics and those with mathematics learning disabilities. Without explicit, systematic instruction on how to create and use visual representations, these students often create visual representations that are disorganized or contain incorrect or partial information. Consider the examples below.
Mrs. Aldridge ask her first-grade students to add 2 + 4 by drawing dots.
Notice that Talia gets the correct answer. However, because Colby draws his dots in haphazard fashion, he fails to count all of them and consequently arrives at the wrong solution.
Mr. Huang asks his students to solve the following word problem:
The flagpole needs to be replaced. The school would like to replace it with the same size pole. When Juan stands 11 feet from the base of the pole, the angle of elevation from Juan’s feet to the top of the pole is 70 degrees. How tall is the pole?
Compare the drawings below created by Brody and Zoe to represent this problem. Notice that Brody drew an accurate representation and applied the correct strategy. In contrast, Zoe drew a picture with partially correct information. The 11 is in the correct place, but the 70° is not. As a result of her inaccurate representation, Zoe is unable to move forward and solve the problem. However, given an accurate representation developed by someone else, Zoe is more likely to solve the problem correctly.
Some students will not be able to grasp mathematics skills and concepts using only the types of visual representations noted in the table above. Very young children and students who struggle with mathematics often require different types of visual representations known as manipulatives. These concrete, hands-on materials and objects—for example, an abacus or coins—help students to represent the mathematical idea they are trying to learn or the problem they are attempting to solve. Manipulatives can help students develop a conceptual understanding of mathematical topics. (For the purpose of this module, the term concrete objects refers to manipulatives and the term visual representations refers to schematic diagrams.)
It is important that the teacher make explicit the connection between the concrete object and the abstract concept being taught. The goal is for the student to eventually understand the concepts and procedures without the use of manipulatives. For secondary students who struggle with mathematics, teachers should show the abstract along with the concrete or visual representation and explicitly make the connection between them.
A move from concrete objects or visual representations to using abstract equations can be difficult for some students. One strategy teachers can use to help students systematically transition among concrete objects, visual representations, and abstract equations is the Concrete-Representational-Abstract (CRA) framework.
If you would like to learn more about this framework, click here.
CRA is effective across all age levels and can assist students in learning concepts, procedures, and applications. When implementing each component, teachers should use explicit, systematic instruction and continually monitor student work to assess their understanding, asking them questions about their thinking and providing clarification as needed. Concrete and representational activities must reflect the actual process of solving the problem so that students are able to generalize the process to solve an abstract equation. The illustration below highlights each of these components.
One promising practice for moving secondary students with mathematics difficulties or disabilities from the use of manipulatives and visual representations to the abstract equation quickly is the CRA-I strategy . In this modified version of CRA, the teacher simultaneously presents the content using concrete objects, visual representations of the concrete objects, and the abstract equation. Studies have shown that this framework is effective for teaching algebra to this population of students (Strickland & Maccini, 2012; Strickland & Maccini, 2013; Strickland, 2017).
Kim Paulsen discusses the benefits of manipulatives and a number of things to keep in mind when using them (time: 2:35).
Kim Paulsen, EdD Associate Professor, Special Education Vanderbilt University
View Transcript
Transcript: Kim Paulsen, EdD
Manipulatives are a great way of helping kids understand conceptually. The use of manipulatives really helps students see that conceptually, and it clicks a little more with them. Some of the things, though, that we need to remember when we’re using manipulatives is that it is important to give students a little bit of free time when you’re using a new manipulative so that they can just explore with them. We need to have specific rules for how to use manipulatives, that they aren’t toys, that they really are learning materials, and how students pick them up, how they put them away, the right time to use them, and making sure that they’re not distracters while we’re actually doing the presentation part of the lesson. One of the important things is that we don’t want students to memorize the algorithm or the procedures while they’re using the manipulatives. It really is just to help them understand conceptually. That doesn’t mean that kids are automatically going to understand conceptually or be able to make that bridge between using the concrete manipulatives into them being able to solve the problems. For some kids, it is difficult to use the manipulatives. That’s not how they learn, and so we don’t want to force kids to have to use manipulatives if it’s not something that is helpful for them. So we have to remember that manipulatives are one way to think about teaching math.
I think part of the reason that some teachers don’t use them is because it takes a lot of time, it takes a lot of organization, and they also feel that students get too reliant on using manipulatives. One way to think about using manipulatives is that you do it a couple of lessons when you’re teaching a new concept, and then take those away so that students are able to do just the computation part of it. It is true we can’t walk around life with manipulatives in our hands. And I think one of the other reasons that a lot of schools or teachers don’t use manipulatives is because they’re very expensive. And so it’s very helpful if all of the teachers in the school can pool resources and have a manipulative room where teachers can go check out manipulatives so that it’s not so expensive. Teachers have to know how to use them, and that takes a lot of practice.
In most organisations, you will find that while they have a process, nobody seems to know it exactly, or even where to go to find it. The problem, it seems is with the way in which processes are documented. Process documents are usually lamented over at the time of their writing, then shelved without much thought at all. The reason for this I believe is that there is primarily only two times when a process document is actually referenced:
In my mind, I would much prefer a simpler process flow that is actually used by staff, even if it doesn’t cover every possible eventuality along the way. The visual process document provides the most effective way of presenting the flow of how we go about completing our tasks. Its typically printable on one page (though it might have to be A3), it’s pinnable to your office cubicle, and sometimes as importantly, can be pasted into powerpoint presentations for the business.
So how do you present your testing process?
Joel Deutscher is an experienced performance test consultant, passionate about continuous improvement. Joel works with Planit's Technical Testing Services as a Principal Consultant in Sydney, Australia. You can read more about Joel on LinkedIn .
Comments are closed.
Joel Deutscher is an experienced performance test consultant, passionate about continuous improvement. Joel works with Planit's Technical Testing Services as a Principal Consultant in Sydney, Australia.
Copyright © 2011 Headwired . Powered by WordPress
Loading metrics
Open Access
Peer-reviewed
Research Article
Roles Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing
Affiliations School of Computer Science, Peking University, Beijing, China, Institute for Artificial Intelligence, Peking University, Beijing, China
Roles Formal analysis, Investigation, Validation, Visualization, Writing – original draft, Writing – review & editing
Affiliation School of Computing, University of Leeds, Leeds, United Kingdom
Roles Investigation, Writing – original draft, Writing – review & editing
Affiliation School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
Roles Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing
Roles Investigation, Validation, Writing – original draft, Writing – review & editing
Affiliation Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science and Institute for Medical and Engineering Innovation, Eye & ENT Hospital, Fudan University, Shanghai, China
Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing
* E-mail: [email protected] (ZY); [email protected] (JKL)
Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing
Affiliations School of Computing, University of Leeds, Leeds, United Kingdom, School of Computer Science, Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
Understanding the computational mechanisms that underlie the encoding and decoding of environmental stimuli is a crucial investigation in neuroscience. Central to this pursuit is the exploration of how the brain represents visual information across its hierarchical architecture. A prominent challenge resides in discerning the neural underpinnings of the processing of dynamic natural visual scenes. Although considerable research efforts have been made to characterize individual components of the visual pathway, a systematic understanding of the distinctive neural coding associated with visual stimuli, as they traverse this hierarchical landscape, remains elusive. In this study, we leverage the comprehensive Allen Visual Coding—Neuropixels dataset and utilize the capabilities of deep learning neural network models to study neural coding in response to dynamic natural visual scenes across an expansive array of brain regions. Our study reveals that our decoding model adeptly deciphers visual scenes from neural spiking patterns exhibited within each distinct brain area. A compelling observation arises from the comparative analysis of decoding performances, which manifests as a notable encoding proficiency within the visual cortex and subcortical nuclei, in contrast to a relatively reduced encoding activity within hippocampal neurons. Strikingly, our results unveil a robust correlation between our decoding metrics and well-established anatomical and functional hierarchy indexes. These findings corroborate existing knowledge in visual coding related to artificial visual stimuli and illuminate the functional role of these deeper brain regions using dynamic stimuli. Consequently, our results suggest a novel perspective on the utility of decoding neural network models as a metric for quantifying the encoding quality of dynamic natural visual scenes represented by neural responses, thereby advancing our comprehension of visual coding within the complex hierarchy of the brain.
Understanding how the brain processes visual information is a crucial area of neuroscience research. One of the main challenges is studying how the brain handles dynamic natural visual scenes. Although there has been progress in studying parts of the visual pathway, we still do not fully understand how different areas of the brain work together to process these scenes. Here we used the comprehensive Allen Visual Coding—Neuropixels dataset and advanced deep learning models to explore how the brain encodes and decodes visual information. We found that our model could accurately interpret visual scenes based on neural activity from various regions of the brain. Our findings show a strong link between our decoding results and established brain hierarchy indexes. This not only supports existing knowledge about visual coding but sheds light on the role of deeper brain regions in processing visual scenes. Our study suggests that decoding neural network models can be a valuable tool for understanding how the brain encodes visual information, providing new insights into the complex workings of the visual system in the brain.
Citation: Chen Y, Beech P, Yin Z, Jia S, Zhang J, Yu Z, et al. (2024) Decoding dynamic visual scenes across the brain hierarchy. PLoS Comput Biol 20(8): e1012297. https://doi.org/10.1371/journal.pcbi.1012297
Editor: Daniele Marinazzo, Ghent University, BELGIUM
Received: December 12, 2023; Accepted: July 3, 2024; Published: August 2, 2024
Copyright: © 2024 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this work are publicly available recordings from the Visual Coding - Neuropixels dataset, provided by the Allen Institute for Brain Science. The stimuli images and neural data are available at https://portal.brain-map.org/circuits-behavior/visual-coding-neuropixels . The code used to generate the results in this paper is available at https://github.com/beizai/Decoding .
Funding: This work was supported by the National Natural Science Foundation of China 62176003 and 62088102 and Beijing Nova Program 20230484362 (ZY) and the MOST of China 2022ZD0208604 and 2022ZD0208605 and National Natural Science Foundation of China Grants T2325008 and 820712002 (JZ), and Royal Society Newton Advanced Fellowship of UK Grant NAF-R1-191082 (JKL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Over the course of several decades, extensive research has yielded deep insights into the neural encoding of various attributes within artificial visual stimuli, including features such as the direction of moving gratings. This wealth of knowledge has shed light on the encoding mechanisms employed by visual neurons located in different regions of the brain, particularly within the early visual processing systems, which include the retina [ 1 – 5 ], the lateral geniculate nuclei (LGN) [ 6 – 10 ], and the primary visual cortex (V1) [ 11 – 14 ]. Nonetheless, a formidable challenge remains in elucidating how neurons distributed across various regions of the brain represent natural scenes, comprising natural images and videos [ 15 – 18 ]. This challenge is particularly pronounced when investigating the neural coding of dynamic natural scenes, such as videos [ 19 – 24 ]. The visual system stands as a typical neural pathway that is of critical importance in mediating interactions with the external environment. A considerable expanse of the cerebral cortex is allocated to the processing of visual information [ 25 – 27 ]. This complex cascade of information processing unfolds as we receive copious sensory inputs from the external world and engage in higher-order cognitive functions following the orchestration of these inputs across various brain areas.
The retina serves as the inaugural site of the conversion of visual scenes into electrical signals, marking the initial phase in the trajectory of visual coding. These visual signals traverse the brain in the form of neural spikes, commencing their journey with the retinal ganglion cells, progressing through the LGN located in the thalamus, and ultimately arriving at the visual cortex. Within the realm of the visual cortex, two discernible information pathways, namely the dorsal and ventral streams, have been suggested to come into play [ 28 , 29 ]. In both streams, the initial point is V1, which serves as the pivotal site of early neural processing [ 30 ]. In the dorsal stream, where intricate spatial and locational information processing occurs, neural signals undergo further transmission to the anterior pretectal nuclei (APN) [ 31 , 32 ]. Simultaneously, within the ventral stream, which contributes to memory formation, the information travels a divergent route, eventually reaching the hippocampus [ 33 ]. The elucidation of how visual signals are encoded and subsequently decoded within the cerebral confines is a paramount inquiry in vision research [ 34 ]. In particular, the revelation of how visual scenes traverse a hierarchical neural structure within the brain constitutes a foundational inquiry that holds profound implications not only for the realm of vision but also for the broader computational principles governing the functioning of neurons and neuronal circuits [ 4 , 19 , 35 ].
Over the past several decades, the mechanism governing the encoding of visual scenes has undergone extensive investigation, culminating in a wealth of insightful perspectives on the encoding of specific visual attributes, including, but not limited to luminance contrast, directional motion, and velocity [ 4 , 13 , 35 ]. Moreover, these inquiries have been extended to encompass more intricate and holistic visual scenes, such as natural images and videos [ 16 , 19 , 36 ]. In contrast to encoding, the challenge of decoding visual information from neural signals has been predominantly approached from an engineering perspective. In this context, a variety of methodologies and models have been developed, primarily oriented at addressing classification tasks related to object categorization [ 37 – 39 ], as well as the endeavor of reconstructing pixel-level images [ 40 – 50 ]. These recent studies in decoding and the reconstruction of pixel-level imagery provide fresh perspectives that hold significant implications for the evaluation of visual neuroprosthetic devices and the advancement of vision restoration [ 51 – 54 ].
However, the interaction of encoding and decoding methodologies within the context of visual coding, especially those pertaining to dynamic natural scenes, remains an area marked by limited exploration. In this study, we undertake the endeavor of uniting these facets to offer a comprehensive perspective on visual coding. To accomplish this, we investigate a well-established and robust resource, the Allen Visual Coding—Neuropixels experimental dataset [ 55 ], utilizing a deep learning neural network decoding model [ 49 ]. Our exploration delves deep into the Allen Visual Coding—Neuropixels dataset, exploring it systematically to unravel the intricacies of visual coding distributed across a wide spectrum of hierarchical brain regions. These encompass three distinct regions within the nucleus, six segments within the visual cortex, and four divisions within the hippocampus. Using the power of our decoding model, we embark on the task of reconstructing every individual pixel within the images of the corresponding neural spikes. In this pursuit, we ascertain our capability to accurately decode video pixels from the neural activity of neurons located within the nucleus and the visual cortex, while discerning a significant decrease in decoding accuracy within the hippocampus. In particular, our findings reveal a strong correlation between our decoding accuracy and the classical encoding metrics obtained from experiments employing artificial stimuli. Furthermore, these findings extend to encompass the alignment of our decoding accuracy with the established anatomical and functional hierarchy indexes identified through experimental investigation. These significant outcomes underscore the substantive meaning and competence of our decoding model in deciphering visual information embedded within neural signals. Consequently, our study introduces an innovative approach to quantifying the extent to which visual information derived from dynamic scenes is encapsulated within neural signals from distinct cerebral regions.
To elucidate the hierarchical organization of visual processing within the intricate pathways of the brain, we used a robust and expansive experimental dataset, the Allen Visual Coding—Neuropixels dataset [ 55 ]. This dataset provides recordings from thousands of neurons within mice, captured simultaneously via neuropixels, across a diverse array of brain regions ( Fig 1 ). Our investigation focused on three principal brain regions, encompassing a total of 13 identified brain areas. These regions are delineated as the visual cortex (comprising VISp, VISl, VISrl, VISal, VISpm, and VISam) located at the uppermost tier of the brain, the hippocampus (encompassing CA1, CA3, DG, and SUB) located at an intermediary level, and the nucleus (encompassing LGN, lateral posterior nucleus—LP, and APN) located in the deeper recesses of the brain ( Fig 1A ).
(A) A hierarchical structure of 13 brain areas in three brain regions of the visual cortex, hippocampus, and nucleus, was recorded in response to dynamic videos. The visual information flow is indicated by arrows. (B) The distribution of cell locations recorded in response to videos on multiple neuropixels. Each datapoint is an individual cell. (C) Example neuronal spike trains of 10 cells in each brain area in response to a video presentation. (D) The distribution of cell numbers in each brain area. (E) The firing activity showing means and standard deviations (SDs) of the spike counts averaged over the entire duration of video stimuli in each brain area. (F) The decoding workflow. A deep learning neural network decode takes the input of neural spikes and outputs images. The decoder performance indicates how much visual information is encoded by different brain areas. Visual scenes are natural movies presented as visual stimuli in the Allen Visual Coding—Neuropixels dataset ( https://portal.brain-map.org/circuits-behavior/visual-coding-neuropixels ).
https://doi.org/10.1371/journal.pcbi.1012297.g001
Within the Allen Visual Coding—Neuropixels dataset, a diverse pool of visual stimuli was deployed, ranging from well-designed artificial scenes, such as drifting gratings designed to assess fundamental tuning properties, to dynamic video scenes that have not been extensively investigated. Neurons were systematically recorded in response to these stimuli, with multiple neuropixels capturing a large set of cells distributed across a wide spectrum of brain areas ( Fig 1B ). Notably, the spiking activity observed in response to the video presentations exhibited dynamic temporal patterns both within individual cells and across the cell population ( Fig 1C ).
While previous studies have successfully characterized the encoding properties of artificial visual scenes, including the tuning of direction and orientation, across various brain regions [ 55 ], the encoding of dynamic video scenes remains comparatively less understood. Here we selected a subset of cells exhibiting firing activities in response to video stimuli. The cell count within this subset exhibited variation, ranging from 355 cells in CA3 to 2443 cells in CA1, with most brain regions comprising more than 800 cells ( Fig 1D , S1 Table ). The firing rates, including means and standard deviations ( Fig 1E ), as well as Fano factors and inter-spike intervals ( S1 Fig ), show a diverse range of values, yet demonstrate consistency across the distinct brain areas. This comprehensive dataset with high-quality neuronal recordings and dynamic video stimuli empowers our exploration into the representation of dynamic natural scenes across different brain regions.
The core focus of our study is the direct decoding of dynamic visual scenes from neural spikes. To achieve this, we leverage our previously developed deep learning model, which has been validated to decode image pixel data from neuronal spikes originating in the retina [ 49 ]. Here, we extend the application of this model to explore the intricacies of this expansive dataset. Our model, designed as an end-to-end deep learning neural network ( Methods ), receives sequences of spikes from a population of cells as input and generates, as output, the corresponding pixels of video frames associated with these spikes. By quantifying the fidelity and quality of the reconstructed images, we are equipped to study the extent to how visual information is encoded within distinct brain areas. It is a reasonable expectation that the nucleus and the visual cortex should exhibit a substantial information load, whereas the hippocampus is expected to bear relatively less visual information ( Fig 1F ).
Utilizing a consistent deep neural network framework, we focus on the task of decoding the same visual images by inputting distinct neural spike data originating from various brain regions. Our model exhibited a superb capability for decoding and reconstructing image pixels with a high degree of precision ( Fig 2A ). To evaluate the quality of the decoded images, we employed two widely accepted quantitative metrics: Structural Similarity (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Both of these metrics were adeptly used as measures for assessing the performance of our decoding model, consistently yielding evaluations of the reconstructed image quality ( Fig 2B ), which are also robust to the variation of the model parameter settings ( S2 Fig ).
(A) Example of decoded video frame images using spikes of each individual brain area. The original images are on the top (Origin). The decoded images from each brain area are colored according to the visual cortex, nucleus, and hippocampus. (B) Decoding metrics, SSIM and PSNR, indicate the quality of decoded images in different brain areas. The random cases serve as decoding baselines (dash lines), using two shuffling scenarios, shuffled spikes in the primary visual cortex (VISp-shuffled), and all six areas of the visual cortex (VI-shuffled). The values in violin plots are computed with 400 test images in this and the following figures. Images are natural movies presented as visual stimuli in the Allen Visual Coding—Neuropixels dataset ( https://portal.brain-map.org/circuits-behavior/visual-coding-neuropixels ).
https://doi.org/10.1371/journal.pcbi.1012297.g002
A pronounced trend of visual coding decay was revealed as we traversed the hierarchical landscape of the visual information pathway. The primary visual cortex (VISp) emerged as the most proficient in rendering images with a high degree of fidelity, faithfully capturing the details of the stimuli. Furthermore, within the sub-regions, each of the six distinct brain areas located within the visual cortex, as well as the LGN and LP within the thalamus, and the APN within the midbrain, exhibited decoding results of considerable quality. These regions could effectively reconstruct a significant portion of the original image details. In contrast, results obtained from the hippocampus areas were notably inferior, reflected in diminished values of decoding metrics and a substantial loss of fine-grained details from the original stimuli.
To determine the significance of the decoding metrics obtained, we performed an experiment involving the random shuffling of spikes within the primary visual cortex (VISP-shuffled) and across all six regions of the visual cortex (VI-shuffled). This process effectively perturbed the temporal relationships between the spikes and the stimulus images, rendering the reconstruction process entirely random without any meaningful information. The decoding metrics SSIM and PSNR were measured for the data resulting from the shuffled spikes, serving as a baseline for comparison. Our decoding results consistently outperformed the shuffled baseline, underscoring the meaningful nature of our decoding approach and its capacity to faithfully reflect the extent of visual information encoded within neural spikes ( Fig 2B ).
To undertake a comprehensive quantification of how visual scenes are encoded across various brain areas, we aggregated all nearby sub-areas into three cohesive and interconnected macroscopic brain regions: the visual cortex, nucleus, and hippocampus. Using the collective neuronal spikes originating from these three distinct brain regions, we conducted a thorough examination of decoding outcomes and subsequently compared their performance across these regions ( Fig 3 ). In particular, the example decoded images in each region show a striking level of precision in image reconstruction, distinguishing them significantly from the results of shuffled spikes ( Fig 3A ). These favorable results were further validated by the decoding metrics SSIM and PSNR, which underscored the superior performance of both the visual cortex and the nucleus in contrast to the hippocampus ( Fig 3B ).
(A) Samples of original images (Origin) and decoded images in each combined brain region (Visual cortex, nucleus, hippocampus). The decoded results in shuffled spikes across all brain regions listed as a baseline. (B) Corresponding decoding metrics in each case of (A). SSIM and PSNR metrics in each brain region and baseline with shuffled data. (C) Decoding matrices where the diagonal elements are the values in (B) and the off-diagonal elements are the values of model generalizability, e.g., using the models trained on each brain area to predict other test brain areas. Marked values are the means of over 400 test images in this the following figures. Images are natural movies presented as visual stimuli in the Allen Visual Coding—Neuropixels dataset ( https://portal.brain-map.org/circuits-behavior/visual-coding-neuropixels ).
https://doi.org/10.1371/journal.pcbi.1012297.g003
To explore the generalizability of our decoding models, we applied a model trained on one specific brain region to data originating from different regions. This approach facilitated the construction of a decoding matrix ( Fig 3C ), wherein the diagonal elements reflect the model’s performance when trained and tested with data from the same brain region. In contrast, the off-diagonal elements represent the model’s generalizability, where the performance was trained with one region’s data and tested on another. Notably, we observed that the decoding models exhibited a remarkable degree of specificity, with relatively low generalizability. The decoding metrics associated with off-diagonal elements closely match the performance observed in the shuffled baseline, underscoring the distinctive nature of our decoding models ( Fig 3C ).
In pursuit of a more detailed comparison between different brain areas, we systematically replicated our decoding procedure, by employing controlled cell numbers from each region. To ensure a limited sampling representation, we randomly selected 800 cells from all six areas within the visual cortex, as well as the LGN, LP and CA1. This selection was made with the understanding that other regions possessed fewer than 800 cells. Remarkably, the decoding results derived from this subset of 800 cells closely match those obtained with the full number of cells ( Fig 4A ). This observation was further substantiated by the decoding metrics SSIM and PSNR. Furthermore, we explored two distinct scenarios wherein we mixed areas from the visual cortex, with or without VISp. In each case, decoding with 800 cells consistently yielded similar results. However, the decoding performance with CA1 remained notably inferior when contrasted with that of the visual cortex, LGN, and LP. The decoding models trained on individual brain areas exhibited remarkable specificity, demonstrating inferior performance when transferred to test data from other brain regions (evident in the off-diagonal metrics, which closely mirrored the shuffled baseline, as illustrated in Fig 4B ).
(A) Examples of decoded images with 800 cells of each brain area and two combined areas (VI-all: all combined six areas in the visual cortex; VI w/o VISp: combined five areas of the visual cortex without VISp). (B) Decoding metrics of SSIM and PSNR. Matrices show the decoding values using models trained on each brain area while testing on the same (diagonal) and different (off-diagonal) areas. Images are natural movies presented as visual stimuli in the Allen Visual Coding—Neuropixels dataset ( https://portal.brain-map.org/circuits-behavior/visual-coding-neuropixels ).
https://doi.org/10.1371/journal.pcbi.1012297.g004
We then investigate the influence of cell quantity on the decoding outcomes. Cells were randomly selected in varying numbers, ranging from 50 to 2000 within VISp, and the decoder was trained using differing numbers of cells under the same scheme ( Fig 5A ). Not surprisingly, the quality of decoding decreased when fewer cells were employed. This examination was extended to encompass all brain regions, and it became evident that the decoding metrics exhibited a consistent upward trend with an increase in the number of cells included for model training. However, an intriguing phenomenon was observed—the performance reached a state of saturation when there are enough cells incorporated ( Fig 5B ). Specifically, the visual cortex and nucleus regions appeared to reach a saturation point with approximately 500 cells, whereas the hippocampus necessitated around 1000 cells for the same effect. This observation implies that cells within the nucleus and visual cortex exhibit a greater degree of specialization in processing visual information in contrast to those within the hippocampus.
(A) Examples of decoding images with different numbers of cells (50–2000) in VISp. (B) Decoding metrics (mean±SD) are convergent over an increasing number of cells in each brain area. Images are natural movies presented as visual stimuli in the Allen Visual Coding—Neuropixels dataset ( https://portal.brain-map.org/circuits-behavior/visual-coding-neuropixels ).
https://doi.org/10.1371/journal.pcbi.1012297.g005
A notable observation was that brain areas that exhibit higher decoding performance consistently outperformed other regions ( Fig 5B ). The performance curve of VISp consistently ranked highest among all segments of the visual cortex, even with a limited number of cells for decoding. Similarly, the LGN consistently exhibited superior decoding results compared to LP and APN. In contrast, the hippocampus displayed a lower decoding performance, even when a substantial number of CA1 cells were utilized. Even though the dataset featured fewer cells in CA3, DG, and SUB, it was anticipated that the decoding performance of these hippocampal regions, with a greater number of cells, would remain suboptimal, comparable to that in CA1. These findings suggest that an accurate decoding of visual scenes can be achieved with a relatively small number of cells, depending on the brain areas. In terms of image pixel decoding, the information contained in the spikes of each brain area appears to possess a certain degree of redundancy. The total information derived from the original stimuli, funneled through each brain area, seems to be a constant quantity. Even when a substantial number of cells were employed in CA1, the volume of information it encompassed was not on comparable to that decoded from a limited number of cells in V1. Collectively, our results suggest that cell quantity is not a deterministic factor in the decodability of visual scenes.
The dataset encompasses neural activity responses to a diverse array of stimuli, including static and drifting gratings, serving as a valuable resource for calculating neural selectivity towards different orientations and directions. To gain a deeper comprehension of how decoding performance may elucidate encoding capabilities, we characterize the relationship between decoding performance and cell selectivity tuning. We employed three classical indexes widely used in the field of visual coding ( Fig 6 ): orientation selectivity to static grating stimuli, orientation selectivity to drifting gratings, and directional selectivity to drifting gratings.
The relationship between natural scene neural activity image reconstruction performance and directional visual feature cell selectivity indexes. Natural scene decoding performance metrics SSIM (top; A-C) and PSNR (bottom; D-F) are plotted against (A) orientation selectivity indexes to static gratings, (B) orientation selectivity indexes to drifting gratings, and (C) directional selectivity indexes to drifting gratings. Solid datapoints are means. The circle size is proportional to the SD of different selectivity indexes.
https://doi.org/10.1371/journal.pcbi.1012297.g006
Within the visual cortex, neurons demonstrated a pronounced tendency for high orientation and directional selectivity tuning, a propensity that is significantly more pronounced compared to neurons within the LGN, LP and APN, consistent with established conventions in the field [ 7 , 13 ]. In contrast, all hippocampal areas exhibited both low decoding metrics and diminished orientation selectivity. This diversity generates a distinct pattern, categorizing regions into three distinct clusters: the visual cortex, marked by its high orientation encoding capabilities coupled with a strong decoding performance; the nucleus, marked by low encoding but a strong decoding performance; and the hippocampus, marked by both low encoding and decoding performance. Consequently, these findings indicate the striking consistency in the quantitative relationships between our decoding metrics derived from natural scenes and the encoding tuning properties traditionally associated with artificial scenes. Therefore, our decoding model serves as a functional metric for the study of the encoding of natural visual scenes.
Recent investigations have revealed the presence of a hierarchical organization within the visual cortex, established through an analysis of regional connections [ 30 , 55 ] ( Fig 7A ). To delve deeper into the hierarchy of natural scene decoding, we conducted an in-depth examination of the regions within the visual cortex ( Fig 6 ). This inspection revealed a substantial positive correlation between the SSIM and PSNR decoding metrics and the selectivity indexes associated with artificial stimuli in the six areas of the visual cortex ( Fig 7B–7D ). Notably, a stronger correlation emerged between decoding performance and cell selectivity for drifting gratings compared to static gratings ( Fig 7B–7D ), signifying a more prominent role for the visual cortex in encoding orientation and direction for dynamic vision compared to static vision.
(A) Diagram of the visual cortex. (A) Diagram of mouse visual cortex, showing the anatomical layout of the regions. Regions are synonymous with previous analyses: V1 (VISp), LM (VISl), RL (VISrl), AL (VISal), PM (VISpm), AM (VISam). (B-D) The relationship within the visual cortex between decoding metrics SSIM (top) and PSNR (bottom), and directional visual feature cell selectivity indexes (B) orientation selectivity using static gratings, (C) orientation selectivity to drifting gratings, and (D) directional selectivity to drifting gratings. (E-H) Correlation between decoding performance metrics SSIM (top) and PSNR (bottom) with (E) anatomical hierarchy score [ 56 ]; (F) hierarchical level [ 30 ]; (G) receptive field (RF) area [ 55 ]; and (H) RF diameter [ 30 ]. Data are presented as mean values. R is the Pearson correlation coefficient. For all correlations P<0.05.
https://doi.org/10.1371/journal.pcbi.1012297.g007
Subsequently, we further dissected the correlation between the decoding metrics and the previously established hierarchical structure within the visual cortex [ 30 , 55 , 56 ] ( Fig 7E–7H ). The decoding metrics exhibited a high correlation with the anatomical hierarchy score [ 56 ] ( Fig 7E ). Similarly, the hierarchical level values obtained from another independent anatomical tracing study, which employed electrophysiological methods, displayed a robust correlation with both decoding metrics [ 30 ] ( Fig 7F ). Furthermore, we analyzed another aspect of visual neuron properties, receptive field (RF) size, which has been previously demonstrated to expand in higher-order areas as visual features are aggregated [ 30 , 55 ]. Intriguingly, we observed a highly significant correlation between decoding metrics and RF sizes, a correlation that was consistent across experiments conducted both by [ 30 , 55 ] ( Fig 7G–7H ). In particular, the correlation was more pronounced in the case of the Allen Visual Coding—Neuropixels dataset ( Fig 7E and 7G , R> 0.9) compared to the data presented by [ 30 ] ( Fig 7F and 7H , R<0.9).
Our primary focus revolves around the direct decoding of dynamic visual scenes from neural spikes, employing an end-to-end deep learning model as the decoding mechanism. Leveraging the valuable resource provided by the Allen Visual Coding—Neuropixels dataset, we systematically explored how visual scenes find their neural representation within diverse brain regions. Specifically, we endeavor to address the longstanding question concerning the extent to which neurons in the hippocampus encode visual information. Traditionally, it has been a well-established notion that the primary contributors to visual information encoding are the visual cortex and the LGN, while the hippocampus, in its capacity for information integration towards associative learning and memory formation, encodes a fraction of visual information. Here, our novel decoding approach introduces, for the first time, a quantitative metric that allows for the quantification of this established paradigm.
Conventional paradigms in neuroscience, particularly within the domain of decoding studies, have traditionally focused on using neural signals to retrieve and classify information related to stimuli. These efforts have yielded valuable insights into the neural code [ 55 , 57 – 59 ]. In recent years, a significant shift has been observed, with an emerging emphasis on the reconstruction of input images or videos using a diverse array of neural signals. These signals encompass functional magnetic resonance imaging (fMRI) data [ 38 , 41 , 42 , 60 ], calcium imaging signals [ 48 , 50 ], and neural spikes [ 3 , 43 , 45 – 47 , 61 – 63 ]. In the present study, we break new ground by undertaking the task of decoding and reconstructing visual scenes across multiple brain regions, ranging from the nucleus and visual cortex to the hippocampus, with a specific focus on unraveling the intricacies of the visual hierarchy.
The fundamental question of how neurons in the brain encode visual signals stands as a cornerstone in the field of neuroscience. The prevailing notion entails that visual signals undergo initial encoding within retinal ganglion cells, further progressing through encoding and decoding processes in the LGN and the visual cortex. As these signals traverse deeper into the brain, they are subjected to processing that extracts more abstract, meaningful information for the purpose of learning and memory formation. Previously, our work demonstrated the capacity of deep learning models to decode image pixels from neural spikes in the retina [ 49 ]. In the current investigation, we apply a similar approach to address decoding using neurons in later stages of the visual pathway.
Of particular note, the LGN and LP, located at the forefront of the visual pathway following the retina, exhibit remarkable accuracy in decoding natural scenes. Their proximity to the initial stage of visual processing may account for their ability to retain a substantial amount of visual information. The six regions within the visual cortex and the APN in the midbrain also yield commendable decoding results. Given the crucial roles played by the primary visual cortex in both the dorsal and ventral visual streams and the involvement of APN in the processing of spatial location information within the ventral stream, it is reasonable to anticipate their need for a wealth of original stimulus information.
In contrast, the decoding results for the hippocampal regions, including CA1, CA3, DG, and SUB, appear notably less robust. This discrepancy could be attributed, in part, to their position at the far end of the visual pathway. It is conceivable that the processes of learning and memory formation do not necessitate an exhaustive retention of pixel-level details from the original stimuli. Our findings align with the prevailing understanding that visual information undergoes hierarchical processing across distinct neural regions, each contributing to a unique facet of the visual information pathway.
The redundancy inherent in the representation of pixel-level information within each distinct brain area becomes apparent in our findings. Notably, we have observed that the use of 800 cells is sufficient to approximate decoding results that align closely with those obtained using the entire population of cells within a given brain area. Intriguingly, even when working with a small subset of cells, such as 50 or 100, our decoding outcomes remain reasonable and surpass those generated by random baseline measures. Furthermore, increasing the number of cells serves to reduce blurriness and enhance the overall quality of image details in our decoding results. It becomes evident that, once the cell count reaches approximately 800, or even as few as 500, the decoding outcomes are in line with those derived from a larger dataset of 2000 cells. However, when progressing into the hippocampus, we find that a more substantial number of cells, approximately 1000 to 1500 in the case of CA1, is requisite to attain a sufficiently reasonable level of pixel decoding. These empirical insights provide compelling evidence of the redundancy in the representation of pixel-level stimuli within each unique brain area. It is remarkable to note that even a sparse population of cells has the capacity to capture and convey the essential details of image pixels, underscoring the efficiency and robustness of neural encoding.
Information processing within the visual pathway unfolds along a hierarchical cascade involving sequential stages encompassing the retina, subcortical regions, and cortical areas. Within this complex neural pathway, visual information undergoes a transformation and is distributed extensively to various regions of the brain. This cascade is widely believed to follow a hierarchical organizational principle, with higher-order brain regions performing more sophisticated computations involving increasingly encoded information [ 56 ]. A rich body of previous research, utilizing advanced techniques such as 2-photon calcium imaging [ 56 ] and electorphysiology [ 30 ], has established the presence of hierarchical organization within distinct regions of the mouse visual cortex. The visual hierarchy becomes notably manifest in the response latency as one traverses along the hierarchy, with higher-order regions exhibiting slower response times, thus corroborating previous findings derived from imaging studies [ 30 , 56 ].
To reveal the capacity of decoding metrics in elucidating the information representation across this hierarchical visual pathway, we conducted a rigorous examination of their interplay with the functional anatomical structure. Our exploration involved three well-established encoding indexes, designed to capture the selective activity of neurons within various brain regions when exposed to artificial stimuli in the form of static and drifting grating patterns, as sourced from the Allen Visual Coding—Neuropixels dataset. These indexes offer insights into how individual neurons are finely tuned to distinct visual features, such as orientation and direction [ 64 ]. Subsequently, the tuning properties of neurons were thoughtfully correlated to decoding performance metrics obtained through the presentation of natural visual scenes. This critical analysis aimed to uncover the intricate relationship between neural encoding properties, such as selectivity tuning, and the representation of real-world visual stimuli by neural decoding ( Fig 6 ).
Areas within the visual cortex and the hippocampus reveal a noteworthy correspondence between the reconstruction performance metrics SSIM and PSNR obtained from the decoding model and the selectivity of cells to the orientation and direction. These findings highlight that decoding performance metrics, as acquired from the presentation of natural scenes, are quantitatively linked to the tuning properties of cells for encoding orientation and direction with conventional artificial stimuli. Consequently, the decoding model offers a valuable metric for the comprehensive study of orientation-based encoding in natural visual scenes, transcending the realm of artificial stimuli. In contrast, the regions located within the nucleus exhibit high reconstruction performance while displaying low cell selectivity indexes. This observation is in harmony with prior research indicating a lower direction selectivity index in the LGN compared to V1 [ 7 , 11 , 64 ].
Nevertheless, it is imperative to emphasize that the collective cell selectivity within a region does not necessarily correspond to the overall amount of visual information retained. The mouse LGN, for example, receives input from various types of retinal ganglion cells, encompassing information that spans a wide spectrum of visual features [ 8 ]. Furthermore, neurons within the LGN of rodents and other mammals have been observed to manifest circularly symmetric receptive fields, in contrast to the elongated receptive fields found in V1 [ 7 , 11 , 64 , 65 ]. While LGN neurons may indeed carry the necessary information to interpret orientation and direction, their selectivity appears to become more pronounced when their receptive fields converge within V1. Additionally, neurons in the rodent and mammalian LGN demonstrate marked selectivity to light intensity and linear spatial summation, in addition to their orientation and direction information [ 66 , 67 ]. The human LGN, characterized by a layered structure, carries a highly diverse range of visual information emanating from both the parvocellular and magnocellular visual pathways [ 68 ]. Consequently, certain groups of cells within the region may exhibit orientation selectivity, while others are attuned to different facets of visual information, thus impacting the overall orientation and direction selectivity indexes.
In contrast, the visual cortex prominently exhibits an elevated degree of orientation and direction selectivity, implying the primacy of this form of information representation within the region. However, each distinct region within the visual cortex demonstrates varying selectivity indexes, with V1 emerging as the most selective for orientation and direction among artificial stimuli in the majority of cases. This aligns cohesively with the previously outlined functional hierarchy of the visual cortex, wherein V1, marked by its highest orientation selectivity, serves as the point of origin for an expansive network of orientation-selective cells originating from LGN inputs [ 7 , 11 , 64 ]. Subsequent regions within the visual cortex hierarchy exhibit weaker preferences for orientation, indicating their role in processing alternative forms of visual information and the transformation of visual data from V1 into more complex representations. For instance, V3 has been suggested to demonstrate a greater preference for texture and pattern information [ 69 ]. Hence, the inclusion of additional selectivity metrics, such as those focused on texture selectivity or those unrelated to orientation and direction, would likely reveal distinctive trends among various regions.
An essential inquiry concerns the comparative reconstruction performance of the LGN and LP models when contrasted with V1. An equivalent performance may indicate that the LGN does not utilize other information besides orientation for enhancing reconstruction accuracy. In contrast, better performance could suggest the utilization of alternative visual information sources to enhance reconstruction performance. Moreover, higher reconstruction performance may imply the model’s capacity to employ spikes derived from alternative forms of visual information to augment reconstruction. While the current decoding model serves as a functional metric for the investigation of natural vision encoding beyond artificial stimuli, its competence in deciphering visual features other than orientation and direction remains enigmatic. Consequently, the examination of regions primarily relying on forms of visual information other than orientation, such as the visual cortex, may necessitate the application of alternative cell tuning metrics or decoding model architectures for the exploration and interpretation of the presence of diverse forms of visual information. Further studies are needed to illustrate these intriguing questions in more detail.
Deep neural network (DNN) models have emerged as valuable tools in neuroscience research [ 70 ], particularly for visual coding using neuronal data from spikes to calcium imaging and fMRI [ 34 , 50 , 54 , 71 – 74 ]. Seminal studies have demonstrated that DNNs can identify neuronal representations of visual features encoded in the visual cortex and inferior temporal (IT) cortex of primates [ 34 , 71 , 72 ]. In the mouse visual system, DNNs have aligned visual cortical areas with corresponding model layers using the same Allen Visual Coding—Neuropixels dataset while focusing on neuronal responses to natural images [ 75 ]. Further applications of DNNs on additional data from the mouse visual system have provided deeper insights into visual coding [ 24 , 76 , 77 ]. While DNNs were initially designed to model the ventral visual processing pathway [ 34 ], questions remain regarding the adequacy of these models in fully explaining the underlying biological visual processes [ 78 ].
Similar to primates, mice can perceptually detect higher-order statistical dependencies in texture images, distinguishing between different texture families across visual areas in alignment with DNN predictions [ 79 ]. This observation implies that mouse visual cortex areas may represent semantic features of learned visual categories [ 80 ]. Beyond visual coding, the rodent hippocampus is suggested to have similar roles in learning and memory as observed in primates [ 81 , 82 ]. In our study, we demonstrate that the same DNN reveals less pixel information in hippocampal neurons compared to the thalamus and visual cortex. This finding aligns with the general belief that the hippocampus encodes more abstract information, such as concepts [ 83 ]. Consequently, a pertinent follow-up question is how to decode this abstract information from mouse hippocampal neurons. Recent studies have shown that DNNs can decode semantic information from human fMRI data using advanced generative models. These models process latent embeddings and resample learned semantic information to generate or reconstruct new images with similar concepts [ 84 – 87 ]. Such generative models might be valuable for decoding semantic information represented by mouse hippocampal neurons. Additionally, developing models constrained by neuroscience knowledge could enhance decoding accuracy, offering new insights into the fundamental workings of the biological brain [ 88 , 89 ].
Our current findings demonstrate that the decoding accuracy of natural scenes is closely correlated with encoding metrics derived from artificial scenes. While the interaction between encoding and decoding has been explored for simple stimuli such as directional selectivity [ 55 ], the investigation of complex natural scenes has been limited due to constraints in computational models and methodologies [ 22 , 24 ]. Previous studies have shown that DNNs can be valuable tools for decoding and reconstructing pixel-level information of natural scenes [ 43 , 44 , 47 , 50 ], yet these studies lacked a clear correlation with established encoding metrics. Our present work aims to elucidate this tight correlation, enabling the use of DNNs to quantify neuronal response patterns and compute the differences between patterns in response to both artificial and natural stimuli.
Our approach extends conventional metrics for assessing the distance between spike trains [ 90 , 91 ], incorporating the capability to handle natural scenes. Consequently, our model provides practical and quantitative metrics for characterizing the quality of vision restoration based on neuronal responses treated by neuroprostheses [ 51 , 52 ]. This advancement offers a robust framework for evaluating and enhancing neuroprosthetic treatments.
In the current work, our DNNs are based on convolutional neural networks that process video images frame by frame, without incorporating temporal information. In neuroscience, neuronal dynamics is characterized by complex temporal patterns. Including temporal information could potentially yield deeper insights. Our initial aim was to utilize the temporal sequence of visual processing to develop a spatiotemporal model using video based on continuous frames. However, recent advances in deep learning models for computer vision indicate that image-based foundation models consistently outperform video-based models in most video understanding tasks [ 92 ]. Consequently, models developed by analyzing static images can effectively address dynamic video tasks frame-by-frame, without the need for temporal information [ 93 ]. Consistent with these observations, we found that our current approach of analyzing video frame-by-frame yields better decoding results. Although our current model does not fully utilize the temporal sequence information, it provides a practical method for studying dynamic visual scenes and neuronal responses. Nonetheless, decoding dynamic continuous scenes in the brain while considering temporal information could offer more profound insights. Future work is needed to investigate temporal dynamics in a comprehensive way.
It is well known that temporal delays exist across brain hierarchical areas in processing visual information [ 55 ]. For instance, deeper areas like the hippocampus respond to visual stimuli with a delay of several tens of milliseconds. To assess the impact of these delays on decoding capabilities, we reconstructed images using frames preceding the neuronal response and observed minimal effects ( S3 Fig ). However, our current model may not be suitable for examining how delays influence the processing and representation of visual information. At this stage, our decoding metrics reflect the hierarchical delays between brain areas, with reduced reconstruction errors being more pronounced in deeper areas. Future work will need to develop alternative modeling approaches that can investigate the interpretability of decoding performance while accounting for temporal delays.
Neuron dynamics across all areas of the mouse visual system can be modulated by behavioral variables and other contextual signals [ 24 , 94 – 97 ]. The experimental settings of the Allen Visual Coding—Neuropixels data involved anesthetized mice, thereby minimizing the influence of contextual differences. Future studies should consider how visual scenes are represented in mice located in more natural scenarios, with additional variables taken into account.
The extracellular electrophysiology data we use is from Allen Visual Coding—Neuropixels dataset [ 55 ]. This data set has multiple sessions of experiments. Each session contains three hours of total experiment data with the same procedure in different mice. Three hours of data include spiking responses to a battery of different stimulus scenes, including gratings and two movie clips whose lengths are 30 and 120 seconds. In this study, we used the 120-second movie with a total of 3600 frames, where each frame image has a size of 304*608 pixels. We counted those cells that release spikes under the movie clips and calculated the average number of spikes each movie frame generated in all sessions, in order to avoid the influence that every session contains different brain areas and every brain area contains different cell units in different sessions. If one cell does not release spikes in a session, then it is not included in our analysis. In all experiment sessions, there are 25 brain areas containing more than 20000 cell units. Among them, we selected the brain areas with more than 300 cells, and the numbers of cells and their corresponding brain areas are shown in S1 Table .
The decoding model used in this study is similar to our previous models [ 49 ]. The first part of the network is a multilayer perceptron (MLP) with four layers of full connection layers. The size of the first layer is the number of input cells. Spike data of each cell corresponds to one neural cell in the network. The middle two layers are hidden layers, the size of which is 16384 and 8092, using the ReLU function as an activation function. The size of the final layer is 4096, outputting a 64*64 intermediate image.
The intermediate images from the MLP pass through a typical convolutional neural network and then get the final decoding result. The convolutional neural network is divided into two parts. The first part contains four convolutional layers, using convolution and down-sampling to reduce the image size. The sizes of four convolution kernels are (256, 7, 7), (512, 5, 5), (1024, 3, 3), and (1024, 3, 3). Stride sizes are all (2, 2). After this part, the network will reserve the main information from the input intermediate image and filter redundant noise. The second part contains six convolutional layers and six up-sampling layers. The convolution kernel sizes are (1024, 3, 3), (512, 3, 3), (256, 3, 3), (256, 3, 3), (128, 5, 5), (1, 7, 7). The stride sizes are all (1, 1). Sizes of up-sampling layers are all (2, 2). After the second part, we can decode and reconstruct the features of the original stimuli and get a reconstruction image with 256*256 pixels. Between every two convolutional layers, there are batch normalization, ReLU activation function, and dropout layer with 0.25 probability.
All decoding experiments used the same settings. Due to the high resolution of the original natural movie, we processed the movie by cropping to select the middle part of the images, such that the images were reduced to a resolution ratio of 256*256 pixels for our decoding model. With all 3600 image frames, we broke down the temporal order and randomly selected 3200 frames as the training and validation set and 400 frames as the test set. The loss function of the network is the mean squared error (MSE) between the reconstructed image and the original image. The training uses Adam optimizer with a learning rate of 0.001. The batch size is 16 and the epoch number is 400. All experiments are conducted on a Nvidia RTX 3080. The total parameter number is about 300 million and a single training costs about 200 minutes.
We used two popular quantitative metrics to compare the decoded images with the original stimulus images of natural scenes: structural similarity (SSIM) and peak signal-to-noise ratio (PSNR). SSIM calculated the difference between the luminance, the contrast, and the structure between the two images. It is based on the hypothesis that human vision feels distortions by extracting the structure information from the images, whose value is between -1 and 1, proportional to the quality of the image. PSNR measures the degree of image distortion by dividing the maximum difference by the MSE between pixels from the reconstructed images and the original images. PSNR is an absolute value such that the bigger it is, the less distortion the result has. All the violin plots use the decoding metric values over 400 test images. The single values of the decoding metrics in all the other figures are the mean metric values over 400 test images.
We used three cell selectivity tuning indexes established in visual coding: orientation selectivity to static grating stimuli (OSI SG), orientation selectivity to drifting gratings (OSI DG), and (C) directional selectivity to drifting gratings (DSI DG). All three indexes were measured and provided in the Allen Visual Coding—Neuropixels data, and the details of each index and the experimental protocols were provided in [ 55 ]. We calculated the means of each index for all the cells that respond to these grating stimuli. The means were then used as an overall measurement of cell tuning selectivity in different brain areas to compare our decoding metrics of dynamic videos.
Two types of hierarchical indexes of six areas of the visual cortex were used in this study. For fair comparison and validation between different experimental results, we used both types of hierarchy that were provided by recent studies of the Allen Visual Coding—Neuropixels data [ 55 , 56 ] and a separate study [ 30 ]. The first is the anatomical hierarchy index. The Allen data provides the anatomical hierarchy score of a large number of brain areas [ 56 ]. We also used another index named hierarchy level that is measured in a recent separate study [ 30 ]. Both measures have been shown to be consistent with each other [ 30 ]. The second one is the functional hierarchy index which is a measure of neuronal response to well-defined artificial stimuli. The Allen data provides several functional indexes [ 55 ], out of which we use the size of the receptive field (RF) that was named as the RF area measured with Gabor stimuli [ 55 ]. We also used the RF diameter that was measured with drifting sinusoidal gratings and provided in a separate study [ 30 ]. We calculated the means of the RF area from all the cells in the Allen data that show responses to Gabor stimuli. Since both studies have different numbers of cells, we used the mean values of both measures for six areas of the visual cortex. Totally there are four hierarchy indexes whose values are listed in S2 Table .
S1 table. the number of cells and their brain areas in response to the movie stimulus..
https://doi.org/10.1371/journal.pcbi.1012297.s001
Two anatomical hierarchy indexes (anatomical hierarchy score and hierarchical level) and two measures of receptive field (RF area and RF diameter) were taken from previous studies.
https://doi.org/10.1371/journal.pcbi.1012297.s002
The statics of firing activity in each brain The distribution of inter-spike intervals and Fano factors.
https://doi.org/10.1371/journal.pcbi.1012297.s003
Decoding performance is robust to the variation of model parameter settings . Decoding metrics of VISp (using all 2015 cells) are robust to the change of the filter size in the model. The size of filters in the last few decoding layers of the network model was changed. The size used in the default model setting is 7*7.
https://doi.org/10.1371/journal.pcbi.1012297.s004
Decoding results of VISp (using all 2015 cells) And CA1 (using all 2443 cells) with different time delays. Decoding was conducted by using the same neuronal response but with shifted video frame images different frames before the response time.
https://doi.org/10.1371/journal.pcbi.1012297.s005
We would like to thank Zhile Yang for the helpful discussions.
30 Accesses
Explore all metrics
Unsupervised video object segmentation segments foreground objects from videos without annotations. However, existing methods rely mainly on a single modality to process motion information and perform poorly with occlusions and static objects. To address this problem, we propose a multimodal motion perception network (M2PNet) based on self-supervised training for completely unlabeled video object segmentation tasks. M2PNet adopts a unique dual-path encoder–decoder structure to model spatial and temporal features to capture richer spatiotemporal information. Specifically, the spatial path computes spatial contrast matrices based on coattention mechanisms to enhance motion-related region correlations and accurately represent motion areas. The temporal path first utilizes residual connections and attention mechanisms to strengthen fused representations of different modalities and then iteratively learns motion change patterns based on slot attention to capture motion characteristics. In addition, we innovatively use residual maps to capture subtle interframe changes and build connections between different modal features. Extensive experiments on public datasets demonstrate that our multimodal method achieves significant improvements over other methods, demonstrating the advantage of multisource feature deep fusion. Our implementation is available at https://github.com/cao3082423114/M2PNet .
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Price includes VAT (Russian Federation)
Instant access to the full article PDF.
Rent this article via DeepDyve
Institutional subscriptions
Lian, L., Wu, Z., Yu, S.X.: Bootstrapping objectness from videos by relaxed common fate and visual grouping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14582–14591 (2023). https://doi.org/10.1109/CVPR52729.2023.01401
Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021). https://doi.org/10.1109/ICCV48922.2021.00951
Wang, Y., Shen, X., Yuan, Y., Du, Y., Li, M., Hu, S.X., Crowley, J.L., Vaufreydaz, D.: TokenCut: segmenting objects in images and videos with self-supervised transformer and normalized cut. IEEE Trans. Pattern Anal. Mach. Intell. 45 (12), 15790–15801 (2023). https://doi.org/10.1109/TPAMI.2023.3305122
Article Google Scholar
Li, C., Chen, Z., Sheng, B., Li, P., He, G.: Video flickering removal using temporal reconstruction optimization. Multimed. Tools Appl. 79 , 4661–4679 (2020). https://doi.org/10.1007/s11042-019-7413-y
Lu, X., Wang, W., Shen, J., Tai, Y.W., Crandall, D.J., Hoi, S.C.: Learning video object segmentation from unlabeled videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8960–8970 (2020). arXiv:2003.05020
Ding, S., Xie, W., Chen, Y., Qian, R., Zhang, X., Xiong, H., Tian, Q.: Motion-inductive self-supervised object discovery in videos. arXiv preprint arXiv:2210.00221 (2022). https://doi.org/10.48550/arXiv.2210.00221
Xie, J., Xie, W., Zisserman, A.: Segmenting moving objects via an object-centric layered representation. In: Advances in Neural Information Processing Systems, vol. 35, pp. 28023–28036 (2022). arXiv:2207.02206
Lai, Z., Lu, E., Xie, W.: MAST: a memory-augmented self-supervised tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2020). https://doi.org/10.1109/CVPR42600.2020.00651
Max, W.: Untersuchungen zur lehre von der gestalt ii. Psychol. Forsch. 4 (1), 301–50 (1923). https://doi.org/10.1515/gth-2017-0007
Gibson, J.J.: The Senses Considered as Perceptual Systems. Houghton Mifflin, Boston (1966)
Google Scholar
Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14 , 201–211 (1973). https://doi.org/10.3758/BF03212378
Yang, C., Lamdouar, H., Lu, E., Zisserman, A., Xie, W.: Self-supervised video object segmentation by motion grouping. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7177–7188 (2021). https://doi.org/10.1109/ICCV48922.2021.00709
Lamdouar, H., Xie, W., Zisserman, A.: Segmenting invisible moving objects. In: Proceedings of the British Machine Vision Conference. British Machine Vision Association (2021)
Sun, J., Mao, Y., Dai, Y., Zhong, Y., Wang, J.: MUNet: motion uncertainty-aware semi-supervised video object segmentation. Pattern Recogn. 138 , 109399 (2023). https://doi.org/10.1016/j.patcog.2023.109399
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015). https://doi.org/10.1109/ICCV.2015.316
Zhou, T., Wang, S., Zhou, Y., Yao, Y., Li, J., Shao, L.: Motion-attentive transition for zero-shot video object segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13066–13073 (2020). https://doi.org/10.1609/aaai.v34i07.7008
Tang, Y., Chen, T., Jiang, X., Yao, Y., Xie, G.S., Shen, H.T.: Holistic prototype attention network for few-shot video object segmentation. IEEE Trans. Circuits Syst. Video Technol. (2023). https://doi.org/10.1109/TCSVT.2023.3296629
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2016). https://doi.org/10.1109/CVPR.2016.85
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199 (2013). https://doi.org/10.1109/ICCV.2013.273
Lamdouar, H., Yang, C., Xie, W., Zisserman, A.: Betrayed by motion: camouflaged object discovery via motion segmentation. In: Proceedings of the Asian Conference on Computer Vision (2020). arXiv:2011.11630
Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36 (6), 1187–1200 (2013). https://doi.org/10.1109/TPAMI.2013.242
Bertasius, G., Torresani, L.: Classifying, segmenting, and tracking object instances in video with mask propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9739–9748 (2020). arXiv:1912.04573
Chen, Z., Wang, J., Sheng, B., Li, P., Feng, D.D.: Illumination-invariant video cut-out using octagon sensitive optimization. IEEE Trans. Circuits Syst. Video Technol. 30 (5), 1410–1422 (2019). https://doi.org/10.1109/TCSVT.2019.2902937
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pp. 173–190. Springer (2020). arXiv:1909.11065
Lin, F., Xie, H., Liu, C., Zhang, Y.: Bilateral temporal re-aggregation for weakly-supervised video object segmentation. IEEE Trans. Circuits Syst. Video Technol. 32 (7), 4498–4512 (2021). https://doi.org/10.1109/TCSVT.2021.3127562
Wang, W., Shen, J., Xie, J., Porikli, F.: Super-trajectory for video segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1671–1679 (2017). https://doi.org/10.1109/ICCV.2017.185
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2141–2148. IEEE (2010). https://doi.org/10.1109/CVPR.2010.5539893
Xu, C., Corso, J.J.: Evaluation of super-voxel methods for early video processing. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1202–1209. IEEE (2012). https://doi.org/10.1109/CVPR.2012.6247802
Zhang, D., Javed, O., Shah, M.: Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 628–635 (2013). https://doi.org/10.1109/CVPR.2013.87
Tsai, Y.H., Zhong, G., Yang, M.H.: Semantic co-segmentation in videos. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 760–775. Springer (2016). https://doi.org/10.1007/978-3-319-46493-0_46
Zeng, D., Chen, X., Zhu, M., Goesele, M., Kuijper, A.: Background subtraction with real-time semantic segmentation. IEEE Access 7 , 153869–153884 (2019). https://doi.org/10.1109/ACCESS.2019.2899348
Zhu, W., Meng, J., Xu, L.: Self-supervised video object segmentation using integration-augmented attention. Neurocomputing 455 , 325–339 (2021). https://doi.org/10.1016/j.neucom.2021.04.090
Lee, S., Cho, S., Lee, D., Lee, M., Lee, S.: Tsanet: temporal and scale alignment for unsupervised video object segmentation. arXiv preprint arXiv:2303.04376 (2023). https://doi.org/10.1109/ICIP49359.2023.10222236
Lian, L., Wu, Z., Yu, S.X.: Improving unsupervised video object segmentation with motion-appearance synergy. arXiv preprint arXiv:2212.08816 (2022)
Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: unsupervised video object segmentation with co-attention Siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3623–3632 (2019). arXiv:2001.06810
Dutt Jain, S., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3664–3673 (2017). arXiv:1701.05384
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Computer Vision-ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, May 11–14, 2004. Proceedings, Part IV 8, pp. 25–36. Springer (2004). https://doi.org/10.1007/978-3-540-24673-2_3
Horn, B.K., Schunck, B.G.: Determining optical flow. In: Artificial Intelligence, vol. 17(1–3), pp. 185–203 (1981). https://doi.org/10.1016/0004-3702(81)90024-2
Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8934–8943 (2018). arXiv:1709.02371
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 402–419. Springer (2020). https://doi.org/10.1007/978-3-030-58536-5
Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., Dai, J., Li, H.: FlowFormer: a transformer architecture for optical flow. In: European Conference on Computer Vision, pp. 668–685. Springer (2022). https://doi.org/10.1007/978-3-031-19790-1_40
Shi, X., Huang, Z., Li, D., Zhang, M., Cheung, K.C., See, S., Qin, H., Dai, J., Li, H.: FlowFormer++: masked cost volume autoencoding for pretraining optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1599–1610 (2023). https://doi.org/10.1109/CVPR52729.2023.00160
Wei, B., Wen, Y., Liu, X., Qi, X., Sheng, B.: SOFNet: optical-flow based large-scale slice augmentation of brain MRI. Displays 80 , 102536 (2023). https://doi.org/10.1016/j.displa.2023.102536
You, S., Yao, H., Xu, C.: Multi-target multi-camera tracking with optical-based pose association. IEEE Trans. Circuits Syst. Video Technol. 31 (8), 3105–3117 (2020). https://doi.org/10.1109/TCSVT.2020.3036467
Zhou, Y., Xu, X., Shen, F., Zhu, X., Shen, H.T.: Flow-edge guided unsupervised video object segmentation. IEEE Trans. Circuits Syst. Video Technol. 32 (12), 8116–8127 (2021). https://doi.org/10.1109/TCSVT.2021.3057872
Zhang, X., Boularias, A.: Optical flow boosts unsupervised localization and segmentation. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7635–7642. IEEE (2023). https://doi.org/10.1109/IROS55552.2023.10342195
Oh, S.W., Lee, J.Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7376–7385 (2018). https://doi.org/10.1109/CVPR.2018.00770
Duarte, K., Rawat, Y.S., Shah, M.: CapsuleVOS: semi-supervised video object segmentation using capsule routing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8480–8489 (2019). https://doi.org/10.1109/ICCV.2019.00857
Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., Kipf, T.: Object-centric learning with slot attention. In: Advances in Neural Information Processing Systems, vol. 33, pp. 11525–11538 (2020). arXiv:2006.15055
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25 , 50–61 (2021). https://doi.org/10.1109/TMM.2021.3120873
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020). https://doi.org/10.1007/978-3-031-43148-7_20
Sun, M., Xiao, J., Lim, E.G., Zhao, C., Zhao, Y.: Unified multi-modality video object segmentation using reinforcement learning. IEEE Trans. Circuits Syst. Video Technol. (2023). https://doi.org/10.1109/TCSVT.2023.3284165
Tang, Y., Zhang, L., Yuan, Y., Chen, Z.: Describe fashion products via local sparse self-attention mechanism and attribute-based re-sampling strategy. IEEE Trans. Circuits Syst. Video Technol. 33 (7), 3409–3424 (2023). https://doi.org/10.1109/TCSVT.2022.3233369
Yang, Y., Loquercio, A., Scaramuzza, D., Soatto, S.: Unsupervised moving object detection via contextual information separation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 879–888 (2019). https://doi.org/10.1109/CVPR.2019.00097
Zhou, T., Porikli, F., Crandall, D.J., Van Gool, L., Wang, W.: A survey on deep learning technique for video segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45 (6), 7099–7122 (2022). https://doi.org/10.1109/TPAMI.2022.3225573
Wright, L., Demeure, N.: Ranger21: a synergistic deep learning optimizer. arXiv preprint arXiv:2106.13731 (2021)
Jabri, A., Owens, A., Efros, A.: Space-time correspondence as a contrastive random walk. In: Advances in Neural Information Processing Systems, vol. 33, pp. 19545–19560. arXiv:2006.14613 (2020)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFS with gaussian edge potentials. In: Advances in Neural Information Processing Systems, vol. 24 (2011). arXiv:1210.5644
Meunier, E., Badoual, A., Bouthemy, P.: EM-driven unsupervised learning for efficient motion segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45 (4), 4462–4473 (2022). https://doi.org/10.1109/TPAMI.2022.3198480
Meunier, E., Bouthemy, P.: Unsupervised motion segmentation in one go: smooth long-term model over a video. arXiv preprint arXiv:2310.01040 (2023)
Lao, D., Hu, Z., Locatello, F., Yang, Y., Soatto, S.: Divided attention: unsupervised multi-object discovery with contextually separated slots. arXiv preprint arXiv:2304.01430 (2023)
Sestini, L., Rosa, B., De Momi, E., Ferrigno, G., Padoy, N.: FUN-SIS: a fully unsupervised approach for surgical instrument segmentation. Med. Image Anal. 85 , 102751 (2023). https://doi.org/10.1016/j.media.2023.102751
Meunier, E., Bouthemy, P.: Unsupervised space-time network for temporally-consistent segmentation of multiple motions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22139–22148 (2023). https://doi.org/10.1109/CVPR52729.2023.02120
Xi, L., Chen, W., Wu, X., Liu, Z., Li, Z.: Online unsupervised video object segmentation via contrastive motion clustering. IEEE Trans. Circuits Syst. Video Technol. 34 (2), 995–1006 (2024). https://doi.org/10.1109/TCSVT.2023.3288878
Sheng, B., Li, P., Ali, R., Chen, C.P.: Improving video temporal consistency via broad learning system. IEEE Trans. Cybern. 52 (7), 6662–6675 (2021). https://doi.org/10.1109/TCYB.2021.3079311
Zhang, H., Ali, R., Sheng, B., Li, P., Kim, J., Wang, J.: Preserving temporal consistency in videos through adaptive SLIC. In: Advances in Computer Graphics: 37th Computer Graphics International Conference, CGI 2020, Geneva, Switzerland, October 20–23, 2020, Proceedings 37, pp. 405–410. Springer (2020). https://doi.org/10.1007/978-3-030-61864-3_34
Download references
Authors and affiliations.
Henan University, Zhengzhou, Henan, China
Jun Wang, Honghui Cao, Chenhao Sun, Ziqing Huang & Yonghua Zhang
You can also search for this author in PubMed Google Scholar
Correspondence to Yonghua Zhang .
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China Youth Fund (No. 62202142) and The Key Scientific Research Projects of Colleges and Universities in Henan Province (No. 23A520011), and Key R &D and Promotion Projects of Henan Province (No. 232102211089, 242102211026).
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
Wang, J., Cao, H., Sun, C. et al. Motion perception-driven multimodal self-supervised video object segmentation. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03597-8
Download citation
Accepted : 27 July 2024
Published : 09 August 2024
DOI : https://doi.org/10.1007/s00371-024-03597-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
IMAGES
COMMENTS
Process mapping can help with the organizing process. It's a visual representation of the workflow, similar to a work breakdown structure, and it can be useful for helping you identify issues and areas of improvement.
By creating a visual representation of your processes, you can more easily explain how things work to colleagues and stakeholders, which can help build consensus and support for process improvements. Types of Process Flow Charts There are a few different types of process flow charts that businesses can use, depending on their needs.
A process map provides a detailed visual representation of each step involved in a workflow, making it easier to understand, analyze, and improve the process. By visualizing the flow of tasks, decision points, and interactions, you can quickly spot inefficiencies and areas for improvement.
Definition of Process Visualization. Process visualization is the graphical representation of a process or system's workflow. It's a technique that transforms your activities, systems, data, and operations into visual elements like diagrams and charts. By doing so, it allows you to grasp complex processes quickly and identify areas in need ...
A workflow diagram is a visual representation of a process, either a new process you're creating or an existing process you're altering. For example: A process to streamline your ecommerce customer journey. A project to increase customer retention and satisfaction.
Process mapping or business process modeling is a technique used in many businesses and organizations to create a visual representation of a workflow or process.
Process mapping is a visual representation of how a process works, from beginning to end. It is a powerful tool used to understand and improve the flow of work in any organization, by highlighting areas of inefficiency, redundancy, or waste.
Process mapping symbols are a set of visual representations and icons that help you to map, track, and analyze processes. They can be used to display different types of information regarding a process, such as the flow of activities or tasks, the input and output resources used, the sequence of events during a process run-through, the people ...
Process maps are visual representations of the stuff your company does. A process map can represent software development, product manufacturing, IT infrastructure, employee onboarding, and more.
Background The use of visual representations (i.e., photographs, diagrams, models) has been part of science, and their use makes it possible for scientists to interact with and represent complex phenomena, not observable in other ways. Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when ...
Separation of the visual representation process into three distinct components (visual model, data model, and visual metaphor) has many benefits. It permits metaphors to be tested for correctness, evaluated, compared, and combined.
A flowchart is a visual representation of the sequence of steps and decisions needed to perform a process. Each step in the sequence is noted within a diagram shape.
Visual Representation refers to the principles by which markings on a surface are made and interpreted. Designers use representations like typography and illustrations to communicate information, emotions and concepts. Color, imagery, typography and layout are crucial in this communication. Alan Blackwell, cognition scientist and professor ...
A process map is a visual representation of a task, process, or workflow. Process maps are a powerful planning and management tool that help you improve and streamline a process's workflow. Process maps provide a visual guide for a process, helping you see all required steps in sequential order. This framework gives you a broad, big-picture ...
He leads readers through a simple process of identifying which of the four types of visualization they might use to achieve their goals most effectively: idea illustration, idea generation, visual ...
3 diagrams to make your processes more visual. Process visualization helps provide another perspective of how your workflows, processes, and systems work across your team. By objectively seeing the details behind your process, your team can troubleshoot and plan changes more effectively. You can keep stakeholders in the loop and make changes ...
What Is Data Visualization? Data visualization is the process of creating graphical representations of information. This process helps the presenter communicate data in a way that's easy for the viewer to interpret and draw conclusions.
Consequently, a visual representation is an event, process, state, or object that carries meaning and that is perceived through the visual sensory channel. Of course, this is a broad definition. It includes writing, too, because writing is perceived visually and refers to a given meaning.
The use of visual representations to transfer knowledge between at least two persons aims to improve the transfer of knowledge by using computer and non-computer-based visualization methods complementarily. [ 8] Thus properly designed visualization is an important part of not only data analysis but knowledge transfer process, too. [ 9]
Code visualization is the process of representing code in a graphical or pictorial format, rather than traditional lines of text. It entails creating visual representations of the structure, behavior, and evolution of software.
Page 5: Visual Representations. Yet another evidence-based strategy to help students learn abstract mathematics concepts and solve problems is the use of visual representations. More than simply a picture or detailed illustration, a visual representation—often referred to as a schematic representation or schematic diagram— is an accurate ...
The technique of drawing to learn has received increasing attention in recent years. In this article, we will present distinct purposes for using drawing that a...
Visual Representation of Process. In most organisations, you will find that while they have a process, nobody seems to know it exactly, or even where to go to find it. The problem, it seems is with the way in which processes are documented. Process documents are usually lamented over at the time of their writing, then shelved without much ...
What Is Visual Communication? Visual communication, also known as communication design, is the medium for portraying information, ideas, and data through pictorial representation.It uses visual elements like graphic design, illustration, drawing, typography, video, animation, and other new electronic resources.
Author summary Understanding how the brain processes visual information is a crucial area of neuroscience research. One of the main challenges is studying how the brain handles dynamic natural visual scenes. Although there has been progress in studying parts of the visual pathway, we still do not fully understand how different areas of the brain work together to process these scenes.
The Visual Computer - Unsupervised video object segmentation segments foreground objects from videos without annotations. ... The temporal path first utilizes residual connections and attention mechanisms to strengthen fused representations of different modalities and then iteratively learns motion change patterns based on slot attention to ...
Bug report management is a costly software maintenance process comprised of several challenging tasks. Given the UI-driven nature of mobile apps, bugs typically manifest through the UI, hence the identification of buggy UI screens and UI components (Buggy UI Localization) is important to localizing the buggy behavior and eventually fixing it.