U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Med Educ

Factor Analysis: a means for theory and instrument development in support of construct validity

Mohsen tavakol.

1 School of Medicine, Medical Education Centre, the University of Nottingham, UK

Angela Wetzel

2 School of Education, Virginia Commonwealth University, USA

Introduction

Factor analysis (FA) allows us to simplify a set of complex variables or items using statistical procedures to explore the underlying dimensions that explain the relationships between the multiple variables/items. For example, to explore inter-item relationships for a 20-item instrument, a basic analysis would produce 400 correlations; it is not an easy task to keep these matrices in our heads. FA simplifies a matrix of correlations so a researcher can more easily understand the relationship between items in a scale and the underlying factors that the items may have in common. FA is a commonly applied and widely promoted procedure for developing and refining clinical assessment instruments to produce evidence for the construct validity of the measure.

In the literature, the strong association between construct validity and FA is well documented, as the method provides evidence based on test content and evidence based on internal structure, key components of construct validity. 1 From FA, evidence based on internal structure and evidence based on test content can be examined to tell us what the instrument really measures - the intended abstract concept (i.e., a factor/dimension/construct) or something else. Establishing construct validity for the interpretations from a measure is critical to high quality assessment and subsequent research using outcomes data from the measure. Therefore, FA should be a researcher’s best friend during the development and validation of a new measure or when adapting a measure to a new population. FA is also a useful companion when critiquing existing measures for application in research or assessment practice. However, despite the popularity of FA, when applied in medical education instrument development, factor analytic procedures do not always match best practice. 2 This editorial article is designed to help medical educators use FA appropriately.

The Applications of FA

The applications of FA depend on the purpose of the research. Generally speaking, there are two most important types of FA: Explorator Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA).

Exploratory Factor Analysis

Exploratory Factor Analysis (EFA) is widely used in medical education research in the early phases of instrument development, specifically for measures of latent variables that cannot be assessed directly. Typically, in EFA, the researcher, through a review of the literature and engagement with content experts, selects as many instrument items as necessary to fully represent the latent construct (e.g., professionalism). Then, using EFA, the researcher explores the results of factor loadings, along with other criteria (e.g., previous theory, Minimum average partial, 3 Parallel analysis, 4 conceptual meaningfulness, etc.) to refine the measure. Suppose an instrument consisting of 30 questions yields two factors - Factor 1 and Factor 2. A good definition of a factor as a theoretical construct is to look at its factor loadings. 5 The factor loading is the correlation between the item and the factor; a factor loading of more than 0.30 usually indicates a moderate correlation between the item and the factor. Most statistical software, such as SAS, SPSS and R, provide factor loadings. Upon review of the items loading on each factor, the researcher identifies two distinct constructs, with items loading on Factor 1 all related to professionalism, and items loading on Factor 2 related, instead, to leadership. Here, EFA helps the researcher build evidence based on internal structure by retaining only those items with appropriately high loadings on Factor 1 for professionalism, the construct of interest.

It is important to note that, often, Principal Component Analysis (PCA) is applied and described, in error, as exploratory factor analysis. 2 , 6 PCA is appropriate if the study primarily aims to reduce the number of original items in the intended instrument to a smaller set. 7 However, if the instrument is being designed to measure a latent construct, EFA, using Maximum Likelihood (ML) or Principal Axis Factoring (PAF), is the appropriate method. 7   These exploratory procedures statistically analyze the interrelationships between the instrument items and domains to uncover the unknown underlying factorial structure (dimensions) of the construct of interest. PCA, by design, seeks to explain total variance (i.e., specific and error variance) in the correlation matrix. The sum of the squared loadings on a factor matrix for a particular item indicates the proportion of variance for that given item that is explained by the factors. This is called the communality. The higher the communality value, the more the extracted factors explain the variance of the item. Further, the mean score for the sum of the squared factor loadings specifies the proportion of variance explained by each factor. For example, assume four items of an instrument have produced Factor 1, factor loadings of Factor 1 are 0.86, 0.75, 0.66 and 0.58, respectively. If you square the factor loading of items, you will get the percentage of the variance of that item which is explained by Factor 1. In this example, the first principal component (PC) for item1, item2, item3 and item4 is 74%, 56%, 43% and 33%, respectively. If you sum the squared factor loadings of Factor 1, you will get the eigenvalue, which is 2.1 and dividing the eigenvalue by four (2.1/4= 0.52) we will get the proportion of variance accounted for Factor 1, which is 52 %. Since PCA does not separate specific variance and error variance, it often inflates factor loadings and limits the potential for the factor structure to be generalized and applied with other samples in subsequent study. On the other hand, Maximum likelihood and Principal Axis Factoring extraction methods separate common and unique variance (specific and error variance), which overcomes the issue attached to PCA.  Thus, the proportion of variance explained by an extracted factor more precisely reflects the extent to which the latent construct is measured by the instrument items. This focus on shared variance among items explained by the underlying factor, particularly during instrument development, helps the researcher understand the extent to which a measure captures the intended construct. It is useful to mention that in PAF, the initial communalities are not set at 1s, but they are chosen based on the squared multiple correlation coefficient. Indeed, if you run a multiple regression to predict say  item1 (dependent variable)  from other items (independent variables) and then look at the R-squared (R2), you will see R2 is equal to the communalities of item1 derived from PAF.

Confirmatory Factor Analysis

When prior EFA studies are available for your intended instrument, Confirmatory Factor Analysis extends on those findings, allowing you to confirm or disconfirm the underlying factor structures, or dimensions, extracted in prior research. CFA is a theory or model-driven approach that tests how well the data “fit” to the proposed model or theory. CFA thus departs from EFA in that researchers must first identify a factor model before analysing the data. More fundamentally, CFA is a means for statistically testing the internal structure of instruments and relies on the maximum likelihood estimation (MLE) and a different set of standards for assessing the suitability of the construct of interest. 7 , 8

Factor analysts usually use the path diagram to show the theoretical and hypothesized relationships between items and the factors to create a hypothetical model to test using the ML method. In the path diagram, circles or ovals represent factors. A rectangle represents the instrument items. Lines (→ or ↔) represent relationships between items. No line, no relationship. A single-headed arrow shows the causal relationship (the variable that the arrowhead refers to is the dependent variable), and a double-headed shows a covariance between variables or factors.

If CFA indicates the primary factors, or first-order factors, produced by the prior PAF are correlated, then the second-order factors need to be modelled and estimated to get a greater understanding of the data. It should be noted if the prior EFA applied an orthogonal rotation to the factor solution, the factors produced would be uncorrelated. Hence, the analysis of the second-order factors is not possible. Generally, in social science research, most constructs assume inter-related factors, and therefore should apply an oblique rotation. The justification for analyzing the second-order factors is that when the correlations between the primary factors exist, CFA can then statistically model a broad picture of factors not captured by the primary factors (i.e., the first-order factors). 9   The analysis of the first-order factors is like surveying mountains with a zoom lens binoculars, while the analysis of the second-order factors uses a wide-angle lens. 10 Goodness of- fit- tests need to be conducted when evaluating the hypothetical model tested by CFA. The question is: does the new data fit the hypothetical model? However, the statistical models of the goodness of- fit- tests are complex, and extend beyond the scope of this editorial paper; thus,we strongly encourage the readers consult with factors analysts to receive resources and possible advise.

Conclusions

Factor analysis methods can be incredibly useful tools for researchers attempting to establish high quality measures of those constructs not directly observed and captured by observation. Specifically, the factor solution derived from an Exploratory Factor Analysis provides a snapshot of the statistical relationships of the key behaviors, attitudes, and dispositions of the construct of interest. This snapshot provides critical evidence for the validity of the measure based on the fit of the test content to the theoretical framework that underlies the construct. Further, the relationships between factors, which can be explored with EFA and confirmed with CFA, help researchers interpret the theoretical connections between underlying dimensions of a construct and even extending to relationships across constructs in a broader theoretical model. However, studies that do not apply recommended extraction, rotation, and interpretation in FA risk drawing faulty conclusions about the validity of a measure. As measures are picked up by other researchers and applied in experimental designs, or by practitioners as assessments in practice, application of measures with subpar evidence for validity produces a ripple effect across the field. It is incumbent on researchers to ensure best practices are applied or engage with methodologists to support and consult where there are gaps in knowledge of methods. Further, it remains important to also critically evaluate measures selected for research and practice, focusing on those that demonstrate alignment with best practice for FA and instrument development. 7 , 11

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Exploratory factor analysis: Current use, methodological developments and recommendations for good practice

  • Published: 20 May 2019
  • Volume 40 , pages 3510–3521, ( 2021 )

Cite this article

factor analysis journal research

  • David Goretzko 1 ,
  • Trang Thien Huong Pham 1 &
  • Markus Bühner 1  

13k Accesses

230 Citations

7 Altmetric

Explore all metrics

Psychological research often relies on Exploratory Factor Analysis (EFA). As the outcome of the analysis highly depends on the chosen settings, there is a strong need for guidelines in this context. Therefore, we want to examine the recent methodological developments as well as the current practice in psychological research. We reviewed ten years of studies containing EFAs and contrasted them with new methodological options. We focused on four major issues: an adequate sample size, the extraction method, the rotation method and the factor retention criterion determining the number of factors. Finally, we present modified recommendations based on these reviewed empirical studies and practical considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

factor analysis journal research

Exploratory Factor Analysis

factor analysis journal research

Exploratory Factor Analysis and Theory Generation in Psychology

factor analysis journal research

Factor Analysis: Revisited

Regularization means that an additional term is added to an objective function to solve an otherwise not solvable problem. Here instead of estimating several unique variances which can be infeasible when the sample size is too small, a so-called regularization parameter is selected that adjusts the initial estimates of the unique variances.

The anti-image can be pictured as the negative of the image of a matrix. The image covariance matrix contains the variation of each variable that can be explained by the other variables (partial covariance coefficients), the respective anti-image consists of the negatives which can be described as the unique components. For more detail, have a look at Kaiser ( 1976 ) or detailed EFA textbooks as the anti-image correlation matrix is a commonly used tool to evaluate whether an EFA is applicable to the data (see also Measuring Sampling Adequacy (MSA), Kaiser 1970 ).

The RMSE is defined as the root of the MSE which is the averaged squared distance between parameters and its estimates. In this case, the differences between the given eigenvalues and the eigenvalues obtained of the simulated data sets of the specific k-factor population are computed.

They varied the number of factors (one to five), the number of response categories (two to 20), used correlated and uncorrelated solutions and sample sizes between 200 and 1000.

It only requires the number of items and the sample size, so it can be applied without knowing much about the structure of the data – for example when evaluating published results.

The so-called complexity function is the objective function which is minimized with regard to specific constraints to achieve a particular rotation of the pattern matrix. We recommend the article of Browne ( 2001 ) explaining the link between constraints and rotation criterion in more detail.

In common ML estimation an objective function (that is derived from the log-likelihood) is minimized. Here a so-called penalty term is added to this function. It penalizes a high number of parameters (in this case loadings, especially cross-loadings). The more parameters are estimated to be non-zero, the higher this term gets and it “becomes harder” to achieve a minimum, so in turn adding this penalty yielding more small (or even zero-) loadings (depending on the type of penalty). You can read about penalizing the likelihood in the EFA estimation process in more detail in Jin, Moustaki and Yang-Wallentin ( 2018 ).

Problem of rotation indeterminacy (see introduction section)

The optimization process is done with respect to different constraints, but apart from that equivalent for all rotation methods. Therefore, theoretical considerations must be taken into account to make a reasonable decision (are cross-loadings consistent with theoretical assumptions, etc.).

Baglin, J. (2014). Improving your exploratory factor analysis for ordinal data: A demonstration using FACTOR. Practical Assessment, Research & Evaluation, 19 , 5 Retrieved from http://pareonline.net/getvn.asp?v=19&n=5 . Accessed 15 Dec 2017

Barendse, M. T., Oort, F. J., & Timmerman, M. E. (2015). Using exploratory factor analysis to determine the dimensionality of discrete responses. Structural Equation Modeling: A Multidisciplinary Journal, 22 (1), 87–101. https://doi.org/10.1080/10705511.2014.934850 .

Article   Google Scholar  

Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13 (2), 186–203. https://doi.org/10.1207/s15328007sem1302_2 .

Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G. J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research & Evaluation, 18 (6) Retrieved from http://pareonline.net/getvn.asp?v=18&n=6 . Accessed 15 Dec 2017

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110 (2), 203–219. https://doi.org/10.1037/0033-295X.110.2.203 .

Article   PubMed   Google Scholar  

Braeken, J., & Van Assen, M. A. (2017). An empirical Kaiser criterion. Psychological Methods, 22 (3), 450–466. https://doi.org/10.1037/met0000074 .

Browne, M. W. (1974). Generalized least squares estimators in the analysis of covariance structures. South African Statistical Journal, 8 (1), 1–24 Retrieved from http://hdl.handle.net/10520/AJA0038271X_175 . Accessed 28 Mar 2018

Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36 (1), 111–150. https://doi.org/10.1207/S15327906MBR3601_05 .

Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21 (2), 230–258. https://doi.org/10.1177/0049124192021002005 .

Cabrera-Nguyen, P. (2010). Author guidelines for reporting scale development and validation results in the journal of the Society for Social Work and Research. Journal of the Society for Social Work and Research, 1 (2), 99–103. https://doi.org/10.5243/jsswr.2010.8 .

Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6 (2), 147–168. https://doi.org/10.1177/1094428103251541 .

Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10 (7), 1–9 Retrieved from http://pareonline.net/getvn.asp?v=10&n=7 . Accessed 24 Oct 2017

Crawford, C. B., & Ferguson, G. A. (1970). A general rotation criterion and its use in orthogonal rotation. Psychometrika, 35 (3), 321–332. https://doi.org/10.1007/BF02310792 .

Cureton, E. E., & Mulaik, S. A. (1975). The weighted varimax rotation and the promax rotation. Psychometrika, 40 (2), 183–195. https://doi.org/10.1007/BF02291565 .

De Winter, J. C. F., & Dodou, D. (2012). Factor recovery by principal axis factoring and maximum likelihood factor analysis as a function of factor pattern and sample size. Journal of Applied Statistics, 39 (4), 695–710. https://doi.org/10.1080/02664763.2011.610445 .

Dinno, A. (2009). Exploring the sensitivity of Horn's parallel analysis to the distributional form of random data. Multivariate Behavioral Research, 44 (3), 362–388. https://doi.org/10.1080/00273170902938969 .

Article   PubMed   PubMed Central   Google Scholar  

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4 (3), 272–299. https://doi.org/10.1037/1082-989X.4.3.272 .

Ford, J. K., MacCallum, R. C., & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39 (2), 291–314. https://doi.org/10.1111/j.1744-6570.1986.tb00583.x .

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale: Lawrence Erlbaum Associates.

Google Scholar  

Gorsuch, R. L. (1990). Common factor analysis versus component analysis: Some well and little known facts. Multivariate Behavioral Research, 25 (1), 33–39. https://doi.org/10.1207/s15327906mbr2501_3 .

Gorsuch, R. L. (1997). Exploratory factor analysis: Its role in item analysis. Journal of Personality Assessment, 68 (3), 532–560. https://doi.org/10.1207/s15327752jpa6803_5 .

Harman, H. H., & Jones, W. H. (1966). Factor analysis by minimizing residuals (minres). Psychometrika, 31 (3), 351–368. https://doi.org/10.1007/BF02289468 .

Hirose, K., & Yamamoto, M. (2014). Estimation of an oblique structure via penalized likelihood factor analysis. Computational Statistics & Data Analysis, 79 , 120–132. https://doi.org/10.1016/j.csda.2014.05.011 .

Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing, 25 (5), 863–875. https://doi.org/10.1007/s11222-014-9458-0 .

Hogarty, K. Y., Hines, C. V., Kromrey, J. D., Ferron, J. M., & Mumford, K. R. (2005). The quality of factor solutions in exploratory factor analysis: The influence of sample size, communality, and overdetermination. Educational and Psychological Measurement, 65 (2), 202–226. https://doi.org/10.1177/0013164404267287 .

Holgado-Tello, F. P., Chacón-Moscoso, S., Barbero-García, I., & Vila-Abad, E. (2010). Polychoric versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal variables. Quality & Quantity, 44 (1), 153–166. https://doi.org/10.1007/s11135-008-9190-y .

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30 (2), 179–185. https://doi.org/10.1007/BF02289447 .

Ihara, M., & Kano, Y. (1986). A new estimator of the uniqueness in factor analysis. Psychometrika, 51 (4), 563–566. https://doi.org/10.1007/BF02295595 .

Jin, S., Moustaki, I., & Yang-Wallentin, F. (2018). Approximated penalized maximum likelihood for exploratory factor analysis: An orthogonal case. Psychometrika, 83 (3), 628–649. https://doi.org/10.1007/s11336-018-9623-z .

Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32 (4), 443–482. https://doi.org/10.1007/BF02289658 .

Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36 (3), 347–387. https://doi.org/10.1207/S15327906347-387 .

Jöreskog, K. G., Olsson, U. H., & Yang-Wallentin, F. (2016). Multivariate analysis with LISREL . Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-33153-9 .

Book   Google Scholar  

Jung, S., & Lee, S. (2011). Exploratory factor analysis for small samples. Behavior Research Methods, 43 (3), 701–709. https://doi.org/10.3758/s13428-011-0077-9 .

Jung, S., & Takane, Y. (2008). Regularized common factor analysis. In K. Shigemasu, A. Okada, T. Imaizumi, & T. Hoshino (Eds.), New trends in psychometrics (pp. 141–149). Tokyo: University Academic Press.

Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35 (4), 401–415. https://doi.org/10.1007/BF02291817 .

Kaiser, H. F. (1976). Image and anti-image covariance matrices from a correlation matrix that may be singular. Psychometrika, 41 (3), 295–300. https://doi.org/10.1007/BF02293555 .

Katsikatsou, M., Moustaki, I., Yang-Wallentin, F., & Jöreskog, K. G. (2012). Pairwise likelihood estimation for factor analysis models with ordinal data. Computational Statistics & Data Analysis, 56 (12), 4243–4258. https://doi.org/10.1016/j.csda.2012.04.010 .

Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Hofelich Mohr, A., … Frank, M. C. (2018). A practical guide for transparency in psychological science. Retrieved from https://osf.io/79epu . Accessed 28 Mar 2018

Lorenzo, U., & Ferrando, P. J. (1996). FACOM: A library for relating solutions obtained in exploratory factor analysis. Behavior Research Methods, Instruments, & Computers, 28 (4), 627–630. https://doi.org/10.3758/BF03200553 .

Lorenzo, U., & Ferrando, P. J. (1998). NFACOM: A new program for relating solutions in exploratory factor analysis. Behavior Research Methods, Instruments, & Computers, 30 (4), 724–725. https://doi.org/10.3758/BF03209493 .

Lorenzo-Seva, U. (2000). The weighted oblimin rotation. Psychometrika, 65 (3), 301–318. https://doi.org/10.1007/BF02296148 .

Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The Hull method for selecting the number of common factors. Multivariate Behavioral Research, 46 (2), 340–364. https://doi.org/10.1080/00273171.2011.564527 .

MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor analysis. Psychological Methods, 4 (1), 84–99. https://doi.org/10.1037/1082-989X.4.1.84 .

Maroof, D. A. (2012). Exploratory factor analysis. In D. A. Maroof (Ed.), Statistical methods in neuropsychology: Common procedures made comprehensible (pp. 23–34). New York: Springer Science + Business Media, LLC. https://doi.org/10.1007/978-1-4614-3417-7_4 .

Chapter   Google Scholar  

Marsh, H. W., Morin, A. J., Parker, P. D., & Kaur, G. (2014). Exploratory structural equation modeling: An integration of the best features of exploratory and confirmatory factor analysis. Annual Review of Clinical Psychology, 10 , 85–110. https://doi.org/10.1146/annurev-clinpsy-032813-153700 .

Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum sample size recommendations for conducting factor analyses. International Journal of Testing, 5 (2), 159–168. https://doi.org/10.1207/s15327574ijt0502_4 .

Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 9 (4), 599–620. https://doi.org/10.1207/S15328007SEM0904_8 .

Myers, N. D., Jin, Y., Ahn, S., Celimli, S., & Zopluoglu, C. (2015). Rotation to a partially specified target matrix in exploratory factor analysis in practice. Behavior Research Methods, 47 (2), 494–505. https://doi.org/10.3758/s13428-014-0486-7 .

Peres-Neto, P. R., Jackson, D. A., & Somers, K. M. (2005). How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Computational Statistics & Data Analysis, 49 (4), 974–997. https://doi.org/10.1016/j.csda.2004.06.015 .

Preacher, K. J., Zhang, G., Kim, C., & Mels, G. (2013). Choosing the optimal number of factors in exploratory factor analysis: A model selection perspective. Multivariate Behavioral Research, 48 (1), 28–56. https://doi.org/10.1080/00273171.2012.710386 .

Rhemtulla, M., Brosseau-Liard, P. É., & Savalei, V. (2012). When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychological Methods, 17 (3), 354–373. https://doi.org/10.1037/a0029315 .

Rouquette, A., & Falissard, B. (2011). Sample size requirements for the internal validation of psychiatric scales. International Journal of Methods in Psychiatric Research, 20 (4), 235–249. https://doi.org/10.1002/mpr.352 .

Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24 (2), 282–292. https://doi.org/10.1037/a0025697 .

Sass, D. A., & Schmitt, T. A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45 (1), 73–103. https://doi.org/10.1080/00273170903504810 .

Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29 (4), 304–321. https://doi.org/10.1177/0734282911406653 .

Schmitt, T. A., & Sass, D. A. (2011). Rotation criteria and hypothesis testing for exploratory factor analysis: Implications for factor pattern loadings and interfactor correlations. Educational and Psychological Measurement, 71 (1), 95–113. https://doi.org/10.1177/0013164410387348 .

Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47 (5), 609–612. https://doi.org/10.1016/j.jrp.2013.05.009 .

Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69 , 487–510. https://doi.org/10.1146/annurev-psych-122216-011845 .

Steiger, J. H. (1979). Factor indeterminacy in the 1930's and the 1970's some interesting parallels. Psychometrika, 44 (2), 157–167. https://doi.org/10.1007/BF02293967 .

Steiger, J. H., & Schönemann, P. H. (1978). A history of factor indeterminacy. In S. Shye (Ed.), Theory construction and data analysis in the behavioral sciences (pp. 136–178). San Francisco: Jossey-Bass.

Suhr, D. D. (2005). Principal component analysis vs. exploratory factor analysis (paper 203-30). Paper presented at the meeting of the SAS Users Group International (SUGI 30), Philadelphia, PA. Retrieved from http://www2.sas.com/proceedings/sugi30/203-30.pdf . Accessed 17 Oct 2017

Timmerman, M. E., & Lorenzo-Seva, U. (2011). Dimensionality assessment of ordered polytomous items with parallel analysis. Psychological Methods, 16 (2), 209–220. https://doi.org/10.1037/a0023353 .

Widaman, K. F. (2012). Exploratory factor analysis and confirmatory factor analysis. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol 3: Data analysis and research publication (pp. 361–389). Washington, DC: American Psychological Association.

Yong, A. G., & Pearce, S. (2013). A beginner’s guide to factor analysis: Focusing on exploratory factor analysis. Tutorial in Quantitative Methods for Psychology, 9 (2), 79–94. https://doi.org/10.20982/tqmp.09.2.079 .

Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99 (3), 432–442. https://doi.org/10.1037/0033-2909.99.3.432 .

Download references

Author information

Authors and affiliations.

Department of Psychology, Ludwig Maximilians University Munich, Giselastr. 13, 80802, Munich, Germany

David Goretzko, Trang Thien Huong Pham & Markus Bühner

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to David Goretzko .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

(DOCX 60 kb)

Rights and permissions

Reprints and permissions

About this article

Goretzko, D., Pham, T.T.H. & Bühner, M. Exploratory factor analysis: Current use, methodological developments and recommendations for good practice. Curr Psychol 40 , 3510–3521 (2021). https://doi.org/10.1007/s12144-019-00300-2

Download citation

Published : 20 May 2019

Issue Date : July 2021

DOI : https://doi.org/10.1007/s12144-019-00300-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Exploratory factor analysis
  • Sample size
  • Retention criteria
  • Rotation methods
  • Extraction methods
  • Find a journal
  • Publish with us
  • Track your research

Plan to Attend Cell Bio 2024

Change Password

Your password must have 8 characters or more and contain 3 of the following:.

  • a lower case character, 
  • an upper case character, 
  • a special character 

Password Changed Successfully

Your password has been changed

  • Sign in / Register

Request Username

Can't sign in? Forgot your username?

Enter your email address below and we will send you your username

If the address matches an existing account you will receive an email with instructions to retrieve your username

One Size Doesn’t Fit All: Using Factor Analysis to Gather Validity Evidence When Using Surveys in Your Research

  • Christopher Runyon

*Address correspondence to: Eva Knekta ( E-mail Address: [email protected] ).

Department of Science and Mathematics Education, Umeå University, 901 87 Umeå, Sweden

Department of Biological Sciences, Florida International University, Miami, FL 33199

Search for more papers by this author

Department of Educational Psychology, University of Texas at Austin, Austin, TX 78712

National Board of Medical Examiners, Philadelphia, PA 19104

Across all sciences, the quality of measurements is important. Survey measurements are only appropriate for use when researchers have validity evidence within their particular context. Yet, this step is frequently skipped or is not reported in educational research. This article briefly reviews the aspects of validity that researchers should consider when using surveys. It then focuses on factor analysis, a statistical method that can be used to collect an important type of validity evidence. Factor analysis helps researchers explore or confirm the relationships between survey items and identify the total number of dimensions represented on the survey. The essential steps to conduct and interpret a factor analysis are described. This use of factor analysis is illustrated throughout by a validation of Diekman and colleagues’ goal endorsement instrument for use with first-year undergraduate science, technology, engineering, and mathematics students. We provide example data, annotated code, and output for analyses in R, an open-source programming language and software environment for statistical computing. For education researchers using surveys, understanding the theoretical and statistical underpinnings of survey validity is fundamental for implementing rigorous education research.

THE USE OF SURVEYS IN BIOLOGY EDUCATION RESEARCH

Surveys and achievement tests are common tools used in biology education research to measure students’ attitudes, feelings, and knowledge. In the early days of biology education research, researchers designed their own surveys (also referred to as “measurement instruments” 1 ) to obtain information about students. Generally, each question on these instruments asked about something different and did not involve extensive use of measures of validity to ensure that researchers were, in fact, measuring what they intended to measure ( Armbruster et al. , 2009 ; Rissing and Cogan, 2009 ; Eddy and Hogan, 2014 ). In recent years, researchers have begun adopting existing measurement instruments. This shift may be due to researchers’ increased recognition of the amount of work that is necessary to create and validate survey instruments (cf. Andrews et al. , 2017 ; Wachsmuth et al. , 2017 ; Wiggins et al. , 2017 ). While this shift is a methodological advancement, as a community of researchers we still have room to grow. As biology education researchers who use surveys, we need to understand both the theoretical and statistical underpinnings of validity to appropriately employ instruments within our contexts. As a community, biology education researchers need to move beyond simply adopting a “validated” instrument to establishing the validity of the scores produced by the instrument for a researcher’s intended interpretation and use. This will allow education researchers to produce more rigorous and replicable science. In this primer, we walk the reader through important validity aspects to consider and report when using surveys in their specific context.

Measuring Variables That Are Not Directly Observable

Some variables measured in education studies are directly observable. For example, the percent of international students in a class or the amount of time students spend on a specific task can both be directly observed by the researcher. Other variables that researchers may want to measure are not directly observable, such as students’ attitudes, feelings, and knowledge. The measurement of unobservable variables is what we focus on in this primer. To study these unobservable variables, researchers collect several related observable variables (responses to survey items) and use them to make inferences about the unobservable variable, termed “latent variable” or “construct” 2 in the measurement literature. For example, when assessing students’ knowledge of evolution, it is intuitive that a single item (i.e., a test question) would not be sufficient to make judgments about the entirety of students’ evolution knowledge. Instead, students’ scores from several items measuring different aspects of evolution are combined into a sum score. The measurement of attitudes and feelings (e.g., students’ goals, students’ interest in biology) is no different. For example, say a researcher wanted to understand the degree to which students embrace goals focused on improving themselves, agentic goals , as will be seen in our illustrating example in this primer. Instead of asking students one question about how important it is for them to improve themselves, an instrument was created to include a number of items that focus on slightly different aspects of improving the self. The observed responses to these survey items can then be combined to represent the construct agentic goal endorsement . To combine a number of items to represent one construct, the researcher must provide evidence that these items truly represent the same construct. In this paper, we provide an overview of the evidence necessary to have confidence in using a survey instrument for one’s specific purpose and go into depth for one type of statistical evidence for validity: factor analysis.

The aims of this article are 1) to briefly review the theoretical background for instrument validation and 2) to provide a step-by-step description of how to use factor analysis to gather evidence about the number and nature of constructs in an instrument. We begin with a brief theoretical background about validity and constructs to situate factor analysis in the larger context of instrument validation. Next, we discuss coefficient alpha, a statistic currently used, and often misused, in educational research as evidence for validity. The remainder of the article explores the statistical method of factor analysis. We describe what factor analysis is, when it is appropriate to use it, what we can learn from it, and the essential steps in conducting it. An evaluation of the number and nature of constructs in the Diekman et al. (2010) goal-endorsement instrument when used with first-year undergraduate science, technology, engineering, and mathematics (STEM) students is provided to illustrate the steps involved in conducting a factor analysis and how to report it in a paper (see Boxes 1 – 7 ). The illustrating example comes from a unique data collection and analysis made by the authors of this article. Data, annotated code, and output from the analyses run in R (an open-source programming language and software environment for statistical computing; R Core Team, 2016 ) for this example are included in the Supplemental Material.

WHAT IS VALIDITY?

The quality of measurements is important to all sciences. Although different terms are used in different disciplines, the underlining principles and problems are the same across disciplines. For example, in physics, the terms “accuracy” and “precision” are commonly used to describe how confident researchers should be in their measurements. In the discourse about survey quality, validity and reliability are the key concepts for measurement quality. Roughly, validity refers to whether an instrument actually measures what it is designed to measure and reliability is the consistency of the instrument’s measurements.

In this section, we will briefly outline what validity is and the many types of validity evidence. Reliability, and its relation to validity, will be discussed in The Misuse of Coefficient Alpha . Before getting into the details, we need to emphasize a critical concept about validity that is often overlooked: validity is not a characteristic of an instrument, but rather a characteristic of the use of an instrument in a particular context. Anytime an instrument is used in a new context, at least some measures of its validity must be established for that specific context.

Validity Is Not a Property of the Instrument

Validity refers to the degree of which evidence and theory support the interpretations of the test score for the proposed use. ( AERA, APA, and NCME, 2014 , p. 11)

Thus, validity is not a property of the measurement instrument but rather refers to its proposed interpretation and use. Validity must be considered each time an instrument is used ( Kane, 2016 ). An instrument may be validated for a certain population and purpose, but that does not mean it will work across all populations and for all purposes. For example, a validation of Diekman’s goal-endorsement instrument (Diekman et al., 2010) as a reasonable measure of university students’ goal endorsement does not automatically validate the use of the instrument for measuring 6-year-olds’ goal endorsement. Similarly, a test validated for one purpose, such as being a reasonable measure of sixth-grade mathematical achievement, does not automatically validate it for use with other purposes, such as placement and advancement decisions ( Kane, 2016 ). The validation of a survey may also be time sensitive, as cultures continually change. Using a survey from the 1980s about the use of technology would be employing a dated view of what is meant by “technology” today.

Types of Validity Evidence

Validation is a continuous and iterative process of collecting many different types of evidence to support that researchers are measuring what they aim to measure. The latest Standards for Educational and Psychological Testing describes many types of validity evidence to consider when validating an instrument for a particular purpose ( AERA, APA, and NCME, 2014 , chap. 1). These types of evidence and illustrative examples are summarized in Table 1 . For example, one important aspect to consider is whether the individual items that make up the survey are interpreted by the respondents in the way intended by the researcher. Researchers must also consider whether the individual items constitute a good representation of the construct and whether the items collectively represent all the important aspects of that construct. Looking at our illustrative example ( Box 1 and Table 2 ), we could ask whether items 15–23 (i.e., helping others, serving humanity, serving community, working with people, connection with others, attending to others, caring for others, intimacy, and spiritual rewards) in the goal-endorsement instrument constitute a good representation of the construct “helping others and one’s community”? Yet another type of validity evidence involves demonstrating that the scores obtained for a construct on an instrument of interest correlate to other measures of the same or closely related constructs.

Types of validity evidence to consider when validating an instrument according to the ( )

Type of validity evidenceDefinitionExample considerations
Evidence based on test contentAnalyses of the relationship between an instrument’s content and the construct it is intended to measureDoes this instrument represent the appropriate aspects of communal goals (construct) as described by the theoretical framework?
Evidence based on response processesInformation on how respondents answer the instrument’s items
Evidence based on internal structureAnalyses of internal relationships between instrument items and instrument ­components and how they conform to the intended constructDoes factor analysis support the relationships between items suggested by the theoretical framework?
Evidence based on relations to other variablesAnalyses of the relationships of instrument scores to variables external to the instrument and to other instruments that measure the same construct or related constructs
Evidence based on the consequences of testing The extent to which the consequences of the use of the score are congruent with the proposed uses of the instrument

a Many of the example considerations are in reference to the elements in the Diekman et al. (2010) instrument; we provide these only as motivating examples and encourage readers to apply the example within their own work.

b If and how to include consequences of testing as a measure of validity is highly debated in educational and psychological measurement (see Mehrens, 1997 ; Lissitz and Samuelsen, 2007 ; Borsboom et al. , 2004 ; Cizek, 2016 ; Kane, 2016 ). We chose to present the view of validity as described in the latest Standards for Educational and Psychological Testing ( AERA, APA, and NCME, 2014 ).

BOX 1. How to describe the purpose (abbreviated), instrument, and sample for publication illustrated with the goal-­endorsement example

Defining the construct and intended use of the instrument.

The aim of this study was to analyze the internal structure of the goal-endorsement instrument described by Diekman et al. (2010) for use with incoming first-year university STEM students. The objective is to use the knowledge gained through the survey to design STEM curricula that might leverage the goals students perceive as most important to increase student interest in their STEM classes.

The theoretical framework leading to the development of this survey has a long and well-established history. In 1966, Bakan (1966) originally proposed that two orientations could be used to characterize the human experience: agentic (orientation to the self) and communal (orientation to others). Agentic goals can thus be seen as goals focusing on improving the self or one’s own circumstances. Communal goals are goals focusing on helping others and one’s community and being part of a community. Gender socialization theory contributed to our understanding of who holds these goals most strongly: women are socialized to desire and assume more communal roles, while males assume more agentic roles ( Eagly et al. , 2000 ; Prentice and Carranza, 2002 ; Su et al. , 2009 ).

This framework and survey were first used in the context of STEM education by Diekman et al. (2010) . They found these two goal orientations to be predictive of women’s attrition from STEM, particularly when they perceive STEM careers to be at odds with the communal goals important to them. Current research in this area has expanded beyond the focus on gender differences and has recognized that all humans value communal goals to some degree and that there is also variation in importance placed on communal goals by racial and ethnic groups ( Smith et al. , 2014 ), social class (Stephens et al. , 2012), and college generation status ( Allen et al. , 2015 ). The majority of this work has been done with the general population of undergraduates. Our proposed use of the survey is to explore the variation in goals among different groups in a STEM-exclusive sample.

The instrument

The goal-endorsement survey described by Diekman et al. , (2010) aims to measure how others-focused (communal) versus self-focused (agentic) students are. The instrument asks students to rate “how important each of the following kinds of goals [is] to you personally” on a scale of 1 (not at all important) to 7 (very important). The original measurement instrument has 23 items that have been reported as two factors: agentic (14 items) and communal (nine items) goals (see Table 2 for a listing of the items). The survey has been used many times in different contexts and has been shown to be predictive in ways hypothesized by theory. Diekman et al. (2010) briefly report on an EFA supporting the proposed two-factor structure of the instrument with a sample of undergraduates from introductory psychology courses.

Data collection and participants

The questionnaire was distributed in Fall 2015 and 2016 to entering first-year undergraduate students in STEM fields (biology, biochemistry, physics, chemistry, math, and computer science) at a large southern U.S. R1 university. Students took the questionnaire in the weeks before their first Fall semester. In total, 796 students (70% women) completed the questionnaire. Fifteen percent of the students were first-generation students, and 24% came from underrepresented minorities.

Sample size

In our study, the total sample size was 796 students. Considering the number of factors (two) and the relatively large number of items per factor (nine and 14), the sample size was deemed more than sufficient to perform factor analysis ( Gagne and Hancock, 2006 ; Wolf et al. , 2013 ).

Items included in the (2010) goal-endorsement instrument

Three-factor solutionFour-factor solutionFive-factor solution
Items123123412345
1Power0.740.740.76
2Recognition0.690.600.60
3Achievement0.440.690.68
4Mastery0.450.200.390.38
5Self-promotion0.560.590.210.560.21
6Independence0.650.660.66
7Individualism0.620.650.65
8Status0.790.750.74
9Focus on the self0.500.470.200.47
10Success0.230.390.650.65
11Financial rewards0.590.550.52
12Self-direction0.640.560.56
13Demonstrating skills or competence0.480.460.200.43
14Competition0.330.250.340.36
15Helping others0.860.860.82
16Serving humanity0.720.740.80
17Serving community0.770.760.83
18Working with people0.480.480.65
19Connection with others0.490.490.82
20Attending to others0.770.780.760.27
21Caring for others0.810.800.700.22
22Intimacy0.230.240.250.270.30
23Spiritual rewards0.460 460.47

a Items 1–14 originally represented the agentic scale, and items 15–23 represented the communal scale. Standardized pattern coefficients from the initial EFA for the three-, four-, and five-factor solutions are reported in columns 3–14. For clarity, pattern coefficients <0.2 are not shown.

The use of existing surveys usually allows the collection of less validity evidence than the creation and use of a new survey. Specifically, if previous studies collected validity evidence for the use of the survey for a similar purpose and with a similar population as the intended research, researchers can then reference that validity evidence and present less of their own. It is important to note that, even if a survey has a long history of established use, this alone does not provide adequate validity evidence for it being an appropriate measurement instrument. It is worth researchers’ time to go through the published uses of the survey and identify all the different types of validity evidence that have been collected. They can then identify the additional evidence they want to collect to feel confident applying the instrument for their intended interpretation and use. For a more detailed description of different types of validity evidence and a pedagogical description of the process of instrument validation, see Reeves and Marbach-Ad (2016) and Andrews et al. (2017) .

In this article, we will focus on the third type of validity evidence listed in Table 1 , evidence based on internal structure. Investigating the internal structure of an instrument is crucial in order to be confident that you can combine several related items to represent a specific construct. We will describe an empirical tool to gather information about the internal relationships between items in a measurement instrument: factor analysis. On its own, factor analysis is not sufficient to establish the validity of the use of an instrument in a researcher’s context and for their purpose. However, when factor analysis is combined with other validity evidence, it can increase a researcher’s confidence that they are invoking the theoretical frameworks used in the development of the instrument: that is, the researcher is correctly interpreting the results as representing the construct the instrument purports to measure.

INSTRUMENT SCOPE: ONE OR SEVERAL CONSTRUCTS?

As described in Measuring Variables That Are Not Directly Observable , a construct cannot be directly measured. Instead, different aspects of a construct are represented by different individual items. The foundational assumption in instrument development is that the construct is what drives respondents to answer similarly on all these items. Thus, it is reasonable to distill the responses on all these items into one single score and make inferences about the construct. Measurement instruments can be used to measure a single construct, several distinct constructs, or even make finer distinctions within a construct. The number of intended constructs or aspects of a construct to be measured are referred to as an instrument’s dimensionality .

Unidimensional Scales

An instrument that aims to measure one underlying construct is a unidimensional scale. To interpret a set of items as if they measure the same construct, one must have both theoretical and empirical evidence that the items function as intended; that they do, indeed represent a single construct. If a researcher takes a single value (such as the mean) to represent a set of responses to a group of items that are unrelated to one another theoretically (e.g., I like biology, I enjoy doing dissection, I know how to write a biology lab report), the resulting value would be difficult to interpret at best, if not meaningless. While all of these items are related to biology, they do not represent a specific, common construct. Obviously, taking the mean response from these three items as a measure of interest in biology would be highly problematic. For example, one could be interested in biology but dislike dissection, and one’s laboratory writing skills are likely influenced by aspects other than interest in biology. Even when a set of items theoretically seem to measure the same construct, the researcher must empirically show that students demonstrate a coherent response pattern over the set of items to validate their use to measure the construct. If students do not demonstrate a coherent response, this indicates that the items are not functioning as intended and they may not all measure the same construct. Thus, the single value used to represent the construct from that group of items would contain very little information about the intended construct.

Multidimensional Scales

An instrument that is constructed to measure several related constructs or several different aspects of a construct is called a multidimensional scale. For example, the Diekman et al. (2010) goal-endorsement instrument (see items in Box 1 and Table 2 ) we use in this article is a multidimensional scale: it theoretically aims to measure two different aspects of student goal endorsement. To be able to separate the results into two subscales, one must test that the items measure distinctly different constructs. It is important to note that whether a set of items represents different constructs can differ depending on the intended populations, which is why collecting evidence on the researcher’s own population is so critical. Wigfield and Eccles (1992) illustrate this concept in a study of children of different ages. Children in early or middle elementary school did not seem to distinguish between their perceptions of interest, importance, and usefulness of mathematics, while older children did appear to differentiate between these constructs. Thus, while it is meaningful to discuss interest, importance, and usefulness as distinct constructs for older children, is it not meaningful to do so with younger children.

In summary, before using a survey, one has to have gathered all the appropriate validity evidence for the proposed interpretations and use. When measuring a construct, one important step in this validation procedure is to explicitly describe and empirically analyze the assumed dimensionality of the survey.

THE MISUSE OF COEFFICIENT ALPHA: UNDERSTANDING THE DIFFERENCE BETWEEN RELIABILITY AND VALIDITY

In many biology educational research papers, researchers only provide coefficient alpha (also called Cronbach’s alpha) as evidence of validity. For example, in Eddy et al. (2015) , the researchers describe the alpha of two scales on a survey and no other evidence of validity or dimensionality. This usage is widely agreed to be a misuse of coefficient alpha ( Green and Yang, 2009 ). To understand why this is the case, we have to understand how validity and reliability differ and what specifically coefficient alpha measures.

Reliability is about consistency when a testing procedure is repeated ( AERA, APA, and NCME, 2014 ). For example, assuming that students do not change their goal endorsement, do repeated measurements of students’ goal endorsement using Diekman’s goal-endorsement instrument give consistent results? Theoretically, reliability can be defined as the ratio between the true variance in the construct among the participating respondents (the latent, unobserved variance the researcher aims to interpret) and the observed variance as measured by the measurement instrument ( Crocker and Algina, 2008 ). The observed variance for an item is a combination of the true variance and measurement error. Measurement error is the extent that responses are affected by factors other than the construct of interest ( Fowler, 2014 ). For example, ideally, students’ responses to Diekman’s goal-endorsement instrument would only be affected by their actual goal endorsement. But students’ responses may also be affected by things unrelated to the construct of goal endorsement. For instance, responses on communal goals items may be influenced by social desirability, students’ desire to answer in a way that they think others would want them to. Students’ responses on items may also depend on external circumstances while they were completing the survey, such as time of the day or the noise level in their environment when they were taking the survey. While it is impossible to avoid measurement error completely, minimizing measurement error increases the ratio between the true and the observed variance, which increases the likelihood that the instrument will yield similar results over repeated use.

Unfortunately, a construct cannot, by definition, be directly measured; the true score variance is unknown. Thus, reliability itself cannot be directly measured and must be estimated. One way to estimate reliability is to distribute the instrument to the same group of students multiple times and analyze how similar the responses of the same students are over time. Often it is not desirable or practically feasible to distribute the same instrument multiple times. Coefficient alpha provides a means to estimate reliability for an instrument based on a single distribution. 3 Simply put, coefficient alpha is the correlation of an instrument to itself ( Tavakol and Dennick, 2011 ). Calculation of coefficient alpha is based on the assumption that all items in a scale measure the same construct. If the average correlation among items on a scale is high, then the scale is said to be reliable.

The use and misuse of coefficient alpha as an estimate of reliability has been extensively discussed by researchers (e.g., Green and Yang, 2009 ; Sijtsma, 2009 ; Raykov and Marcoulides, 2017 ; McNeish, 2018 ). It is outside the scope of this article to fully explain and take a stand among these arguments. Although coefficient alpha may be a good estimator of reliability under certain circumstances, it has limitations. We will further elaborate on two limitations that are most pertinent within the context of instrument validation.

Limitation 1: Coefficient Alpha Is about Reliability, Not Validity

A high coefficient alpha does not prove that researchers are measuring what they intended to measure, only that they measured the same thing consistently. In other words, coefficient alpha is an estimation of reliability. Reliability and validity complement each other: for valid interpretations to be made using an instrument, the reliability of that instrument must be high. However, if the test is invalid, then reliability does not matter. Thus, high reliability is necessary, but not sufficient, to make valid interpretations from scores resulting from instrument administration. Consider this analogy using observable phenomena: a calibrated scale might produce consistent values for the weight of a student and thus the measure is reliable, but using this score to make interpretations about the students’ height would be completely invalid. Similarly, a survey’s coefficient alpha could be high, but the survey instrument could still not be measuring what the researcher intended it to measure.

Limitation 2: Coefficient Alpha Does Not Provide Evidence of Dimensionality of the Scale

Coefficient alpha does not provide evidence for whether the instrument measures one or several underlying constructs ( Schmitt, 1996 ; Sijtsma, 2009 ; Yang and Green, 2011 ). Schmitt (1996) provides two illustrative examples of why a high coefficient alpha should not be taken as a proof of a unidimensional instrument. He shows that a six-item instrument, in which all items have equal correlations to one another (unidimensional instrument), could yield the same coefficient alpha as a six-item instrument with item correlations clearly showing a two-dimensional pattern (i.e., an instrument with item correlation of 0.5 across all items has the same coefficient alpha as an instrument with item correlations of 0.8 between some items and items correlations of 0.3 between other items). Thus, as Yang and Green (2011) conclude, “A scale can be unidimensional and have a low or a high coefficient alpha; a scale can be multidimensional and have a low or a high coefficient alpha” (p. 380).

In conclusion, reporting only coefficient alpha is not sufficient evidence 1) to make valid interpretations of the scores from an instrument or 2) to prove that a set of items measure only one underlying construct (unidimensionality). We encourage readers interested in learning more about reliability to read chapters 7–9 in Bandalos (2018) . In the following section, we describe another statistical tool, factor analysis, which actually tests the dimensionality among a set of items.

FACTOR ANALYSIS: EVIDENCE OF DIMENSIONALITY AMONG A SET OF ITEMS

Factor analysis is a statistical technique that analyzes the relationships between a set of survey items to determine whether the participant’s responses on different subsets of items relate more closely to one another than to other subsets, that is, it is an analysis of the dimensionality among the items ( Raykov and Marcoulides, 2008 ; Leandre et al. , 2012; Tabachnick and Fidell, 2013 ; Kline, 2016 ; Bandalos, 2018 ). This technique was explicitly developed to better elucidate the dimensionality underpinning sets of achievement test items ( Mulaik, 1987 ). Speaking in terms of constructs, factor analysis can be used to analyze whether it is likely that a certain set of items together measure a predefined construct (collecting validity evidence relating to internal structure; Table 1 ). Factor analysis can broadly be divided into exploratory factor analysis (EFA) and confirmatory factor analysis (CFA).

Exploratory Factor Analysis

EFA can be used to explore patterns underlying a data set. As such, EFA can elucidate how different items and constructs relate to one another and help develop new theories. EFA is suitable during early stages of instrument development. By using EFA, the researcher can identify items that do not empirically belong to the intended construct and that should be removed from the survey. Further, EFA can be used to explore the dimensionality of the instrument. Sometimes EFA is conflated with principal component analysis (PCA; Leandre et al. , 2012). PCA and EFA differ from each other in several fundamental ways. EFA is a statistical technique that should be used to identify plausible underlying constructs for a set of items. In EFA, the variance the items share is assumed to represent the construct and the nonshared variance is assumed to represent measurement errors. PCA is a data reduction technique that does not assume an underlying construct. PCA reduces a number of observed variables to a smaller number of components that account for the most variance in the observed variables. In PCA, all variance is considered, that is, it assumes no measurement errors. Within educational research, PCA may be useful when measuring multiple observable variables, for example, when creating an index from a checklist of different behaviors. For readers interested in reading more about the distinction between EFA and PCA and why EFA is the most suitable for exploring constructs, see Leandre et al. (2012) or Raykov and Marcoulides (2008) .

Confirmatory Factor Analysis

CFA is used to confirm a previously stated theoretical model. Essentially, when using CFA, the researcher is testing whether the data collected supports a hypothesized model. CFA is suitable when the theoretical constructs are well understood and clearly articulated and the validity evidence on the internal structure of the scale (the relationship between the items) has already been obtained in similar contexts. The researcher can then specify the relationship between the item and the construct and use CFA to confirm the hypothesized number of constructs, the relationship between the constructs, and the relationship between the constructs and the items. CFA may be appropriate when a researcher is using a preexisting survey that has an established structure with a similar population of respondents.

A Brief Technical Description of Factor Analysis

Mathematically, factor analysis involves the analysis of the variances and covariances among the items. The shared variance among items is assumed to represent the construct. In factor analysis, the constructs (the shared variances) are commonly referred to as factors. Nonshared variance is considered error variance. During an EFA, the covariances among all items are analyzed together, and items sharing a substantial amount of variance are collapsed into a factor. During a CFA the shared variance among items that are prespecified to measure the same underlying construct is extracted. Figure 1 illustrates EFA and CFA on an instrument consisting of eight observable variables (items) aiming to measure two constructs (factors): F1 and F2. In EFA, no a priori assumption of which items represent which factors is necessary: the EFA determines these relationships. In CFA, the shared variance of items 1–4 are specified by the researcher to represent F1, and the shared variance of items 5–8 are specified to represent F2. Even further, part of what CFA tests is that items 1–4 do not represent F2, and items 5–8 do not represent F1. For both EFA and CFA, nonshared variance is considered error variance.

FIGURE 1. Conceptual illustration of EFA and CFA. Observed variables (items 1–8) by squares, and constructs (factors F1 and F2) are represented by ovals. Factor loading/pattern coefficients representing the effect of the factor on the item (i.e., the unique correlation between the factor and the item) are represented by arrows. σ j , variance for factor j ; E i , unique error variance for item i . The factor loading for one item on each factor is set to 1 to give the factors an interpretable scale.

Figures illustrating the relationships between items and factors (such as Figure 1 ) are interpreted as follows. The double-headed arrow between the factors represents the correlation between the two factors (factor correlations). Each one-­directional arrow between the factors and the items represents the unique correlation between the factor and the item (called “pattern coefficient” in EFA and “factor loading” in CFA). The pattern coefficients and factor loadings are similar to regression coefficients in a multiple regression. For example, consider the self-promotion item on Diekman’s goal-endorsement instrument. The factor loading/pattern coefficient for this item tells the researcher how much of the average respondent’s answer on this item is due to his or her general interest in agentic goals versus something unique about that item (error variance). For readers interested in more mathematical details about factor analysis, we recommend Kline (2016) , Tabachnick and Fidell (2013) , or Yong and Pearce (2013) .

Should EFA and CFA Be Applied on the Same Sample?

If a researcher decides that EFA is the best approach for analyzing the data, the results from the EFA should ideally be confirmed with a CFA before using the measurement instrument for research. This confirmation should never be conducted on the same sample as the initial EFA. Doing so does not provide generalizable information, as the CFA will be (essentially) repeating many of the relationships that were established through the EFA. Additionally, there could be something nuanced about the way the particular sample responds to items that might not be found in a second sample. For these reasons (among others), it is best practice to perform an EFA and CFA on independent samples. If a researcher has a large enough sample size, this can be done by randomly dividing the initial sample into two independent groups. It is also not uncommon for a researcher using an existing survey to decide that a CFA is suitable to start with but then discover that the data do not fit to the theoretical model specified. In this case, it is completely justified and recommended to conduct a second round of analyses starting with an EFA on half of the initial sample followed by a CFA on the other half of the sample ( Bandalos and Finney, 2010 ).

FACTOR ANALYSIS STEP BY STEP

In this section, we 1) describe the important considerations when preparing to perform a factor analysis, 2) introduce the essential analytical decisions made during an analysis, and 3) discuss how to interpret the outputs from factor analyses. We illustrate each step with real data using factor analysis to analyze the dimensionality of a goal-endorsement instrument ( Diekman et al. , 2010 ). Further, annotated code and output for running and analyzing EFA and CFA in R are provided as Supplemental Material (R syntax and Section 2) along with sample data.

Before delving into the technical details, we would like to be clear that conducting a factor analysis involves many decisions. There are no golden rules to follow to make these decisions. Instead, the researcher must make holistic judgments based on his or her specific context and available theoretical and empirical information. Factor analysis requires collecting evidence to build an argument to support a suggested instrument structure. The more time a researcher spends with the data investigating the effect of different possible decisions, the more confident they will be in finalizing the survey structure. As always, it is critical that a researcher’s decisions are guided by previously collected evidence and empirical information and not by a priori assumptions that the researcher wishes to support.

Defining the Construct and Intended Use of the Instrument

An essential prerequisite when selecting (or developing) and analyzing an instrument is to explicitly define the intended purpose and use of the instrument. Further, the theoretical construct or constructs that one aims to measure should be clearly defined, and the current general understanding of the construct should be described. The next step is to confirm a good alignment between the construct of interest and the instrument selected to measure it, that is, that the items on the instrument actually represent what one aims to measure (evidence based on content; Table 1 ). For a researcher to be able to use CFA for validation, an instrument must include at least four items in total. A multidimensional scale should have at least three but preferably five or more items for each theorized subscale. In very special cases, two items can be acceptable for a subscale ( Yong and Pearce, 2013 ; Kline, 2016 ). 4 For an abbreviated example of how to write up this type of validity for a manuscript using a survey instrument, see Box 1 .

Sample Size

The appropriate sample size needed for factor analysis is a multifaceted question. Larger sample sizes are generally better, as they will enhance the accuracy of all estimates and increase statistical power ( Gagne and Hancock, 2006 ). Early guidelines on sample sizes for factor analysis were general in their nature, such as a minimum sample size of 100 or 200 (e.g., see Boomsma, 1982 ; Gorsuch, 1983 ; Comrey and Lee, 1992 ). Although it is very tempting to adopt such general guidelines, caution must be taken, as they might lead to underestimating or overestimating the sample size needed ( Worthington and Whittaker, 2006 ; Tabachnick and Fidell, 2013 ; Wolf et al. , 2013 ).

The sample size needed depends on many elements, including number of factors, number of items per factor, size of factor loadings or pattern coefficients, correlations between factors, missing values in the data, reliability of the measurements, and the expected effect size of the parameters of interest ( Gagne and Hancock, 2006 ; Worthington and Whittaker, 2006 ; Wolf et al. , 2013 ). Wolf et al. (2013) showed that a sufficient sample size for a one-factor CFA with eight items and factor loadings of 0.8 could be as low as 30 respondents. For a two-factor CFA with three or four items per scale and factor loadings of 0.5, a sample size of ∼450 respondents is needed. For EFA, Leandre et al. (2012) recommend that “under moderately” good conditions (communalities 5 of 0.40–0.70 and at least three items for each factor), a sample of at least 200 should be sufficient, while under poor conditions (communalities lower than 0.40 and some factors with only two items for each factor), a sample size of at least 400 is needed. Thus, when deciding on an appropriate sample size, one should consider the unique properties of the actual survey. The articles written by Wolf et al. (2013) and Gagne and Hancock (2006) provide a good starting point for such considerations. See Box 1 for an example of how to discuss sample size decisions in a manuscript.

In some cases, it may be implausible to have the large sample sizes necessary to obtain stable estimates from an EFA or a CFA. Often researchers must work with data that have already been collected or are using a study design that simply does not include a large number of respondents. In these circumstances, it is strongly recommended that one use a measurement instrument that has already been validated for use in a similar population for a similar purpose. In addition to considering and analyzing other relevant types of validity evidence (see Table 1 ), the researchers should report on validity evidence based on internal structure from other studies and describe the context of those studies relative to their own context. The researchers should also acknowledge in the methods and limitation sections that they could not run dimensionality checks on their sample. Further, researchers can also analyze a correlation matrix 6 of the responses to the survey items from their own data collection to get a sense of how the items may relate to one another in their context. This correlation matrix may be reported to help provide preliminary validity evidence based on internal structure.

Properties of the Data

As with any statistical analysis, before performing a factor analysis the researcher must investigate whether the data meet the assumptions for the proposed analysis. Section 1 of the Supplemental Material provides a summary of what a researcher should check for in the data for the purposes of meeting the assumptions of a factor analysis and an illustration applied to the example data. These include analyses of missing values, outliers, factorability, normality, linearity, and multicollinearity. Box 3 provides an example of how to report these analyses in a manuscript.

Analytic Considerations for CFA

Once the data are screened to determine their properties, several analytical decisions must be made. Because there are some differences in analytical decisions and outputs for EFA and CFA, we will discuss EFA and CFA in separate sections. We will start with CFA, as most researchers adopting an existing instrument will use this method first and may not ever need to perform an EFA. See Box 2 for how to report analytical considerations for a CFA in a manuscript.

BOX 2. What to report in the methods of a publication for a CFA using the goal-endorsement example

We chose to start with a CFA to confirm a two-factor solution, because 1) the theoretical framework underlying the instrument is well understood and articulated and 2) Diekman et al. (2010) performed an EFA on a similar population to ours that supported the two-factor solution. If the assumed factor model was confirmed, then we could confidently combine the items into two sum scores and interpret the data as representing both an agentic and a communal factor. CFA was run using the R package lavaan ( Rosseel, 2012 ).

Selecting an estimator

In consideration of the ordinal and nonnormal nature of the data, the robust maximum-likelihood estimation (MLR) was used to extract the variances from the data. Full-information maximum likelihood in the estimation procedure was used to handle the missing data.

Specifying a two-factor CFA

To confirm the factor structure proposed by Diekman et al. (2010) , we specified a two-factor CFA, with items 1–14 representing the agentic scale and items 15–23 representing the communal factor ( Table 2 ). Correlation between the two factors was allowed. For identification purposes, the factor loading for one item on each factor was set to 1. The number of variances and covariances in the data was 276 (23(23 + 1)/2), which was larger than the number of parameter estimates (one factor correlation, 23 error terms, 21 factor loadings, and variances for each factor). Thus, the model was overidentified.

Selecting model fit indices and setting cutoff values

Multiple fit indices (chi-square value from robust MLR [MLR χ 2 ]; comparative fit index [CFI]; the root-mean-square error of approximation [RMSEA]; and the standardized root-mean-square residual [SRMR]) were consulted to evaluate model fit. The fit indices were chosen to represent an absolute, a parsimony-adjusted, and an incremental fit index. Consistent with the recommendations by Hu and Bentler (1999) , the following criteria were used to evaluate the adequacy of the models: CFI > 0.95, SRMR < 0.08, and RMSEA < 0.06. Coefficient alpha was computed based on the model results and used to assess reliability. Values > 0.70 were considered acceptable.

Selecting an Estimator.

When performing a CFA, a researcher must choose a statistical method for extracting the variance from the data. There are several different methods available, including unweighted least squares, generalized least squares, maximum likelihood, robust maximum likelihood, principal axis factoring, alpha factoring, and image factoring. Each of these methods has its strengths and weaknesses. Kline (2016) and Tabachnick and Fidell (2013) provide a useful discussion of several of these methods and when best to apply each one. In general, because data from surveys are often on an ordinal level (e.g., data from Likert scales) and sometimes slightly nonnormally distributed, estimators robust against nonnormality, such as maximum-likelihood estimation with robust standard errors (MLR) or weighted least-squares estimation (WLS), are often suitable for performing CFA. Whether or not MLR or WLS is most suitable depends partly on the number of response options for the survey items. MLR work best when data can be considered continuous. In most cases, scales with seven response options work well for this purpose, whereas scales with five response options are questionably continuous. MLR is still often used in estimation for five response options, but with four or fewer response options, WLS is better ( Finney and DiStefano, 2006 ). The decision regarding the number of response options to include in a survey should not be driven by these considerations. Rather, the number of response options and properties of the data should drive the selection of the CFA estimator. Although more response options for an item allow researchers to model it as continuous, respondents may not be able to meaningfully differentiate between the different response options. Fewer response options usually offer less ambiguity, but usually result in less variation in the response. For example, if students are provided with 10 options to indicate their level agreement with a given item, it is possible that not all of the response options may be used. In such a case, fewer response options may better capture the latent distribution of possible responses to an item.

Specifying the Model.

The purpose of a CFA is to test whether the data collected with an instrument support the hypothesized model. Using theory and previous validations of the instrument, the researcher specifies how the different items and factors relate to one another (see Figure 1 for an example model). For a CFA, the number of parameters that the researcher aims to estimate (e.g., error terms, variances, correlations and factor loadings) must be less than or equal to the number of possible variances and covariances among the items ( Kline, 2016 ). For a CFA, a simple equation tells you the number of possible variances and covariances: p ( p + 1)/2, where p = number of items. If the number of parameters to estimate is more than the number of possible variances and covariances among the items, the CFA is called “underidentified” and will not provide interpretable results. When the number of parameters to be estimated equals the number of covariances and variances among the items, the model is deemed “just identified” and will result in perfect fit of the data to the model, regardless of the true relationship between the items. To test whether the data fit the theoretical model, the number of parameters that are being estimated needs to be less than the number of variances and covariances observed in the data. In this case, the model is “over­identified.” For the example CFA in Figure 1 , the number of possible variances and covariances is 8(8 + 1)/2 = 36, and the number of parameters to estimate is 17 (one factor correlation, eight error terms, six factor loadings, and variances for each of the two factors 7 ), thus the model is overidentified.

Choosing Appropriate Model Fit Indices.

The true splendor of CFA is that so-called model fit indices have been developed to help researchers understand whether the data support the hypothesized theoretical model. 8 The closest statistic to an omnibus test of model fit is the model chi-square test. The null hypothesis for the chi-square test is that there is no difference between the hypothesized model and the observed relationships within the data. Several researchers argue that this is an unrealistic hypothesis ( Hu and Bentler, 1999 ; Tabachnick and Fidell, 2013 ). A close approximation of the data to the model is more realistic than a perfect model fit. Further, the model chi-square test is very sensitive to sample size (the chi-square statistic tends to increase with an increase in sample size, all other considerations constant; Kline, 2016). Thus, while large sample sizes provide good statistical power, the null hypothesis that the factor model and the data do not differ from each other may be rejected although the difference is actually quite small. Given these concerns, it is important to consider the result of the chi-square test in conjunction with multiple other model fit indices.

Many model fit indices have been developed that quantify the degree of fit between the model and the data. That is, the values provided by these indices are not intended to make binary (fit vs. no fit) judgments about model fit. These model fit indices can be divided into absolute, parsimony-adjusted, and incremental fit indices ( Bandalos and Finney, 2010 ). Because each type of index has its strengths and weaknesses (e.g., sensitivity to sample size, model complexity, or misspecified factor correlations), using at least two different types of fit indices is recommended ( Hu and Bentler, 1999 ; Tabachnick and Fidell, 2013 ). The researcher should decide a priori which model fit indices to use and the cutoff values that will be considered a good enough indicator of model fit to the data. Hu and Bentler (1999) recommend using one of the relative fix indices such as comparative fit index (CFI) with a cutoff of >0.95 in combination with standardized root-mean-square residual (SRMR; absolute fit indices, good model < 0.08) or root-mean-square error of approximation (RMSEA; parsimony-adjusted fit indices, good model < 0.06) as indicators for good fit. Some researchers, including Hu and Bentler (1999) , caution against using these cutoff values as golden rules because it might lead to incorrect rejection of acceptable models ( Marsh et al. , 2004 ; Perry et al. , 2015 ).

Interpreting the Outputs from CFA

After making all the suggested analytical decisions, a researcher is now ready to apply a CFA to the data. Model fit indices that the researcher a priori decided to use are the first element of the output that should be interpreted from a CFA. If these indices suggest that the data do not fit the specified model, then the researcher does not have empirical support for using the hypothesized survey structure. This is exactly what happened when we initially ran a CFA on Diekman’s goal-endorsement instrument example (see Box 3 ). In this case, focus should shift to understanding the source of the model misfit. For example, one should ask whether there are any items that do not seem to correlate with their specified latent factor, whether any correlations seem to be missing, or whether some items on a factor group together more strongly than other items on that same factor. These questions can be answered by analyzing factor loadings, correlation residuals, and modification indices. In the following sections, we describe these in more detail. See Boxes 3 , 6 , and 7 for examples of how to discuss and present output from a CFA in a paper.

BOX 3. How to interpret and report CFA output for publication using the goal-endorsement example, initial CFA

Descriptive statistics.

No items were missing more than 1.3% of their values, and this missingness was random (Little’s MCAR test: chi-square = 677.719, df = 625, p = 0.075 implemented with the BaylorEdPsych package; Beaujean, 2012 ). Mean values for the items ranged from 4.1 to 6.3. Most items had a skewness and kurtosis below |1.0|, and all items had a skewness below |2.0| and kurtosis below |4.0|. Mardia’s multivariate normality test (implemented with the psych package; Revelle 2017 ) showed significant multivariate skewness and kurtosis values. Intra-subscale correlations ranged from 0.02 to 0.73, and the lowest tolerance value was 0.36.

Interpreting output from the initial two-factor CFA

Results from the initial two-factor CFA indicated that, in our population, the data did not support the model specified. The chi-square test of model fit was significant (χ 2 = 1549, df = 229, p < 0.00), but this test is known to be sensitive to minor model misspecification with large sample sizes ( n = 796). However, additional model fit indices also indicated that the data did not support the model specified. SRMR was 0.079, suggesting good fit, but CFI was 0.818, and RMSEA was 0.084. Thus, the hypothesized model was not empirically supported by the data.

To better understand this model misspecification, we explored the factor loadings, correlational residuals, original interitem correlation matrix, and modification indices. Several factor loadings were well below 0.7, indicating that the factors did not explain these items well. Analysis of correlational residuals did not point out any special item-pair correlation as especially problematic; rather, several correlational residuals were residuals greater than |0.10|. Consequently, the poor model fit did not seem to be primarily caused by a few ill-fitting items. A reinvestigation of the interitem correlation matrix made when analyzing the factorability of the data (see the Supplemental Material, Section 1) suggested the presence of more than two factors. This was most pronounced for the agentic scale, for which some items had a relatively high correlation to one another and lower correlations to other items in that scale. Inspection of the modification indices suggested adding correlations between, for example, the items achievement and mastery. Together, these patterns indicate that the data might be better represented by more than two factors.

Factor Loadings.

As mentioned in Brief Technical Description of Factor Analysis , factor loadings represent how much of the respondent’s response to an item is due to the factor. When a construct is measured using a set of items, the assumption is that each item measures a slightly different aspect of the construct and that the common variance among them is the best possible representation of the construct. High, but not too high, factor loadings for these items are preferred. If several items have high standardized factor loadings 9 (e.g., above 0.9), this suggests that they share a lot of variance, which indicates that these items may be too similar and thus do not contribute unique information ( Clark and Watson, 1995 ). On the other hand, if an item has a low factor loading on its focal factor, it means that item shares no or little variance with the other items that theoretically belong to the same focal factor and thus its contribution to the factor is low. Including items with low factor loadings when combining the scores from several items into a single score (sum, average, or common variance) will introduce bias into the results. 10 There is, however, no clear rule for when an item has a factor loading that is too low to be included. Bandalos and Finney (2010) argue that, because the items are specifically chosen to indicate a factor, one would hope that the variability explained in the item by the factor would be high (at least 50%). Squaring the standardized factor loadings provides the amount of variability explained in the item by the factor ( R 2 ), indicating that it is desirable to have standardized factor loadings of at least 0.7 ( R 2 = 0.7 2 = ∼50%). However, the acceptable strength of the factor loading depends on the theoretically assumed relationship between the item and the factor. Some items might be more theoretically distant from the factor and therefore have lower factor loadings, but still comprise an essential part of the factor. This reinforces the idea that there are no hard and fast rules in factor analysis. Even if an item does not reach the suggested level for factor loading, if a researcher can argue from a theoretical basis for its inclusion, then it could be included.

Correlation Residuals.

As mentioned before, CFA is used to confirm a previously stated theoretical model. In CFA, the collected data are used to evaluate the accuracy of the proposed model by comparing the discrepancy between what the theoretical model implies (e.g., a two-factor model in the Diekman et al. [2010] example) and what is observed in the actual data. Correlation residuals represent the differences between the observed correlations in the data and the correlations implied by the CFA ( Bandalos and Finney, 2010 ). Local areas of misfit can be identified by inspecting correlational residuals. Correlation residuals greater than |0.10| are indicative of a specific item-pair relationship that is poorly reproduced by the model ( Kline, 2016 ). This guideline may be too low when working with small sample sizes and too large when working with large samples sizes and, as with all other fit indices, should only be used as one element among many to understand model fit.

Modification Indices.

Most statistical software used for CFA provides modification indices that can easily be viewed by the user. Modification indices propose a series of possible additions to the model and estimate the amount the model’s chi-square value would decrease if the suggested parameter were added (recall that a lower chi-square value indicates better model fit). For example, if an item strongly correlates with two factors but is constrained to only correlate with one, the modification index associated with adding a relationship to the second factor would indicate how much the chi-square model fit is expected to improve with the addition of this factor loading. In short, modification indices can be used to better understand which items or relationships might be driving the poor model fit.

If (and only if) theoretically justified, a suggested relationship can be added or problematic items can be removed during a CFA. However, caution should be taken before adding or removing any parameters ( Bandalos and Finney, 2010 ). As Bandalos and Finney (2010) state, “Researchers must keep in mind that the purpose of conducting a CFA study is to gain a better understanding of the underlying structure of the variables, not to force models to fit” (p. 112). If post hoc changes to the model are made, the analysis becomes more explorative in nature, and thus tenuous. The modified model should ideally be confirmed with a new data set to avoid championing a model that has an artificially good model fit.

Best practice if the model does not fit (as noted in Factor Analysis ) is to split the data and conduct a second round of analyses starting with an EFA using half of the sample and then conducting a CFA with the other half ( Bandalos and Finney, 2010 ). To see an example of how to write up this secondary CFA analysis, see Boxes 6 and 7 of the goal-endorsement example.

When the Model Fit Is Good.

When model fit indices indicate that the hypothesized model is a plausible explanation of the relationships between the items in the data, factor loadings and the correlation between the latent variables in the model (so-called factor correlations) can be interpreted and a better understanding of the construct can be gained. It is also now appropriate to calculate and report the coefficient alpha, omega, or any other index of reliability for each of the subscales. The researcher can more confidently use the results from the instrument to make conclusions about the intended constructs based on combined scale scores (given that other relevant validity evidence presented in Table 1 also supports the intended interpretations). If a researcher has used CFA to examine the dimensionality of the items and finds that the scale functions as intended, this information should be noted in the methods section of the research manuscript when describing the measurement instruments used in the study. At the very least, the researcher should report the estimator and fit indices that were used and accompanying values for the fit indices. If the scale has been adapted in some way, or if it is being empirically examined for the first time, all of the factor loadings and factor correlations should also be reported so future researchers can compare their values with these original estimates. These could be reported as a standalone instrument validation paper or in the methods section of a study using that instrument.

Analytical Considerations for EFA

If a researcher’s data do not fit the model proposed in the CFA, then using the items as indicators of the hypothesized construct is not sensible. If the researcher wants to continue to use the existing items, it is prudent to investigate this misfit to better understand the relationships between the items. This calls for the use of an EFA, where the relationships between variables and factors are not predetermined (i.e., a model is not specified a priori) but are instead allowed to emerge from the data. As mentioned before, EFA could also be the first choice for a researcher if the instrument is in an early stage of development. We outline the steps for conducting an EFA in the following sections. See Box 4 for a description of how to describe analytical considerations for an EFA in the methods section.

BOX 4. What to report in the methods of a publication for an EFA using the goal-endorsement example

Because the results from the initial CFA indicated that the data did not support a two-factor solution, we proceeded with an EFA to explore the factor structure of the data. The original sample was randomly divided into equal-sized parts, and EFA was performed on half of the sample ( n = 398) to determine the dimensionality of the goal-endorsement scale and detect possible problematic items. This was followed by a CFA ( n = 398) to confirm the result gained from the EFA. EFA and CFA were run using the R package lavaan ( Rosseel, 2012 ).

Selecting an estimator for the EFA

Considering the ordinal and nonnormal nature of the data, a principal axis factor estimator was used to extract the variances from the data. Only cases with complete items were used in the EFA.

Factor rotation

Due to the fact that theory and the preceding CFA indicated that the different subscales are correlated, quartimin rotation (an oblique rotation) was chosen for the EFA.

Determining the number of factors

Visual inspection of the scree plot, parallel analysis (PA) based on eigenvalues from the principal components and factor analysis in combination with theoretical considerations were used to decide on the appropriate number of factors to retain. PA was implemented with the psych package ( Revelle, 2017 ).

Just as with CFA, the first step in an EFA is selecting a statistical method to use to extract the variances from the data. The considerations for the selection of this estimator are similar to those for CFA (see Selecting an Estimator ). One of the most commonly used methods for extracting variance when conducting an EFA on ordinal data with slight nonnormality is principal axis factoring (Leandre et al. , 2012). If the items in one’s instrument have fewer than five response options, WLS can be considered.

Factor Rotation.

Factor rotation is a technical step to make the final output from the model easier to interpret (see Bandalos, 2018 , pp. 327–334, for more details). The main decision for the researcher to make here is whether the rotation should be orthogonal or oblique ( Raykov and Marcoulides, 2008 ; Leandre et al. , 2012; Bandalos, 2018 ). Orthogonal means that the factors are uncorrelated to one another in the model. Oblique allows the factors to correlate to one another. In educational studies, factors are likely to correlate to one another; thus oblique rotation should be chosen unless a strong hypothesis for uncorrelated factors exists (Leandre et al. , 2012). Orthogonal and oblique are actually families of rotations, so once the larger choice of family is made, a specific rotation method must be chosen. The specific rotation method within the oblique category that is chosen does not generally have a strong effect on the results ( Bandalos and Finney, 2010 ). However, the researcher should always provide information about which rotation method was used ( Bandalos and Finney, 2010 ).

Determining the Number of Factors.

After selecting the methods for estimation and rotation, researchers must determine how many factors to extract for EFA. This step is recognized as the greatest challenge of an EFA, and the issue has generated a large amount of debate (e.g., Cattell, 1966 ; Crawford et al. , 2010 ; Leandre et al. , 2012). Commonly used methods are to retain all factors with an eigenvalue >1 or to use a scree plot. Eigenvalues are roughly a measure of the amount of information contained in a factor, so factors with higher eigenvalues are the most useful for understanding the data. A scree plot is a plot of eigenvalues versus number of factors. Scree plots allow researchers to visually estimate the number of factors that are informative by considering the shape of the plot (see the annotated output in the Supplemental Material, Section 2, for an example of a scree plot). These two methods are considered heuristic, and many researchers recommend also using parallel analysis (PA) or the minimum average partial correlation test to determine the appropriate number of factors ( Ledesma and Valero-Mora, 2007 ; Leandre et al. , 2012; Tabachnick and Fidell, 2013 ). In addition, several statistics that mathematically analyze the shape of the scree plot have been developed in an effort to provide a nonvisual method of determining the number of factors ( Ruscio and Roche, 2012 ; Raiche et al. , 2013).

We recommend using a number of these indices, as well as theoretical considerations, to determine the number of factors to retain. The results of all of the various methods discussed provide plausible solutions that can all be explored to evaluate the best solution. When these indices are in agreement, this provides more evidence of a clear factor structure in the data. To make each factor interpretable, it is of outmost importance that the number and nature of factors retained make theoretical sense (see Box 5 for a discussion on how many factors to retain). Further, the intended use for the survey should also be considered. For example, say a researcher is interested in studying two distinct populations of students. If the empirical and theoretical evidence supports both a two-factor and a three-­factor solution, but the three-factor solution provides a clearer distinction between two populations of interest, then the researcher might choose the three-factor solution (see Box 7 ).

BOX 5. How to interpret and report EFA output for publication using the goal-endorsement example

Initial efas.

Parallel analysis based on eigenvalues from the principal components and factor analysis indicated three components and five factors. The scree plot indicated an initial leveling out at four factors and a second leveling out at six factors.

We started by running a three-factor model and then increased the number of factors by one until we had run all models ranging from three to six factors. The pattern matrices were then examined in detail with a special focus on whether the factors made theoretical sense (see Table 2 for pattern matrices for the three-, four-, and five-factor models). The three-factor solution consisted of one factor with high factor loadings for the items representing communal goals (explaining 17% of the variance in the data). The items originally representing agentic goals were split into two factors. One factor included items that theoretically could be described as prestige (explaining 12% of the variance in the data) and the other items related to autonomy and competency (explaining 11% of the variance in the data). The total variance explained by the three-factor model was 41%. In the four-factor solution, the autonomy and competency items were split into two different factors. In the five-factor solution, three items from the original communal goals scale (working with people, connection to others, and intimacy) contributed most to the additional factor. In total, 48% of the variance was explained by the five-factor model. For a six-factor solution, the sixth factor included only one item with pattern loadings greater than 0.40, and thus a six-factor solution was deemed to be inappropriate.

In conclusion, the communal scale might represent one underlying construct as suggested by previous research or it might be split into two subscales represented by items related to 1) serving others and 2) connection. Our data did not support a single agentic factor. Instead, these items seemed to fit on two or three subscales: prestige, autonomy, and possibly competency. Because all the suggested solutions (three-, four-, and five-factor solutions) included a number of poorly fitting items, we decided to remove items and run a second set of EFAs before proceeding to the CFA.

Second round of EFAs

On the basis of the results from the initial EFAs, we first continued with a three-factor solution, removing items with low pattern coefficients (<0.40; 10: success, 14: competition, and 22: intimacy, to begin with; Table 2 ). When these variables were removed in a stepwise manner, additional items now showed low pattern coefficients (<0.40) and/or low communalities in the new EFA solutions. The new items showing low pattern coefficients were items belonging to their own factors in the five-factor EFA (i.e., items representing competency and connection). Not until all items from these two scales were removed was a stable three-factor solution achieved with pattern coefficients >0.40. Thus, to achieve a three-factor solution, including only items with pattern coefficients >0.40, we had to drop 30% of the items and, consequently, extensively narrow the content validity of the scale.

To further explore a five-factor solution, we decided, on the basis of the empirical results and the theoretical meaning of the items, to stepwise remove items 4 (mastery), 14 (competition), and 22 (intimacy). We used an inclusive pattern coefficient cutoff (<0.40) for this initial round of validation, because we wanted to keep as many items as possible from the original scale. If some items continue to show pattern coefficients below 0.5 over repeated data collections, researchers should reconsider whether these items should be kept in the scale. The new 20-item five-factor solution resulted in theoretically the same factors as for the first five-factor EFA, but now all pattern coefficients but one were above 0.50 on the primary factor and below 0.20 on the other factors ( Table 3 ). In total, 52% of the variance in the data was explained.

Standardized pattern coefficients for the (2010) goal-endorsement instrument from the second EFA for the five-factor solutions

12345
1Power0.75
2Recognition0.60
3Achievement0.81
5Self-promotion0.56
6Independence0.65
7Individualism0.69
8Status0.76
9Focus on the self0.50
10Success0.55
11Financial rewards0.55
12Self-direction0.55
13Demonstrating skills or competence0.40
15Helping others0.84
16Serving humanity0.80
17Serving community0.80
18Working with people0.94
19Connection with others0.53
20Attending to others0.75
21Caring for others0.74
23Spiritual rewards0.500.20

a For clarity, pattern coefficients <0.2 are not shown.

In conclusion, the initial CFA, as well as the EFA analysis, indicated that the two-dimensional scale previously suggested was not supported in our sample. The EFA analysis mainly indicated a three- or a five-factor solution. To achieve a good three-factor solution, we had to exclude 30% of the original items. The final three factors were labeled “prestige,” “autonomy,” and “service.” Both the empirical data and theoretical consideration suggested two additional factors: a competency factor and a connection factor. We continued with this five-factor solution, as it allowed us to retain more of the original items and made theoretical sense, as the five factors were just a further parsing of the original agentic and communal scales.

BOX 6. How to interpret and report CFA output for publication using the goal-endorsement example, second CFA

Based on the results from the EFAs, a second CFA was specified using the five-factor model with 20 items (excluding 4: mastery, 10: competition, and 22: intimacy). The specified five-factor CFA demonstrated appropriate model fit (χ 2 = 266, df = 160, p < 0.00, CFI = 0.959, RMSEA = 0.046, and SRMR = 0.050). Factor loadings were close to or above 0.70 for all but three items ( Figure 2 ), meaning that, for most items, around 50% of the variance in the items was explained ( R 2 ≈ 0.5) by the theorized factor. This means that the factors explained most of the items well. Factor correlations were highest between the service and connection factors (0.76) and the autonomy and competency (0.67) factors. The lowest factor correlation found was between the prestige and service factors (0.21). Coefficient alpha values for the subscales were 0.81, 0.77, 0.66, 0.87, and 0.77 for prestige, autonomy, competency, service, and connection, respectively.

FIGURE 2. Results from the final five-factor CFA model. Survey items (for items descriptions see Table 3 ) are represented by squares and factors are represented by ovals. The numbers below the double-headed arrows represent correlations between the factors; the numbers by the one-directional arrows between the factors and the items represent standardized factor loadings. Small arrows indicate error terms. *, p < 0.01; p < 0.001 for all other estimates.

BOX 7. Writing conclusions from factor analysis for publication using the goal-endorsement example

Conclusions.

The results from the factor analysis did not confirm the proposed two-factor goal-endorsement scale for use with college STEM majors. Instead, our results indicated five subscales: prestige, autonomy, competency, service, and connection ( Table 4 ). The five-factor solution aligned with Diekman et al. ’s (2010) original two-factor scale, because communal items did not mix with agentic items. Our sample did, however, allows us to further refine the solution for the original two scales. Finer parsing of the agentic and communal scales may help identify important differences between students and allow researchers to better understand factors contributing to retention in STEM majors. In addition, with items related to autonomy and competency moved to their own scales, the refined prestige scale focusing on factors like power, recognition, and status may be a more direct contrast to the service scale. Additional evidence in support of this refinement include that the five-factor solution better distinguishes the service scale and the prestige scale (factor correlation = 0.21) than the two-factor solution (factor correlation between agentic and communal factors = 0.35). Further, retention may be significantly correlated to prestige but not to autonomy. Alternatively, differences between genders may exist for the service scale but not the connection scale.

Proposed five-factor solution. Items within each factor are ordered by highest to lowest factor loadings

ServicePrestigeAutonomyConnectionCompetency
Helping othersStatusIndividualismWorking with peopleAchievement
Serving humanityPowerIndependenceConnection with othersSuccess
Serving communityRecognitionSelf-directionCompetence
Attending to othersSelf-promotionFocus on the self
Caring for othersFinancial rewards
Spiritual rewards

On the basis of the result of this factor analysis, we recommend using the five-factor solution for interpreting the results of the current data set, but interpret the connection and competency scales with some caution, for reasons summarized in the next section.

Limitations and future studies

The proposed five-factor solution needs additional work. In particular, both the competency and connection scales need further development. Only two items represented connection, and this is not adequate to represent the full aspect of this construct, especially to make it clearly distinct from the construct of service. The competency scale included only three items, coefficient alpha was 0.66, and factor loadings for the scale were low (<0.40) for demonstrating skills or competency.

Another limitation of this study is that the sample consisted of 70% women, an overrepresentation of women for a typical undergraduate STEM population. Further studies should confirm whether the suggested dimensionality holds in a more representative sample. Future studies should also test whether the instrument has the same structure with STEM students from different backgrounds (i.e., measurement invariance should be investigated). The work presented here only establishes the dimensionality of the survey. We recommend the collection of other types of validity evidence, such as evidence based on content or relationships to other variables, to further strengthen our confidence that the scores from this survey represent STEM students’ goal orientation.

Interpreting Output from EFA

The aim of EFA is to gain a better understanding of underlying patterns in the data, investigate dimensionality, and identify potentially problematic items. In addition to the results from parallel analysis or other methods used to estimate the number of factors, other informative measures include pattern coefficients and communalities. These outputs from an EFA will be discussed in this section. See Box 5 for an example of how to write up the output from an EFA.

Pattern Coefficients and Communalities.

Pattern coefficients and communalities are parameters describing the relationship between the items and the factors. They help researchers understand the meaning of the factors and identify items that do not empirically appear to belong to their theorized factor.

Pattern coefficients closely correspond to factor loadings in CFA, and they are commonly the focal output from an EFA (Leandre et al. , 2012). Pattern coefficients represent the impact each factor has on an item after controlling for the impact of all the other factors on that item. A high pattern coefficient suggests that the item is well explained by a particular factor. However, as with CFA, there is no clear rule as to when an item has a pattern coefficient too low to be considered part of a particular factor. Guidelines for minimum pattern coefficient values range from 0.40 to 0.70. In other words, all items with pattern coefficients equal to or higher than the chosen cutoff value can be considered “good” items and should be kept in the survey ( Matsunaga, 2010 ).

It is also important to consider the magnitude of any cross-loadings. Cross-loading describes the situation in which an item seems to be influenced by more than one factor in the model. Cross-loading is indicated when an item has high pattern coefficients for multiple factors. Using that item is problematic when creating a summed/mean score for a factor, as responses to that item are not uniquely driven by its hypothesized factor, but instead by additional measured factors. Cross-loadings higher than 0.20 or 0.30 are usually considered to be problematic ( Matsunaga, 2010 ), especially if the item does not have a particularly strong loading on a focal factor.

Communality represents the percentage of the variance in responses on an item accounted for by all factors in the proposed model. Communalities are similar to R 2 in CFA (see Factor Loadings ). However, in CFA, the variance in an item is only explained by one factor, while in EFA, the variance in one item can be explained by several factors. Low communality for an item means that the variance in the item is not well explained by any part of the model, and thus that item could be a subject for elimination.

We emphasize that, even if pattern coefficients or communalities indicate that an item might be subject for elimination, it is important to consider the alignment between the item and hypothesized construct before actually eliminating the item. The items in a scale are presumably chosen for some theoretical reason, and eliminating any items can cause a decrease in content validity ( Bandalos and Finney, 2010 ). If any item is removed, the EFA should be rerun to ensure that the original factor structure persists. This can be done on the same data set, as EFA is exploratory in nature.

Interpreting the Final Solution.

Once the factors and the items make empirical and theoretical sense, the factor solution can be interpreted, and suitable names for the factors should be chosen (see Box 5 for a discussion of the output from an EFA). Important sources of information for this include: the amount variance explained by the whole solution and the factors, factor correlations, pattern coefficients, communality values, and the underlying theory. Because the names of the factors will be used to communicate the results, it is crucial that the names reflect the meaning of the underlying items. Because the item responses are manifestations of the constructs, different sets of items representing a construct will, accordingly, lead to slightly different nuanced interpretations of that construct. Once a plausible solution has been identified by an EFA, it is important to note that stronger support for the solution can be obtained by testing the hypothesized model using a CFA on a new sample.

CONCLUDING REMARKS

In this article, we have discussed the need for understanding the validity evidence available for an existing survey before its use in discipline-based educational research. We emphasized that validity is not a property of the measurement instrument itself but is instead a property of the instrument’s use. Thus, each time a researcher decides to use an instrument, they have to consider to what degree evidence and theory support the intended interpretations and use of the instrument. A researcher should always review the different kinds of validity evidence described by AERA, APA, and NCME (2014 ; Table 1 ) before using an instrument and should identify the evidence they need to feel confident when employing the instrument for an intended use. When using several related items to measure an underlying construct, one important validity aspect to consider is whether a set of items can confidently be combined to represent that construct. In this paper, we have shown how factor analysis (both exploratory and confirmatory) can be used to investigate that.

We recognize that the information presented herein may seem daunting and a potential barrier to carrying out important, substantive, educational research. We appreciate this sentiment and have experienced those fears ourselves, but we feel that properly understanding procedures for vetting instruments before their use is essential for robust and replicable research. To reiterate, at issue here is the confidence and trust one can have in one’s own research, both after its initial completion and in future studies that will rely on the replicability of results. Again, we can use an analogy for the measurement of unobservable phenomena: one would not expect an uncalibrated and calibrated scale to produce the same values for the weight of a rock. This does not mean that the uncalibrated scale will necessarily produce invalid measurements, only that one’s confidence in its ability to do so should be tempered by the knowledge that it has not yet been calibrated. Research conducted using uncalibrated or biased instruments, regardless of discipline, is at risk of inferring conclusions that are incorrect. The researcher may make the appropriate inferences given the values provided by the instrument, but if the instrument itself is invalid for the proposed use, then the inferences drawn are also invalid. Our aim in presenting these methods is to strengthen the research conducted in biology education and continue to improve the quality of biology education in higher education.

1 In this article, we will use the terms “surveys,” “measurement instrument,” and “instrument” interchangeably. We will, however, put the most emphasis on the term “measurement instrument,” because it conveys the importance of considering the quality of the measurement resulting from the instrument’s use.

2 “Latent variables” and “constructs” both refer to phenomena that are not directly observable. Examples could include a student’s goals, the strength of his or her interest in biology, or his or her tolerance of failure. The term “latent variable” is commonly used when discussing these phenomena from a measurement point of view, while “construct” is a more general term used when discussing these phenomena from a theoretical perspective. In this article, we will use the term “construct” only when referring to phenomena that are not directly observable.

3 In addition to coefficient alpha, there are a number of other reliability estimates available. We refer interested readers to Bandalos (2018) , Sijtsma (2009) , and Crocker and Algina (2008) .

4 This is partly due to identification issues (see Specifying the Model ).

5 In EFA, communalities describe how much of the variance in an item is explained by the factor. For more information about communalities, see Interpreting Output from EFA .

6 For a description of a correlation matrix, see the Supplemental Material, Sections 1 and 2.

7 It is necessary to set the metric to interpret factor loadings and variances in a CFA model. This is commonly done by either 1) choosing one of the factor loadings and fixing it to 1 (this is done for each factor in the model) or 2) by fixing the variance of the latent factors to 1. We have chosen the former approach for this example.

8 For some software and estimation methods, model fit indices are also provided for EFA. In a similar way as for CFA, these model fit indices can be used to evaluate the fit of the data to the model.

9 When using CFA, the default setting in most software is to provide factor loadings in the original metric of the items, such that the results are covariances between the items and the factor. Because these values are unstandardized, it is sometimes hard to interpret these relationships. For this reason, it is common to standardize factor loadings and other model relationships (e.g., correlations between latent factors), which puts them in the more familiar correlation format that is bounded by −1 and +1.

10 When distilling the responses of several items into a single score, one is implicitly assuming that all of the items measure the underlying construct equally well (usually without measurement error) and are of equal theoretical importance. Fully discussing the nuances of how to create a single score from a set of items is beyond the scope of this paper, but we would be remiss if we did not at least mention it and encourage the reader to seek more information, such as DiStefano et al . (2009 ).

ACKNOWLEDGMENTS

We are indebted to Ashely Rowland, Melissa McCartney, Matthew Kararo, Julie Charbonnier, and Marie Janelle Tacloban for their comments on earlier versions of this article. The research reported in this paper was supported by awards from the National Science Foundation (NSF DUE 1534195 and 1711082). This research was conducted under approved IRB 2015-06-0055, University of Texas at Austin.

  • Allen, J. M., Muragishi, G. A., Smith, J. L., Thoman, D. B., & Brown, E. R. ( 2015 ). To grab and to hold: Cultivating communal goals to overcome cultural and structural barriers in first-generation college students’ science interest . Translational Issues in Psychological Science , 1 (4), 331. Medline ,  Google Scholar
  • American Educational Research Association, American Psychological Association, and National Council for Measurement in Education (AERA, APA, and NCME) . ( 2014 ). Standards for educational and psychological testing . Washington, DC. Google Scholar
  • Andrews, S. E., Runyon, C., & Aikens, M. L. ( 2017 ). The math–biology values instrument: Development of a tool to measure life science majors’ task values of using math in the context of biology . CBE—Life Sciences Education , 16 (3), ar45. Link ,  Google Scholar
  • Armbruster, P., Patel, M., Johnson, E., & Weiss, M. ( 2009 ). Active learning and student-centered pedagogy improve student attitudes and performance in introductory biology . CBE—Life Sciences Education , 8 (3), 203–213. Link ,  Google Scholar
  • Bakan, D. ( 1966 ). The duality of human existence: An essay on psychology and religion . Oxford, UK: Rand McNally. Google Scholar
  • Bandalos, D. L. ( 2018 ). Measurement theory and applications for the social sciences . New York: Guilford. Google Scholar
  • Bandalos, D. L., & Finney, S. J. ( 2010 ). Factor analysis. Exploratory and confirmatory . In Hancock, G. R.Mueller, R. O. (Eds.), The reviewer’s guide to quantitative methods in the social science (pp. 93–114). New York: Routledge. Google Scholar
  • Beaujean, A. A. ( 2012 ). BaylorEdPsych: R package for Baylor University educational psychology quantitative courses . Retrieved from https://CRAN.R-project.org/package=BaylorEdPsych Google Scholar
  • Boomsma, A. ( 1982 ). Robustness of LISREL against small sample sizes in factor analysis models . In Joreskog, K. G.Wold, H. (Eds.), Systems under indirect observation: Causality, structure, prediction (Part 1, pp. 149–173). Amsterdam, Netherlands: North Holland. Google Scholar
  • Borsboom, D., Mellenbergh, G. J., & van Heerden, J. ( 2004 ). The concept of validity . Psychological Review , 111 (4), 1061–1071. Medline ,  Google Scholar
  • Cattell, R. B. ( 1966 ). The scree test for the number of factors . Multivariate Behavioral Research , 1 (2), 245–276. Medline ,  Google Scholar
  • Clark, L. A., & Watson, D. ( 1995 ). Constructing validity: Basic issues in objective scale development . Psychological Assessment , 7 (3), 309–319. Google Scholar
  • Comrey, A. L., & Lee, H. B. ( 1992 ). A first course in factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum. Google Scholar
  • Crawford, A.V., Green, S.B., Levy, R., Lo W-J., Scott, L., Svetina, D., & Thompson, M.S. ( 2010 ). Evaluation of parallel analysis methods for determining the number of factors . Educational and Psychological Measurement , 70 (6), 885–901. Google Scholar
  • Crocker, L., & Algina, J. ( 2008 ). Introduction to classical and modern test theory . Mason, OH: Cengage Learning. Google Scholar
  • Cronbach, L. J., & Meehl, P. E. ( 1955 ). Construct validity in psychological tests . Psychological Bulletin , 52 (4), 281–302. Medline ,  Google Scholar
  • Diekman, A. B., Brown, E. R., Johnston, A. M., & Clark, E. K. ( 2010 ). Seeking congruity between goals and roles: A new look at why women opt out of science, technology, engineering, and mathematics careers . Psychological Science , 21 (8), 1051–1057. Medline ,  Google Scholar
  • DiStefano, C., Zhu, M., & Mindrila, D. ( 2009 ). Understanding and using factor scores: Considerations for the applied researcher . Practical Assessment, Research & Evaluation , 14 (20), 1–11. Google Scholar
  • Eagly, A. H., Wood, W., & Diekman, A. ( 2000 ). Social role theory of sex differences and similarities: A current appraisal . In Eckes, T.Trautner, H. M. (Eds.), The developmental social psychology of gender (pp. 123–174). Mahwah, NJ: Erlbaum. Google Scholar
  • Eddy, S. L., Brownell, S. E., Thummaphan, P., Lan, M. C., & Wenderoth, M. P. ( 2015 ). Caution, student experience may vary: Social identities impact a student’s experience in peer discussions . CBE—Life Sciences Education , 14 (4), ar45. Link ,  Google Scholar
  • Eddy, S. L., & Hogan, K. A. ( 2014 ). Getting under the hood: How and for whom does increasing course structure work? . CBE—Life Sciences Education , 13 (3), 453–468. Link ,  Google Scholar
  • Finney, S. J., & DiStefano, C. ( 2006 ). Nonnormal and categorical data in structural equation modeling . In Hancock, G. R.Mueller, R. O. (Eds.), A second course in structural equation modeling (pp. 269–314). Greenwich, CT: Information Age. Google Scholar
  • Fowler, F. J. ( 2014 ). Survey research methods . Los Angeles: Sage. Google Scholar
  • Gagne, P., & Hancock, G. R. ( 2006 ). Measurement model quality, sample size, and solution propriety in confirmatory factor models . Multivariate Behavioral Research , 41 (1), 65–83. Medline ,  Google Scholar
  • Gorsuch, R. L. ( 1983 ). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum. Google Scholar
  • Green, S. B., & Yang, Y. ( 2009 ). Commentary on coefficient alpha: A cautionary tale . Psychometrika , 74 (1), 121–135. Google Scholar
  • Hu, L., & Bentler, P. ( 1999 ). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives . Structural Equation Modeling: A Multidisciplinary Journal , 6 (1), 1–55. Google Scholar
  • Kline, R. B. ( 2016 ). Principles and practise of structural equation modeling (4th ed.). New York: Guilford. Google Scholar
  • Leandre, R., Fabrigar, L. R., & Wegener, D. T. ( 2012 ). Exploratory factor analysis . Oxford, UK: Oxford University Press. Google Scholar
  • Ledesma, R.D., & Valero-Mora, P. ( 2007 ). Determining the number of factors to retain in EFA: An easy-to-use computer program for carrying out parallel analysis . Practical Assessment, Research & Evaluation , 12 (2) Google Scholar
  • Lissitz, R. W., & Samuelsen, K. ( 2007 ). A suggested change in terminology and emphasis regarding validity and education . Educational Researcher , 36 (8), 437–448. Google Scholar
  • Matsunaga, M. ( 2010 ). How to factor analyze your data right: Do’s, don’t and how-to’s . International Journal of Psychological Research , 3 . Retrieved February 24, 2019, from www.redalyc.org/html/2990/ 299023509007/ Google Scholar
  • McNeish, D. ( 2018 ). Thanks coefficient alpha, we’ll take it from here . Psychological Methods , 23 (3), 412–433. https://dx.doi.org/10.1037/met0000144 Google Scholar
  • Mehrens, W. A. ( 1997 ). The consequences of consequential validity . Educational Measurement: Issues and Practise , 16 (2), 16–18. Google Scholar
  • Messick, S. ( 1995 ). Validity of psychological-assessment—Validation of inferences from person’s responses and performances as scientific inquiry into score meaning . American Psychologist , 50 (9), 741–749. Google Scholar
  • Perry, J. L., Nicholls, A. R, Clough, P. J., & Crust, L. ( 2015 ). Assessing model fit: Caveats and recommendations for confirmatory factor analysis and exploratory structural equation modeling . Measurement in Physical Education and Exercise Science , 19 (1), 12–21. Google Scholar
  • Prentice, D. A., & Carranza, E. ( 2002 ). What women and men should be, shouldn’t be, are allowed to be, and don’t have to be: The contents of prescriptive gender stereotypes . Psychology of Women Quarterly , 26 (4), 269–281. Google Scholar
  • Raykov, T., & Marcoulides, G. A. ( 2008 ). An introduction to applied multivariate analysis . New York: Routledge. Google Scholar
  • R Core Team . ( 2016 ). R: A language and environment for statistical computing . Vienna, Austria: R Foundation for Statistical Computing. Retrieved February 24, 2019, from www.R-project.org Google Scholar
  • Reeves, T. D., & Marbach-Ad, G. ( 2016 ). Contemporary test validity in theory and practice: A primer for discipline-based education researchers . CBE—Life Sciences Education , 15 (1), rm1. Google Scholar
  • Revelle, W. ( 2017 ). psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA . Retrieved February 24, 2019, from https://CRAN.R-project.org/package=psychVersion=1.7.8 Google Scholar
  • Rissing, S. W., & Cogan, J. G. ( 2009 ). Can an inquiry approach improve college student learning in a teaching laboratory ? CBE—Life Sciences Education , 8 (1), 55–61. Link ,  Google Scholar
  • Rosseel, Y. ( 2012 ). lavaan: An R Package for Structural Equation Modeling . Journal of Statistical Software , 48 (2), 1–36. Google Scholar
  • Ruscio, J., & Roche, B. ( 2012 ). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure . Psychological Assessment , 24 (2), 282. Medline ,  Google Scholar
  • Schmitt, N. ( 1996 ). Uses and abuses of coefficient alpha . Psychological Assessment , 8 (4), 350–353. Google Scholar
  • Sijtsma, K. ( 2009 ). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha . Psychometrika , 74 (1), 107. Medline ,  Google Scholar
  • Slaney, K. ( 2017 ). Construct validity: Developments and debates . In Validating psychological constructs: Historical, Philosophical, and Practical Dimensions (pp. 83–109) (Palgrave studies in the theory and history of psycho­logy). London: Palgrave Macmillan. Google Scholar
  • Smith, J. L., Cech, E., Metz, A., Huntoon, M., & Moyer, C. ( 2014 ). Giving back or giving up: Native American student experiences in science and engineering . Cultural Diversity and Ethnic Minority Psychology , 20 (3), 413. Medline ,  Google Scholar
  • Stephens, N. M., Fryberg, S. A., Markus, H. R., Johnson, C. S., & Covarrubias, R. ( 2012 ). Unseen disadvantage: How American universities’ focus on independence undermines the academic performance of first-generation college students . Journal of Personality and Social Psychology , 102 (6), 1178–1197. Retrieved from http://doi.org/10.1037/a0027143 Medline ,  Google Scholar
  • Su, R., Rounds, J., & Armstrong, P. I. ( 2009 ). Men and things, women and people: A meta-analysis of sex differences in interests . Psychological Bulletin , 135 (6), 859. Medline ,  Google Scholar
  • Tabachnick, B. G., & Fidell, L. S. ( 2013 ). Using multivariate statistics (6th ed). Boston: Pearson. Google Scholar
  • Tavakol, M., & Dennick, R. ( 2011 ). Making sense of Cronbach’s alpha . International Journal of Medical Education , 2 , 53–55. Retrieved February 24, 2019, from http://doi.org/10.5116/ijme.4dfb.8dfd Medline ,  Google Scholar
  • Wachsmuth, L. P., Runyon, C. R., Drake, J. M., & Dolan, E. L. ( 2017 ). Do biology students really hate math? Empirical insights into undergraduate life science majors’ emotions about mathematics . CBE—Life Sciences Education , 16 (3), ar49. Link ,  Google Scholar
  • Wiggins, B. L., Eddy, S. L., Wener-Fligner, L., Freisem, K., Grunspan, D. Z., Theobald, E. J., & Crowe, A. J. ( 2017 ). ASPECT: A survey to assess student perspective of engagement in an active-learning classroom . CBE—Life Sciences Education , 16 (2), ar32. Link ,  Google Scholar
  • Wolf, E. J., Harrington, K. M., Clark, S. L., & Miller, M. W. ( 2013 ). Sample size requirements for structural equation models: An evaluation of power, bias, and solution propriety . Educational and Psychological Measurement , 73 (6), 913–934. Google Scholar
  • Worthington, R. L., & Whittaker, T. A. ( 2006 ). Scale development research: A content analysis and recommendations for best practices . The Counseling Psychologist , 34 (6), 806–838. Google Scholar
  • Yang, Y., & Green, S. B. ( 2011 ). Coefficient alpha: A reliability coefficient for the 21st century ? Journal of Psychoeducational Assessment , 29 (4), 377–392. Google Scholar
  • Yong, A. G., & Pearce, S. ( 2013 ). A beginner’s guide to factor analysis: Focusing on exploratory factor analysis . Tutorials in Quantitative Methods for Psychology , 9 (2), 79–94. Google Scholar
  • Intention of consumers dwelling in urban areas of Ethiopia to consume Spirulina-fortified bread 18 June 2024 | Cogent Business & Management, Vol. 11, No. 1
  • Use and consequences of exercise tracking technology on exercise psychopathology and mental health outcomes in adolescents 30 June 2024 | International Journal of Adolescence and Youth, Vol. 29, No. 1
  • The Prader-Willi syndrome Profile: validation of a new measure of behavioral and emotional problems in Prader-Willi syndrome 23 February 2024 | Orphanet Journal of Rare Diseases, Vol. 19, No. 1
  • Poor personal protective equipment practices were associated with heat-related symptoms among Asian healthcare workers: a large-scale multi-national questionnaire survey 1 March 2024 | BMC Nursing, Vol. 23, No. 1
  • Shaping tomorrow’s dentists: a multi-institutional survey of undergraduate dental students’ perceptions towards interprofessional education 4 July 2024 | BMC Oral Health, Vol. 24, No. 1
  • Translation, validity and reliability of the fall risk scale for older adults 24 August 2024 | BMC Geriatrics, Vol. 24, No. 1
  • Adoption of immersive-virtual reality as an intrinsically motivating learning tool in parasitology 12 June 2024 | Virtual Reality, Vol. 28, No. 3
  • Are there any “science people” in undergraduate health science courses? Assessing science identity among pre‐nursing and pre‐allied health students in a community college setting 19 September 2023 | Journal of Research in Science Teaching, Vol. 61, No. 7
  • Development and Preliminary Validation of the Lovebird Scale 26 August 2024 | Behavioral Sciences, Vol. 14, No. 9
  • Tool, Threat, Tutor, Talk, and Trend: College Students’ Attitudes toward ChatGPT 27 August 2024 | Behavioral Sciences, Vol. 14, No. 9
  • Erin M. Ball ,
  • Robin A. Costello ,
  • Cissy J. Ballen ,
  • Rita M. Graze , and
  • Eric W. Burkholder
  • Sarah L Eddy, Monitoring Editor
  • A scoping review of undergraduate anatomy and physiology education: approaches to evaluating student outcomes in the United States 29 Aug 2024 | Journal of Microbiology & Biology Education, Vol. 25, No. 2
  • Examining Motivational Attitudes Toward Statistics and Their Relationship to Performance in Life Science Students 22 August 2024 | Journal of Statistics and Data Science Education, Vol. 17
  • Retirement Migration to Thailand 16 Aug 2024
  • Development and use of an instrument to measure pseudoscientific beliefs in quantum mechanics: the PSEUDO-QM scale 12 August 2024 | Research in Science & Technological Education, Vol. 21
  • Effective learning and testing of bird identification skills 12 August 2024 | Journal of Biological Education, Vol. 93
  • Psychometric Properties of the Turkish Version of the Emotion Efficacy Scale – 2 in a Sample of Turkish Emerging Adults 15 April 2024 | Emerging Adulthood, Vol. 12, No. 4
  • Structural validity of the impact of vision impairment questionnaire among patients with visual impairment in Thailand 1 Aug 2024 | Heliyon, Vol. 10, No. 16
  • Economic Growth through Determinants of Technical and Vocational Education and Training in Pakistan: Mediating Role of Human Resource Development 4 July 2024 | Journal of the Knowledge Economy, Vol. 8
  • The Development of the RA-VING Self-Trust Instrument (RSTI) across Three National Samples of Adults in the United States 22 February 2024 | Measurement and Evaluation in Counseling and Development, Vol. 57, No. 3
  • Measuring the Institutional Capacity of Older Volunteer Organizations in Japan: Comparative Study with the United States 28 June 2024 | International Journal of Community Well-Being, Vol. 140
  • Preliminary Evaluation of a Questionnaire to Measure Program-Level Sense of Belonging Among Student Physical Therapists 4 June 2024 | Journal of Physical Therapy Education, Vol. 7
  • Notes to Factor Analysis Techniques for Construct Validity 6 October 2023 | Canadian Journal of Nursing Research, Vol. 56, No. 2
  • Revisiting the Usage of Alpha in Scale Evaluation: Effects of Scale Length and Sample Size 20 March 2024 | Educational Measurement: Issues and Practice, Vol. 43, No. 2
  • Testing the reciprocal relationship between depressive symptoms and insomnia 26 February 2024 | Journal of Clinical Psychology, Vol. 80, No. 6
  • Swedish maternity care professionals’ perception of labor induction 1 Jun 2024 | Midwifery, Vol. 133
  • Initial Validation of the Family Cultural Wealth Survey: Relation with Racial Discrimination and Well-being for Black Families 1 June 2024 | Adversity and Resilience Science, Vol. 14
  • Lisa B. Limeri ,
  • Nathan T. Carter ,
  • Riley A. Hess ,
  • Trevor T. Tuma ,
  • Isabelle Koscik ,
  • Alexander J. Morrison ,
  • Briana Outlaw ,
  • Kathren Sage Royston ,
  • Benjamin H. T. Bridges , and
  • Erin L. Dolan
  • Ross Nehm, Monitoring Editor
  • Anne M. Casper and
  • Marianne M. Laporte
  • Lisa Corwin, Monitoring Editor
  • Validity of the Health Personality Assessment among rectal cancer survivors in Serbia 1 Jun 2024 | Heliyon, Vol. 10, No. 12
  • Consumer perspectives on the relationship between iconic branding and entrepreneurial orientation 31 May 2024 | The Southern African Journal of Entrepreneurship and Small Business Management, Vol. 16, No. 1
  • Development and Validation of Instruments for Assessing the Impact of Artificial Intelligence on Students in Higher Education 15 May 2024 | International Journal of Educational Methodology, Vol. volume-10-2024, No. volume-10-issue-2-may-2024
  • Factors influencing the willingness to use agrivoltaics: A quantitative study among German farmers 1 May 2024 | Applied Energy, Vol. 361
  • Measuring virtual embodiment: A psychometric investigation of a standardised questionnaire for the psychological sciences 1 May 2024 | Computers in Human Behavior Reports, Vol. 14
  • Undergraduate-level biology students’ application of central dogma to understand COVID mRNA vaccines 25 Apr 2024 | Journal of Microbiology & Biology Education, Vol. 25, No. 1
  • Connecting sustainability and computer science curricula through website learning projects embedding different types of student-generated content 25 April 2024 | Educational technology research and development, Vol. 19
  • The Robot Rights and Responsibilities Scale: Development and Validation of a Metric for Understanding Perceptions of Robots’ Rights and Responsibilities 16 April 2024 | International Journal of Human–Computer Interaction, Vol. 48
  • Self- and Socially-Shared Metacognitive Regulation and Engagement in Online Collaborative Discussions 12 Apr 2024
  • Investigating the internal structure of multiple mini interviews—A perspective from Pakistan 11 April 2024 | PLOS ONE, Vol. 19, No. 4
  • Validation of the Portuguese Version of the Perceived Physical Literacy Instrument 1 Apr 2024 | Journal of Physical Activity and Health, Vol. 21, No. 4
  • Development and Validation of the Interprofessional Collaboration Practice Competency Scale (IPCPCS) for Clinical Nurses 8 April 2024 | Healthcare, Vol. 12, No. 7
  • A Comprehensive Usability Measurement Tool for m-Learning Applications 1 Apr 2024 | IEEE Transactions on Education, Vol. 67, No. 2
  • Evaluating the impact of internet communication quality in human resource management on the productivity of construction projects 1 Apr 2024 | Heliyon, Vol. 10, No. 7
  • The differential effects of self-view in virtual meetings when speaking vs. listening 29 March 2024 | European Journal of Information Systems, Vol. 4
  • A Network Analysis of Digital Clock Drawing for Command and Copy Conditions 18 March 2024 | Assessment
  • Students’ attitudes towards fundamental democratic values: The construction of a measurement instrument 13 March 2024 | Education, Citizenship and Social Justice, Vol. 2
  • Miranda M. Chen Musgrove ,
  • Melissa E. Ko ,
  • Jeffrey N. Schinske , and
  • Lisa A. Corwin
  • Brian Sato, Monitoring Editor
  • Development and Validation of Scientific Inquiry Literacy Instrument (SILI) Using Rasch Measurement Model 18 March 2024 | Education Sciences, Vol. 14, No. 3
  • Measuring School Staff Confidence and Worries to Deliver Mental Health Content: An Examination of the Psychometric Properties of Two Measures in a Sample of Secondary School Staff 25 October 2023 | School Mental Health, Vol. 16, No. 1
  • The Italian Validation of the Healthcare Professional Humanization Scale for Nursing 19 February 2024 | Journal of Holistic Nursing, Vol. 70
  • Investigating evidence in support of validity and reliability for data collected with the meaningful learning in the laboratory instrument (MLLI) 1 January 2024 | Chemistry Education Research and Practice, Vol. 25, No. 1
  • Psychometric Evaluation of the Hope-Action Inventory in Individuals with Substance Use Issues 3 April 2023 | Measurement and Evaluation in Counseling and Development, Vol. 57, No. 1
  • Upcycled food choice motives and their association with hesitancy towards consumption of this type of food: a Swedish study 23 January 2023 | British Food Journal, Vol. 126, No. 1
  • The Interplay Between Supplier Management Practices and Customer Satisfaction: An Empirical Investigation of the Banking Industry in a Developing Country 5 February 2024 | Quality Management Journal, Vol. 31, No. 1
  • Students’ expert-like attitudes in calculus and introductory computer science courses with active-learning pedagogy 8 October 2022 | Computer Science Education, Vol. 34, No. 1
  • Using Protection Motivation Theory to develop a survey of over-the-counter decision-making by older adults 1 Jan 2024 | Research in Social and Administrative Pharmacy, Vol. 20, No. 1
  • Understanding Children’s Online Victimization through the Psychosocial Lens: The Roles of Loneliness, Online Social Currency, and Digital Citizenship 31 December 2023 | Healthcare, Vol. 12, No. 1
  • Psychometric properties of the Mongolian version of shitsu-taikan-sho (alexisomia) in young adults 23 January 2024 | Science Progress, Vol. 107, No. 1
  • Measuring perceived fitness interdependence between humans and non-humans 27 February 2024 | Evolutionary Human Sciences, Vol. 6
  • Transforming science teaching in Namibia: A practical work inquiry framework for secondary schools 1 Jan 2024 | Aquademia, Vol. 8, No. 1
  • Interest of middle school students toward life and earth sciences 1 Jan 2024 | Eurasia Journal of Mathematics, Science and Technology Education, Vol. 20, No. 7
  • Entrepreneurial marketing and market performance implications for small-scale retailers: Organizational behavior in a developing economy 28 June 2024 | Corporate Governance and Organizational Behavior Review, Vol. 8, No. 2, special issue
  • Investigating the relationship between self-efficacy in clinical performance and psychological empowerment among nursing students 1 January 2024 | Preventive Care In Nursing and Midwifery Journal, Vol. 14, No. 1
  • The Psychometric Properties of The Revised-Depression Attitude Questionnaire Among Primary Healthcare Physicians in Oman 1 January 2024 | Medical Science and Discovery, Vol. 11, No. 1
  • The pitfalls or gaps in monitoring and evaluation tools during Coronavirus disease 2019 era in South African municipalities 25 December 2023 | International Journal of Research in Business and Social Science (2147- 4478), Vol. 12, No. 10
  • Potential of serious games as a competency assessment tool for acute care nurses on the blood transfusion procedure 18 December 2023 | International Journal of Digital Health, Vol. 12
  • ปัจจัยความสำเร็จของธุรกิจแฟรนไชส์ในเขตกรุงเทพมหานครและปริมณฑลในยุควิถีใหม่ 16 December 2023 | RMUTT Global Business and Economics Review, Vol. 18, No. 2
  • Construct Validity and Reliability of the Theory Evaluation Scale: A Factor Analysis 15 December 2023 | Journal of Social Work Education, Vol. 111
  • Influence of CUREs on STEM retention depends on demographic identities 14 Dec 2023 | Journal of Microbiology & Biology Education, Vol. 24, No. 3
  • Measuring undergraduates’ understanding of the culture of scientific research as an outcome variable in research on CUREs 14 Dec 2023 | Journal of Microbiology & Biology Education, Vol. 24, No. 3
  • Psychometric properties of the arabic version of the positive/negative experiences of parents for school-aged students about online learning during COVID-19 pandemic assessment scale 28 February 2023 | BMC Research Notes, Vol. 16, No. 1
  • Validation of IFMSA social accountability assessment tool: exploratory and confirmatory factor analysis 1 March 2023 | BMC Medical Education, Vol. 23, No. 1
  • Measurement in STEM education research: a systematic literature review of trends in the psychometric evidence of scales 2 June 2023 | International Journal of STEM Education, Vol. 10, No. 1
  • The New Zealand eating behavior questionnaire – Validation study for a novel assessment tool to describe actionable eating behavior traits 1 Dec 2023 | Appetite, Vol. 191
  • Development and Validation of a Measure to Assess Readiness to Advance Health and Equity: The Assessment for Advancing Community Transformation (AACT) 14 November 2022 | Evaluation & the Health Professions, Vol. 46, No. 4
  • Autism and intention attribution test: a non-verbal evaluation with comic strips 12 August 2023 | Annals of General Psychiatry, Vol. 22, No. 1
  • Characterizing faculty motivation to implement three-dimensional learning 14 August 2023 | Disciplinary and Interdisciplinary Science Education Research, Vol. 5, No. 1
  • Evaluation the validity and reliability of persian short form of the literacy of suicide scale (LOSS): a methodological study in 2022 25 October 2023 | BMC Psychiatry, Vol. 23, No. 1
  • Validation of the Cantonese version of the Traditional Chinese Medicine (TCM) Body constitution Questionnaire in elderly people 11 October 2023 | Chinese Medicine, Vol. 18, No. 1
  • Jeremy L. Hsu ,
  • Noelle Clark ,
  • Kate Hill , and
  • Melissa Rowland-Goldsmith
  • Luanna Prevost, Monitoring Editor
  • Forming Groups in a Large-Enrollment Biology Class: Group Permanence Matters More than Group Size 1 Dec 2023 | CBE—Life Sciences Education, Vol. 22, No. 4
  • Bridging language barriers in developing valid health policy research tools: insights from the translation and validation process of the SHEMESH questionnaire 26 November 2023 | Israel Journal of Health Policy Research, Vol. 12, No. 1
  • Measuring self-regulation in everyday life: Reliability and validity of smartphone-based experiments in alcohol use disorder 12 December 2022 | Behavior Research Methods, Vol. 55, No. 8
  • Psychometric evaluation of the Spanish version of the Pediatric Quality of Life Eosinophilic Esophagitis Questionnaire (Peds QL-EoE Module ™) 13 December 2023 | Health and Quality of Life Outcomes, Vol. 21, No. 1
  • Examining Oral Communication Skills in Communication Training Programs for STEM Professionals: A Systematic Quantitative Literature Review 13 October 2023 | Science Communication, Vol. 45, No. 6
  • Prospective Physician Assistant Student Perspectives on an In-Person vs. Virtual Admissions Interview Process 18 September 2023 | Journal of Physician Assistant Education, Vol. 34, No. 4
  • Validation of the Comprehensive Feeding Practices Questionnaire among parents of 5- to 7-year-old children in Sweden 30 November 2023 | Frontiers in Psychology, Vol. 14
  • Validation of a culturally adapted Swedish-language version of the Death Literacy Index 30 November 2023 | PLOS ONE, Vol. 18, No. 11
  • Self-Care Behavior and Associated Factors of Nursing Students with Dysmenorrhea: A Structural Equation Model 3 Nov 2023 | Journal of Nursing Management, Vol. 2023
  • Development and Psychometric Validation of the 27 Item Zarit Caregiver Interview for Alzheimer’s Disease (ZCI-AD-27) 1 Nov 2023 | Current Alzheimer Research, Vol. 19, No. 13
  • Can psychological distress account for the associations between COVID‐19 vaccination acceptance and socio‐economic vulnerability? 4 May 2023 | Applied Psychology: Health and Well-Being, Vol. 15, No. 4
  • Romantic Love and Behavioral Activation System Sensitivity to a Loved One 10 November 2023 | Behavioral Sciences, Vol. 13, No. 11
  • Factor structure and validity of the Parental Competence Questionnaire in the Paediatric Hospital Emergency Setting (ECP-U) 1 Nov 2023 | Journal of Pediatric Nursing, Vol. 73
  • How sensory perceptions and sensory brand experience influence customer behavioral intentions in the context of cartoon-themed restaurants 1 Oct 2023 | International Journal of Hospitality Management, Vol. 115
  • Going Cashless? Elucidating Predictors for Mobile Payment Users’ Readiness and Intention to Adopt 12 December 2023 | SAGE Open, Vol. 13, No. 4
  • Social integration, solidarity, and psychological health of internally displaced persons in Cameroon: Exploring the role of community satisfaction 1 Oct 2023 | Heliyon, Vol. 9, No. 10
  • Five ways to waste food: food wasting behaviours questionnaire 30 June 2023 | British Food Journal, Vol. 125, No. 9
  • Differential Mechanisms Linking Early Childhood Threat and Social/Environmental Deprivation to Adolescent Conduct Problems 26 August 2023 | Journal of Family Violence, Vol. 21
  • Participation in a High-Structure General Chemistry Course Increases Student Sense of Belonging and Persistence to Organic Chemistry 12 July 2023 | Journal of Chemical Education, Vol. 100, No. 8
  • Adopting e-government to monitor public infrastructure projects execution in Nigeria: The public perspective 1 Aug 2023 | Heliyon, Vol. 9, No. 8
  • Impacts of Tourists’ Shopping Destination Trust on Post-Visit Behaviors: A Loss Aversion Perspective 30 July 2023 | International Journal of Hospitality & Tourism Administration, Vol. 5
  • Breaking the mold: Study strategies of students who improve their achievement on introductory biology exams 3 July 2023 | PLOS ONE, Vol. 18, No. 7
  • Adaptation and Psychometric Properties of an Instrument to Assess Self-Efficacy in Client-Centeredness (SECCQ) 23 February 2022 | Journal of Social Work Education, Vol. 59, No. 3
  • Measuring policymaking capacities of schools: validation of the Policy Making Capacities Questionnaire (PMC-Q) 18 May 2023 | School Effectiveness and School Improvement, Vol. 34, No. 3
  • Identifying Drivers and Hindrances to the Disposal of Used Mobile Phones: A Study of User Behavior in the UAE 21 September 2023 | SAGE Open, Vol. 13, No. 3
  • What is reciprocity? A review and expert-based classification of cooperative transfers 1 Jul 2023 | Evolution and Human Behavior, Vol. 44, No. 4
  • Factor structure of the Five Facets Mindfulness Questionnaire (FFMQ) (15 items) in a collectivist society—Pakistan 8 February 2023 | Psychology in the Schools, Vol. 60, No. 7
  • Verification of Reliability and Validity of a Malaysian Version of Rathus Assertiveness Schedule as Drug Prevention Scale 28 June 2023 | Islamic Guidance and Counseling Journal, Vol. 6, No. 2
  • L’vannah Abrams ,
  • Tess Carlson ,
  • Mark Dieter ,
  • Paulos Flores ,
  • David Frischer ,
  • Jolie Goolish ,
  • Michelle La-Fevre Bernt ,
  • Amber Lancaster ,
  • Christopher Lipski ,
  • Joshua Vargas Luna ,
  • Lucy M. C. Luong ,
  • Marlene Mullin ,
  • Mia Janelle Newman ,
  • Carolina Quintero ,
  • Julie Reis ,
  • Freja Robinson ,
  • Allison James Ross ,
  • Hilary Simon ,
  • Gianne Souza ,
  • Jess Taylor ,
  • Katherine E. Ward ,
  • Yvonne Lever White ,
  • Emily Witkop ,
  • Christine Yang ,
  • Aliza Zenilman ,
  • Eddie Zhang ,
  • Kimberly D. Tanner
  • Sehoya Cotner, Monitoring Editor
  • Develop and Validate a Survey to Assess Adult’s Perspectives on Autonomous Ridesharing and Ridehailing Services 1 June 2023 | Future Transportation, Vol. 3, No. 2
  • Exploring member trust in German community-supported agriculture: a multiple regression analysis 7 November 2022 | Agriculture and Human Values, Vol. 40, No. 2
  • Predicting unreliable response patterns in smartphone health surveys: A case study with the mood survey 1 Jun 2023 | Smart Health, Vol. 28
  • The Structure of Cognitive Abilities and Associations with Problem Behaviors in Early Adolescence: An Analysis of Baseline Data from the Adolescent Brain Cognitive Development Study 10 May 2023 | Journal of Intelligence, Vol. 11, No. 5
  • Function, symbolism or society? Exploring consumer interest in electric and shared mobility 1 May 2023 | Transportation Research Part D: Transport and Environment, Vol. 118
  • Construct validation of the teacher attitude to inclusion scale for Filipino pre-service teachers 30 April 2023 | Bedan Research Journal, Vol. 8, No. 1
  • Confidence Disparities: Pre-course Coding Confidence Predicts Greater Statistics Intentions and Perceived Achievement in a Project-Based Introductory Statistics Course 17 April 2023 | Journal of Statistics and Data Science Education, Vol. 12
  • Linking LMX and happiness at work through symbolic interaction theory – The role of self-esteem and organizational embeddedness 4 April 2023 | Journal of Economic and Administrative Sciences, Vol. 5
  • Development and Psychometric Properties of the Sleep Parenting Scale for Infants 18 November 2021 | Behavioral Medicine, Vol. 49, No. 2
  • Teaching design for additive manufacturing: efficacy of and engagement with lecture and laboratory approaches 8 August 2022 | International Journal of Technology and Design Education, Vol. 33, No. 2
  • Validity and Reliability of the Empowered Veteran Index Among Military Veterans 21 May 2023 | Journal of Prevention and Health Promotion, Vol. 4, No. 2
  • Psychometric properties of an innovative smartphone application to investigate the daily impact of hypoglycemia in people with type 1 or type 2 diabetes: The Hypo-METRICS app 17 March 2023 | PLOS ONE, Vol. 18, No. 3
  • Indonesia Translation and Cross-Cultural Validation of Pediatric Anesthesia Parent Satisfaction (PAPS) Questionnaire 12 Mar 2023 | Cureus, Vol. 363
  • Irfanul Alam ,
  • Karen Ramirez ,
  • Katharine Semsar , and
  • Validation of Self-Reported Attachment Classification Among Racially and Ethnically Diverse Parents of Young Children 9 December 2022 | Nursing Research, Vol. 72, No. 2
  • Development and preliminary validation of a questionnaire to measure parental support for drawing 1 Mar 2023 | Thinking Skills and Creativity, Vol. 47
  • Texting and crossing: An extended theory of planned behaviour to model the psychological and demographic factors related to pedestrians' use of cell phone for texting at crosswalks in developing country 1 Mar 2023 | IATSS Research, Vol. 47, No. 1
  • AN EXPLORATORY FACTOR ANALYSIS OF LONG COVID 14 February 2023 | Central Asian Journal of Medical Hypotheses and Ethics, Vol. 3, No. 4
  • Physical self-concept in Peruvian adolescent schoolchildren: Validity, reliability, and proposal of percentiles for its evaluation 10 February 2023 | Frontiers in Education, Vol. 8
  • Gender Awareness in Healthcare: Contextualization of an Arabic Version of the Nijmegen Gender Awareness in Medicine Scale (N-GAMS) 20 February 2023 | Healthcare, Vol. 11, No. 4
  • Development and Validation of a Diabetes Questionnaire for Middle School Students 1 Feb 2023 | Journal of Nutrition Education and Behavior, Vol. 55, No. 2
  • Development and Initial Validation of the Attitudes Towards Older Adult Sexuality in Long-term Care Scale (AOASLC) 14 September 2021 | The Journal of Sex Research, Vol. 60, No. 1
  • Family Functioning in Humanitarian Contexts: Correlates of the Feminist-Grounded Family Functioning Scale among Men and Women in the Eastern Democratic Republic of Congo 8 July 2022 | Journal of Child and Family Studies, Vol. 32, No. 1
  • A Review on Dimensionality Reduction for Machine Learning 28 March 2023
  • The Swedish Version of the eHealth Literacy Questionnaire: Translation, Cultural Adaptation, and Validation Study 12 April 2023 | Journal of Medical Internet Research, Vol. 25
  • Validation Study of the Revised Spirituality and Spiritual Care Rating Scale (SSCRS): A Cross-Sectional Survey in Poland 1 May 2023 | Journal of Multidisciplinary Healthcare, Vol. Volume 16
  • Decline Is Not Inevitable: Changes in Science Identity during the Progression through a U.S. Middle School among Boys and Girls 25 February 2023 | Socius: Sociological Research for a Dynamic World, Vol. 9
  • The impact of perceived due care on trustworthiness and free market support in the Dutch banking sector 8 June 2022 | Business Ethics, the Environment & Responsibility, Vol. 32, No. 1
  • The Impact of COVID-19 on the Training of Anesthesiologists in Hong Kong: Overcoming the Challenge 20 November 2023 | Journal of Medical Education and Curricular Development, Vol. 10
  • Sensitivity Analysis Using Standardized Regression Coefficients of Roof Design Variables for Energy Performance in Residential Buildings 3 January 2024
  • Survey Data Analysis 27 January 2024
  • Modification and adaptation of the general self-efficacy scale to determine nursing students’ belief in their capability to care for older adults 1 Jan 2023 | International Journal of Africa Nursing Sciences, Vol. 19
  • Japanese school children's intake of selected food groups and meal quality due to differences in guardian's literacy of meal preparation for children during the COVID-19 pandemic 1 Jan 2023 | Appetite, Vol. 180
  • Validity and Reliability of the Symptom-Management Self-Efficacy Scale for Breast Cancer Related to Chemotherapy 16 December 2022 | Bezmialem Science, Vol. 10, No. 6
  • From Novice To Expert: An Assessment To Measure Strategies Students Implement While Learning To Read Primary Scientific Literature 15 Dec 2022 | Journal of Microbiology & Biology Education, Vol. 23, No. 3
  • BAGAIMANA TERPAAN MEDIA UNTUK INFORMASI COVID-19 MEMENGARUHI NIAT MAHASISWA MENERAPKAN PERLINDUNGAN KESEHATAN SELAMA PANDEMI 15 December 2022 | Interaksi: Jurnal Ilmu Komunikasi, Vol. 11, No. 2
  • Kathryn M. Parsley ,
  • Bernie J. Daigle , and
  • Jaime L. Sabel
  • Validity and reliability of a questionnaire developed to explore quality assurance components for teaching and learning in vocational and technical education 2 September 2022 | Humanities and Social Sciences Communications, Vol. 9, No. 1
  • Clinical trial recruitment in primary care: exploratory factor analysis of a questionnaire to measure barriers and facilitators to primary care providers’ involvement 3 December 2022 | BMC Primary Care, Vol. 23, No. 1
  • Psychometric properties of the Internalized Stigma of Mental Illness (ISMI-10) scale in a Dutch sample of employees with mental illness 27 October 2022 | BMC Psychiatry, Vol. 22, No. 1
  • The effect of implementing mind maps for online learning and assessment on students during COVID-19 pandemic: a cross sectional study 12 March 2022 | BMC Medical Education, Vol. 22, No. 1
  • Nursing students’ understanding of health literacy and health practices: a cross-sectional study at a university in Namibia 4 January 2022 | BMC Nursing, Vol. 21, No. 1
  • Validity of score interpretations on an online English placement writing test 15 September 2022 | Language Testing in Asia, Vol. 12, No. 1
  • Coping behavior versus coping style: characterizing a measure of coping in undergraduate STEM contexts 14 February 2022 | International Journal of STEM Education, Vol. 9, No. 1
  • A case study of a novel summer bridge program to prepare transfer students for research in biological sciences 20 December 2022 | Disciplinary and Interdisciplinary Science Education Research, Vol. 4, No. 1
  • An Exploration of Masculinities and Concurrency Among Black Sexual Minority and Majority Men: Implications for HIV/STI Prevention 7 December 2022 | Annals of LGBTQ Public and Population Health, Vol. 3, No. 4
  • Validation and Scoring of the Greek Version of the Strategic and Clinical Quality Indicators in Postoperative Pain Management (SCQIPP) Questionnaire 1 Dec 2022 | Journal of PeriAnesthesia Nursing, Vol. 37, No. 6
  • Why Travel to Georgia? Motivations, Experiences, and Country’s Image Perceptions of Wine Tourists 6 October 2022 | Tourism and Hospitality, Vol. 3, No. 4
  • Behavioural Factors for Users of Bicycles as a Transport Alternative: A Case Study 15 December 2022 | Sustainability, Vol. 14, No. 24
  • An interest-oriented laboratory microbial engineering course is helpful for the cultivation of scientific literacy 23 November 2022 | Journal of Biological Education
  • Reliability and Validity of a Perinatal Shared Decision-Making Measure: The Childbirth Options, Information, and Person-Centered Explanation 1 Nov 2022 | Journal of Obstetric, Gynecologic & Neonatal Nursing, Vol. 51, No. 6
  • Development of a survey instrument to assess individual and organizational use of climate adaptation science 1 Nov 2022 | Environmental Science & Policy, Vol. 137
  • Assessing how students value learning communication skills in an undergraduate anatomy and physiology course 3 December 2021 | Anatomical Sciences Education, Vol. 15, No. 6
  • HLS19-NAV—Validation of a New Instrument Measuring Navigational Health Literacy in Eight European Countries 25 October 2022 | International Journal of Environmental Research and Public Health, Vol. 19, No. 21
  • Socio-Economic Factors Affecting Member’s Satisfaction towards National Health Insurance: An Evidence from the Philippines 21 November 2022 | International Journal of Environmental Research and Public Health, Vol. 19, No. 22
  • How do STEM graduate students perceive science communication? Understanding science communication perceptions of future scientists 3 October 2022 | PLOS ONE, Vol. 17, No. 10
  • Validation of the Short-Test of Functional Health Literacy in Adults for the Samoan Population 1 Oct 2022 | HLRP: Health Literacy Research and Practice, Vol. 6, No. 4
  • Sensory quality control: Assessment of food company employees' knowledge, attitudes, and practices 29 June 2022 | Journal of Sensory Studies, Vol. 37, No. 5
  • Nurses' professional values scale‒three: Validation and psychometric appraisal among Saudi undergraduate student nurses 1 Oct 2022 | Journal of Taibah University Medical Sciences, Vol. 17, No. 5
  • Joshua Premo ,
  • Brittney N. Wyatt ,
  • Matthew Horn , and
  • Heather Wilson-Ashworth
  • Kristy Jean Wilson, Monitoring Editor
  • Shorter and sweeter: the 16-item version of the SRS questionnaire shows better structural validity than the 20-item version in young patients with spinal deformity 27 April 2022 | Spine Deformity, Vol. 10, No. 5
  • Validation of the Spanish Version of the Copenhagen Burnout Inventory in Mexican Medical Residents 1 Sep 2022 | Archives of Medical Research, Vol. 53, No. 6
  • Human–cobot interaction fluency and cobot operators’ job performance. The mediating role of work engagement: A survey 1 Sep 2022 | Robotics and Autonomous Systems, Vol. 155
  • Importance of Top Management Commitment to Organizational Citizenship Behaviour towards the Environment, Green Training and Environmental Performance in Pakistani Industries 5 September 2022 | Sustainability, Vol. 14, No. 17
  • Validation of the Korean Version of Nurses’ Moral Courage Scale 15 September 2022 | International Journal of Environmental Research and Public Health, Vol. 19, No. 18
  • Differential Item Functioning Analysis of the Fundamental Concepts for Organic Reaction Mechanisms Inventory 28 July 2022 | Journal of Chemical Education, Vol. 99, No. 8
  • Beyond online search strategies: The effects of internet epistemic beliefs and different note‐taking formats on online multiple document reading comprehension 11 April 2022 | Journal of Computer Assisted Learning, Vol. 38, No. 4
  • Investigating Student Engagement in General Chemistry Active Learning Activities using the Activity Engagement Survey (AcES) 7 June 2022 | Journal of Chemical Education, Vol. 99, No. 7
  • Distinct pathways to stakeholder use versus academic contribution in climate adaptation research 8 June 2022 | Conservation Letters, Vol. 15, No. 4
  • Lisa A. Corwin ,
  • Michael E. Ramsey ,
  • Eric A. Vance ,
  • Elizabeth Woolner ,
  • Stevie Maiden ,
  • Nina Gustafson and
  • Joseph A. Harsh
  • Erin Shortlidge, Monitoring Editor
  • “Thandi should feel embarrassed”: describing the validity and reliability of a tool to measure depression-related stigma among patients with depressive symptoms in Malawi 20 November 2021 | Social Psychiatry and Psychiatric Epidemiology, Vol. 57, No. 6
  • Attitudes of Nursing Staff in Hospitals towards Restraint Use: A Cross-Sectional Study 10 June 2022 | International Journal of Environmental Research and Public Health, Vol. 19, No. 12
  • Identifying the Psychometric Properties of the Malay Version of the WHOQOL-BREF among Employees with Obesity Problem 20 June 2022 | International Journal of Environmental Research and Public Health, Vol. 19, No. 12
  • Motivation in Reading Primary Scientific Literature: a questionnaire to assess student purpose and efficacy in reading disciplinary literature 19 May 2022 | International Journal of Science Education, Vol. 44, No. 8
  • Development of the Adolescent Opioid Safety and Learning (AOSL) scale using exploratory factor analysis 1 May 2022 | Research in Social and Administrative Pharmacy, Vol. 18, No. 5
  • Caregiver social support and child toilet training in rural Odisha, India: What types of support facilitate training and how? 19 October 2021 | Applied Psychology: Health and Well-Being, Vol. 14, No. 2
  • The Competence Scale in Managing Behavioral and Psychological Symptoms of Dementia (CS-MBPSD) for family caregivers: Instrument development and cross-sectional validation study 1 May 2022 | International Journal of Nursing Studies, Vol. 129
  • Portuguese Version of the Spiritual Well-Being Questionnaire: Validation Study in People under Assisted Reproductive Techniques 26 April 2022 | Religions, Vol. 13, No. 5
  • Why Did Students Report Lower Test Anxiety during the COVID-19 Pandemic? 29 Apr 2022 | Journal of Microbiology & Biology Education, Vol. 23, No. 1
  • A new perspective of work stress on teaching performance by competencies 13 April 2022 | International Journal of Leadership in Education, Vol. 17
  • Development and validation of Online Classroom Learning Environment Inventory (OCLEI): The case of Indonesia during the COVID-19 pandemic 4 March 2021 | Learning Environments Research, Vol. 25, No. 1
  • Validity and reliability of a social skills scale among Chilean health sciences students: A cross-sectional study 14 March 2022 | European Journal of Translational Myology, Vol. 32, No. 1
  • Assessment of the Psychometric Properties of the Holland Sleep Disorders Questionnaire in the Iranian Population 14 Mar 2022 | Sleep Disorders, Vol. 2022
  • Comprehensive assessment of reliability and validity for the clinical cases in simulated community pharmacy 11 March 2022 | Pharmacy Education
  • Development and Evaluation of a Survey to Measure Student Engagement at the Activity Level in General Chemistry 9 February 2022 | Journal of Chemical Education, Vol. 99, No. 3
  • Developing a Multilevel Scale to Assess Retention of Workers with Disabilities 9 June 2021 | Journal of Occupational Rehabilitation, Vol. 32, No. 1
  • One size does NOT fit all: Understanding differences in perceived organizational support during the COVID‐19 pandemic 3 March 2022 | Business and Society Review, Vol. 127, No. S1
  • Dimensionality and Reliability of the Intentions to Seeking Counseling Inventory with International Students 9 July 2021 | Journal of International Students, Vol. 12, No. 1
  • Nursing Warmth Scale (NWS): Development and empirical validation 7 June 2022 | Avances en Enfermería, Vol. 40, No. 2
  • Psychometric properties and factor structure of the Finnish version of the Health Care Providers’ Pain and Impairment Relationship Scale 1 Feb 2022 | Musculoskeletal Science and Practice, Vol. 57
  • Evolution of the earthworm (Eisenia fetida) microbial community in vitro and in vivo under tetracycline stress 1 Feb 2022 | Ecotoxicology and Environmental Safety, Vol. 231
  • The moderation effect of work engagement on entrepreneurial attitude and organizational commitment: evidence from Thailand’s entry-level employees during the COVID-19 pandemic 29 July 2021 | Asia-Pacific Journal of Business Administration, Vol. 14, No. 1
  • 2022 | Global Business and Organizational Excellence, Vol. 41, No. 5
  • 2022 | Science & Education, Vol. 31, No. 3
  • 2022 | Medical Science Educator, Vol. 32, No. 3
  • Factors influencing undergraduate nursing students’ evaluation of teaching effectiveness in a nursing program at a higher education institution in Namibia 1 Jan 2022 | International Journal of Africa Nursing Sciences, Vol. 17
  • 2022 | Ocean & Coastal Management, Vol. 225
  • 2022 | Journal of Microbiology & Biology Education, Vol. 23, No. 1
  • 2022 | Evolution: Education and Outreach, Vol. 15, No. 1
  • 2022 | PLOS ONE, Vol. 17, No. 6
  • Psychometric Properties of Remote Teaching Efficacy Scale in Employed Filipino Teachers during COVID-19 Crisis 1 Jan 2022 | Journal of Digital Educational Technology, Vol. 2, No. 1
  • The Technology Acceptance of Video Consultations for Type 2 Diabetes Care in General Practice: Cross-sectional Survey of Danish General Practitioners 30 August 2022 | Journal of Medical Internet Research, Vol. 24, No. 8
  • The Intersection of Persuasive System Design and Personalization in Mobile Health: Statistical Evaluation 14 September 2022 | JMIR mHealth and uHealth, Vol. 10, No. 9
  • An assessment of the policy and regulatory outcome by the telecom services users: The emerging economy study 9 May 2022 | Journal of Governance and Regulation, Vol. 11, No. 2, special issue
  • Identifying Structure in Program-Level Competencies and Skills 1 Jan 2022
  • Impact of a Mentorship Training Course on the Prevalence of Burnout in Nurse Leaders Working in a Regional Healthcare System 1 Jan 2022 | SSRN Electronic Journal, Vol. 38
  • Increasing Resilience of Utility Tunnel PPP Projects Through Risk Management: A Case on in Shiyan City 2 September 2022
  • The Ph.D. Panic: Examining the Relationships Among Teaching Anxiety, Teaching Self-Efficacy, And Coping in Biology Graduate Teaching Assistants (GTAs) 1 Jan 2022 | Journal of Research in Science, Mathematics and Technology Education, Vol. 5, No. SI
  • Development and validation of ESL/EFL reading strategies inventory 1 Jan 2022 | Ampersand, Vol. 9
  • Modifying the ASPECT Survey to Support the Validity of Student Perception Data from Different Active Learning Environments 15 Dec 2021 | Journal of Microbiology & Biology Education, Vol. 22, No. 3
  • S. Salehi ,
  • S. A. Berk ,
  • R. Brunelli ,
  • S. Cotner ,
  • C. Creech ,
  • A. G. Drake ,
  • S. Fagbodun ,
  • S. Hebert ,
  • J. Hewlett ,
  • A. C. James ,
  • M. Shuster ,
  • J. R. St. Juliana ,
  • D. B. Stovall ,
  • R. Whittington ,
  • M. Zhong , and
  • C. J. Ballen
  • Rebecca Price, Monitoring Editor
  • Lauren Hensley ,
  • Amy Kulesza ,
  • Joshua Peri ,
  • Anna C. Brady ,
  • Christopher A. Wolters ,
  • David Sovic , and
  • Caroline Breitenberger
  • Joel K. Abraham, Monitoring Editor
  • Development of emergency nursing care competency scale for school nurses 14 April 2021 | BMC Nursing, Vol. 20, No. 1
  • Measuring sexual violence stigma in humanitarian contexts: assessment of scale psychometric properties and validity with female sexual violence survivors from Somalia and Syria 24 December 2021 | Conflict and Health, Vol. 15, No. 1
  • Quantifying fear of failure in STEM: modifying and evaluating the Performance Failure Appraisal Inventory (PFAI) for use with STEM undergraduates 6 July 2021 | International Journal of STEM Education, Vol. 8, No. 1
  • Measuring COVID-19 related anxiety and obsession: Validation of the Coronavirus Anxiety Scale and the Obsession with COVID-19 Scale in a probability Chinese sample 1 Dec 2021 | Journal of Affective Disorders, Vol. 295
  • Proposal of a temporality perspective for a successful organizational change project 10 August 2021 | International Journal of Workplace Health Management, Vol. 14, No. 5
  • On Black Male Leadership: A Study of Leadership Efficacy, Servant Leadership, and Engagement Mediated by Microaggressions 31 August 2021 | Advances in Developing Human Resources, Vol. 23, No. 4
  • Development and validation of the athletes’ rights survey 15 November 2021 | BMJ Open Sport & Exercise Medicine, Vol. 7, No. 4
  • Establishing appropriate sample size for developing and validating a questionnaire in nursing research 28 October 2021 | Belitung Nursing Journal, Vol. 7, No. 5
  • Addressing the Unique Qualities of Upper-Level Biology Course-based Undergraduate Research Experiences through the Integration of Skill-Building 3 May 2021 | Integrative and Comparative Biology, Vol. 61, No. 3
  • Reassessment of climate zones for high-level pavement analysis using machine learning algorithms and NASA MERRA-2 data 1 Oct 2021 | Advanced Engineering Informatics, Vol. 50
  • Mediation Analysis in Discipline-Based Education Research Using Structural Equation Modeling: Beyond “What Works” to Understand How It Works, and for Whom 10 Sep 2021 | Journal of Microbiology & Biology Education, Vol. 22, No. 2
  • Psychometric Evaluation of the Nurses Professional Values Scale-3: Indonesian Version 20 August 2021 | International Journal of Environmental Research and Public Health, Vol. 18, No. 16
  • Chemistry self-efficacy in lower-division chemistry courses: changes after a semester of instruction and gaps still remain between student groups 1 January 2021 | Chemistry Education Research and Practice, Vol. 22, No. 3
  • The incoherence of sustainability literacy assessed with the Sulitest 25 February 2021 | Nature Sustainability, Vol. 4, No. 6
  • Validation of the Arabic Version of the Copenhagen Psychosocial Questionnaire II (A-COPSOQ II) among Workers in Oil and Gas Industrial Sector 1 June 2021 | Journal of Biomedical Research & Environmental Sciences
  • Somatic symptoms have negligible impact on Patient Health Questionnaire‐9 depression scale scores in neurological patients 26 March 2021 | European Journal of Neurology, Vol. 28, No. 6
  • Enhancing the Positive Impact Rating: A New Business School Rating in Support of a Sustainable Future 8 June 2021 | Sustainability, Vol. 13, No. 12
  • Multi-institutional Study of Self-Efficacy within Flipped Chemistry Courses 30 March 2021 | Journal of Chemical Education, Vol. 98, No. 5
  • Hospitality workers’ COVID-19 risk perception and depression: A contingent model based on transactional theory of stress model 1 May 2021 | International Journal of Hospitality Management, Vol. 95
  • Commonalities and specificities of positive youth development in the U.S. and Taiwan 1 Mar 2021 | Journal of Applied Developmental Psychology, Vol. 73
  • Course-Based Undergraduate Research Experiences Spanning Two Semesters of Biology Impact Student Self-Efficacy but not Future Goals 27 September 2023 | Journal of College Science Teaching, Vol. 50, No. 4
  • Preliminary validity and reliability evidence of the Brief Antisocial Behavior Scale (B-ABS) in young adults from four countries 22 February 2021 | PLOS ONE, Vol. 16, No. 2
  • Construct Validity and Test–Retest Reliability of the Automated Vehicle User Perception Survey 25 January 2021 | Frontiers in Psychology, Vol. 12
  • Cross-Cultural Adaptation and Validation of the Malay Satisfaction Questionnaire for Osteoporosis Prevention in Malaysia 1 June 2021 | Patient Preference and Adherence, Vol. Volume 15
  • Validity and Reliability of the Turkish Version of the COVID Stress Scale 1 Jan 2021 | Journal of Korean Academy of Nursing, Vol. 51, No. 5
  • Symptom clusters and quality of life among patients with chronic heart failure: A cross‐sectional study 28 August 2020 | Japan Journal of Nursing Science, Vol. 18, No. 1
  • Measuring university students’ interest in biology: evaluation of an instrument targeting Hidi and Renninger’s individual interest 19 May 2020 | International Journal of STEM Education, Vol. 7, No. 1
  • Belonging in general chemistry predicts first-year undergraduates’ performance and attrition 1 January 2020 | Chemistry Education Research and Practice, Vol. 21, No. 4
  • Eva Knekta ,
  • Kyriaki Chatzikyriakidou ,, and
  • Melissa McCartney
  • David Feldon, Monitoring Editor
  • Brie Tripp and
  • Erin E. Shortlidge
  • Developing and testing a measure of COVID-19 organizational support of healthcare workers – results from Peru, Ecuador, and Bolivia 1 Sep 2020 | Psychiatry Research, Vol. 291
  • Developing and validating five-construct model of customer satisfaction in beauty and cosmetic E-commerce 1 Sep 2020 | Heliyon, Vol. 6, No. 9
  • Stepfanie M. Aguillon ,
  • Gregor-Fausto Siegmund ,
  • Renee H. Petipas ,
  • Abby Grace Drake ,
  • Sehoya Cotner , and
  • Cissy J. Ballen
  • Sarah L. Eddy, Monitoring Editor
  • Amanda R. Butz and
  • Janet L. Branchaw
  • All Happy Emotions Are Alike but Every Unhappy Emotion Is Unhappy in Its Own Way: A Network Perspective to Academic Emotions 30 April 2020 | Frontiers in Psychology, Vol. 11
  • Developing a Scale to Measure Students’ Attitudes toward Science 5 January 2020 | International Journal of Assessment Tools in Education, Vol. 6, No. 4
  • Ashley A. Rowland ,
  • Sarah Eddy , and
  • Cynthia Brame, Monitoring Editor
  • Beyond linear regression: A reference for analyzing common data types in discipline based education research 3 July 2019 | Physical Review Physics Education Research, Vol. 15, No. 2
  • Identification of University Students’ Psychological Capital Components from Islamic Perspective 1 March 2019 | Applied Issues in Quarterly Journal of Islamic Education, Vol. 4, No. 1
  • Motivation, Self-efficacy, and Student Engagement in Intermediate Mechanical Engineering Courses

factor analysis journal research

Submitted: 26 April 2018 Revised: 20 September 2018 Accepted: 27 November 2018

© 2019 E. Knekta et al. CBE—Life Sciences Education © 2019 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

  • Factor analysis
  • Statistical methods
  • Get an email alert for Factor analysis
  • Get the RSS feed for Factor analysis

Showing 1 - 13 of 1,069

View by: Cover Page List Articles

Sort by: Recent Popular

factor analysis journal research

Perceived school service quality and vocational students’ learning satisfaction: Mediating role of conceptions of vocational education

Tao Guo, Tianxin Li, Zhanyong Qi

factor analysis journal research

Recognition of the game situation in baseball

Yasuhiro Hashimoto, Hiroshi Takahashi,  [ ... ], Hiroki Nakata

factor analysis journal research

Satisfaction of basic needs mediates relationships between incremental mindsets and well-being

Marzena Cypryańska, John B. Nezlek

factor analysis journal research

The patronage of religious tourism seen from its motivations that predict satisfaction and loyalty: The Virgin of Chaguaya in Bolivia

Mauricio Carvache-Franco, Jose Loaiza-Torres,  [ ... ], Wilmer Carvache-Franco

factor analysis journal research

Accurate classification of wheat freeze injury severity from the color information in digital canopy images

Jibo Zhang, Haijun Huan,  [ ... ], Pei Zhang

factor analysis journal research

The effect of workforce diversity on organizational performance with the mediation role of workplace ethics: Empirical evidence from food and beverage industry

Abel Tewolde Mehari, Zerihun Ayenew Birbirsa, Gemechu Nemera Dinber

factor analysis journal research

Healthy city evaluation based on factor analysis—Taking cities in the Guangxi Zhuang Autonomous Region as an example

Hui Huang, Shuxin Huang,  [ ... ], Shuguang Deng

factor analysis journal research

The antecedents of customer satisfaction in the live-streaming commerce of green agricultural products

Ying Wang, Lin Fang, Jialing Pan

factor analysis journal research

Measuring Strong, Skillful, Good and Transpersonal Will: The development of the Multidimensional Will Scale

Andrea Bonacchi, Georgia Marunic,  [ ... ], Francesca Chiesi

factor analysis journal research

Measuring the relationship between museum attributes and visitors: An application of topic model on museum online reviews

Hong Huo, Keqin Shen, Chunjia Han, Mu Yang

factor analysis journal research

Validation of the patient reported experiences and outcomes of safety in primary care compact Form Brazil

Ana Elisa Bauer de Camargo Silva, Tanielly Paula Sousa,  [ ... ], Cristina Alves Bernardes

factor analysis journal research

Study protocol for the development and validation of a questionnaire evaluating predisposition to immunosuppressant medication non-adherence of kidney pre-transplant patients. The KATITA project

Luana Cristina Lins de Medeiros Oliveira, Rand Randall Martins, Antonio Gouveia Oliveira

factor analysis journal research

Prosthesis usability experience is associated with extent of upper limb prosthesis adoption: A Structural Equation Modeling (SEM) analysis

Linda J. Resnik, Matthew Borgia,  [ ... ], Pengsheng Ni

Connect with Us

  • PLOS ONE on Twitter
  • PLOS on Facebook

factor analysis journal research

  • Special Issues
  • Conferences
  • Turkish Journal of Analysis and Number Theory Home
  • Current Issue
  • Browse Articles
  • Editorial Board
  • Abstracting and Indexing
  • Aims and Scope
  • American Journal of Applied Mathematics and Statistics Home
  • Social Science
  • Medicine & Healthcare
  • Earth & Environmental
  • Agriculture & Food Sciences
  • Business, Management & Economics
  • Biomedical & Life Science
  • Mathematics & Physics
  • Engineering & Technology
  • Materials Science & Metallurgy
  • Quick Submission
  • Apply for Editorial Position
  • Propose a special issue
  • Launch a new journal
  • Authors & Referees
  • Advertisers
  • Open Access

factor analysis journal research

  • Full-Text PDF
  • Full-Text HTML
  • Full-Text Epub
  • Full-Text XML
  • Noora Shrestha. Factor Analysis as a Tool for Survey Analysis. American Journal of Applied Mathematics and Statistics . Vol. 9, No. 1, 2021, pp 4-11. https://pubs.sciepub.com/ajams/9/1/2 ">Normal Style
  • Shrestha, Noora. 'Factor Analysis as a Tool for Survey Analysis.' American Journal of Applied Mathematics and Statistics 9.1 (2021): 4-11. ">MLA Style
  • Shrestha, N. (2021). Factor Analysis as a Tool for Survey Analysis. American Journal of Applied Mathematics and Statistics , 9 (1), 4-11. ">APA Style
  • Shrestha, Noora. 'Factor Analysis as a Tool for Survey Analysis.' American Journal of Applied Mathematics and Statistics 9, no. 1 (2021): 4-11. ">Chicago Style

Factor Analysis as a Tool for Survey Analysis

Factor analysis is particularly suitable to extract few factors from the large number of related variables to a more manageable number, prior to using them in other analysis such as multiple regression or multivariate analysis of variance. It can be beneficial in developing of a questionnaire. Sometimes adding more statements in the questionnaire fail to give clear understanding of the variables. With the help of factor analysis, irrelevant questions can be removed from the final questionnaire. This study proposed a factor analysis to identify the factors underlying the variables of a questionnaire to measure tourist satisfaction. In this study, Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test of Sphericity are used to assess the factorability of the data. Determinant score is calculated to examine the multicollinearity among the variables. To determine the number of factors to be extracted, Kaiser’s Criterion and Scree test are examined. Varimax orthogonal factor rotation method is applied to minimize the number of variables that have high loadings on each factor. The internal consistency is confirmed by calculating Cronbach’s alpha and composite reliability to test the instrument accuracy. The convergent validity is established when average variance extracted is greater than or equal to 0.5. The results have revealed that the factor analysis not only allows detecting irrelevant items but will also allow extracting the valuable factors from the data set of a questionnaire survey. The application of factor analysis for questionnaire evaluation provides very valuable inputs to the decision makers to focus on few important factors rather than a large number of parameters.

1. Introduction

Factor Analysis is a multivariate statistical technique applied to a single set of variables when the investigator is interested in determining which variables in the set form logical subsets that are relatively independent of one another 1 . In other words, factor analysis is particularly useful to identify the factors underlying the variables by means of clubbing related variables in the same factor 2 . In this paper, the main focus is given on the application of factor analysis to reduce huge number of inter-correlated measures to a few representative constructs or factors that can be used for subsequent analysis 3 . The goal of the present work is to examine the application of factor analysis of a questionnaire item to measure tourist satisfaction. Therefore, in order to identify the factors, it is necessary to understand the concept and steps to apply factor analysis for the questionnaire survey.

Factor analysis is based on the assumption that all variables correlate to some degree. The variables should be measured at least at the ordinal level. The sample size for factor analysis should be larger but the more acceptable range would be a ten-to-one ratio Handbook of univariate and multivariate data analysis and interpretation with SPSS, Chapman & Hall/CRC, Boca Raton, 2006." class="coltj"> 3 , Multivariate data analysis (5 th ed.) , N J: Prentice-Hall, Upper Saddle River, 1998." class="coltj"> 4 . There are two main approaches to factor analysis: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Exploratory factor analysis is used for checking dimensionality and often used in the early stages of research to gather information about the interrelationships among a set of variables 5 . On the other hand, the confirmatory factor analysis is a more complex and sophisticated set of techniques used in the research process to test specific hypotheses or theories concerning the structure underlying a set of variables Multivariate data analysis, Upper Saddle River, New Jersey, 2006." class="coltj"> 6 , SPSS survival manual: a step by step guide to data analysis using SPSS, Open University Press/ Mc Graw-Hill, Maidenhead, 2010." class="coltj"> 7 .

Several studies examined and discussed the application of factor analysis to reduce the large set of data and to identify the factors extracted from the analysis Multivariate Behavioral Research, 12 (1). 43-47. 1977." class="coltj"> 8 , Psychological Bulletin , 81, 358-361. 1974." class="coltj"> 9 , Biometrika, 38(3/4), 337-344. 1951." class="coltj"> 10 , Psychological Methods, 4(1), 84-99. 1999." class="coltj"> 11 . In tourism business, the satisfaction of tourists can be measured by the large number of parameters. The factor analysis may cluster these variables into different factors where each factor measure some dimension of tourist satisfaction. Factors are so designed that the variables contained in it are linked with each other in some way. The significant factors are extracted to explain the maximum variability of the group under study. The application of factor analysis provides very valuable inputs to the decision makers and policy makers to focus on few factors rather than a large number of parameters. People related to the tourism business is interested to know as to what makes their customer or tourists to choose a particular destination. There may be boundless concerns on which the opinion of the tourists can be taken. Several issues like local food, weather condition, culture, nature, recreation activities, photography, travel video making, transportation, medical treatment, water supply, safety, communication, trekking, mountaineering, environment, natural resources, cost of accommodation, transportation, etc. may be explored by taking the responses from the tourists survey and from the literature review 12 . By using the factor analysis, the large number of variables may be clubbed in different components like component one, component two, etc. Instead of concentrating on many issues, the researcher or policy maker can make a strategy to optimize these components for the growth of tourism business.

The contribution of this paper is twofold related to the advantages of factor analysis. First, factor analysis can be applied to developing of a questionnaire. On doing analysis, irrelevant questions can be removed from the final questionnaire. It helps in categorizing the questions into different parameters in the questionnaire. Second, factor analysis can be used to simplify data, such as decreasing the number of variables in regression models. This study also encourages researchers to consider the step-by-step process to identify factors using factor analysis. Sometimes adding more statements or items in the questionnaire fail to give clear understanding of the variables. Using factor analysis, few factors are extracted form the large number of related variables to a more manageable number, prior to using them in other analysis such as multiple regression or multivariate analysis of variance SPSS survival manual: a step by step guide to data analysis using SPSS, Open University Press/ Mc Graw-Hill, Maidenhead, 2010." class="coltj"> 7 , Statistical Methods (8 th ed.) , Iowa State University Press, Iowa, 1989." class="coltj"> 13 . Hence, instead of examining all the parameters, few extracted factors can be studied which in turn explain the variations of the group characteristics. Therefore, the present study discusses on the factor analysis of a questionnaire to measure tourist satisfaction. In the present work, data collected from the tourist satisfaction survey is used as an example for the factor analysis.

The structured questionnaire was designed to collect primary data. The data were collected from the international tourists travelled various places of Nepal in 2019. The tourists older than 25 years of age who had been in Nepal for over a week and had experienced the travelling were included in this study. The pilot study was carried out among 15 tourists, who were not included in the sample, to identify the possible errors of a questionnaire so as to improve the reliability (Cronbach’s alpha > 0.7) of the questionnaire. The questionnaire consists of questions and statements related to the independent and dependent variables, which were developed on the basis of literature review. Each statement was rated on a five-point (1 to 5) Likert scale, with high score 5 indicating strongly agree with that statement. The statements were written to reflect the hospitality, destination attractions, and relaxation. The data were gathered from the 1 st week of November 2019 to last week of December 2019. Due to outbreak of 2019 novel coronavirus, the data collection process was affected and hence convenience sampling method was used to select a respondent. Total 220 questionnaires were distributed among the tourists but only 200 respondents provided their reactions to the statements with a response rate 91%. All the statistical analysis has performed using IBM SPSS version 23.

The reliability of a questionnaire is examined with Cronbach’s alpha. It provides a simple way to measure whether or not a score is reliable. It is used under the assumption that there are multiple items measuring the same underlying construct; such as in tourist satisfaction survey, there are few questions all asking different things, but when combined, could be said to measure overall satisfaction. Cronbach’s alpha is a measure of internal consistency. It is also considered to be a measure of scale reliability and can be expressed as

The average variance extracted and the composite reliability coefficients are related to the quality of a measure. AVE is a measure of the amount of variance that is taken by a construct in relation to the amount of variance due to measurement error 15 . To be specific, AVE is a measure to assess convergent validity.

Convergent validity is used to measure the level of correlation of multiple indicators of the same construct that are in agreement. The factor loading of the items, composite reliability and the average variance extracted have to be calculated to determine convergent validity 16 . The value of AVE and CR ranges from 0 to 1, where a higher value indicates higher reliability level. AVE is more than or equal to 0.5 confirms the convergent validity. The average variance extracted is the sum of squared loadings divided by the number of items and is given by

Composite reliability is a measure of internal consistency in scale items 17 . According to Fornell and Larcker (1981), composite reliability is an indicator of the shared variance among the observed variables used as an indicator of a latent construct. CR for each construct can be obtained by summing of squares of completely standardized factor loadings divided by this sum plus total of variance of the error term for i th indicators. CR can be calculated as:

Here, n is the number of the items, λ i the factor loading of item i, and Var (e i ) the variance of the error of the item i, The values of composite reliability between 0.6 to 0.7 are acceptable while in more advanced phase the value have to be higher than 0.7. According to Fornell and Larcker (1981), if AVE is less than 0.5, but composite reliability is higher than 0.6, the convergent validity of the construct is still adequate.

This study employs exploratory factor analysis to examine the data set to identify complicated interrelationships among items and group items that are part of integrated concepts. Due to explorative nature of factor analysis, it does not differentiate between independent and dependent variables. Factor analysis clusters similar variables into the same factor to identify underlying variables and it only uses the data correlation matrix. In this study, factor analysis with principal components extraction used to examine whether the statements represent identifiable factors related to tourist satisfaction. The principal component analysis (PCA) signifies to the statistical process used to underline variation for which principal data components are calculated and bring out strong patterns in the dataset Multivariate data analysis, Upper Saddle River, New Jersey, 2006." class="coltj"> 6 , Psychological Bulletin , 81, 358-361. 1974." class="coltj"> 9 .

Factor Model with ‘m’ Common Factors

Let X = (X 1 , X 2 , ....X p )’is a random vector with mean vector μ and covariance matrix Σ. The factor analysis model assumes that X = μ + λ F + ε, where, λ = { λ jk } pxm denotes the matrix of factor loadings; λ jk is the loading of the j th variable on the k th common factor, F= (F 1 ,F 2 ,....F m )’ denotes the vector of latent factor scores; F k is the score on the k th common factor and ε = (ε 1 , ε 2 ,....ε p )’ denotes the vector of latent error terms; ε j is the j th specific factor.

There are three major steps for factor analysis: a) assessment of the suitability of the data, b) factor extraction, and c) factor rotation and interpretation. They are described as:

2.2.1.1. Assessment of the Suitability of the data

To determine the suitability of the data set for factor analysis, sample size and strength of the relationship among the items have to be considered Using multivariate statistics (6 th ed.) , Pearson, 2013." class="coltj"> 1 , Applied multivariate statistics for the social sciences (3 rd ed.), Lawrence Erlbaum Associates, Mahwah, NJ, 1996." class="coltj"> 18 . Generally, a larger sample is recommended for factor analysis i.e. ten cases for each item. Nevertheless, a smaller sample size can also be sufficient if solutions have several high loading marker variables < 0.80 18 . To determine the strength of the relationship among the items, there must be evidence of the coefficient of correlation > 0.3 in the correlation matrix. The existence of multicollinearity in the data is a type of disturbance that alters the result of the analysis. It is a state of great inter-correlations among the independent variables. Multicollinearity makes some of the significant variables in a research study to be statistically insignificant and then the statistical inferences made about the data may not be trustworthy American Journal of Applied Mathematics and Statistics , 8(2), 39-42, 2020." class="coltj"> 19 , Review of Economics and Statistics, 51(4), 486-489. 1969." class="coltj"> 20 . Hence, the presence of multicollinearity among the variables is examined with the determinant score.

Determinant Score

The value of the determinant is an important test for multicollinearity or singularity. The determinant score of the correlation matrix should be > 0.00001 which specifies that there is an absence of multicollinearity. If the determinant value is < 0.00001, it would be important to attempt to identify pairs of variables where correlation coefficient r > 0.8 and consider eliminating them from the analysis. A lower score might indicate that groups of three or more questions/statements have high inter-correlations, so the threshold for item elimination should be reduced until this condition is satisfied. If correlation is singular, the determinant |R| =0 Review of Economics and Statistics, 51(4), 486-489. 1969." class="coltj"> 20 , Discovering statistics using SPSS (3 rd ed.), SAGE, London, 2009." class="coltj"> 21 .

There are two statistical measures to assess the factorability of the data: Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett’s test of Sphericity.

Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy

KMO test is a measure that has been intended to measure the suitability of data for factor analysis. In other words, it tests the adequacy of the sample size. The test measures sampling adequacy for each variable in the model and for the complete model. The KMO measure of sampling adequacy is given by the formula:

where, R ij is the correlation matrix and U ij is the partial covariance matrix. KMO value varies from 0 to 1. The KMO values between 0.8 to 1.0 indicate the sampling is adequate. KMO values between 0.7 to 0.79 are middling and values between 0.6 to 0.69 are mediocre. KMO values less than 0.6 indicate the sampling is not adequate and the remedial action should be taken. If the value is less than 0.5, the results of the factor analysis undoubtedly won’t be very suitable for the analysis of the data. If the sample size is < 300 the average communality of the retained items has to be tested. An average value > 0.6 is acceptable for sample size < 100, an average value between 0.5 and 0.6 is acceptable for sample sizes between 100 and 200 Using multivariate statistics (6 th ed.) , Pearson, 2013." class="coltj"> 1 , Psychometrika, 19, 149-161. 1954." class="coltj"> 22 , Psychometrika, 35, 401-415. 1970." class="coltj"> 23 , Exploratory factor analysis, [E-book], available: net Library e-book." class="coltj"> 24 .

Bartlett’s Test of Sphericity

Bartlett’s Test of Sphericity tests the null hypothesis, H 0 : The variables are orthogonal i.e. The original correlation matrix is an identity matrix indicating that the variables are unrelated and therefore unsuitable for structure detection. The alternative hypothesis, H 1 : The variables are not orthogonal i.e. they are correlated enough to where the correlation matrix diverges significantly from the identity matrix. The significant value < 0.05 indicates that a factor analysis may be worthwhile for the data set.

In order to measure the overall relation between the variables the determinant of the correlation matrix |R| is calculated. Under H 0 , |R| =1; if the variables are highly correlate, then |R| ≈ 0. The Bartlett’s test of Sphericity is given by:

where, p= number of variables, n= total sample size and R= correlation matrix Psychometrika, 19, 149-161. 1954." class="coltj"> 22 , Exploratory factor analysis, [E-book], available: net Library e-book." class="coltj"> 24 .

Factor extraction encompasses determining the least number of factors that can be used to best represent the interrelationships among the set of variables. There are many approaches to extract the number of underlying factors. For obtaining factor solutions, principal component analysis and common factor analysis can be used. This study has used principal component analysis (PCA) because the purpose of the study is to analyze the data in order to obtain the minimum number of factors required to represent the available data set.

To Determine the Number of Factors to be Extracted

In this study two techniques are used to assist in the decision concerning the number of factors to retain: Kaiser’s Criterion and Scree Test. The Kaiser’s criterion (Eigenvalue Criterion) and the Scree test can be used to determine the number of initial unrotated factors to be extracted. The eigenvalue is a ratio between the common variance and the specific variance explained by a specific factor extracted.

Kaiser’s (Eigenvalue) Criterion

The eigenvalue of a factor represents the amount of the total variance explained by that factor. In factor analysis, the remarkable factors having eigenvalue greater than one are retained. The logic underlying this rule is reasonable. An eigenvalue greater than one is considered to be significant, and it indicates that more common variance than unique variance is explained by that factor SPSS survival manual: a step by step guide to data analysis using SPSS, Open University Press/ Mc Graw-Hill, Maidenhead, 2010." class="coltj"> 7 , Psychometrika, 19, 149-161. 1954." class="coltj"> 22 , Psychometrika, 35, 401-415. 1970." class="coltj"> 23 , Data analysis in management with SPSS software, Springer, India, 2013." class="coltj"> 25 . Measure and composite variables are separate classes of variables. Factors are latent constructs created as aggregates of measured variables and so should consist of more than a single measured variable. But eigenvalues, like all sample statistics, have some sampling error. Hence, it is very important for the researcher to exercise some judgment in using this strategy to determine the number of factors to extract or retain 26 .

Cattell (1996) proposed a graphical test for determining the number of factors. A scree plot graphs eigenvalue magnitudes on the vertical access, with eigenvalue numbers constituting the horizontal axis. The eigenvalues are plotted as dots within the graph, and a line connects successive values. Factor extraction should be stopped at the point where there is an ‘elbow’ or leveling of the plot. This test is used to identify the optimum number of factors that can be extracted before the amount of unique variance begins to dominate the common variance structure Multivariate data analysis (5 th ed.) , N J: Prentice-Hall, Upper Saddle River, 1998." class="coltj"> 4 , Multivariate Behavioral Research, 1, 245-276. 1966." class="coltj"> 27 , Factor analysis, Greenwood Press, Westport, CT, 1973." class="coltj"> 28 .

Factors obtained in the initial extraction phase are often difficult to interpret because of significant cross loadings in which many factors are correlated with many variables. There are two main approaches to factor rotation; orthogonal (uncorrelated) or oblique (correlated) factor solutions. In this study, orthogonal factor rotation is used because it results in solutions that are easier to interpret and to report. The varimax, quartimax, and equimax are the methods related to orthogonal rotation. Furthermore, Varimax method developed by Kaiser (1958) is used to minimize the number of variables that have high loadings on each factor. Varimax tends to focus on maximizing the differences between the squared pattern structure coefficients on a factor (i.e. focuses on a column perspective). The spread in loadings is maximized loadings that are high after extraction become higher after rotation and loadings that are low become lower. If the rotated component matrix shows many significant cross-loading values then it is suggested to rerun the factor analysis to get an item loaded in only one component by deleting all cross loaded variables Exploratory and confirmatory factor analysis: Understanding concepts and application, American Psychological Association, Washington D.C., 2004. " class="coltj"> 26 , Factor analysis, Greenwood Press, Westport, CT, 1973." class="coltj"> 28 , Psychometrika, 23, 187-200. 1958." class="coltj"> 29 .

Orthogonal Factor Model Assumptions

The orthogonal factor analysis model assumes the form X = μ + λ F + ε, and adds the assumptions that F~ (0, 1 m ), i.e. the latent factors have mean zero, unit variance, and are uncorrelated, Ε ~ (0, Ψ) where Ψ = diag(Ψ 1 , Ψ 2 , ... Ψ p ) with Ψ i denoting the j th specific variance, and ε j and F k are independent of one another for all pairs, j, k.

Variance Explained by Common Factors

The portion of variance of the j th variable that is explained by the ‘m’ common factors is called the communality of the j th variable: σ jj = h j 2 + ψ j , where, σ jj is the variance of X j (i.e. j th diagonal of Σ). Communality is the sum of squared loadings for X j and given by h j 2 = (λλ’) jj = λ j1 2 + λ j2 2 +......+ λ jm 2 is the communality of X j , and ψ j is the specific variance (or uniqueness) of X j Encyclopedia of survey research methods. SAGE Publications, Thousand Oaks, 2008." class="coltj"> 14 , Exploratory factor analysis, [E-book], available: net Library e-book." class="coltj"> 24 , Exploratory and confirmatory factor analysis: Understanding concepts and application, American Psychological Association, Washington D.C., 2004. " class="coltj"> 26 .

3. Results and Discussions

In this section the results obtained with the statistical software SPSS are presented. In this study, the participants consisted of 200 tourists who had been travelling in Nepal during 2019. Majority (26.5%) of tourists belongs to the age group 40 to 44 years. Participants ranged in age from 25 to 55 years (mean age= 39.8 years, standard deviation = 7.94) and of the total sample n=108, 54% were male and n= 92, 46% were female. In addition, the respondents were from various parts of the world. The region wise distribution of tourists was Asian–SAARC (n=59, 29.5%), Asian-others (n=58, 29%), European (n=40, 20%), Americans (n=18, 9%), Oceania (n=17, 8.5%), and Other (n=8, 4%). There are various purposes of visiting Nepal. 29.5% (n=59) of the tourist visited Nepal with the purpose of holiday and pleasure. Similarly, n=50, 25% of the tourist came for adventure including trekking and mountaineering, n=30, 15% for volunteering and academic purposes, n=44, 22% for entertainment video and photography, n=17, 8.5% for other purposes. The average length of stay of respondent tourists was found to be 12 days. According to Nepal tourism statistics 2019, the average length of stay of international tourists in Nepal in 2018 dropped to 12.4 days from 12.6 days in 2017 30 .

This study has followed three major steps for factor analysis: a) assessment of the suitability of the data, b) factor extraction, and c) factor rotation and interpretation.

Table 1. Correlation Matrix a and Determinant Score

factor analysis journal research

  • Tables index View option Full Size Next Table

Step 1: Assessment of the Suita bility of the Data

The utmost significant factor of international tourist’s satisfaction is hospitality such as home stay and local family, arts, crafts, and historic places, local souvenirs, and local food. Similarly, destination attraction plays a vital role in tourist satisfaction such as cultural activities, trekking, sightseeing, and safety during travel period. Most of the tourists visit different places for relaxation and experience different lifestyle. These factors may be associated with the satisfaction of tourists. To analyze the tourist satisfaction, Kaiser-Meyer-Olkin is used to measure the suitability of data for factor analysis. Similarly, Bartlett’s test of Sphericity, correlation matrix, and determinant score are computed to detect the appropriateness of the data set for functioning factor analysis 31 .

In Table 1 , the correlation matrix displays that there are sufficient correlations to justify the application of factor analysis. The correlation matrix shows that there are few items whose inter-correlations > 0.3 between the variables and it can be concluded that the hypothesized factor model appears to be suitable. The value for the determinant is an important test for multicollinearity. The determinant score of the correlation matrix is 0.038 > 0.00001 which indicates that there is an absence of multicollinearity.

Table 2 illustrates the value of KMO statistics is equal to 0.813 > 0.6 which indicates that sampling is adequate and the factor analysis is appropriate for the data. Bartlett’s test of Sphericity is used to test for the adequacy of the correlation matrix. The Bartlett’s test of Sphericity is highly significant at p < 0.001 which shows that the correlation matrix has significant correlations among at least some of the variables. Here, test value is 637.65 and an associated degree of significance is less than 0.0001. Hence, the hypothesis that the correlation matrix is an identity matrix is rejected. To be specific, the variables are not orthogonal. The significant value < 0.05 indicates that a factor analysis may be worthwhile for the data set.

Table 2. Kaiser-Meyer-Olkin and Bartlett’s Test of Sphericity

factor analysis journal research

  • Tables index View option Full Size Previous Table Next Table

Step 2: Factor Extraction

Kaiser’s criterion and Scree test are used to determine the number of initial unrotated factors to be extracted. The eigenvalues associated with each factor represent the variance explained by those specific linear components. The coefficient value less than 0.4 is suppressed that will suppress the presentation of any factor loadings with values less than 0.4 23 .

Table 3. Eigenvalues (EV) and Total Variance Explained

factor analysis journal research

Table 3 demonstrates the eigenvalues and total variance explained. The extraction method of factor analysis used in this study is principal component analysis. Before extraction, eleven linear components are identified within the data set. After extraction and rotation, there are three distinct linear components within the data set for the eigenvalue > 1. The three factors are extracted accounting for a combined 60.2% of the total variance. It is suggested that the proportion of the total variance explained by the retained factors should be at least 50%. The result shows that 60.2% common variance shared by eleven variables can be accounted by three factors. This is the reflection of KMO value, 0.813, which can be considered good and also indicates that factor analysis is useful for the variables. This initial solution suggests that the final solution will extract not more than three factors. The first component has explained 22% of the total variance with eigenvalue 4.01. The second component has explained 20.9% variance with eigenvalue 1.54. The third component has explained 17.34% variance with eigenvalue 1.08.

In Figure 1 , for Scree test, a graph is plotted with eigenvalues on the y-axis against the eleven component numbers in their order of extraction on the x-axis. The initial factors extracted are large factors with higher eigenvalues followed by smaller factors. The scree plot is used to determine the number of factors to retain. Here, the scree plot shows that there are three factors for which the eigenvalue is greater than one and account for most of the total variability in data. The other factors account for a very small proportion of the variability and considered as not so much important.

Step 3: Factor Rotation and Interpretation

The present study has executed the extraction method based on principal component analysis and the orthogonal rotation method based on varimax with Kaiser normalization.

Table 4 exhibits factor loading, diagonal anti-image correlation and communality after extraction. The diagonal anti-image correlation stretches the knowledge of sampling adequacy of each and every item. The communalities reflect the common variance in the data structure after extraction of factors. Factor loading values communicates the relationship of each variable to the underlying factors. The variables with large loadings values > 0.40 indicate that they are representative of the factor.

factor analysis journal research

  • Figure 1. Scree Plot

Table 4. Summary for factors related to travel satisfaction

factor analysis journal research

The component 1 is labeled as ‘Hospitality’ which contains four items that strive for homestay, local food, local souvenirs, arts and craft, and have a correlation of 0.77, 0.70, 0.71, and 0.74, with component 1 respectively. The component hospitality explained 22% of the total variance with eigenvalue 4.01. This component contained four items but out of these items the arts, craft, and historic places tends to be strongly agreeing according to its mean score 4.23. The other three items such as strive for homestay, arts & historic place, local souvenirs, and local food have a tendency towards agree according to their mean score of the scale.

The second component entitled as ‘Destination Attraction’ explained 20.9% variance with eigenvalue 1.54. This component contained four items such as sightseeing, trekking, cultural activities, and safety. The variables sightseeing, trekking, cultural activities, and safety have correlation of 0.71, 0.72, 0.79, and 0.58 with component 2 respectively. The item cultural activities (mean = 4.25) tends to strongly agree but other items trekking, sightseeing, and safety tend to agree according to their mean score of scale.

The component 3 is marked as ‘Relaxation’. It contains three items namely stress relief, different lifestyle, new experience and which have a correlation of 0.52, 0.86, and 0.84 with component 3 respectively. The third component explained 17.34% variance with eigenvalue 1.08. The three items of the third component such as different lifestyle, new experience, and stress relief, tend to agree according to their mean score of scale.

In Table 4 , the diagonal element of the anti-image correlation value gives the information of sampling adequacy of each and every item that must be > 0.5. The amount of variance in each variable that can be explained by the retained factor is represented by the communalities after extraction. The communalities suggest the common variance in the data set. The communality value corresponding to the first statement (Item_1) of the first component is 0.63. It means 63% of the variance associated with this statement is common. Similarly, 0.63%, 0.57%, 0.56%, 0.64%, 0.59%, 0.55%, 0.67%, 0.50%, 0.50%, 0.77%, and 0.72% of the common variance associated with statement first to eleventh respectively.

4. Reliability and Validity Test Results

The internal consistency is confirmed by calculating Cronbach’s alpha to test the instrument accuracy and reliability. The adequate threshold value for Cronbach’s alpha is that it should be > 0.7. In Table 5 the component hospitality, destination attraction, and relaxation have Cronbach’s alpha values 0.75, 0.74, and 0.71 respectively, which confirmed the reliability of the survey instrument. The Cronbach’s alpha coefficient for the factors with total scale reliability is 0.82 > 0.7. It shows that the variables exhibit a correlation with their component grouping and thus they are internally consistent.

The convergent validity is established when average variance extracted is ≥ 0.5. The AVE values corresponding to the components hospitality, destination attraction, and relaxation are 0.53, 0.50, and 0.56 respectively. According to Fornell and Larcker (1981), AVE ≥ 0.5 confirms the convergent validity and it can be seen that all the AVE values in Table 5 are greater or equal to 0.5. The composite reliability value for component 1, 2, and 3 are 0.82, 0.79, and 0.79 respectively. It evidences the internal consistency in scale items.

Table 5. Reliability, Average Variance Extracted (AVE) and Composite Reliability (CR)

factor analysis journal research

  • Tables index View option Full Size Previous Table

5. Conclusion

The goal of this study was to examine on the factor analysis of a questionnaire to identify main factors that measure tourist satisfaction. The likelihood to use factor analysis for the data set is explored with the threshold values of determinant score, Kaiser-Meyer-Olkin and Bartlett’s test of Sphericity. Based on the results of this study, it can be concluded that factor analysis is a promising approach to extract significant factors to explain the maximum variability of the group under study.

The hospitality, destination attraction, and relaxation are the major factors extracted using principal component analysis and varimax orthogonal factor rotation method to measure satisfaction of tourists. The application of factor analysis provides very valuable inputs to the decision makers and policy makers to focus only on the few manageable factors rather than a large number of parameters. The findings of the study cannot be generalized for the large population so advanced study can be done taking more sample size with probability sampling methods. Nevertheless, before making stronger decision on the tourist satisfaction factors to promote tourism of a country, further research is required to analyze in detail.

[1]  Tabachnick, B.G. and Fidell, L.S., , Pearson, 2013.
In article      
 
[2]  Verma, J. and Abdel-Salam, A., John Willey & Sons Inc., 2019.
In article      
 
[3]  Ho, R., Chapman & Hall/CRC, Boca Raton, 2006.
In article      
 
[4]  Hair, J.F., Anderson, R.E., Tatham, R.L., and Black, W.C., , N J: Prentice-Hall, Upper Saddle River, 1998.
In article      
 
[5]  Pituch, K. A. and Stevens, J., Taylor & Francis, New York, 2016.
In article      
 
[6]  Hair, J. J., Black, W.C., Babin, B. J., Anderson, R. R., Tatham, R. L., Upper Saddle River, New Jersey, 2006.
In article      
 
[7]  Pallant, J., Open University Press/ Mc Graw-Hill, Maidenhead, 2010.
In article      
 
[8]  Cerny, C.A. and Kaiser, H.F, “A study of a measure of sampling adequacy for factor analytic correlation matrices,” 12 (1). 43-47. 1977.
In article           
 
[9]  Dziuban, C.D. and Shirkey, E.C., “When is a correlation matrix appropriate for factor analysis?” , 81, 358-361. 1974.
In article      
 
[10]  Bartlett, M. S., “The effect of standardization on a Chi-square approximation in factor analysis,” 38(3/4), 337-344. 1951.
In article      
 
[11]  MacCallum, R.C., Widaman, K. F., Zhang, S. and Hong, S. “Sample size in factor analysis,” 4(1), 84-99. 1999.
In article      
 
[12]  Dhakal, B., “Using factor analysis for residents’ attitudes towards economic impact of tourism in Nepal,” 7(5), 250-257. 2017.
In article      
 
[13]  Snedecor, G. W. and Cochran, W.G., , Iowa State University Press, Iowa, 1989.
In article      
 
[14]  Lavrakas, P.J., SAGE Publications, Thousand Oaks, 2008.
In article      
 
[15]  Fornell, C., and Larcker, D.F., “Evaluating structural equation models with unobservable variables and measurement error,” 18(1), 39-50. 1981.
In article      
 
[16]  Hair, J., Hult G.T.M., Ringle, C., Sarstedt, M., Sage Publications, Los Angeles, 2014.
In article      
 
[17]  Netemeyer, R. G., Bearden, W. O. and Sharma, S., Sage Publications, Thousand Oaks, Calif, 2003.
In article           
 
[18]  Stevens, J., Lawrence Erlbaum Associates, Mahwah, NJ, 1996.
In article      
 
[19]  Shrestha, N., “Detecting multicollinearity in regression analysis,” , 8(2), 39-42, 2020.
In article      
 
[20]  Haitovsky, Y., “Multicollinearity in regression analysis: A comment,” 51(4), 486-489. 1969.
In article      
 
[21]  Field, A., SAGE, London, 2009.
In article      
 
[22]  Guttman, L., “Some necessary conditions for common-factor analysis,” 19, 149-161. 1954.
In article      
 
[23]  Kaiser, H.F., “A second generation little jiffy,” 35, 401-415. 1970.
In article      
 
[24]  Tucker, L. R., MacCallum, R.C., [E-book], available: net Library e-book.
In article      
 
[25]  Verma, J., Springer, India, 2013.
In article      
 
[26]  Thompson, B.,
In article      
 
[27]  Cattell, R.B., “The scree test for the number of factors,” 1, 245-276. 1966.
In article           
 
[28]  Cattel, R.B., Greenwood Press, Westport, CT, 1973.
In article      
 
[29]  Kaiser, H.F., “The varimax criterion for analytic rotation in factor analysis,” 23, 187-200. 1958.
In article      
 
[30]  MoCTCA, Ministry of Culture, Tourism & Civil Aviation, Government of Nepal, 2019.
In article      
 
[31]  Pett, M.J., Lackey, N.R., Sullivan, J.J., SAGE, Thousand Oaks, CA, 2003.
In article      
 

Published with license by Science and Education Publishing, Copyright © 2021 Noora Shrestha

Creative Commons

Cite this article:

Normal style, chicago style.

  • Google-plus

CiteULike

  • View in article Full Size Figure
  • View in article Full Size
  • Privacy Policy

Research Method

Home » Factor Analysis – Steps, Methods and Examples

Factor Analysis – Steps, Methods and Examples

Table of Contents

Factor Analysis

Factor Analysis

Definition:

Factor analysis is a statistical technique that is used to identify the underlying structure of a relatively large set of variables and to explain these variables in terms of a smaller number of common underlying factors. It helps to investigate the latent relationships between observed variables.

Factor Analysis Steps

Here are the general steps involved in conducting a factor analysis:

1. Define the Research Objective:

Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis.

2. Data Collection:

Gather the data on the variables of interest. These variables should be measurable and related to the research objective. Ensure that you have a sufficient sample size for reliable results.

3. Assess Data Suitability:

Examine the suitability of the data for factor analysis. Check for the following aspects:

  • Sample size: Ensure that you have an adequate sample size to perform factor analysis reliably.
  • Missing values: Handle missing data appropriately, either by imputation or exclusion.
  • Variable characteristics: Verify that the variables are continuous or at least ordinal in nature. Categorical variables may require different analysis techniques.
  • Linearity: Assess whether the relationships among variables are linear.

4. Determine the Factor Analysis Technique:

There are different types of factor analysis techniques available, such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Choose the appropriate technique based on your research objective and the nature of the data.

5. Perform Factor Analysis:

   a. Exploratory Factor Analysis (EFA):

  • Extract factors: Use factor extraction methods (e.g., principal component analysis or common factor analysis) to identify the initial set of factors.
  • Determine the number of factors: Decide on the number of factors to retain based on statistical criteria (e.g., eigenvalues, scree plot) and theoretical considerations.
  • Rotate factors: Apply factor rotation techniques (e.g., varimax, oblique) to simplify the factor structure and make it more interpretable.
  • Interpret factors: Analyze the factor loadings (correlations between variables and factors) to interpret the meaning of each factor.
  • Determine factor reliability: Assess the internal consistency or reliability of the factors using measures like Cronbach’s alpha.
  • Report results: Document the factor loadings, rotated component matrix, communalities, and any other relevant information.

   b. Confirmatory Factor Analysis (CFA):

  • Formulate a theoretical model: Specify the hypothesized relationships among variables and factors based on prior knowledge or theoretical considerations.
  • Define measurement model: Establish how each variable is related to the underlying factors by assigning factor loadings in the model.
  • Test the model: Use statistical techniques like maximum likelihood estimation or structural equation modeling to assess the goodness-of-fit between the observed data and the hypothesized model.
  • Modify the model: If the initial model does not fit the data adequately, revise the model by adding or removing paths, allowing for correlated errors, or other modifications to improve model fit.
  • Report results: Present the final measurement model, parameter estimates, fit indices (e.g., chi-square, RMSEA, CFI), and any modifications made.

6. Interpret and Validate the Factors:

Once you have identified the factors, interpret them based on the factor loadings, theoretical understanding, and research objectives. Validate the factors by examining their relationships with external criteria or by conducting further analyses if necessary.

Types of Factor Analysis

Types of Factor Analysis are as follows:

Exploratory Factor Analysis (EFA)

EFA is used to explore the underlying structure of a set of observed variables without any preconceived assumptions about the number or nature of the factors. It aims to discover the number of factors and how the observed variables are related to those factors. EFA does not impose any restrictions on the factor structure and allows for cross-loadings of variables on multiple factors.

Confirmatory Factor Analysis (CFA)

CFA is used to test a pre-specified factor structure based on theoretical or conceptual assumptions. It aims to confirm whether the observed variables measure the latent factors as intended. CFA tests the fit of a hypothesized model and assesses how well the observed variables are associated with the expected factors. It is often used for validating measurement instruments or evaluating theoretical models.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that can be considered a form of factor analysis, although it has some differences. PCA aims to explain the maximum amount of variance in the observed variables using a smaller number of uncorrelated components. Unlike traditional factor analysis, PCA does not assume that the observed variables are caused by underlying factors but focuses solely on accounting for variance.

Common Factor Analysis

It assumes that the observed variables are influenced by common factors and unique factors (specific to each variable). It attempts to estimate the common factor structure by extracting the shared variance among the variables while also considering the unique variance of each variable.

Hierarchical Factor Analysis

Hierarchical factor analysis involves multiple levels of factors. It explores both higher-order and lower-order factors, aiming to capture the complex relationships among variables. Higher-order factors are based on the relationships among lower-order factors, which are in turn based on the relationships among observed variables.

Factor Analysis Formulas

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Here are some of the essential formulas and calculations used in factor analysis:

Correlation Matrix :

The first step in factor analysis is to create a correlation matrix, which calculates the correlation coefficients between pairs of variables.

Correlation coefficient (Pearson’s r) between variables X and Y is calculated as:

r(X,Y) = Σ[(xi – x̄)(yi – ȳ)] / [n-1] σx σy

where: xi, yi are the data points, x̄, ȳ are the means of X and Y respectively, σx, σy are the standard deviations of X and Y respectively, n is the number of data points.

Extraction of Factors :

The extraction of factors from the correlation matrix is typically done by methods such as Principal Component Analysis (PCA) or other similar methods.

The formula used in PCA to calculate the principal components (factors) involves finding the eigenvalues and eigenvectors of the correlation matrix.

Let’s denote the correlation matrix as R. If λ is an eigenvalue of R, and v is the corresponding eigenvector, they satisfy the equation: Rv = λv

Factor Loadings :

Factor loadings are the correlations between the original variables and the factors. They can be calculated as the eigenvectors normalized by the square roots of their corresponding eigenvalues.

Communality and Specific Variance :

Communality of a variable is the proportion of variance in that variable explained by the factors. It can be calculated as the sum of squared factor loadings for that variable across all factors.

The specific variance of a variable is the proportion of variance in that variable not explained by the factors, and it’s calculated as 1 – Communality.

Factor Rotation : Factor rotation, such as Varimax or Promax, is used to make the output more interpretable. It doesn’t change the underlying relationships but affects the loadings of the variables on the factors.

For example, in the Varimax rotation, the objective is to minimize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which leads to more high and low loadings, making the factor easier to interpret.

Examples of Factor Analysis

Here are some real-time examples of factor analysis:

  • Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.
  • Market Research: In marketing, factor analysis can be used to understand consumers’ preferences and behaviors. For instance, by analyzing survey data related to product features, pricing, and brand perception, researchers can identify factors such as price sensitivity, brand loyalty, and product quality that influence consumer decision-making.
  • Finance and Economics: Factor analysis is widely used in portfolio management and asset pricing models. By analyzing historical market data, factors such as market returns, interest rates, inflation rates, and other economic indicators can be identified. These factors help in understanding and predicting investment returns and risk.
  • Social Sciences: Factor analysis is employed in social sciences to explore underlying constructs in complex datasets. For example, in education research, factor analysis can be used to identify dimensions such as academic achievement, socio-economic status, and parental involvement that contribute to student success.
  • Health Sciences: In medical research, factor analysis can be utilized to identify underlying factors related to health conditions, symptom clusters, or treatment outcomes. For instance, in a study on mental health, factor analysis can be used to identify underlying factors contributing to depression, anxiety, and stress.
  • Customer Satisfaction Surveys: Factor analysis can help businesses understand the key drivers of customer satisfaction. By analyzing survey responses related to various aspects of product or service experience, factors such as product quality, customer service, and pricing can be identified, enabling businesses to focus on areas that impact customer satisfaction the most.

Factor analysis in Research Example

Here’s an example of how factor analysis might be used in research:

Let’s say a psychologist is interested in the factors that contribute to overall wellbeing. They conduct a survey with 1000 participants, asking them to respond to 50 different questions relating to various aspects of their lives, including social relationships, physical health, mental health, job satisfaction, financial security, personal growth, and leisure activities.

Given the broad scope of these questions, the psychologist decides to use factor analysis to identify underlying factors that could explain the correlations among responses.

After conducting the factor analysis, the psychologist finds that the responses can be grouped into five factors:

  • Physical Wellbeing : Includes variables related to physical health, exercise, and diet.
  • Mental Wellbeing : Includes variables related to mental health, stress levels, and emotional balance.
  • Social Wellbeing : Includes variables related to social relationships, community involvement, and support from friends and family.
  • Professional Wellbeing : Includes variables related to job satisfaction, work-life balance, and career development.
  • Financial Wellbeing : Includes variables related to financial security, savings, and income.

By reducing the 50 individual questions to five underlying factors, the psychologist can more effectively analyze the data and draw conclusions about the major aspects of life that contribute to overall wellbeing.

In this way, factor analysis helps researchers understand complex relationships among many variables by grouping them into a smaller number of factors, simplifying the data analysis process, and facilitating the identification of patterns or structures within the data.

When to Use Factor Analysis

Here are some circumstances in which you might want to use factor analysis:

  • Data Reduction : If you have a large set of variables, you can use factor analysis to reduce them to a smaller set of factors. This helps in simplifying the data and making it easier to analyze.
  • Identification of Underlying Structures : Factor analysis can be used to identify underlying structures in a dataset that are not immediately apparent. This can help you understand complex relationships between variables.
  • Validation of Constructs : Factor analysis can be used to confirm whether a scale or measure truly reflects the construct it’s meant to measure. If all the items in a scale load highly on a single factor, that supports the construct validity of the scale.
  • Generating Hypotheses : By revealing the underlying structure of your variables, factor analysis can help to generate hypotheses for future research.
  • Survey Analysis : If you have a survey with many questions, factor analysis can help determine if there are underlying factors that explain response patterns.

Applications of Factor Analysis

Factor Analysis has a wide range of applications across various fields. Here are some of them:

  • Psychology : It’s often used in psychology to identify the underlying factors that explain different patterns of correlations among mental abilities. For instance, factor analysis has been used to identify personality traits (like the Big Five personality traits), intelligence structures (like Spearman’s g), or to validate the constructs of different psychological tests.
  • Market Research : In this field, factor analysis is used to identify the factors that influence purchasing behavior. By understanding these factors, businesses can tailor their products and marketing strategies to meet the needs of different customer groups.
  • Healthcare : In healthcare, factor analysis is used in a similar way to psychology, identifying underlying factors that might influence health outcomes. For instance, it could be used to identify lifestyle or behavioral factors that influence the risk of developing certain diseases.
  • Sociology : Sociologists use factor analysis to understand the structure of attitudes, beliefs, and behaviors in populations. For example, factor analysis might be used to understand the factors that contribute to social inequality.
  • Finance and Economics : In finance, factor analysis is used to identify the factors that drive financial markets or economic behavior. For instance, factor analysis can help understand the factors that influence stock prices or economic growth.
  • Education : In education, factor analysis is used to identify the factors that influence academic performance or attitudes towards learning. This could help in developing more effective teaching strategies.
  • Survey Analysis : Factor analysis is often used in survey research to reduce the number of items or to identify the underlying structure of the data.
  • Environment : In environmental studies, factor analysis can be used to identify the major sources of environmental pollution by analyzing the data on pollutants.

Advantages of Factor Analysis

Advantages of Factor Analysis are as follows:

  • Data Reduction : Factor analysis can simplify a large dataset by reducing the number of variables. This helps make the data easier to manage and analyze.
  • Structure Identification : It can identify underlying structures or patterns in a dataset that are not immediately apparent. This can provide insights into complex relationships between variables.
  • Construct Validation : Factor analysis can be used to validate whether a scale or measure accurately reflects the construct it’s intended to measure. This is important for ensuring the reliability and validity of measurement tools.
  • Hypothesis Generation : By revealing the underlying structure of your variables, factor analysis can help generate hypotheses for future research.
  • Versatility : Factor analysis can be used in various fields, including psychology, market research, healthcare, sociology, finance, education, and environmental studies.

Disadvantages of Factor Analysis

Disadvantages of Factor Analysis are as follows:

  • Subjectivity : The interpretation of the factors can sometimes be subjective, depending on how the data is perceived. Different researchers might interpret the factors differently, which can lead to different conclusions.
  • Assumptions : Factor analysis assumes that there’s some underlying structure in the dataset and that all variables are related. If these assumptions do not hold, factor analysis might not be the best tool for your analysis.
  • Large Sample Size Required : Factor analysis generally requires a large sample size to produce reliable results. This can be a limitation in studies where data collection is challenging or expensive.
  • Correlation, not Causation : Factor analysis identifies correlational relationships, not causal ones. It cannot prove that changes in one variable cause changes in another.
  • Complexity : The statistical concepts behind factor analysis can be difficult to understand and require expertise to implement correctly. Misuse or misunderstanding of the method can lead to incorrect conclusions.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Grounded Theory

Grounded Theory – Methods, Examples and Guide

Content Analysis

Content Analysis – Methods, Types and Examples

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Phenomenology

Phenomenology – Methods, Examples and Guide

Symmetric Histogram

Symmetric Histogram – Examples and Making Guide

Bimodal Histogram

Bimodal Histogram – Definition, Examples

  • Search Menu
  • Sign in through your institution
  • Chemical Biology and Nucleic Acid Chemistry
  • Computational Biology
  • Critical Reviews and Perspectives
  • Data Resources and Analyses
  • Gene Regulation, Chromatin and Epigenetics
  • Genome Integrity, Repair and Replication
  • Nucleic Acid Enzymes
  • RNA and RNA-protein complexes
  • Synthetic Biology and Bioengineering
  • Molecular and Structural Biology
  • Advance Articles
  • Breakthrough Articles
  • Molecular Biology Database Collection
  • Special Collections
  • Scope and Criteria for Consideration
  • Author Guidelines
  • Data Deposition Policy
  • Database Issue Guidelines
  • Web Server Issue Guidelines
  • Submission Site
  • About Nucleic Acids Research
  • Editors & Editorial Board
  • Information of Referees
  • Self-Archiving Policy
  • Dispatch Dates
  • Advertising and Corporate Services
  • Journals Career Network
  • Journals on Oxford Academic
  • Books on Oxford Academic

Article Contents

Introduction, materials and methods, data availability, supplementary data, acknowledgements.

  • < Previous

Ribosomal protein RPL39L is an efficiency factor in the cotranslational folding of a subset of proteins with alpha helical domains

ORCID logo

The first three authors should be regarded as Joint First Authors.

  • Article contents
  • Figures & tables
  • Supplementary Data

Arka Banerjee, Meric Ataman, Maciej Jerzy Smialek, Debdatto Mookherjee, Julius Rabl, Aleksei Mironov, Lea Mues, Ludovic Enkler, Mairene Coto-Llerena, Alexander Schmidt, Daniel Boehringer, Salvatore Piscuoglio, Anne Spang, Nitish Mittal, Mihaela Zavolan, Ribosomal protein RPL39L is an efficiency factor in the cotranslational folding of a subset of proteins with alpha helical domains, Nucleic Acids Research , Volume 52, Issue 15, 27 August 2024, Pages 9028–9048, https://doi.org/10.1093/nar/gkae630

  • Permissions Icon Permissions

Increasingly many studies reveal how ribosome composition can be tuned to optimally translate the transcriptome of individual cell types. In this study, we investigated the expression pattern, structure within the ribosome and effect on protein synthesis of the ribosomal protein paralog 39L (RPL39L). With a novel mass spectrometric approach we revealed the expression of RPL39L protein beyond mouse germ cells, in human pluripotent cells, cancer cell lines and tissue samples. We generated RPL39L knock-out mouse embryonic stem cell (mESC) lines and demonstrated that RPL39L impacts the dynamics of translation, to support the pluripotency and differentiation, spontaneous and along the germ cell lineage. Most differences in protein abundance between WT and RPL39L KO lines were explained by widespread autophagy. By CryoEM analysis of purified RPL39 and RPL39L-containing ribosomes we found that, unlike RPL39, RPL39L has two distinct conformations in the exposed segment of the nascent peptide exit tunnel, creating a distinct hydrophobic patch that has been predicted to support the efficient co-translational folding of alpha helices. Our study shows that ribosomal protein paralogs provide switchable modular components that can tune translation to the protein production needs of individual cell types.

Graphical Abstract

Protein synthesis is carried out by the ribosome, a highly conserved molecular machine with the same basic architecture in all free living organisms. In mammals, the small, 40S ribosomal subunit contains the 18S ribosomal RNA (rRNA) and 33 ribosomal proteins (RPs), while the large 60S subunit contains 46 RPs along with the 5S, 5.8S and 28S rRNAs. Gene duplications gave rise to RP paralogs ( 1 ), some with evolutionarily-conserved tissue-specific patterns of expression ( 2 ). For instance, Rps27 and Rps27l have both been found to target p53 through their interaction with E3 ubiquitin ligase Mdm2 ( 3 ). Rpl22 and Rpl22l have been shown to regulate the splicing of pre-mRNA during morphogenesis, but also to target each other's transcript for degradation, as a simple mechanism to maintain stable protein levels within the cells ( 4 ). Rpl3 is necessary for myotube formation and growth ( 5 ), while its paralog Rpl3l is important for cardiac muscle contraction ( 6 , 7 ). A particular subset of RP paralogs resulted from the retrotransposition of X-chromosome-located RPs on autosomal chromosomes. These include RPL36AL, RPL10L and RPL39L ( 8 ), as well as RPS4 ( 9 ). These RPs have strong expression bias for the male germ cell lineage, which would allow them to compensate for their respective X chromosome-encoded paralogs upon meiotic sex chromosome inactivation. Indeed, this function was demonstrated for RPL10L ( 10 ). RPL39L is a recently evolved ( 1 ) and non-redundant paralog of RPL39L that has just been implicated in the translation of long-lived, sperm cell-specific proteins ( 11 ). However, the RPL39L mRNA was also observed outside of the germ cell lineage, particularly in ovarian ( 12 ) and breast cancer tissues ( 2 ), as well as in lung cancer ( 13 ) and neuroblastoma ( 11 ) cell lines, where the expression appears to be driven by gene amplifications ( 14 ) and CpG island hypomethylation ( 13 ). These observations suggest that RPL39L’s function extends beyond the translation of long-lived sperm cell proteins. Unraveling this function has been challenging. RPL39L differs from RPL39 by only 4 or 3 amino acids in human and mouse, respectively, explaining why antibodies that can distinguish RPL39L from RPL39 are still lacking. In addition, the high arginine/lysine content of RPL39L leads to the almost complete digestion of the protein during standard sample preparation for mass spectrometry, probably restricting its detection to cell types with very high expression ( 11 ). As pure populations of RPL39L-containing ribosomes have not been obtained so far ( 11 ), how RPL39/RPL39L ribosomes differ has also remained unclear.

Our study aimed to determine the role of RPL39/RPL39L ribosome heterogeneity across mammalian cell types. We took advantage of mouse embryonic stem cells (mESC), a cell type with native expression of RPL39L, to generate RPL39L-deficient mESC lines by CRISPR/Cas9 genome editing. We then characterized their gene expression and capacity to differentiate both spontaneously and towards the sperm cell lineage. Ribosome footprinting along with mass spectrometric analysis in the presence and absence of protein degradation inhibitors revealed that RPL39L supports the translation of a heterogeneous collection of proteins. Many of these are involved in cell motility and polarization, thus explaining the critical role of RPL39L in spermatogenesis. As obtaining pure populations of RPL39L ribosomes from mice remains challenging, to analyze the impact of RPL39/RPL39L on ribosome structure we turned to the yeast, which has only the RPL39 gene. We complemented RPL39 -deficient yeast cells with either mouse RPL39 or RPL39L , and analyzed purified RPL39 and RPL39L ribosomes. While mouse and yeast RPL39 occupy virtually identical positions in the ribosome exit tunnel, RPL39L was found in two distinct conformations, one very similar and the other distinct from RPL39. The alternative conformation creates a hydrophobic patch in the vestibular region of the nascent peptide exit tunnel (NPET), which was previously postulated to be necessary for the folding of amphipathic ɑ-helices ( 15 ). Our results provide an example of paralogous RPs supporting the generation of ribosomes with distinct biophysical properties. In particular, the flexibility conferred to the peptide exit tunnel of the ribosome by RPL39L relative to RPL39 appears to be important for the folding and stability of proteins with long helical domains, many of which are abundant in the male germ cells, but can be more generally described as relevant for cell motility and polarization.

Quantification of RP gene expression in public RNA-seq data

Raw gene expression counts generated by STAR ( 16 ) for all 11 274 samples of the TCGA projects were downloaded from the GDC portal ( https://portal.gdc.cancer.gov/ ). In addition, STAR 2-PASS genomic alignments (BAM format) of short reads from 1226 breast cancer and solid normal tissue samples of the TCGA-BRCA project were obtained from the GDC portal (accession number phs000178.v11.p8). Short reads from a single cells study of embryogenesis ( 17 ) were downloaded in .fastq format from the NCBI Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra ) and aligned with STAR v2.7.10b in 2-PASS mode. Normalized gene TPM values across cell types and tissues estimated from pseudo-bulk scRNA-seq data ( https://www.proteinatlas.org/download/rna_single_cell_type_tissue.tsv.zip ) and cell-level gene count data ( https://www.proteinatlas.org/download/rna_single_cell_read_count.zip ) were downloaded from the HPA portal ( 18 ).

From 90 RPs expressed in human cells, a subset of 4 core RPs (RPS9, RPS14, RPL4, RPL32) was stringently selected to satisfy the following criteria: (i) RPs are deeply embedded into the rRNA ( 19 ), (ii) bind early to ribosomal subunits during synthesis and assembly ( 20 ), (iii) are present in other taxonomic groups apart from eukaryotes ( 21 ), (iv) do not have strong paralogs in the human genome with DIOPT score >2 and ‘high’ DIOPT rank ( 22 ), (v) do not display strong evidence of tissue-specific expression ( 2 ). RPs with these features are more likely to be essential for cell survival ( Supplementary Figure S1A ), as indicated by significantly lower scores in CRISPR ( 23 ) and RNAi screening ( 24 ) (data available at the DepMap portal https://depmap.org/portal/ ).

mRNAs encoding RP genes were extracted from the v36 version of the GENCODE comprehensive annotation of the hg38 human genome assembly ( 25 ). RNA-SeQC ( 26 ) was utilized to obtain raw gene counts from uniquely mapped reads (MAPQ = 255) as well as from all reads, including multi-mapped ones (MAPQ>=0), in bulk RNA-seq data from the TCGA-BRCA project and scRNA-seq data from the embryogenesis study ( 17 ). Gene counts of RPL39L and of core RPs comprised only a small fraction (<7%) of multi-mapped reads in breast cancer and normal tissue samples ( Supplementary Figure S1B ), consistent with these RPs having a relatively low number of processed pseudogenes ( 27 ). Gene lengths of the RPs were estimated as average transcript lengths weighted by TPM values. StringTie v.2.2.1 ( 28 ) was utilized to identify mRNAs and quantify their TPM values across TCGA-BRCA samples. Median gene lengths of RPL39L , RPL39 , and core RPs were checked and found to have only minor differences between cancer and normal samples, indicating minimal isoform variation ( Supplementary Figure S1C ). The arithmetic means of these median values were further used as gene length estimates.

For RPL39L, RPL39 and core RPs, reads-per-kilobase (RPK) values were estimated from public bulk RNA-seq data from the TCGA project and scRNA-seq data from the embryogenesis study ( 17 ) as raw gene counts with pseudocount = 1 divided by the gene length. RPL39-to-core RP and RPL39L-to-core RP ratios were calculated as the ratios of RPL39 (respectively, RPL39L) RPK values relative to the median of core RP RPK values.

Cell culture

In the absence of feeder cells, the WT E14 mESC line was cultured in Dulbecco's Modified Eagle Media (DMEM) (Gibco 31 966), which contained 20% fetal bovine serum (FBS; Gibco 16 141 079) tested for optimal mESC growth, NEAA (Gibco 11 140 050), sodium pyruvate (Gibco 11 360 039), 100 U/mL LIF (Millipore ESG1107) and 0.1 mM 2-ß-mercaptoethanol (Merck ES-007-E), on 0.2% gelatin-coated plates. The culture medium was changed daily, and the cells were passaged every second or third day. Cells were cultured at 37°C in 5% CO2.

RPL39L CRISPR in E14.Tg2a (E14) mouse embryonic stem cells

All transfections for the CRISPR knockout were performed using the lipofectamine 2000 reagent (Life technologies), according to manufacturer's instructions. SgRNAs for the RPL39L gene were designed in pairs from the sequences upstream and downstream of the 5′ and 3′UTRs, respectively. Upstream sgRNAs were cloned into the px330 plasmid backbone with an mCherry marker, and downstream sgRNAs were cloned in the px330 backbone with a GFP marker. The cells positive for both mCherry and GFP expression were FACS-sorted. For single cell colony selection, FACS sorted cells were diluted to a concentration of 0.75 cells per 100 μl in the ESC culture media. 100 μl of this solution was added to each well of a 96-well plate. The cells were allowed to grow for 2 days before wells with single clones were marked. Forward primers for CRISPR KO validation were designed upstream of the upstream sgRNAs and the reverse primers were designed downstream of the downstream sgRNAs (see Supplementary Table S1 ). In clones with homozygous KO the PCR product should consist of a single band, migrating lower than the wild type band (692 and 587 bp in KO, compared to 1157bp in WT) when run on a 1.3% agarose gel. The bands were excised and gel purified using the Qiagen Gel purification Kit, according to manufacturer's instructions. The purified DNA was cloned into a pUC19 plasmid backbone using Zero Blunt Topo kit (ThermoFischer) according to manufacturer's instructions. This plasmid was sequenced to validate the complete excision of RPL39L .

LC–MS analysis

Sample preparation.

Cells were collected and lysed in 50 μl lysis buffer (2 M guanidinium–HCl, 0.1 M HEPES, 5 mM TCEP, pH 8.3) using strong ultra-sonication (10 cycles, Bioruptor, Diagnode). Protein concentration was determined by BCA assay (Thermo Fisher Scientific 23 227) using a small sample aliquot. Sample aliquots containing 50 μg of total proteins were supplemented with lysis buffer to 50 μl, reduced for 10 min at 95°C and alkylated at 10 mM iodoacetamide for 30 min at 25°C followed by adding N -acetyl-cysteine to a final concentration of 12.5 mM to quench any iodoacetamide excess. For global proteomics analyses, proteins were directly digested by incubation with sequencing-grade modified trypsin (1/50, w/w; Promega, Madison, Wisconsin) overnight at 37°C. For targeted analysis of RPL39 and RPL39L, a mixture containing 100 fmol of heavy reference peptides were added to the samples. Then, protein samples were propionylated by adding N -(propionyloxy)-succinimide (0.15 M in DMSO) to a final concentration of 22.5 mM and incubating for 2 h at 25°C with shaking at 500 rpm. This step prevents trypsin from cleaving after lysine residues and leads to the generation of larger peptides suited for LC–MS analysis for our target proteins. To quench the labeling reaction, 1.5 μl aqueous 1.5 M hydroxylamine solution was added and samples were incubated for another 10 min at 25°C shaking at 500 rpm. Subsequently, the pH of the samples was increased to 11.9 by adding 1 M potassium phosphate buffer (pH 12) and incubated for 20 min at 25°C shaking at 500 rpm to remove propionic acid linked to peptide hydroxyl groups. The reaction was stopped by adding 2 M hydrochloric acid until a pH < 2 was reached. After adding 70 μl of a 1 M TEAB (pH 8.5), proteins were digested by incubation with sequencing-grade modified trypsin (1/50, w/w; Promega, Madison, Wisconsin) overnight at 37°C. For all samples, the generated peptides were cleaned up using iST cartridges (PreOmics, Munich, Germany) according to the manufacturer's instructions. Samples were dried under vacuum and stored at −80°C until further use.

Global proteomics LC–MS analysis

Dried peptides were resuspended in 0.1% aqueous formic acid and subjected to LC–MS/MS analysis using a Q Exactive HF Mass Spectrometer fitted with an EASY-nLC 1000 (both Thermo Fisher Scientific) and a custom-made column heater set to 60°C. Peptides were resolved using a RP-HPLC column (75μm × 30cm) packed in-house with C18 resin (ReproSil-Pur C18-AQ, 1.9 μm resin; Dr. Maisch GmbH) at a flow rate of 0.2 μl min –1 . The following gradient was used for peptide separation: from 5% B to 15% B over 10 min to 30% B over 60 min to 45% B over 20 min to 95% B over 2 min followed by 18 min at 95% B. Buffer A was 0.1% formic acid in water and buffer B was 80% acetonitrile, 0.1% formic acid in water.

The mass spectrometer was operated in DDA mode with a total cycle time of approximately 1 s. Each MS1 scan was followed by high-collision-dissociation (HCD) of the 10 most abundant precursor ions with dynamic exclusion set to 30 seconds. For MS1, 3e6 ions were accumulated in the Orbitrap over a maximum time of 100 ms and scanned at a resolution of 120 000 FWHM (at 200 m/z). MS2 scans were acquired at a target setting of 1e5 ions, maximum accumulation time of 100 ms and a resolution of 30 000 FWHM (at 200 m / z ). Singly charged ions and ions with unassigned charge state were excluded from triggering MS2 events. The normalized collision energy was set to 35%, the mass isolation window was set to 1.1 m / z and one microscan was acquired for each spectrum.

Targeted LC–MS analysis of RPL39L

Parallel reaction-monitoring (PRM) assays ( 29 , 30 ) were generated from a mixture containing 25 fmol/μl of each proteotypic N -(propionyloxy)-succinimid labeled heavy reference peptide (SSHKTFR (RPL39 human and mouse), SSHKTFTIKR (RPL39L human), ASHKTFR (rpl39l mouse), JPT Peptide Technologies GmbH). 2 μl of this standard peptide mix were subjected to LC–MS/MS analysis using a Q Exactive plus Mass Spectrometer fitted with an EASY-nLC 1000 (both Thermo Fisher Scientific) and a custom-made column heater set to 60°C. Peptides were resolved using a EasySpray RP-HPLC column (75 μm × 25 cm, Thermo Fisher Scientific) and a pre-column setup at a flow rate of 0.2 μl/min. The mass spectrometer was operated in DDA mode. Each MS1 scan was followed by high-collision-dissociation (HCD) of the precursor masses of the imported isolation list and the 20 most abundant precursor ions with dynamic exclusion for 20 s. Total cycle time was approximately 1 s. For MS1, 3e6 ions were accumulated in the Orbitrap cell over a maximum time of 50 ms and scanned at a resolution of 70 000 FWHM (at 200 m / z ). MS1 triggered MS2 scans were acquired at a target setting of 1e5 ions, a resolution of 17 500 FWHM (at 200 m/z) and a mass isolation window of 1.4 Th. Singly charged ions and ions with unassigned charge state were excluded from triggering MS2 events. The normalized collision energy was set to 27% and one microscan was acquired for each spectrum.

The acquired raw-files were searched using the MaxQuant software (Version 1.6.2.3) against the same human and mouse database mentioned above using default parameters except protein, peptide and site FDR were set to 1 and Lys8, Arg10 and propyl (K) were added as variable modifications. The search results were imported into Skyline (v21.1.0.278) ( 31 ) to build a spectral library and assign the most intense transitions to each peptide. An unscheduled mass isolation list containing all peptide ion masses was exported and imported into the Q Exactive Plus operating software for PRM analysis. For PRM-MS analysis, peptide samples were resuspended in 0.1% aqueous formic acid. Due to the required protein propionylation for rpl39 and rpl39l LC–MS analysis, the heavy reference peptides were already spiked in at a concentration of 2 fmol of heavy reference peptides per 1 μg of total endogenous peptide mass during sample preparation (see above). The samples were subjected to LC–MS/MS analysis on the same LC–MS system described above using the following settings: The MS2 resolution of the orbitrap was set to 17 500/140 000 FWHM (at 200 m / z ) and the fill time to 50/500 ms for heavy/light peptides. AGC target was set to 3e6, the normalized collision energy was set to 27%, ion isolation window was set to 0.4 m / z and the first mass was fixed to 100 m / z . A MS1 scan at 35 000 resolution (FWHM at 200 m / z ), AGC target 3e6 and fill time of 50 ms was included in each MS cycle. All raw-files were imported into Skyline for protein/peptide quantification. To control for sample amount variations during sample preparation, the total ion chromatogram (only comprising precursor ions with two to five charges) of each sample was determined using Progenesis QI software (Nonlinear Dynamics (Waters), Version 2.0) and used for normalization of light (endogenous) peptide abundances.

Sample preparation for inhibition of protein degradation

E14 and RPL39L KO cells were treated for 5 h at 37°C and 5% CO 2 with 10 μM ( S )-MG132 (STEMCELL Technologies Catalog #73 264) and 5 μM Bafilomycin-A1 (STEMCELL Technologies Catalog #74 242) in the normal media described above. For LC–MS as well as for western blot analysis, cells were scraped and washed twice with warm DPBS, then lysed in a corresponding lysis buffer.

Spontaneous differentiation

Spontaneous differentiation of mESC lines was carried out in conventional mESC medium (as stated above) with 10% fetal bovine serum and no LIF ( 32 ). To avoid attachment, embryoid bodies (EBs) were generated by growing 750 000 cells in suspension for 6 days in non-adherent dishes (Greiner Bio-One 633 181). Spheroids were allowed to grow without disturbing the plate for the first 3 days. The medium was changed every second day afterwards. The EBs were harvested after 6 days with 25ml Pasteur pipettes and washed with 1× PBS thrice before further analysis.

Spermatogenic differentiation

Differentiation of mESCs toward sperm cells was done according to the protocol described in ref. ( 33 ). In short, E14 cells were harvested and plated as a hanging media drop on Petri dishes filled with PBS on the bottom of the plate, with 1250 ESCs per 25 μl in each drop. The resultant EBs were transferred onto Petri dishes (10–15 EB per dish) after 3 days in hanging drop culture. Neurobasal medium (Gibco 21 103 049), supplemented with B27 (Invitrogen 17 504 044) was used as the differentiation medium, according to the manufacturer's protocol 0.1 μM retinoic acid final concentration (Sigma R2625-50MG) was added to the culture medium one day after EB transfer, and the medium was replaced every two days to minimize deterioration. The contents of each plate were harvested after 4 days.

Polysome profiling and ribo-seq

For ribo-seq analysis, WT and RPL39L KO E14 mESCs were propagated in 5 × 15 cm Petri Dishes (Falcon, 353 025) per sample as described in the cell culture section. The medium was replenished three hours prior to cell collection at the confluency of 50–70%. Before harvesting, the cells were treated with 100μg/ml cycloheximide (CHX) (Sigma, G7698) for 15 minutes at 5% CO 2 , 37°C, to freeze the elongating ribosomes. The cells were harvested on ice in a cold room. Cells were washed twice with ice cold DPBS (Lonza, BE17-512Q) containing 100μg/ml CHX. Cells were scraped, collected, spun down, flash frozen and stored at −80°C.

The cell pellet was resuspended in 900 μl of ice-cold polysome lysis buffer (20 mM Tris–HCl (Sigma, T294), pH 7.5; 100 mM NaCl (Sigma, 71 386); 10 mM MgCl 2 (Sigma, 63 069); 1% Triton X100 (Sigma, T8787); freshly added 2 mM DTT (Sigma, 646 563); 100 μg/ml cycloheximide (Sigma, G7698); 400U of RNAsin plus RNase inhibitor (Promega, N261B); 20U of Turbo DNase (Ambion, AM2238) and Complet mini, EDTA-free protease inhibitors (Roche, 11 836 170 001)) by pipetting up and down. After resuspending the pellet, the sample was incubated for 5 minutes at 4°C with continuous rotation (50 rpm), passed through a 23G needle (Braun, 4 657 640) for minimum 10 times, followed by additional 5 min incubation at 4°C with continuous rotation (50 rpm). The cell lysate was clarified by centrifugation at 3000 g/3 min/4°C followed by centrifugation of supernatant from the first step at 10 000g/5min/4°C. 50 μl of clarified lysate from each sample was kept aside for mRNAseq library preparation and was snap frozen and stored at −80°C. The optical density (OD) of remaining lysate was measured at A 260 with a NanoDrop2000. For ribo-seq analysis, the lysate equivalent to OD ( A 260 ) = 7 was subjected to RNase I digestion (5 U per OD, 35 U in total; Invitrogen, AM2294) at 22°C/20 min with continuous rotation at 1000 rpm in thermoblock (Eppendorf). RNase I was inactivated by adding 10 μl of RNase inhibitor SuperaseIN (Invitrogen, AM2696) in each reaction. An equal amount of undigested lysate was also used for the polysome profile.

10–50% linear sucrose gradient was prepared using a Gradient Master instrument (Biocomp) according to the manufacturer's instructions. In brief, 10 and 50% sucrose (Sigma, 84 100) solution was prepared in buffer containing 50 mM Tris–HCl, pH 7.5 (Sigma, T294); 50 mM NH 4 Cl (Sigma, 09 718); 12 mM MgCl 2 (Sigma, 63 069); 100 μg/ml CHX (Sigma, G7698); 0.5 mM DTT (Sigma, 646 563) and 10 μl SuperaseIN (Invitrogen, AM2696). A 14 × 89 mm tube (Beckman Coulter, 331 372) (used in rotor SW-41/TH-641) (Beckmann Coulter) with a long cap that can hold 800 μl of sample was used to prepare the gradient. First 10% solution was laid in the tube followed by 50% solution that was under-laid with the help of a long syringe. The gradient was prepared by using a pre-program for 10% to 50% sucrose gradient in Gradient master 108 instrument (Biocomp). The gradient was cooled down at 4°C by keeping it in the fridge for a minimum of one hour. Undigested (for polysome profile) and digested samples (for ribo-seq) were loaded onto pre-cooled 10–50% sucrose gradient and centrifuged at 35 000 rpm for 3 h at 4°C in a SW-41Ti rotor (Beckmann Coulter). Finally, all gradients were monitored at wavelength A 254 and fractionated using Piston Gradient Fractionator (Biocomp). 30 separate fractions of 0.37 ml were collected in 1.5 ml Eppendorf tubes for each digested and undigested ribosome profiles using a Gilson collector attached with the fractionator.

The appropriate fractions containing 80S monosomes were processed for ribo-seq library preparation by combining the protocol from ( 34 , 35 ). In brief, RNA was isolated from the appropriate monosomes fraction by using the hot phenol method. RNA fragments of appropriate size (28–32 nt) were obtained by running samples on 15% polyacrylamide denaturing TBE-Urea gel and visualized by SYBR Gold dye (Life Technologies). Size selected RNA was dephosphorylated by T4 polynucleotide kinase (PNK, New England Biolabs, B0201S) treatment for 1 h at 37ºC. PNK was heat inactivated and RNA was purified using phenol chloroform method and overnight precipitation of RNA in ethanol. An RNA amount equivalent to 12 ng was used to prepare the sequencing library using SMARTer® smRNA-Seq Kit for Illumina® (Takara 635 031) according to the kit manual till cDNA synthesis. After cDNA synthesis, rRNA was depleted by using the protocol and probe defined in ( 34 ), followed by final PCR according to Smarter kit with multiplexing barcodes. The PCR product was purified on 8% polyacrylamide native TBE gel and sequenced on a NextSeq 500 instrument at the genomic facility Basel.

The translation rate of cells was calculated from polysome profiles, as the area under the curve corresponding to polysomes divided by the area under the curve corresponding to the monosome (80S). This ratio was calculated in every sample/clone and compared to the WT control. The unpaired one-sided t-test was used to determine whether the KO clones exhibit significantly increased rate of translation relative to WT.

RNA-seq sample preparation

Cells were maintained as described in the cell culture section. RNA was isolated using AccuPure Cell/Blood RNA Mini Kit ( 95 ) (AccuBioMed, R10096) using a iColumn24 Robot (AccuBioMed) with DNase1 treatment and 50μl volume elution. RNA-seq samples were prepared using the Trueseq Standard mRNA Illumina kit and sequenced on a NovaSeq 6000 instrument at the genomics facility Basel.

Analysis of ribosome profiling data

Reads from fastq files were trimmed with fastx_clipper from FASTX-Toolkit version 0.0.14 with parameters ‘-a (3′adapter) AAAAAAAAAA, −l (minimum-length) 20, −c (discards non-clipped sequences) and -n (discards sequences with unknown (N) nucleotides)’. The trimmed reads were further trimmed with fastq_quality_trimmer from the same toolkit with -t (minimum quality) 20, -Q (quality type) 33. Then the trimmed reads were filtered with fastq_quality_filter from the same toolkit for read quality with the following parameters: ‘-q (minimum quality) 20, -p (minimum percent of bases that must have [-q] quality) 90’, −l (minimum-length) 20. These reads were first aligned to ribosomal RNA (rRNA) sequences obtained from Mus musculus ribosomal DNA (rDNA), complete repeating unit ( https://www.ncbi.nlm.nih.gov/nuccore/bk000964 ) using Segemehl ( 36 ) version 0.2.0. The reads that did not map to rDNA were then aligned to the longest coding transcripts for each gene identified from Mus musculus GRCm38–mm10 genome assembly, Ensembl 99 annotation using Segemehl. The uniquely mapped reads from this alignment were used for downstream analysis.

Analysis of RNA sequencing data

Single-end reads from raw fastq files were processed using ZARP ( 37 ) workflow with the default parameters, Mus musculus GRCm38–mm10 genome assembly, Ensembl 99 annotation, 3′ adapter ‘GATCGGAAGAGCACAC’, ‘SR’ for library type (strand-specific reads coming from the reverse strand). The kallisto ( 38 ) output (version 0.46.2) of ZARP workflow was used for downstream analysis.

Analysis of differential expression, translation and translational efficiency

Differential expression (RNA-seq) and differential translation (ribo-seq) analyses were performed using the Deseq2 R package ( 39 ) version 1.34.0 with default parameters. The deltaTE ( 40 ) procedure was applied for differential translation efficiency analysis using the reads mapped to coding sequence (CDS) regions obtained from GRCm38–mm10 genome assembly, Ensembl 99 annotation both for RNA-seq and Ribo-seq libraries. Rsubread ( 41 ) R package version 2.8.2 was employed to obtain the RNA-seq reads aligned to CDS regions. An in-house algorithm was used to obtain the ribo-seq reads mapped to CDS regions based on their estimated P-sites.

Gene ontology (GO) analysis

The ClusterProfiler ( 42 ) R package version 3.18.1 was used for all the GO term analyses reported in this study. ComplexHeatmap ( 42 , 43 ) version 2.6.2 and circlize ( 44 ) version 0.4.15 R packages were used to construct the heatmaps. InteractiVenn ( 45 ) was used for Venn diagrams.

Analysis of LC–MS data

The raw files were searched against a protein database containing sequences of the SwissProt ( 46 ) entries of Mus musculus (in total 17 137 protein sequences) along with commonly observed contaminants using FragPipe version 18.0.0 (downloaded from https://github.com/Nesvilab/FragPipe/releases ). Protein intensities obtained from the FragPipe platform were imputed with the impute.knn function from impute R package ( 47 ) version 1.64.0. Normalization of protein intensities and differential protein expression analyses were performed using limma R package ( 48 ) version 3.64.0.

For amino acid mismatch analysis, the procedure developed by Mordret et al. ( 49 ) was employed. The dependent peptides required for mismatch identification were obtained using MaxQuant computational platform ( 50 ) version 2.1.3.0.

RNA from Spontaneous and Spermatogenic differentiation was isolated using AccuPure Cell/Blood RNA Mini Kit ( 95 ) (AccuBioMed, R10096) using iColumn24 Robot (AccuBioMed) with DNase1 treatment and 50μl volume elution. Reverse transcription was performed from 500–1000 ng of RNA using SuperScript IV First-Strand Synthesis System (Invitrogen, 18 091 200) using Random hexamers (Promega, 300 453). qPCR reaction was performed in 20μl volume using PowerSYBR Green PCR Master Mix (AppliedBiosystems, 4 367 659) and QuantStudio 3 System (Applied Biosystems, A28567) using Comparative CT (ΔΔCT) analysis and Standard Run mode. Rrm2 was treated as endogenous control for all analyses. The primer sequences used are in Supplementary Table S2 .

Spontaneous differentiation organoids immunofluorescence

Organoids were fixed using 4% PFA in PBS at 4°C for 2 h. After two washes in PBS, the organoids were transferred to 10% sucrose solution in PBS and stored at 4°C for 1 day. This was followed by a 1-day incubation in 20% sucrose in PBS at 4°C, and finally by a 1-day incubation in 30% sucrose in PBS at 4°C. Next, the organoids were embedded in PolyFreeze Tissue Freezing Medium (Sigma, SHH0025) in ibidi slides (ibiTreat, 80 826), snap frozen on dry ice and stored at −80°C until cryosectioning. Cryosections of 12-μm thickness were made on Superfrost Ultra Plus Gold Adhesion slides (Thermo Fisher, 11 976 299) using a Leica Microsystems cryostat. Slides were stored at −80°C or processed immediately. For IF, slides were air-dried at RT for 1 h, washed in PBS three times 5 min each and permeabilized in 0.2% Triton X-100 in PBS for 30 min at RT. The cryosections on each slide were then circumscribed using ImmEdge Hydrophobic Barrier Pen (Vector Labs) and blocking solution (1% BSA, 5% mouse serum in 0.2% Triton X-100) was added for 30 min at RT. Primary antibodies against Gata4-Alexa594 (SantaCruz, sc-25310) and Nestin-Alexa488 (SantaCruz, sc-23927) were diluted 1:50 in blocking solution (with 0.1% Tween) and incubated in dark at 4°C overnight. In the end slides were washed 3 times with 0.1%Tween in PBS and mounted with VECTASHIELD Antifade Mounting Medium with DAPI (Vector Laboratories, H-1200-10). The images were acquired using Zeiss LSM800 confocal microscope and analyses using Fiji software ( 51 ) and visualized and deposited using OMERO ( 52 ) (project ID:12 920).

The image analysis was done in Fiji ( 51 ) using software available on github https://github.com/imcf-shareables/stem_cell_analysis . The script uses TrackMate ( 53 ) and StarDist ( 54 ) to segment the nuclei in 3D in a defined region of interest. The mean intensity of the endoderm marker was measured in the nuclear region while the mean intensity of the ectoderm marker was measured in a 3D layer around the nuclei obtained by dilating the nuclei and subtracting the originals from the dilation, using the 3DImageJSuite ( 55 ). The results were then saved to CSV for statistical analysis.

Western blotting

Samples were collected from cell culture by scraping, centrifuged (5min @210rcf, 4°C), washed and resuspended in RIPA buffer containing Phosphatase inhibitor (Roche 4 906 837 001) and protease inhibitor cocktail (Roche 118 361 530 001). After sonication (2 × 4 times Amp60%, pulse 0,5) on Hilsher UP50H probe sonicator and centrifugation (5 min @ 5000 rpm, 4°C) of the samples, a BCA measurement was executed to determine the protein concentration of the samples. For the SDS-Gel electrophoresis, 30ug of the samples were mixed with 4x Lämmli Buffer containing b-mercaptoethanol, heated up for 10 min at 95°C and centrifuged before loading onto a 10-well 5–20% gradient gel (BioRad 456–1093). The finished gels were blotted via semi dry method (20V, 400 mA, 45 min) onto a nitrocellulose membrane (GE, Amersham Protran Premium 0.2 um NC), dyed with ponceau, washed with 1× TBST, blocked with 5% BSA in 1× TBST and kept overnight at 4°C rotating in the respective primary antibody. The membranes were washed afterwards 3 × 10 min with 1xTBST then put into secondary antibody at RT for 1 h and washed again. They were imaged in the Fusion FX from VILBER (Software FUSIONFX7 Edge 18.11) with Amersham ECL Western Blotting Detection Reagents RPN2106 from GE Life science.

For loading control the membranes were put in GAPDH (Histone H3 for EIF2A and P-EIF2A). For details of the antibodies and the respective concentrations used, please refer to Supplementary Table S3 .

Yeast strains and media

Yeast strains were either grown in rich media composed of 1% w/v yeast extract, 1% (w/v) peptone, 40 mg/ l -adenine, 2% (w/v) glucose (YPD) or in synthetic complete medium (HC) composed of 0.17% (w/v) yeast nitrogen base with ammonium sulfate and without amino acids, 2% (w/v) glucose and mixtures of amino acids (MP Biomedicals) depending on the auxotrophies used for selection. Cells were grown at 30°C or 23°C. Solid media contained 2% (w/v) agar and were supplemented with paromomycin (P9297 Sigma) or l -azetidine-2-carboxylic acid (AZC; A0760 Sigma) when needed. Yeast RPL39 genomic deletion was done according to standard procedures (69), using pAG32 as PCR template for gene replacement and integration of the Hygromycin resistance gene confirmed by PCR (see Tables S3 and S4).

Yeast transformation

Three units of OD 600 of yeast cells were grown in appropriate YPD or HC media to mid-log phase. Cells were spun down and washed in 1 volume of 1× TE and 10 mM LiAc. The pellet was then resuspended in 350 μl of transformation mix (1× TE, 100 mM LiAc, 8.5% (v/v) ssDNA, 70% (v/v) PEG3000), incubated with DNA (PCR product or 1 μg of plasmid DNA) for 1 h at 42°C, spun down (30 s at 10 000 × g at RT), resuspended in 100 μl of YPD or HC media and cells were plated onto selective media and incubated at 30°C.

All RPL39 variants were cloned into the pRS413-GPD plasmid digested by the BamHI and SalI using the Gibson assembly kit (NEB). Guide blocks (IdT; Supplementary Table S4 ) of 156 bp consisting of mouse RPL39, mouse RPL39L, or yeast RPL39 were designed and used as PCR templates using dedicated primers ( Supplementary Table S5 ).

Ribosome purification from yeast cells

The ribosomes from the various yeast lines were extracted using the protocol as described in ( 56 ). To summarize, the yeast cells were grown in HC -His medium and harvested by centrifugation at exponential growth phase and snap-frozen in liquid nitrogen. The frozen pellet was disrupted using SPEX SamplePrep 6875 Freezer Mill in liquid nitrogen. The crushed frozen pellet was resuspended in RES (50 mM Hepes pH 7.6, 200 mM KCl, 10 mM MgCl 2 , 5 mM EDTA, 250 mM sucrose, 2 mM DTT). Centrifugation was used to clear cell debris for 60 min at a speed of 25 600 × g in a Beckmann Coulter Type 45 Ti rotor. The 80S ribosome-containing supernatant was decanted and added to a cushion of 50% (w/w) sucrose (62 mM Hepes pH 7.6, 62 mM KCl, 12 mM MgCl 2 , 6 mM EDTA, 50% (w/w) sucrose, 0.025% sodium azide and 2 mM DTT), which was then centrifuged for 20 hours at 184 000 × g and 4°C. (Beckman Ti70 rotor). The granules were resuspended in PRE buffer (50 mM Hepes pH 7.6, 10 mM KCl, 10 mM MgCl 2 , 0.02% sodium azide and 2 mM DTT) after the supernatant was removed. Centrifugation at 103 000 × g and 4°C for 14 h separated the ribosomal subunits on a 50% (w/v) sucrose gradient (52 mM Hepes pH 7.6, 727 mM KCl, 10 mM MgCl2, 0.021% sodium azide and 2 mM DTT) in Beckmann Coulter XE-90 ultracentrifuge.

Cryo-EM analysis

Vitrification and cryo-em data collection.

Samples were vitrified on Quantifoil R2/2 holey carbon grids, coated in-house with 1 nm continuous carbon. Grids were subjected to glow discharge for 15 s with 15 mA current directly before sample vitrification. The climate chamber of a ThermoScientific Vitrobot was equilibrated to a temperature of 4°C and 95% humidity. A volume of 4 μl ribosome sample was subjected to a vitrification protocol using 30 s pre-blot incubation followed by 2 or 3 s blotting, and subsequent rapid transfer into liquid ethane-propane mix. Micrographs were collected on a ThermoScientific Titan Krios cryo-electron microscope equipped with a Gatan K3 direct electron detector and a GIF BioContinuum energy filter. Images were acquired at 300 kV accelerating voltage in counted super-resolution mode with an electron dose of 45 e − /Å 2 at a nominal magnification of 105 000×, resulting in a pixel size of 0.84 Å/pixel. Micrographs were exposed for 1 s and fractionated into 40 frames. The slit width of the energy filter was set to 20 eV.

Cryo-EM image processing

All image processing was carried out in cryoSPARC (version v3.3.1 + 220 315) ( 57 ) ( Supplementary Figure S8 ). Micrographs were dose weighted and motion corrected using Patch Motion Correction. Defocus was subsequently estimated by CTF fitting using Patch CTF Estimation. Particles were selected using the blob picker with a circular blob template with a diameter between 300 and 500 Å. Visual inspection confirmed that the blob picker accurately selected ribosomal particles. Particles were extracted with a box size of 600 pixels and scaled to 400 pixels (Nyquist limit 2.54 Å) and subjected to 2D classification into 200 classes. Classes that contained 60S or 80S ribosomal subunits were retained while empty classes and classes that contained junk particles (ice blobs, carbon edges), or 40S ribosomal subunits were removed. An initial model was generated from a subset of 63 945 particles of dataset 1 (rpl39Δ-MmRPL39L) using cryoSPARC ab-initio model generation (1 class). A soft mask encompassing the 60S subunit, but excluding the 40S subunit was created from the initial model and used for refinement. The resulting map at 2.63 Å resolution and the associated mask were used as the initial model for refinement of all datasets. Using the reconstruction calculated from a subset of the data filtered to 30 Å resolution as the initial model, 3D maps of all samples were reconstructed by Homogeneous Refinement. As the resolution of the refined maps of samples Sc60S, rpl39Δ-ScRPL39, rpl39Δ-MmRPL39 and rpl39Δ-MmRPL39L exceeded the Nyquist limit of the binned particle images, particles were re-extracted with 600 pixel box size, without binning and subsequently refined using Homogeneous Refinement and Non-uniform Refinement. Resolution of the maps was further improved by Global and subsequent Local CTF Refinement.

Model building and refinement

The yeast 60S ribosomal subunit structure (7TOO, ( 58 )) was docked into the density map by rigid-body fitting in UCSF Chimera ( 59 ). Parts of the original model that were outside of cryo-EM density due to flexibility were deleted from the model. Ribosomal protein and rRNA positions were adjusted in Coot ( 60 ) and MmRPL39 and MmRPL39L, respectively, were built into the density. Metal ions, chloride ions, water, spermine and spermidine were placed into positive difference density according to the chemistry of the environment and the geometry of the coordinating groups; divalent ions were built as magnesium ions and monovalent ions were built as potassium ions. Metal ion coordination restraints were generated and optimized with ReadySet ( 61 ). The model was refined into the density using Phenix (dev-4788–00) ( 62 ) real space refinement ( Supplementary Figure S8 , Supplementary Table S6 ), using five cycles with secondary structure, Ramachandran and side-chain rotamer restraints. The quality of the refinement was validated using real-space correlation coefficients (model versus map at FSC = 0.5). Images were prepared with PyMol (citation: The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.).

The RP paralog RPL39L is expressed in a variety of normal and malignant cells

Aiming to determine the breadth of RPL39L expression across normal human cells, we examined the single-cell sequencing (scRNA-seq) data from The Human Protein Atlas (HPA ( 18 ), https://www.proteinatlas.org/download/rna_single_cell_type_tissue.tsv.zip ). While most abundant in the cells of the male germ cell lineage, the RPL39L mRNA is also present in other cell types, such as the extravillous trophoblast, where the RPL39L level is ∼12-fold lower compared to spermatocytes (Figure 1A , top). In contrast, another germ cell-specific RP, RPL10L , is virtually absent outside of the male germ line (Figure 1A , top). For comparison, we also investigated the variation of mRNAs encoding core RPs (identified as described in Materials and methods, Supplementary Figure S1 ) across cell types, finding it to be much smaller relative to the RP paralogs (Figure 1A , top). The scRNA-seq data also gave us the opportunity to determine whether RPL39L replaces RPL39 in specific cell types or rather the two genes are co-expressed. In the HPA scRNA-seq read count data ( https://www.proteinatlas.org/download/rna_single_cell_read_count.zip ) all cells that contained RPL39L -derived reads also had RPL39-derived reads. The sole exception were male germ cells, most of which contained exclusively RPL39L reads. This suggests that some cell types have heterogeneous RPL39/RPL39L populations of ribosomes (Figure 1A , bottom panel). The relatively high expression level of RPL39L in trophoblast cells (Figure 1A . top) prompted us to further examine the dynamics of RPL39L/RPL39 expression in early human development relative to adult human tissues. As the overall abundance of ribosomes differs across cell types ( 2 ), we always examined the variation in RPL39L/RPL39 levels relative to the core RPs. Reanalyzing scRNA-seq data sets from pre-implantation embryos and embryonic stem cells (ESCs) ( 17 ) along with normal adult tissue samples in The Cancer Genome Atlas ( https://www.cancer.gov/tcga/ ) we found that, relative to core RPs, RPL39 maintained the same level in embryonic and differentiated cells (Figure 1B ), while the level of RPL39L was significantly higher in embryonic cells than in adult normal cells (Figure 1B ). The RPL39L -to-core RP ratio also varied more than 30-fold across cancer types (Figure 1B ), in many cancers reaching the values observed in embryonic cells. The ratio was highest in lung (LUSC) and cervix (CESC) tumors. In contrast, the RPL39 -to-core RP ratio fluctuated much less across tumors (Ansari–Bradley test, P -value < 10 −4 ) and relative to normal cells.

RPL39L expression across cell types. (A) Top: HPA-provided normalized gene expression values (transcript-per-million, TPM) were used to identify the 14 cell types with highest RPL39L expression. The log2 fold change in each of these cell types relative to the median across all other 57 cell types in HPA is shown for RPL39L (orange), RPL39 (green), RPL10L (blue) genes and 4 core RPs (gray). Bottom: the proportion of RPL39L+ cells that also contained RPL39-derived reads in single cells of the types shown in the top panel. (B) Ratio of RPL39 and RPL39L to core RP expression (log2) in bulk RNA-seq samples of primary tumors from TCGA (https://www.cancer.gov/tcga/) (left panel) as well as human pre-implantation embryos and cultured embryonic stem cells (17) (right panel). See https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations for TCGA cancer-type abbreviations. Black horizontal lines show the median values of these ratios in adult normal tissue samples from TCGA. Statistically significant (two-sided Wilcoxon test, Benjamini–Hochberg FDR < 0.05) positive and negative deviations from the medians are shown in orange and blue, respectively. For each category, the 95% confidence interval over all samples (for bulk data from TCGA) or all cells (for single-cell data) is shown. Categories for which the ratios were not significantly different from the median of the normal samples are shown in black. (C) Ribo-seq vs. RNA-seq level expression of RPL39L in samples from the human tissue atlas (extracted from the supplementary material of (63)). (D) Quantification of RPL39L/RPL39 protein ratio in various cellular systems (mouse sperm cells −6 samples, breast cancer cell line MDA-MB-231, bone marrow-derived mesenchymal stem cells (hMSC) and E14 mouse embryonic stem cells (mESC) −3 independent samples, breast cancer tissues −10 samples) using reference peptides. Similar quantification from purified ribosomes of E14 and MDA-MB-231 cells is also shown.

RPL39L expression across cell types. ( A ) Top: HPA-provided normalized gene expression values (transcript-per-million, TPM) were used to identify the 14 cell types with highest RPL39L expression. The log 2 fold change in each of these cell types relative to the median across all other 57 cell types in HPA is shown for RPL39L (orange), RPL39 (green), RPL10L (blue) genes and 4 core RPs (gray). Bottom: the proportion of RPL39L + cells that also contained RPL39 -derived reads in single cells of the types shown in the top panel. ( B ) Ratio of RPL39 and RPL39L to core RP expression (log 2 ) in bulk RNA-seq samples of primary tumors from TCGA ( https://www.cancer.gov/tcga/ ) (left panel) as well as human pre-implantation embryos and cultured embryonic stem cells ( 17 ) (right panel). See https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations for TCGA cancer-type abbreviations. Black horizontal lines show the median values of these ratios in adult normal tissue samples from TCGA. Statistically significant (two-sided Wilcoxon test, Benjamini–Hochberg FDR < 0.05) positive and negative deviations from the medians are shown in orange and blue, respectively. For each category, the 95% confidence interval over all samples (for bulk data from TCGA) or all cells (for single-cell data) is shown. Categories for which the ratios were not significantly different from the median of the normal samples are shown in black. ( C ) Ribo-seq vs. RNA-seq level expression of RPL39L in samples from the human tissue atlas (extracted from the supplementary material of ( 63 )). ( D ) Quantification of RPL39L/RPL39 protein ratio in various cellular systems (mouse sperm cells −6 samples, breast cancer cell line MDA-MB-231, bone marrow-derived mesenchymal stem cells (hMSC) and E14 mouse embryonic stem cells (mESC) −3 independent samples, breast cancer tissues −10 samples) using reference peptides. Similar quantification from purified ribosomes of E14 and MDA-MB-231 cells is also shown.

To determine whether the RPL39L mRNA is indeed translated into protein, we examined an extensive ribosome footprinting dataset from human primary cells and tissues ( 63 ). As shown in Figure 1C , we found the number of RPL39L-derived ribosome footprints to be largely proportional to the number of RPL39L-derived RNA-seq reads, indicating that the RPL39L mRNA does indeed undergo translation. Furthermore, human embryonic stem cells had relatively high expression of RPL39L, as we have seen in other datasets (Figure 1B ). We also sought to measure the RPL39L protein in a few cellular systems. As RPL39L-specific antibodies are not available, we turned to mass spectrometry. Due to its high lysine/arginine content, the RPL39L protein is not reliably captured in standard proteomics analyses. Thus, we modified the sample preparation, acetylating the lysines in the proteome to direct the cleavage by trypsin to arginines only. We used heavy-labeled reference peptides to measure the RPL39L-to-RPL39 protein ratio in mature mouse sperm cells as positive control, and then in mouse embryonic stem cells, the human MDA-MB-231 breast cancer cell line, as well as human tissue samples, namely bone marrow-derived mesenchymal stem cells and breast cancer tissue. RPL39L was detectable at the protein level in these cell types, albeit at lower levels compared to mouse sperm (Figure 1D ). To further answer the question of whether the protein is incorporated into ribosomes, we carried out the mass spectrometric analysis on ribosome populations purified by sucrose cushioning from E14 and MDA-MB-231 cell lines. The proportion of Rpl39l in the ribosomes of E14 mESC cells was similar to the proportion in the total lysate, while in the MDA-MB-231 cell line the proportion was much lower than in total lysate.

In human breast cancer tissue samples with heterogeneous cell type composition, RPL39L was also detectable, at lower abundance compared to pluripotent stem cells (Figure 1D ). Our data thus provide conclusive mass spectrometric evidence of RPL39L protein expression not only in male germ cells, but also in pluripotent cells and cancer cell lines. The broader expression pattern of RPL39L compared to other germ cell-biased RP paralogs points to the relevance of RPL39/RPL39L ribosome heterogeneity beyond spermatogenesis.

KO of RPL39L impairs the pluripotency of E14 mESCs

To determine the role of RPL39L in pluripotent cells we generated RPL39L knockout (KO) mESCs by CRISPR-mediated deletion of the coding region of this gene in the E14 mouse stem cell line. To minimize the chance of interfering with RPL39 we designed the sgRNAs to target non-coding regions of RPL39L , which are not shared with RPL39 . To further exclude sgRNA-specific off-target effects, we designed two sets of guide RNAs that matched non-coding regions around the RPL39L CDS (Figure 2A ), and for each sgRNA set we selected clones that originated from independent editing events ( Supplementary Figure S2 ). We comprehensively analyzed two distinct clones for each sgRNA pair, in which the editing of the RPL39L locus was validated both by qPCR amplification with probes that flanked the edited region, and by measuring the protein expression with targeted proteomics (Figure 2B , C). All four clones (labeled as 1.17, 1.20, 2.9 and 2.11) exhibited homozygous deletions of the RPL39L locus, resulting from a spectrum of editing events including seemingly distinct deletions on the two chromosomes in clone 2.11 ( Supplementary Figure S2 ). The RPL39L protein was undetectable in all of the edited clones except for 1.17, where some residual protein expression was detected.

Characterization of RPL39L KO mESCs. (A) Schema of sgRNA design. Two distinct pairs of sgRNAs (in green and blue) were designed to target flanking regions of the RPL39L coding sequence (CDS, shown as a red box). Primers (black lines) from further upstream and downstream in the RPL39L locus were used for amplification, and the expected sizes of the PCR products are indicated for both the WT locus (1157 nts) and the edited loci (587 and 692 nts, respectively, for the sgRNA sets 1 and 2). (B) PCR products from the WT E14 cells and the 4 independent clones, 1.17 and 1.20 generated with sgRNA set #1 and2.9 and 2.11 generated with sgRNA set #2. (C) Expression of RPL39L in individual clones determined with targeted proteomics (n = 3 for each clone). (D) Representative bright field images of colonies from all analyzed clones. (E) Results of the ethynyldeoxyuridine (EdU, thymidine analog) incorporation assay. EdU-treated cells were fixed, permeabilized and the AF488 fluorophore was linked to the EdU in the replicated DNA by click-chemistry. The label intensity was measured by FACS (Mean fluorescence intensity, MFI). (F) Results of the annexin binding assay. Cells were incubated with AF488-conjugated Annexin V and counterstained with phycoerythrin (PE). The proportion of AF488+PE− (apoptotic cells) relative to the parent cell population was determined by FACS. (G) RT-qPCR of the pluripotency factors SOX2, OCT4 and NANOG. Values are 2−ΔΔCt, relative to RRM2 (internal reference) and to WT. (H) Representative Western blots of pluripotency markers SOX2, OCT4 and NANOG in the RPL39L KO lines and WT relative to GAPDH. In panels F and G, *, ** and *** correspond to P-values <0.05, <0.01 and < 0.001, respectively in the two tailed t-test comparing KO lines with WT.

Characterization of RPL39L KO mESCs. ( A ) Schema of sgRNA design. Two distinct pairs of sgRNAs (in green and blue) were designed to target flanking regions of the RPL39L coding sequence (CDS, shown as a red box). Primers (black lines) from further upstream and downstream in the RPL39L locus were used for amplification, and the expected sizes of the PCR products are indicated for both the WT locus (1157 nts) and the edited loci (587 and 692 nts, respectively, for the sgRNA sets 1 and 2). ( B ) PCR products from the WT E14 cells and the 4 independent clones, 1.17 and 1.20 generated with sgRNA set #1 and2.9 and 2.11 generated with sgRNA set #2. ( C ) Expression of RPL39L in individual clones determined with targeted proteomics ( n  = 3 for each clone). ( D ) Representative bright field images of colonies from all analyzed clones. ( E ) Results of the ethynyldeoxyuridine (EdU, thymidine analog) incorporation assay. EdU-treated cells were fixed, permeabilized and the AF488 fluorophore was linked to the EdU in the replicated DNA by click-chemistry. The label intensity was measured by FACS (Mean fluorescence intensity, MFI). ( F ) Results of the annexin binding assay. Cells were incubated with AF488-conjugated Annexin V and counterstained with phycoerythrin (PE). The proportion of AF488 + PE − (apoptotic cells) relative to the parent cell population was determined by FACS. ( G ) RT-qPCR of the pluripotency factors SOX2 , OCT4 and NANOG . Values are 2 −ΔΔCt , relative to RRM2 (internal reference) and to WT. ( H ) Representative Western blots of pluripotency markers SOX2, OCT4 and NANOG in the RPL39L KO lines and WT relative to GAPDH. In panels F and G, *, ** and *** correspond to P -values <0.05, <0.01 and < 0.001, respectively in the two tailed t-test comparing KO lines with WT.

The RPL39L KO cells were viable in culture, forming colonies that were similar in morphology to those formed by the WT mESCs (Figure 2D ). Analysis of 5-ethynyl-2-deoxiuridine (EdU) incorporation during 2 h of treatment did not reveal significant differences between KO and WT cells, indicating that the RPL39L KO does not significantly impact proliferation in mESCs (Figure 2E ). Annexin V staining showed only small, though statistically significant increase in the proportion of apoptotic cells relative to WT (Figure 2F ). Despite no obvious changes in the colony morphology, the pluripotency markers exhibited significant variation; the Sox2 and Oct4 levels were reduced in KO compared to WT E14 lines, while Nanog showed less consistent reduction (Figure 2G ). These differences were also apparent at the protein level (Figure 2H ). Thus, RPL39L KO lines are viable, but have reduced expression of pluripotency markers relative to WT mESCs.

RPL39L KO mESCs exhibit differentiation defects

The perturbed expression of pluripotency markers in RPL39L KO cells prompted us to further investigate their ability to differentiate. Given the relevance of RPL39L for the spermatogenic lineage ( 11 , 64 ), we first subjected WT and RPL39L KO E14 lines to in vitro differentiation along this lineage using a previously described protocol ( 33 ). The emergence of spermatocyte-like cells in the WT E14 mESC culture demonstrated that the protocol works as expected (Figure 3A ). In contrast, we did not identify any spermatocyte-like cells in the cultures of RPL39L KO lines, and the expression of germ-cell lineage markers Stella and Dazl were significantly reduced relative to differentiating WT cells (Figure 3B ). These results show that RPL39L is necessary for spermatogenesis in vitro , consistent with previous observations in mice ( 11 , 64 ). To determine whether the KO of RPL39L impacts other developmental lineages as well, we carried out spontaneous differentiation of all our mESC lines in vitro , by culturing the cells in leukemia inhibiting factor (LIF)-free medium under non-adherent conditions ( 65 ). We then analyzed the expression of lineage markers both by qRT-PCR and by immunofluorescence-based quantification of protein levels in embryoid body sections (Figure 3C – F ). The qRT-PCR revealed decreased expression of extraembryonic endoderm markers GATA6, GATA4 and DAB2 in the organoids generated from the KO cells relative to those generated from WT cells (Figure 3C , E) and increased expression of ectodermal lineage markers NESTIN, FGF5 and PAX6 (Figure 3D , E). The protein level quantification from confocal images of NESTIN (ectoderm) and GATA4 (endoderm) validated the mRNA-level results (Figure 3F ). Thus, the KO of RPL39L impacts spontaneous differentiation of mESCs, primarily towards the ectoderm and endoderm.

RPL39L KO leads to differentiation defects in E14 mESCs. (A) Representative bright field images showing the spermatogenic differentiation of WT E14 cells. Red arrows indicate spermatocyte-like cells. (B) qRT-PCR assays of DAZL (late) and STELLA (early) sperm cell markers (33) (y-axis, log2 fold-change) in differentiating (4 days in RA-containing medium) populations of KO clones (x-axis) relative to WT. (C) qRT-PCR of extraembryonic endoderm markers (DAB2, GATA4, GATA6) (65) in RPL39L KO lines relative to WT. (D) Similar for FGF5, NESTIN, and PAX6 ectoderm markers (65). (E) Immunofluorescence staining of embryoid bodies subjected to spontaneous differentiation: GATA4 was used as endoderm marker, NESTIN as ectoderm marker and DAPI to delineate the nucleus. (F) Quantification of GATA4 and NESTIN expression in immunofluorescence images. AFU - arbitrary fluorescence units normalized to DAPI. In all panels, *, ** and *** correspond to P-values <0.05, <0.01 and < 0.001, respectively in the two tailed t-test comparing KO lines with WT.

RPL39L KO leads to differentiation defects in E14 mESCs. ( A ) Representative bright field images showing the spermatogenic differentiation of WT E14 cells. Red arrows indicate spermatocyte-like cells. ( B ) qRT-PCR assays of DAZL (late) and STELLA (early) sperm cell markers ( 33 ) (y-axis, log 2 fold-change) in differentiating (4 days in RA-containing medium) populations of KO clones (x-axis) relative to WT. ( C ) qRT-PCR of extraembryonic endoderm markers ( DAB2, GATA4, GATA6 ) ( 65 ) in RPL39L KO lines relative to WT. ( D ) Similar for FGF5, NESTIN , and PAX6 ectoderm markers ( 65 ). ( E ) Immunofluorescence staining of embryoid bodies subjected to spontaneous differentiation: GATA4 was used as endoderm marker, NESTIN as ectoderm marker and DAPI to delineate the nucleus. ( F ) Quantification of GATA4 and NESTIN expression in immunofluorescence images. AFU - arbitrary fluorescence units normalized to DAPI. In all panels, *, ** and *** correspond to P -values <0.05, <0.01 and < 0.001, respectively in the two tailed t-test comparing KO lines with WT.

RPL39L KO lines exhibit perturbed protein synthesis and ER stress

RPL39L being a ribosome component, to unravel the mechanisms underlying the observed functional defects, we evaluated the effect of RPL39L KO on global translation. We generated polysome profiles from all clones, and found that the KO of RPL39L leads to a small, but statistically significant increase in polysome-to-monosome ratio (Figure 4A , B ). This indicates that RPL39L KO induces either a small increase in the rate of translation ( 66 ) or to an elongation defect ( 67 ) in mESCs.

Impact of RPL39L KO on mRNA translation. (A) Example polysome profiles from the WT, 1.20 and 2.9 RPL39L KO E14 cell lines. (B) Ratio of the area under the profile corresponding to polysomes vs. monosomes (80S), in polysome profiles obtained from the KO clones. Fold-changes were calculated relative to the median ratio in the corresponding WT (dashed line at 1, n = 3 for all cell lines). (C) Log2 fold-changes in the translation efficiency (TE), mRNA level and the number of ribosome protected fragments (RPF) for specific genes, in mutant clones relative to WT E14 cells (n = 3 for each clone). Shown are all genes with a significant change in TE in at least one of the RPL39L KO clones. Values are capped at −1 and 1. (D, E) Gene Ontology analysis of mRNAs with reduced (D) and increased (E) TE in RPL39L KO clones. (F) Representative western blots and corresponding quantification (from n = 3 for each clone) of UPR markers PERK (phospho-Thr980) and EIF2A (phospho-Ser51). Intensities of phosphorylated proteins were normalized by the respective unphosphorylated forms and are relative to WT, for which the relative phosphorylation level was set to 1. (G) Representative western blot and corresponding quantification (from n = 3 for each clone) showing lower global O-GlcNAc modification of proteins in RPL39L KO lines when compared to the WT. Values are relative to GAPDH (loading control) and WT (level set to 1). In all panels, *, ** and *** correspond to P-values <0.05, <0.01 and < 0.001, respectively in the two tailed t-test comparing KO lines with WT.

Impact of RPL39L KO on mRNA translation. ( A ) Example polysome profiles from the WT, 1.20 and 2.9 RPL39L KO E14 cell lines. ( B ) Ratio of the area under the profile corresponding to polysomes vs. monosomes (80S), in polysome profiles obtained from the KO clones. Fold-changes were calculated relative to the median ratio in the corresponding WT (dashed line at 1, n  = 3 for all cell lines). ( C ) Log 2 fold-changes in the translation efficiency (TE), mRNA level and the number of ribosome protected fragments (RPF) for specific genes, in mutant clones relative to WT E14 cells ( n  = 3 for each clone). Shown are all genes with a significant change in TE in at least one of the RPL39L KO clones. Values are capped at −1 and 1. ( D ,  E ) Gene Ontology analysis of mRNAs with reduced ( D ) and increased ( E ) TE in RPL39L KO clones. ( F ) Representative western blots and corresponding quantification (from n  = 3 for each clone) of UPR markers PERK (phospho-Thr980) and EIF2A (phospho-Ser51). Intensities of phosphorylated proteins were normalized by the respective unphosphorylated forms and are relative to WT, for which the relative phosphorylation level was set to 1. ( G ) Representative western blot and corresponding quantification (from n  = 3 for each clone) showing lower global O-GlcNAc modification of proteins in RPL39L KO lines when compared to the WT. Values are relative to GAPDH (loading control) and WT (level set to 1). In all panels, *, ** and *** correspond to P -values <0.05, <0.01 and < 0.001, respectively in the two tailed t -test comparing KO lines with WT.

To determine whether some transcripts are specifically impacted in translation by the RPL39L KO, we carried out ribosome footprinting ( 34 ), sequencing ribosome-protected mRNA fragments (RPFs) from both WT and RPL39L KO mESCs. The RPF data fulfilled expected quality criteria such as the vast majority of reads mapping to the coding regions of mRNAs and the 3 nucleotide periodicity of inferred P site locations ( Supplementary Figure S3 ). We further sequenced the mRNAs from these cells and calculated the translation efficiency (TE) per mRNA as the ratio of the RPF and mRNA read density along the CDS (see Materials and methods). Focusing on mRNAs whose TE was significantly altered ( P -value < 0.01 in ΔTE test, see Materials and methods) in at least one of the KO lines relative to the WT, we found that while the differences in RPFs and TE in KO clones relative to WT were generally small, the direction of change was highly consistent among specific classes of mRNAs (Figure 4C ). Gene Ontology analysis showed that transcripts encoding components of the endomembrane system, including the Golgi apparatus and the endoplasmic reticulum (ER) experienced the strongest reduction in TE across the KO lines (Figure 4D ). In contrast, we found a significantly increased TE for transcripts associated with the cellular response to chemical and oxidative stress (Figure 4E ). Examples of increased and decreased RPF coverage of specific genes are shown in Supplementary Figure S3 . These results suggest that the production of specific classes of proteins, associated with subcellular compartments such as the ER and Golgi apparatus, is impaired in RPL39L KO cells.

To further elucidate the relationship between the RPL39L KO and cellular stress, we measured the levels of both phosphorylated PKR-like ER kinase (PERK), a sensor of ER stress ( 68 ) and of its phosphorylation target, the eukaryotic initiation factor 2a (EIF2A) ( 69 ), which is responsible for regulation of many unfolded protein response (UPR)-associated stress-response genes. Western blotting showed that both of these markers were elevated in RPL39L KO compared to WT E14 cells (Figure 4F ). We further used an O-GlcNAc antibody to label the glycosylated proteins from all our cell lines on a western blot. Integrating the chemiluminescence signal over the entire lanes corresponding to specific cell lines, we found reduced global signals in the samples from RPL39L KO cell lines compared to the WT E14 line (Figure 4G ). Thus, ER stress markers are upregulated and protein glycosylation is impaired in RPL39L KO mESC lines relative to WT.

Increased degradation underlies the perturbed protein levels in RPL39L KO cell lines

To learn more about the protein synthesis in RPL39L KO and WT cells, we measured the protein levels, both in steady-state and upon inhibition of proteasome and autophagy-dependent protein degradation (by treatment with MG132 and Bafilomycin A1, respectively), by shot-gun mass spectrometry. Overall, we identified 5026 proteins, 2992 of which in both treated and untreated conditions ( Supplementary Figure S4 ). Strikingly, many proteins whose expression was perturbed by the RPL39L KO were restored to levels similar to those in WT by the inhibition of protein degradation, as indicated by the narrower distribution of log 2 fold-changes of KO relative to WT cells in protease inhibitor-treated compared to untreated cells (Figure 5A ). Along with the observed ER stress, this points to a reduced stability of proteins in RPL39L KO lines, which in turn suggests an increased production of defective proteins. This could be due to translation errors, e.g. amino acid misincorporation, or to defects in co-translational protein folding. The proteins whose reduced level in a KO clone relative to WT was restored by the treatment with protease inhibitors are shown in Figure 5B , which demonstrates their consistent behavior across KO clones. Western blotting further confirmed these results (Figure 5C , D ). We further validated that the majority of the effect came from Bafilomycin A1, while MG132 contributed little, if at all, to rescuing the levels of proteins that are degraded in the RPL39L KO cells ( Supplementary Figure S5 ). Gene Ontology analysis revealed enrichments in categories associated with the cytoskeleton and microtubules, indicating that the proteins stabilized by the protease inhibitor treatment are components of cellular membranes, contributing to cytoskeletal organization ( Supplementary Figure S4 ). With a previously developed tool that specifically searches for amino acid substitutions in the measured peptides ( 49 ) we were unable to detect an enrichment of misincorporation in these proteins ( Supplementary Figure S4 ). Thus, the most plausible explanation for the observed proteome changes is that RPL39L ribosomes facilitate the co-translational folding of specific proteins, leading to the production of defective proteins with compromised stability in the RPL39L KO clones.

RPL39L KO clones exhibit enhanced degradation of specific classes of proteins. (A) Distribution of log2 fold changes in protein levels between untreated (blue) or MG-132 + Bafilomycin-A1-treated RPL39L KO and WT cells. Each panel corresponds to one KO clone. Three biological replicates for each condition were used to calculate average protein abundance levels and respective fold-changes relative to WT. (B) Heatmap of protein-level Δlog2 fold changes in KO cells relative to WT between untreated and treated cells. Included are all proteins with a significant downregulation in at least one of the untreated KO clones. (C) Representative western blot results showing the expression of a subset of proteins from (B) in untreated (left) and MG-132 + Bafilomycin-A1-treated cells (right). (D) Quantification of western blots as shown in (B), from three replicates for each protein and each condition.

RPL39L KO clones exhibit enhanced degradation of specific classes of proteins. ( A ) Distribution of log 2 fold changes in protein levels between untreated (blue) or MG-132 + Bafilomycin-A1-treated RPL39L KO and WT cells. Each panel corresponds to one KO clone. Three biological replicates for each condition were used to calculate average protein abundance levels and respective fold-changes relative to WT. ( B ) Heatmap of protein-level Δlog 2 fold changes in KO cells relative to WT between untreated and treated cells. Included are all proteins with a significant downregulation in at least one of the untreated KO clones. ( C ) Representative western blot results showing the expression of a subset of proteins from (B) in untreated (left) and MG-132 + Bafilomycin-A1-treated cells (right). ( D ) Quantification of western blots as shown in (B), from three replicates for each protein and each condition.

Conformational differences between RPL39/RPL39L ribosome peptide exit tunnels

To understand how RPL39L could influence the dynamics of translation and the co-translational folding of proteins we sought to analyze RPL39L and RPL39-containing ribosomes by cryo-electron microscopy (cryo-EM). Mouse RPL39L has only three substitutions relative to RPL39: S2A, R28Q and R36M. Since these differences are subtle, structural investigations of RPL39/RPL39L-ribosomes necessitate high-resolution structures, which can only be obtained from homogeneous (RPL39 or RPL39L only) samples. Obtaining pure populations of RPL39L-containing ribosomes has been an unsolved challenge ( 11 ), which we have decided to overcome by expressing the mouse RPL39 and RPL39L in yeast RPL39 KO cell lines (Figure 6A ). In yeast, just like in mammals, residues R28 and R36 of RPL39 face the lumen of the NPET and are in direct proximity to the nascent polypeptide chain.

RPL39L introduces a hydrophobic patch in the NPET. (A) Scheme of yeast RPL39 KO and insertion of mouse RPL39 and mouse RPL39L. (B) The RPL39 KO causes growth defects in yeast upon environmental challenges like growth at 23 degrees Celsius in HC-His media. Expression of mouse RPL39 or RPL39L rescues these phenotypes. +EV are cells transformed with an empty vector as control. The experiments were carried out in two independent RPL39 knockout clones (#1 and #2). (C) RPL39L (cartoon representation, orange) shown in context of the large ribosomal subunit (refined atomic model of rpl39Δ-MmRpl39l; rRNAs are shown in gray, proteins are shown in pale blue surface representation). RPL39L is embedded inside the 60S subunit, adjacent to the exit of the NPET. Part of the 5.8S rRNA and ribosomal proteins have been removed for clarity on the center and right panel. Nascent protein chains and regulatory protein complexes pass the NPET in direct proximity to RPL39L as evident from aligned structures containing NPET-bound chains (nascent chains and regulatory proteins, PDB-IDs: 6M62, 7OBR, 7TM3, 7TUT, 7QWQ, 7QWS, shown as purple semi-transparent surfaces). (D) Side view of RPL39L (orange) and the surrounding 5.8S rRNA (gray) in direct contact with the NPET-facing region of RPL39L. The region containing Q28 and M36 is located directly adjacent to protein chains localized in the lumen of the NPET in structures containing nascent chains or tunnel-bound regulatory complexes (protein chain models shown as purple semi-transparent surfaces). (E) Comparison of the atomic model in the immediate surrounding of R/Q28 and R/M36 in WT yeast RPL39 (cyan), mouse RPL39 (dark blue), and mouse RPL39L (orange), shown in stick representation. The experimental cryo-EM density (semi-transparent surface, light blue) is shown superimposed on the refined atomic model. Maps around R28 are shown at a threshold of 6.5σ (yRPL39), 5.5σ (mRPL39), and 4.25σ (mRPL39L), while maps around R/M36 at a threshold of 5σ (yRPL39), 4σ (mRPL39), and 3.75σ (mRPL39L). Experimental cryo-EM density around M36 in mRPL39L is substantially weaker than the density observed either in yeast or mouse RPL39, due to increased conformational heterogeneity. (F) At a lower threshold, an alternative conformation is apparent in the experimental cryo-EM density (blue surface) of the region around M36 in mRPL39L, as RPL39L adopts two alternative conformations that differ substantially relative to the protein backbone and side chains. (G) In both WT yRPL39 (atomic model, stick representation, cyan) and mRPL39 (dark blue), the side chain of R36 faces the lumen of the NPET, potentially in direct contact with the protein chains inside the NPET (fitted chains of nascent protein and regulatory complexes, shown as purple semi-transparent surfaces). In mRPL39L, side chains of M36 and I35 hydrophobic residues are facing the NPET chains, forming a hydrophobic spot inside the NPET.

RPL39L introduces a hydrophobic patch in the NPET. ( A ) Scheme of yeast RPL39 KO and insertion of mouse RPL39 and mouse RPL39L. ( B ) The RPL39 KO causes growth defects in yeast upon environmental challenges like growth at 23 degrees Celsius in HC-His media. Expression of mouse RPL39 or RPL39L rescues these phenotypes. +EV are cells transformed with an empty vector as control. The experiments were carried out in two independent RPL39 knockout clones (#1 and #2). ( C ) RPL39L (cartoon representation, orange) shown in context of the large ribosomal subunit (refined atomic model of rpl39Δ-MmRpl39l; rRNAs are shown in gray, proteins are shown in pale blue surface representation). RPL39L is embedded inside the 60S subunit, adjacent to the exit of the NPET. Part of the 5.8S rRNA and ribosomal proteins have been removed for clarity on the center and right panel. Nascent protein chains and regulatory protein complexes pass the NPET in direct proximity to RPL39L as evident from aligned structures containing NPET-bound chains (nascent chains and regulatory proteins, PDB-IDs: 6M62, 7OBR, 7TM3, 7TUT, 7QWQ, 7QWS, shown as purple semi-transparent surfaces). ( D ) Side view of RPL39L (orange) and the surrounding 5.8S rRNA (gray) in direct contact with the NPET-facing region of RPL39L. The region containing Q28 and M36 is located directly adjacent to protein chains localized in the lumen of the NPET in structures containing nascent chains or tunnel-bound regulatory complexes (protein chain models shown as purple semi-transparent surfaces). ( E ) Comparison of the atomic model in the immediate surrounding of R/Q28 and R/M36 in WT yeast RPL39 (cyan), mouse RPL39 (dark blue), and mouse RPL39L (orange), shown in stick representation. The experimental cryo-EM density (semi-transparent surface, light blue) is shown superimposed on the refined atomic model. Maps around R28 are shown at a threshold of 6.5σ (yRPL39), 5.5σ (mRPL39), and 4.25σ (mRPL39L), while maps around R/M36 at a threshold of 5σ (yRPL39), 4σ (mRPL39), and 3.75σ (mRPL39L). Experimental cryo-EM density around M36 in mRPL39L is substantially weaker than the density observed either in yeast or mouse RPL39, due to increased conformational heterogeneity. ( F ) At a lower threshold, an alternative conformation is apparent in the experimental cryo-EM density (blue surface) of the region around M36 in mRPL39L, as RPL39L adopts two alternative conformations that differ substantially relative to the protein backbone and side chains. ( G ) In both WT yRPL39 (atomic model, stick representation, cyan) and mRPL39 (dark blue), the side chain of R36 faces the lumen of the NPET, potentially in direct contact with the protein chains inside the NPET (fitted chains of nascent protein and regulatory complexes, shown as purple semi-transparent surfaces). In mRPL39L, side chains of M36 and I35 hydrophobic residues are facing the NPET chains, forming a hydrophobic spot inside the NPET.

Thus, the replacement of R28 by Q28 and R36 by M36 in RPL39L, that is, of positively-charged residues by polar and hydrophobic residues, could perturb the co-translational folding or localization of proteins. The high degree of structural conservation of the 60S ribosomal subunit core in general, and of the region directly surrounding RPL39 in particular (mouse: RMSD = 0.52 Å, human: RMSD = 0.25 Å) further justifies the use of a heterologous system ( 70 ).

Increased rate of translation error reduces the growth of RPL39 KO ( rpl39 Δ) yeast strains at cold temperatures and sensitizes the cells to translation-interfering drugs like paromomycin ( 71 ) and azetidine-2-carboxylic acid (AZC) ( 72 ). We replicated these phenotypes in two distinct yeast RPL39 KO clones and further showed that they are rescued by the introduction of yRPL39 (rpl39Δ-ScRPL39 clones), mouse RPL39 (rpl39Δ-MmRPL39 clones) and mouse RPL39L (rpl39Δ-MmRPL39L clones) (Figure 6B and Supplementary Figure S6 ). This indicates that both mRPL39 and mRPL39L occupy the place of yRPL39 in the NPET and have a similar capacity to ensure the translation accuracy ( 71 ) and promote co-translational folding ( 72 ) in yeast. The results also suggest RPL39L shares the ancestral function of RPL39, though it may have acquired additional functions following its emergence in mammalian species.

Ribosomal particles from the WT yeast strain Sc60S, the yeast RPL39 KO strain rpl39Δ, yeast rescue strain rpl39Δ-ScRPL39 and the mouse RPL39/RPL39L-complemented RPL39 KO strains rpl39Δ-MmRPL39 and rpl39Δ-MmRPL39L were studied by single-particle cryo-EM ( Supplementary Figures S7 , S8 ). The structure of the rpl39Δ ribosome demonstrated the absence of RPL39 in the KO strain, with an otherwise structurally unaltered 60S subunit (overall RMSD = 0.308 Å), including the region surrounding RPL 39 (RMSD = 0.38 Å) ( Supplementary Figure S8 ). Expression of yRPL39 in a RPL39Δ background resulted in a correctly assembled 60S subunit that did not differ significantly structurally from the wild type 60S (RMSD = 0.192 Å), corroborating the feasibility of the complementation approach ( Supplementary Figure S8 ). Mouse RPL39 and RPL39L both integrate into yeast 60S subunits when expressed in the rpl39Δ background ( Supplementary Figure S8 ). Mouse RPL39, adopted a structure that was highly similar to that of RPL39 (RMSD = 0.482 Å) in the WT yeast ribosomes ( Supplementary Figure S8 ). The structure of RPL39 in mouse (PDB: 6SWA, RMSD = 0.546 Å, ( 73 )) and in human ribosomes (PDB: 7OW7 ( 74 ), RMSD = 0.451 Å) is also in agreement with the structure of rpl39Δ-MmRPL39, suggesting that our heterologous system adequately reflects the structure and conformation of mammalian RPL39 ( Supplementary Figure S7 ). The comparison of rpl39Δ-MmRPL39 and rpl39Δ-MmRPL39L maps shows that the amino acids S2 in RPL39 and A2 in RPL39L are found at the same position, deeply embedded in rRNA 1490–1493, where they serve a structural role. The loop containing R28 and R36, exposed to the lumen of the NPET (Figure 6C – E ), exhibits weaker cryo-EM density than the remainder of the protein in the map of rpl39Δ-MmRPL39. The exchange of R28 and R36 with glutamine and methionine, respectively, in RPL39L further increases the flexibility (Figure 6D - E ) and enables the loop (residues 32–37) to adopt an alternative conformation where the Cɑ atom of I35 is displaced by 5.2 Å towards the exit of the NPET (Figure 6E - F ). The positions of the basic R28 side chain in RPL39 and polar Q28 side chain in RPL39L are nearly identical. While residues M29, K30, T31, and G32 adopt the same conformation in RPL39 and RPL39L, the side chain of N33 has a slight displacement of Cγ: 1.87 Å. The side chain of K34, which is located in direct proximity to the phosphate backbone of A351 in RPL39, protrudes into the lumen of the tunnel in RPL39L. However, a partial occlusion of the tunnel appears unlikely as the side-chain cryo-EM density for K34 is very weak. The position of I35, which is facing towards the wall of the tunnel (A351 and A42) in RPL39, is occupied by M36 in RPL39L, leaving I35 oriented towards the lumen of the tunnel in RPL39L in the alternative conformation (Figure 6G ). These observed structural changes do not substantially alter the overall electrostatic surface potential of the tunnel-exposed surface of RPL39/RPL39L, which is overwhelmingly positive in both cases. However, they do affect a narrow part of the ribosomal tunnel, traversed by the nascent chain (Figure 6D , E). Specifically, in RPL39L, Q28 and M36/I35 are found directly adjacent to each other at the location occupied by R28 and R36 in RPL39, introducing a clearly defined hydrophobic patch at the surface of the tunnel (Figure 6G ). This may influence the translation speed, efficiency or the co-translational folding, either via direct interaction with the nascent chain (Figure 6G ) or via interaction with translation regulation machinery such as the NAC complex, which was shown to bind to RPL39 ( 75 ).

Altered translation dynamics of RPL39L KO-destabilized proteins

To determine how these conformational changes in the NPET may impact co-translational processes, we returned to the ribosome footprinting data, as such data were previously used to investigate multiple aspects of protein synthesis, including co-translational folding (reviewed in ( 76 )). We estimated the change in ribosome dwell time on individual codons between WT and RPL39L KO clones as described before ( 77 ), constraining for the presence of a specific amino acid at a specific distance from the P site. We tested all 20 amino acids and all distances between 20 and 45 amino acids ( Supplementary Figure S9 ). To identify consistent patterns of codon dwell time changes in independent Rpl39l KO clones, we calculated the cosine similarity scores between dwell time changes with respect to WT for pairs of clones and then the average across all pairwise comparisons. The highest average score, indicating the most consistent change in dwell times across Rpl39l KO clones, came from the amino acid leucine, when located at position +31 with respect to the P site. This is illustrated in Figure 7A , which shows that the presence of a leucine amino acid at position +31 leads to longer dwell times for leucine codons and shorter dwell time for the CAC codon of histidine. Histidine and leucine are amino acids that are enriched in helix-turn-helix, zinc finger, leucine zipper coiled-coil motifs, which occur in proteins of diverse functions ( 78–80 ). 7 of the 39 (18%) proteins that were consistently destabilized in the KO lines contained coiled-coils, which is ∼2-fold higher than the ∼8% estimated frequency of coiled-coil domain-containing proteins in the human proteome ( 81 ). We also estimated the proportion of amino acids involved in coiled-coiled domains as a function of their relative location in the sequence and then compared this fraction between proteins that were consistently destabilized in the KO lines and those that did not change. Mann–Whitney U test showed a significantly higher fraction of coiled-coil structure in the C-terminal half of presumed RPL39L targets compared to non-targets (Figure 7B ). Interestingly, in the Human Proteome Atlas, 6 of the seven proteins (SMC3, KIF23, KIF5B, KIF20A, DLGAP5, BCAP31) have spermatocytes and/or trophoblast as the tissues of highest expression (Figure 7C ). These are the tissues where RPL39L expression is also highest. Along with the ribosome profiling and protein stability data, these data suggest a role of RPL39L ribosomes in the co-translational folding of a subclass of proteins.

Altered codon dwell times in RPL39l KO relative to WT clones. (A) Change in codon dwell time in the RPL39L KO clones relative to WT, conditioned on the presence of leucine at position +31 in the NPET. For each amino acid the 4 columns correspond to the 4 independent clones, shown always in the same order: 1_17, 1_20, 2_9 and 2_11. (B) Location-dependent P-values in the Mann–Whitney U test comparing the fractions of amino acids in coiled-coil structures within 100 length bins in presumed Rpl39l (39) targets and non-target proteins (4720). (C) Expression level (TPM) of mRNAs encoding proteins with coiled-coil domains that are destabilized in the RPL39L KO clones. Shown are values obtained by scRNA-seq of various cell types in the Human Protein Atlas. The gene names are indicated at the top of each panel and the cell types are labeled on the x-axis.

Altered codon dwell times in RPL39l KO relative to WT clones. ( A ) Change in codon dwell time in the RPL39L KO clones relative to WT, conditioned on the presence of leucine at position +31 in the NPET. For each amino acid the 4 columns correspond to the 4 independent clones, shown always in the same order: 1_17, 1_20, 2_9 and 2_11. ( B ) Location-dependent P -values in the Mann–Whitney U test comparing the fractions of amino acids in coiled-coil structures within 100 length bins in presumed Rpl39l ( 39 ) targets and non-target proteins (4720). ( C ) Expression level (TPM) of mRNAs encoding proteins with coiled-coil domains that are destabilized in the RPL39L KO clones. Shown are values obtained by scRNA-seq of various cell types in the Human Protein Atlas. The gene names are indicated at the top of each panel and the cell types are labeled on the x-axis.

mRNA translation is carried out by the ribosome, a highly conserved molecular machine dating back to the last universal common ancestor of cellular life ( 82 ). Although a ribosome can, in principle, translate any of the mRNAs expressed in a cell, the protein output varies among mRNAs, dependent on the efficiency of translation initiation, the speed of peptide chain elongation ( 83 , 84 ), and factors such as the total number of ribosomes in the cell ( 85–87 ). In recent years, a variety of factors have been found to modify the protein output of mRNAs at the level of ribosomes, among which ribosomal protein paralogs (reviewed in ( 88 )). The human and mouse genomes encode around 20 RP paralogs, which have a more restricted distribution across tissues than canonical RPs ( 2 ). RPL39L is a recently evolved RP paralog ( 1 ) recently found to be critical to male fertility ( 11 , 64 ). The RPL39L mRNA is also expressed outside of the male germ line, in normal and cancer cells ( 2 , 12 , 13 ), where its role is still unknown. Furthermore, while RPL39L has been implicated in the folding of long-lived sperm proteins ( 11 , 64 ), how this is achieved is unclear.

To answer these questions, we generated RPL39L mESC KO lines by CRISPR-mediated genome editing. ESCs naturally express RPL39L , as we have found in analyses of bulk and scRNA-seq datasets. To confirm RPL39L expression at the protein level, we developed a mass spectrometric-based approach wherein lysine residues are acetylated proteome-wide to direct the tryptic cleavage to arginine residues. With its high content of lysines and arginines (31% of 51 amino acids), RPL39L is almost completely digested by trypsin during standard sample preparation for mass spectrometry, leaving only incompletely cleaved peptides amenable for detection. This may explain why prior mass spectrometric analyses have not detected this protein in tissues with relatively low frequency of RPL39L-expressing cells. If RPL39 and RPL39L carry their functions as part of ribosomes, as has been suggested before ( 11 , 89 ), the RPL39L/RPL39 ratios that we estimated imply that ∼1% of the ribosomes in pluripotent cells contain RPL39L. Indeed, mass spectrometry data of ribosomes purified by sucrose cushioning corroborates this result. Moreover, single cells that contain RPL39L reads also contain RPL39 reads, which indicates that pluripotent cells have heterogeneous RPL39/RPL39L ribosome populations. The relatively high expression of RPL39L mRNA in extravillous trophoblast, breast glandular and pancreatic endocrine cells from the Human Protein Atlas, as well as multiple cancer types, suggested that RPL39L may support protein synthesis in the context of secretory processes.

By morphological and molecular characterization of KO cell lines, we demonstrated that RPL39L contributes to mESC pluripotency and differentiation along multiple lineages. Like RPL10L, RPL39L is a recently evolved autosomal chromosome-encoded paralog of an X chromosome-encoded RP. These paralogs are thought to take part in the formation of testis-specific ribosomes, necessary for spermatogenesis ( 89 ). Indeed, RPL10L has been shown to compensate for the reduced level of RPL10 during meiotic sex chromosome inactivation ( 10 ), while mice deficient in RPL39L exhibit spermatogenesis defects ( 11 , 64 ). We found that RPL39L KO mESCs are not only impaired in the ability to differentiate into sperm cells but also show defects in spontaneous differentiation. The expression of ectoderm markers was increased and of endoderm markers decreased in embryoid bodies derived from RPL39L KO mESCs compared to those derived from WT cells. Whether these defects that we observed in vitro can also be detected in the early mouse development has not been investigated ( 11 , 64 ).

To evaluate the impact of RPL39L on translation, we carried out ribosome profiling ( 34 ). Consistent with previous studies ( 11 , 64 ), we found a small upregulation in the polysome-to-monosome ratio in the KO compared to WT clones, which indicates an increased protein synthesis rate. At the level of individual mRNAs, the changes in translation efficiency were very consistent across KO clones and occurred in mRNAs with specific functions. ER and Golgi-associated membrane proteins were generally less translated, while stress response and oxidative damage-related proteins were more translated in the RPL39L KO compared to WT mESCs. The increased phosphorylation of PERK and EIF2A further indicated that the stress was due to the ER-associated unfolded protein response (UPR) ( 90 ), consistent with an apparent increase in protein degradation that we also consistently found across all RPL39L KO clones. Pluripotent cells have higher resistance to ER stress than differentiated cells. However, the normal differentiation along the endodermal and ectodermal lineages requires UPR to be maintained at a specific level ( 91 ), which may explain why differentiation is perturbed as the mESCs experienced increased ER stress upon RPL39L KO.

We demonstrated that the reduced protein levels in RPL39L KO relative to WT mESCs were restored by the inhibition of autophagy, showing that the protein stability is impaired in RPL39L KO cells, as also demonstrated by previous works ( 11 ). Some of the destabilized proteins have a testis bias of expression (according to the mRNA level data in the HPA), but the set more generally included proteins that are involved in cytoskeleton organization, cellular localization (specifically to membranes) and polarization. As we did not find an increased frequency of amino acid misincorporation in the KO clones, we investigated whether the decreased protein stability is due to alterations in other co-translational processes such as folding. RPL39 is known to interact with transmembrane alpha helices that fold in the NPET, but not with the alpha helices of secretory proteins ( 92 ). To elucidate the origin of these interactions, we turned to structural analyses. Obtaining pure populations of RPL39L ribosomes is very challenging and has not been achieved so far. Thus, to determine how RPL39/RPL39L impact the NPET structure we overexpressed the mouse RPL39 and RPL39L in RPL39 KO yeast strains. The RPL39 KO yeast lines are sensitive to cold, paromomycin and AZT, phenotypes that have been described before ( 71 , 72 ). These phenotypes are rescued by both mouse RPL39 and mouse RPL39L, which indicates that RPL39L can perform the basic functions of RPL39. Analysis of single ribosome particles from these RPL39/RPL39L-expressing strains revealed that the main structural difference is a conformational change that involves the conserved residue I35, and leads to the appearance of a hydrophobic patch in the vestibular region of NPET, in the position where RPL39-containing ribosomes expose a positive charge. The loop containing Q28 and M36 further shows a higher flexibility compared to that provided by RPL39. Notably, we did not find electron densities supporting an occlusion of the tunnel region by RPL39L as proposed by a previous study ( 11 ). In fact, we did not find such evidence upon re-examining the published data either. Thus, it seems unlikely that RPL39L promotes the folding of specific proteins by occluding the NPET. What other mechanism could be responsible? Early studies hypothesized that a hydrophobic patch near the vestibular region of the NPET nucleates the co-translational folding of alpha helices ( 15 ). While the first structures of the 60S subunit of an eubacterial ribosome ( 93 ) did not identify such a patch ( 94 ), in prokaryotes, the uL23 RP, located at the vestibular region of the NPET recruits the trigger factor a (TFa) ( 95 ) to provide the hydrophobic surface for folding of cytosolic hydrophobic proteins ( 96 ). In eukaryotes, RPL39 replaces a loop of uL23 in the lower part of the tunnel, making it considerably narrower ( 97 ), which is thought to promote the entropic stabilization of alpha helices ( 98 ), and probably reducing the need for the hydrophobic patch to promote nascent protein folding. However, a narrower tunnel also leads to a reduced speed and increased accuracy of translation, both of which are impacted by the deletion of RPL39 ( 71 ). It is remarkable that the distinguishing feature of RPL39L ribosomes is the re-emergence of the hydrophobic patch, provided by RPL39L instead of the TFa chaperone, as chaperone function has diversified during the evolution of eukaryotes ( 99 ).

Although our study was done in mESCs, we found that the RPL39L-sensitive proteins have a testis bias of expression. To further understand the properties of these RPL39L-sensitive proteins, we further analyzed the dynamics of translation by calculating ribosome dwell time on individual codons and on codons encoding individual amino acids. We further carried out this analysis conditioning the calculation on the presence of specific amino acids at specific distances upstream of the P-site of the ribosome. We found between clones-consistent alterations in codon dwell time relative to WT particularly when leucine was present at position + 31. In this case (though also more generally) dwell times increased for leucine codons and decreased for the CAC codon of histidine in RPL39L KO cells compared to WT. These amino acids occur in helix-turn-helix, zinc finger, and leucine zipper coiled coil motifs ( 100 ), which are found in amphiphilic alpha helices ( 101 , 102 ). The mRNA level data from HPA further shows that proteins with coiled-coil domains that are destabilized in the RPL39L KO mESCs exhibit an expression pattern surprisingly similar to that of RPL39L , in that they have highest mRNA-level expression in the spermatogenic lineage and/or trophoblast. Although amphiphilic alpha helices can fold in RPL39-containing ribosomes, the presence of a hydrophobic patch and better hydrophilic/hydrophobic partitioning may allow a more reliable folding of such domains, important when the burden of protein production is high, as in secretory cells (glandular, endocrine etc.) or cells that undergo large changes in polarity (developing sperm cells, ES cells etc.) ( 103 , 104 ).

Our work elucidates the function of a specialized ribosome, defined by RPL39L and present in sperm cells, but also other cell types such as mESCs. However, additional questions remain to be addressed in the future. First, the implications of RPL39L KO on the in vivo animal development are insufficiently understood. As already mentioned, RPL39L-deficient mice obtained via CRISPR/Cas9 genome editing ( 11 , 64 ) were not investigated beyond the spermatogenesis defect. At the molecular level, it will be interesting to investigate further how ES cells and cancer cells fine-tune their translation output and the quality of synthesized proteins using modular ribosome components.

Sequencing data has been deposited to the NCBI BioProject database ( https://www.ncbi.nlm.nih.gov/bioproject/ ), under accession PRJNA951511, and the mass spectrometry to the MassIVE repository ( https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp ), accession number MSV000091698. Processing scripts are available from the zenodo repository, with DOI 10.5281/zenodo.7794007. Cryo-EM data has been deposited: Mouse RPL39L integrated into the yeast 60S ribosomal subunit (PDB ID 8PFR) and EMDB entry ID EMD-17653; Mouse RPL39 integrated into the yeast 60S ribosomal subunit (PDB ID 8P8N) and EMDB entry ID EMD-17550; Yeast 60S ribosomal subunit, RPL39 deletion (PDB ID 8P8M) and EMDB entry ID EMD-17549; Yeast 60S ribosomal subunit (PDB ID 8P8U) and EMDB entry ID EMD-17552.

Supplementary Data are available at NAR Online.

We are grateful to Dr Constance Ciaudo for suggestions regarding mESC differentiation, to Dr Arnaud Schieberich for the hBM-MSC cells, Dr Frederic Schmitt for the mouse samples, Dr. Sebastian Hiller and Dr Nenad Ban for insightful discussions on the ribosome structure, and Dr Vinay Tergaonkar for suggestions on the project. We thank the Imaging Core Facility at Biozentrum (IMCF), especially Laurent Guerard, Dr Sara Roig and Dr Kai Schleicher for help with the image acquisition and data analysis, and the sciCORE team for supporting the computational infrastructure. We would also like to thank Stella Stefanova and Janine Boegli of Biozentrum FACS core facility for help with the FACS data acquisition, and Philippe Demougin and Dr Christian Beisel from the genomics facility Basel for help with RNA sequencing. Cryo-EM data were collected in the Scientific Center for Optical and Electron Microscopy (ScopeM). We thank Miroslav Peterek (ScopeM) for support and Pavel Afanasyev (CEMK) for helpful discussions. This work was supported in part by the SNF grant #51NF40_141735 for the NCCR project ‘RNA & Disease’ in which the Zavolan group participated. The results shown here are in part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga . Graphical abstract was created with Biorender.com.

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung [51NF40_141735]. Funding for open access charge: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung.

Conflict of interest statement . None declared.

Wong Q.W.-L. , Li J. , Ng S.R. , Lim S.G. , Yang H. , Vardy L.A. RPL39L is an example of a recently evolved ribosomal protein paralog that shows highly specific tissue expression patterns and is upregulated in ESCs and HCC tumors . RNA Biol. 2014 ; 11 : 33 – 41 .

Google Scholar

Guimaraes J.C. , Zavolan M. Patterns of ribosomal protein expression specify normal and malignant human cells . Genome Biol. 2016 ; 17 : 236 .

Xiong X. , Zhao Y. , He H. , Sun Y. Ribosomal protein S27-like and S27 interplay with p53-MDM2 axis as a target, a substrate and a regulator . Oncogene . 2011 ; 30 : 1798 – 1811 .

Zhang Y. , O’Leary M.N. , Peri S. , Wang M. , Zha J. , Melov S. , Kappes D.J. , Feng Q. , Rhodes J. , Amieux P.S. et al. . Ribosomal proteins Rpl22 and Rpl22l1 control morphogenesis by regulating pre-mRNA splicing . Cell Rep. 2017 ; 18 : 545 – 556 .

Chaillou T. , Zhang X. , McCarthy J.J. Expression of muscle-specific ribosomal protein L3-like impairs myotube growth . J. Cell. Physiol. 2016 ; 231 : 1894 – 1902 .

Shiraishi C. , Matsumoto A. , Ichihara K. , Yamamoto T. , Yokoyama T. , Mizoo T. , Hatano A. , Matsumoto M. , Tanaka Y. , Matsuura-Suzuki E. et al. . RPL3L-containing ribosomes determine translation elongation dynamics required for cardiac function . Nat. Commun. 2023 ; 14 : 2131 .

Milenkovic I. , Santos Vieira H.G. , Lucas M.C. , Ruiz-Orera J. , Patone G. , Kesteven S. , Wu J. , Feneley M. , Espadas G. , Sabidó E. et al. . Dynamic interplay between RPL3- and RPL3L-containing ribosomes modulates mitochondrial activity in the mammalian heart . Nucleic Acids Res. 2023 ; 51 : 5301 – 5324 .

Uechi T. , Maeda N. , Tanaka T. , Kenmochi N. Functional second genes generated by retrotransposition of the X-linked ribosomal protein genes . Nucleic Acids Res. 2002 ; 30 : 5369 – 5375 .

Sugihara Y. , Sadohara E. , Yonezawa K. , Kugo M. , Oshima K. , Matsuda T. , Nadano D. Identification and expression of an autosomal paralogue of ribosomal protein S4, X-linked, in mice: potential involvement of testis-specific ribosomal proteins in translation and spermatogenesis . Gene . 2013 ; 521 : 91 – 99 .

Jiang L. , Li T. , Zhang X. , Zhang B. , Yu C. , Li Y. , Fan S. , Jiang X. , Khan T. , Hao Q. et al. . RPL10L Is required for male meiotic division by compensating for RPL10 during meiotic sex chromosome inactivation in mice . Curr. Biol. 2017 ; 27 : 1498 – 1505 .

Li H. , Huo Y. , He X. , Yao L. , Zhang H. , Cui Y. , Xiao H. , Xie W. , Zhang D. , Wang Y. et al. . A male germ-cell-specific ribosome controls male fertility . Nature . 2022 ; 612 : 725 – 731 .

Rohozinski J. , Anderson M.L. , Broaddus R.E. , Edwards C.L. , Bishop C.E. Spermatogenesis associated retrogenes are expressed in the human ovary and ovarian cancers . PLoS One . 2009 ; 4 : e5064 .

Yan P. , Yang X. , Wang J. , Wang S. , Ren H. A novel CpG island methylation panel predicts survival in lung adenocarcinomas . Oncol. Lett. 2019 ; 18 : 1011 – 1022 .

PCAWG Transcriptome Core Group Calabrese C. , Davidson N.R. , Demircioğlu D. , Fonseca N.A. , He Y. , Kahles A. , Lehmann K.-V. , Liu F. , Shiraishi Y. et al. . Genomic basis for RNA alterations in cancer . Nature . 2020 ; 578 : 129 – 136 .

Liao S. , Lin J. , Do H. , Johnson A.E. Both lumenal and cytosolic gating of the aqueous ER translocon pore are regulated from inside the ribosome during membrane protein integration . Cell . 1997 ; 90 : 31 – 41 .

Dobin A. , Davis C.A. , Schlesinger F. , Drenkow J. , Zaleski C. , Jha S. , Batut P. , Chaisson M. , Gingeras T.R. STAR: ultrafast universal RNA-seq aligner . Bioinformatics . 2013 ; 29 : 15 – 21 .

Yan L. , Yang M. , Guo H. , Yang L. , Wu J. , Li R. , Liu P. , Lian Y. , Zheng X. , Yan J. et al. . Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells . Nat. Struct. Mol. Biol. 2013 ; 20 : 1131 – 1139 .

Uhlén M. , Fagerberg L. , Hallström B.M. , Lindskog C. , Oksvold P. , Mardinoglu A. , Sivertsson Å. , Kampf C. , Sjöstedt E. , Asplund A. et al. . Proteomics. Tissue-based map of the human proteome . Science . 2015 ; 347 : 1260419 .

Kater L. , Thoms M. , Barrio-Garcia C. , Cheng J. , Ismail S. , Ahmed Y.L. , Bange G. , Kressler D. , Berninghausen O. , Sinning I. et al. . Visualizing the assembly pathway of nucleolar pre-60S ribosomes . Cell . 2017 ; 171 : 1599 – 1610 .

de la Cruz J. , Karbstein K. , Woolford J.L. Jr Functions of ribosomal proteins in assembly of eukaryotic ribosomes in vivo . Annu. Rev. Biochem. 2015 ; 84 : 93 – 129 .

Razi A. , Ortega J. Ribosomal proteins: their role in the assembly, structure and function of the ribosome . eLS . 2017 ; John Wiley & Sons, Ltd https://doi.org/10.1002/9780470015902.a0000535.pub2 .

Google Preview

Hu Y. , Flockhart I. , Vinayagam A. , Bergwitz C. , Berger B. , Perrimon N. , Mohr S.E. An integrative approach to ortholog prediction for disease-focused and other functional studies . BMC Bioinf. 2011 ; 12 : 357 .

Pacini C. , Dempster J.M. , Boyle I. , Gonçalves E. , Najgebauer H. , Karakoc E. , van der Meer D. , Barthorpe A. , Lightfoot H. , Jaaks P. et al. . Integrated cross-study datasets of genetic dependencies in cancer . Nat. Commun. 2021 ; 12 : 1661 .

McFarland J.M. , Ho Z.V. , Kugener G. , Dempster J.M. , Montgomery P.G. , Bryan J.G. , Krill-Burger J.M. , Green T.M. , Vazquez F. , Boehm J.S. et al. . Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration . Nat. Commun. 2018 ; 9 : 4610 .

Frankish A. , Diekhans M. , Jungreis I. , Lagarde J. , Loveland J.E. , Mudge J.M. , Sisu C. , Wright J.C. , Armstrong J. , Barnes I. et al. . Gencode 2021 . Nucleic Acids Res. 2021 ; 49 : D916 – D923 .

Graubert A. , Aguet F. , Ravi A. , Ardlie K.G. , Getz G. RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts . Bioinformatics . 2021 ; 37 : 3048 – 3050 .

Zhang Z. , Harrison P. , Gerstein M. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome . Genome Res. 2002 ; 12 : 1466 – 1482 .

Pertea M. , Pertea G.M. , Antonescu C.M. , Chang T.-C. , Mendell J.T. , Salzberg S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads . Nat. Biotechnol. 2015 ; 33 : 290 – 295 .

Peterson A.C. , Russell J.D. , Bailey D.J. , Westphall M.S. , Coon J.J. Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics . Mol. Cell. Proteomics . 2012 ; 11 : 1475 – 1488 .

Gallien S. , Duriez E. , Crone C. , Kellmann M. , Moehring T. , Domon B. Targeted proteomic quantification on quadrupole-orbitrap mass spectrometer . Mol. Cell. Proteomics . 2012 ; 11 : 1709 – 1723 .

MacLean B. , Tomazela D.M. , Shulman N. , Chambers M. , Finney G.L. , Frewen B. , Kern R. , Tabb D.L. , Liebler D.C. , MacCoss M.J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments . Bioinformatics . 2010 ; 26 : 966 – 968 .

Kurosawa H. Methods for inducing embryoid body formation: in vitro differentiation system of embryonic stem cells . J. Biosci. Bioeng. 2007 ; 103 : 389 – 398 .

Kerkis A. , Fonseca S.A.S. , Serafim R.C. , Lavagnolli T.M.C. , Abdelmassih S. , Abdelmassih R. , Kerkis I. In vitro differentiation of male mouse embryonic stem cells into both presumptive sperm cells and oocytes . Cloning Stem Cells . 2007 ; 9 : 535 – 548 .

Ingolia N.T. , Brar G.A. , Rouskin S. , McGeachy A.M. , Weissman J.S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments . Nat. Protoc. 2012 ; 7 : 1534 – 1550 .

Hornstein N. , Torres D. , Das Sharma S. , Tang G. , Canoll P. , Sims P.A. Ligation-free ribosome profiling of cell type-specific translation in the brain . Genome Biol. 2016 ; 17 : 149 .

Hoffmann S. , Otto C. , Kurtz S. , Sharma C.M. , Khaitovich P. , Vogel J. , Stadler P.F. , Hackermüller J. Fast mapping of short sequences with mismatches, insertions and deletions using index structures . PLoS Comput. Biol. 2009 ; 5 : e1000502 .

Katsantoni M. , Gypas F. , Herrmann C.J. , Burri D. , Bak M. , Iborra P. , Agarwal K. , Ataman M. , Börsch A. , Zavolan M. et al. . ZARP: an automated workflow for processing of RNA-seq data . F1000Research . (2024) ; 13 : 533 .

Bray N.L. , Pimentel H. , Melsted P. , Pachter L. Near-optimal probabilistic RNA-seq quantification . Nat. Biotechnol. 2016 ; 34 : 525 – 527 .

Love M.I. , Huber W. , Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 . Genome Biol. 2014 ; 15 : 550 .

Chothani S. , Adami E. , Ouyang J.F. , Viswanathan S. , Hubner N. , Cook S.A. , Schafer S. , Rackham O.J.L. deltaTE: detection of translationally regulated genes by integrative analysis of ribo-seq and RNA-seq data . Curr. Protoc. Mol. Biol. 2019 ; 129 : e108 .

Liao Y. , Smyth G.K. , Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads . Nucleic Acids Res. 2019 ; 47 : e47 .

Yu G. , Wang L.-G. , Han Y. , He Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters . Omics . 2012 ; 16 : 284 – 287 .

Gu Z. , Eils R. , Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data . Bioinformatics . 2016 ; 32 : 2847 – 2849 .

Gu Z. , Gu L. , Eils R. , Schlesner M. , Brors B. circlize implements and enhances circular visualization in R . Bioinformatics . 2014 ; 30 : 2811 – 2812 .

Heberle H. , Meirelles G.V. , da Silva F.R. , Telles G.P. , Minghim R. InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams . BMC Bioinf. 2015 ; 16 : 169 .

UniProt Consortium UniProt: the Universal Protein knowledgebase in 2023 . Nucleic Acids Res. 2023 ; 51 : D523 – D531 .

Hastie T. , Tibshirani R. , Narasimhan B. , Chu G 2017 ; Impute bioconductor .

Ritchie M.E. , Phipson B. , Wu D. , Hu Y. , Law C.W. , Shi W. , Smyth G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies . Nucleic Acids Res. 2015 ; 43 : e47 .

Mordret E. , Dahan O. , Asraf O. , Rak R. , Yehonadav A. , Barnabas G.D. , Cox J. , Geiger T. , Lindner A.B. , Pilpel Y. Systematic detection of amino acid substitutions in proteomes reveals mechanistic basis of ribosome errors and selection for translation fidelity . Mol. Cell . 2019 ; 75 : 427 – 441 .

Tyanova S. , Temu T. , Cox J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics . Nat. Protoc. 2016 ; 11 : 2301 – 2319 .

Schindelin J. , Arganda-Carreras I. , Frise E. , Kaynig V. , Longair M. , Pietzsch T. , Preibisch S. , Rueden C. , Saalfeld S. , Schmid B. et al. . Fiji: an open-source platform for biological-image analysis . Nat. Methods . 2012 ; 9 : 676 – 682 .

Burel J.-M. , Besson S. , Blackburn C. , Carroll M. , Ferguson R.K. , Flynn H. , Gillen K. , Leigh R. , Li S. , Lindner D. et al. . Publishing and sharing multi-dimensional image data with OMERO . Mamm. Genome . 2015 ; 26 : 441 – 447 .

Ershov D. , Phan M.-S. , Pylvänäinen J.W. , Rigaud S.U. , Le Blanc L. , Charles-Orszag A. , Conway J.R.W. , Laine R.F. , Roy N.H. , Bonazzi D. et al. . TrackMate 7: integrating state-of-the-art segmentation algorithms into tracking pipelines . Nat. Methods . 2022 ; 19 : 829 – 832 .

Zhao Y. , Fu C. , Zhang W. , Ye C. , Wang Z. , Ma H.-F. Automatic segmentation of cervical cells based on star-convex polygons in pap smear images . Bioengineering (Basel) . 2022 ; 10 : 47 .

Ollion J. , Cochennec J. , Loll F. , Escudé C. , Boudier T. TANGO: a generic tool for high-throughput 3D image analysis for studying nuclear organization . Bioinformatics . 2013 ; 29 : 1840 – 1841 .

Rabl J. , Leibundgut M. , Ataide S.F. , Haag A. , Ban N. Crystal structure of the eukaryotic 40S ribosomal subunit in complex with initiation factor 1 . Science . 2011 ; 331 : 730 – 736 .

Punjani A. , Rubinstein J.L. , Fleet D.J. , Brubaker M.A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination . Nat. Methods . 2017 ; 14 : 290 – 296 .

Loveland A.B. , Svidritskiy E. , Susorov D. , Lee S. , Park A. , Zvornicanin S. , Demo G. , Gao F.-B. , Korostelev A.A. Ribosome inhibition by C9ORF72-ALS/FTD-associated poly-PR and poly-GR proteins revealed by cryo-EM . Nat. Commun. 2022 ; 13 : 2776 .

Pettersen E.F. , Goddard T.D. , Huang C.C. , Couch G.S. , Greenblatt D.M. , Meng E.C. , Ferrin T.E. UCSF Chimera–a visualization system for exploratory research and analysis . J. Comput. Chem. 2004 ; 25 : 1605 – 1612 .

Emsley P. , Cowtan K. Coot: model-building tools for molecular graphics . Acta Crystallogr. D Biol. Crystallogr. 2004 ; 60 : 2126 – 2132 .

Liebschner D. , Afonine P.V. , Baker M.L. , Bunkóczi G. , Chen V.B. , Croll T.I. , Hintze B. , Hung L.W. , Jain S. , McCoy A.J. et al. . Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix . Acta Crystallogr D Struct. Biol. 2019 ; 75 : 861 – 877 .

Afonine P.V. , Poon B.K. , Read R.J. , Sobolev O.V. , Terwilliger T.C. , Urzhumtsev A. , Adams P.D. Real-space refinement in PHENIX for cryo-EM and crystallography . Acta Crystallogr D Struct. Biol. 2018 ; 74 : 531 – 544 .

Chothani S.P. , Adami E. , Widjaja A.A. , Langley S.R. , Viswanathan S. , Pua C.J. , Zhihao N.T. , Harmston N. , D’Agostino G. , Whiffin N. et al. . A high-resolution map of human RNA translation . Mol. Cell . 2022 ; 82 : 2885 – 2899 .

Zou Q. , Yang L. , Shi R. , Qi Y. , Zhang X. , Qi H. Proteostasis regulated by testis-specific ribosomal protein RPL39L maintains mouse spermatogenesis . iScience . 2021 ; 24 : 103396 .

Ngondo R.P. , Cirera-Salinas D. , Yu J. , Wischnewski H. , Bodak M. , Vandormael-Pournin S. , Geiselmann A. , Wettstein R. , Luitz J. , Cohen-Tannoudji M. et al. . Argonaute 2 is required for extra-embryonic endoderm differentiation of mouse embryonic stem cells . Stem Cell Rep. 2018 ; 10 : 461 – 476 .

Pospísek M. , Valásek L. Polysome profile analysis–yeast . Methods Enzymol. 2013 ; 530 : 173 – 181 .

Wu C.C.-C. , Zinshteyn B. , Wehner K.A. , Green R. High-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress . Mol. Cell . 2019 ; 73 : 959 – 970 .

Jäger R. , Bertrand M.J.M. , Gorman A.M. , Vandenabeele P. , Samali A. The unfolded protein response at the crossroads of cellular life and death during endoplasmic reticulum stress . Biol. Cell . 2012 ; 104 : 259 – 270 .

Harding H.P. , Zhang Y. , Ron D. Protein translation and folding are coupled by an endoplasmic-reticulum-resident kinase . Nature . 1999 ; 397 : 271 – 274 .

Fleming G. , Belhumeur P. , Skup D. , Fried H.M. Functional substitution of mouse ribosomal protein L27’ for yeast ribosomal protein L29 in yeast ribosomes . Proc. Natl. Acad. Sci. U.S.A. 1989 ; 86 : 217 – 221 .

Dresios J. , Derkatch I.L. , Liebman S.W. , Synetos D. Yeast ribosomal protein L24 affects the kinetics of protein synthesis and ribosomal protein L39 improves translational accuracy, while mutants lacking both remain viable . Biochemistry . 2000 ; 39 : 7236 – 7244 .

Micic J. , Rodríguez-Galán O. , Babiano R. , Fitzgerald F. , Fernández-Fernández J. , Zhang Y. , Gao N. , Woolford J.L. , de la Cruz J. Ribosomal protein eL39 is important for maturation of the nascent polypeptide exit tunnel and proper protein folding during translation . Nucleic Acids Res. 2022 ; 50 : 6453 – 6473 .

Kraushar M.L. , Krupp F. , Harnett D. , Turko P. , Ambrozkiewicz M.C. , Sprink T. , Imami K. , Günnigmann M. , Zinnall U. , Vieira-Vieira C.H. et al. . Protein synthesis in the developing neocortex at near-atomic resolution reveals Ebp1-mediated neuronal proteostasis at the 60S tunnel exit . Mol. Cell . 2021 ; 81 : 304 – 322 .

Faille A. , Warren A.J. EIF6-bound large subunit of the human ribosome . 2022 ; https://doi.org/10.2210/pdb7ow7/pdb .

Gamerdinger M. , Kobayashi K. , Wallisch A. , Kreft S.G. , Sailer C. , Schlömer R. , Sachs N. , Jomaa A. , Stengel F. , Ban N. et al. . Early scanning of nascent polypeptides inside the ribosomal tunnel by NAC . Mol. Cell . 2019 ; 75 : 996 – 1006 .

Collart M.A. , Weiss B. Ribosome pausing, a dangerous necessity for co-translational events . Nucleic Acids Res. 2020 ; 48 : 1043 – 1055 .

Legrand C. , Tuorto F. RiboVIEW: a computational framework for visualization, quality control and statistical analysis of ribosome profiling data . Nucleic Acids Res. 2020 ; 48 : e7 .

Ng A. , Xavier R.J. Leucine-rich repeat (LRR) proteins: integrators of pattern recognition and signaling in immunity . Autophagy . 2011 ; 7 : 1082 – 1084 .

Liu X. , Vrana K.E. Leucine zippers and coiled-coils in the aromatic amino acid hydroxylases . Neurochem. Int. 1991 ; 18 : 27 – 31 .

Matsushima N. , Kretsinger R.H. Numerous variants of leucine rich repeats in proteins from nucleo-cytoplasmic large DNA viruses . Gene . 2022 ; 817 : 146156 .

Rose A. , Schraegle S.J. , Stahlberg E.A. , Meier I. Coiled-coil protein composition of 22 proteomes–differences and common themes in subcellular infrastructure and traffic control . BMC Evol. Biol. 2005 ; 5 : 66 .

Mushegian A. Gene content of LUCA, the last universal common ancestor . Front. Biosci. 2008 ; 13 : 4657 – 4666 .

Schwanhäusser B. , Busse D. , Li N. , Dittmar G. , Schuchhardt J. , Wolf J. , Chen W. , Selbach M. Global quantification of mammalian gene expression control . Nature . 2011 ; 473 : 337 – 342 .

Riba A. , Di Nanni N. , Mittal N. , Arhné E. , Schmidt A. , Zavolan M. Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates . Proc. Natl. Acad. Sci. U.S.A. 2019 ; 116 : 15023 – 15032 .

Lodish H.F. Model for the regulation of mRNA translation applied to haemoglobin synthesis . Nature . 1974 ; 251 : 385 – 388 .

Mills E.W. , Green R. Ribosomopathies: there's strength in numbers . Science . 2017 ; 358 : eaan2755 .

Guimaraes J.C. , Mittal N. , Gnann A. , Jedlinski D. , Riba A. , Buczak K. , Schmidt A. , Zavolan M. A rare codon-based translational program of cell proliferation . Genome Biol. 2020 ; 21 : 44 .

Gerst J.E. Pimp my ribosome: ribosomal protein paralogs specify translational control . Trends Genet. 2018 ; 34 : 832 – 845 .

Sugihara Y. , Honda H. , Iida T. , Morinaga T. , Hino S. , Okajima T. , Matsuda T. , Nadano D. Proteomic analysis of rodent ribosomes revealed heterogeneity including ribosomal proteins L10-like, L22-like 1, and L39-like . J. Proteome Res. 2010 ; 9 : 1351 – 1366 .

Read A. , Schröder M. The unfolded protein response: an overview . Biology . 2021 ; 10 : 384 .

Kratochvílová K. , Moráň L. , Paďourová S. , Stejskal S. , Tesařová L. , Šimara P. , Hampl A. , Koutná I. , Vaňhara P. The role of the endoplasmic reticulum stress in stemness, pluripotency and development . Eur. J. Cell Biol. 2016 ; 95 : 115 – 123 .

Woolhead C.A. , McCormick P.J. , Johnson A.E. Nascent membrane and secretory proteins differ in FRET-detected folding far inside the ribosome and in their exposure to ribosomal proteins . Cell . 2004 ; 116 : 725 – 736 .

Ban N. , Nissen P. , Hansen J. , Moore P.B. , Steitz T.A. The complete atomic structure of the large ribosomal subunit at 2.4 A resolution . Science . 2000 ; 289 : 905 – 920 .

Nissen P. , Hansen J. , Ban N. , Moore P.B. , Steitz T.A. The structural basis of ribosome activity in peptide bond synthesis . Science . 2000 ; 289 : 920 – 930 .

Ferbitz L. , Maier T. , Patzelt H. , Bukau B. , Deuerling E. , Ban N. Trigger factor in complex with the ribosome forms a molecular cradle for nascent proteins . Nature . 2004 ; 431 : 590 – 596 .

Baram D. , Pyetan E. , Sittner A. , Auerbach-Nevo T. , Bashan A. , Yonath A. Structure of trigger factor binding domain in biologically homologous complex with eubacterial ribosome reveals its chaperone action . Proc. Natl. Acad. Sci. U.S.A. 2005 ; 102 : 12017 – 12022 .

Dao Duc K. , Batra S.S. , Bhattacharya N. , Cate J.H.D. , Song Y.S. Differences in the path to exit the ribosome across the three domains of life . Nucleic Acids Res. 2019 ; 47 : 4198 – 4210 .

Ziv G. , Haran G. , Thirumalai D. Ribosome exit tunnel can entropically stabilize alpha-helices . Proc. Natl. Acad. Sci. U.S.A. 2005 ; 102 : 18956 – 18961 .

Pechmann S. , Willmund F. , Frydman J. The ribosome as a hub for protein quality control . Mol. Cell . 2013 ; 49 : 411 – 421 .

Acharya A. , Ruvinov S.B. , Gal J. , Moll J.R. , Vinson C. A heterodimerizing leucine zipper coiled coil system for examining the specificity of a position interactions: amino acids I, V, L, N, A, and K . Biochemistry . 2002 ; 41 : 14122 – 14131 .

Szilák L. , Moitra J. , Vinson C. Design of a leucine zipper coiled coil stabilized 1.4 kcal mol-1 by phosphorylation of a serine in the e position . Protein Sci. 1997 ; 6 : 1273 – 1283 .

Massari M.E. , Murre C. Helix-loop-helix proteins: regulators of transcription in eucaryotic organisms . Mol. Cell. Biol. 2000 ; 20 : 429 – 440 .

Pandol S.J. , Gorelick F.S. , Lugea A. Environmental and genetic stressors and the unfolded protein response in exocrine pancreatic function - a hypothesis . Front. Physiol. 2011 ; 2 : 8 .

Baser A. , Skabkin M. , Martin-Villalba A. Neural stem cell activation and the role of protein synthesis . Brain Plast . 2017 ; 3 : 27 – 41 .

Author notes

Month: Total Views:
July 2024 822
August 2024 429

Email alerts

Citing articles via.

  • Editorial Board

Affiliations

  • Online ISSN 1362-4962
  • Print ISSN 0305-1048
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

IMAGES

  1. Factor Analysis: Simplifying Your Research

    factor analysis journal research

  2. Best Practices in Exploratory Factor Analysis

    factor analysis journal research

  3. Analyse factorielle : Définition, méthodes et exemples // Qualtrics

    factor analysis journal research

  4. Factor Analysis

    factor analysis journal research

  5. Factor Analysis Guide with an Example

    factor analysis journal research

  6. (PDF) Exploratory Factor Analysis: A Guide to Best Practice

    factor analysis journal research

VIDEO

  1. Factor Analysis Part 1

  2. How Studying Factor Analysis Changed My View of Human Nature. There are no Scaler Variables In Psych

  3. Research Metrics : Impact Factor

  4. How to check the impact factor and journal ranking using the Pubmed impact factor extension

  5. Q1 -Q4 Journal Ranking Criteria| Latest Journal Metrics

  6. Factor Analysis

COMMENTS

  1. Exploratory Factor Analysis: A Guide to Best Practice

    Exploratory factor analysis (EFA) is a multivariate statistical method that has become a fundamental tool in the development and validation of psychological theories and measurements. However, researchers must make several thoughtful and evidence-based methodological decisions while conducting an EFA, and there are a number of options available ...

  2. Factor Analysis: a means for theory and instrument development in

    It should be noted if the prior EFA applied an orthogonal rotation to the factor solution, the factors produced would be uncorrelated. Hence, the analysis of the second-order factors is not possible. Generally, in social science research, most constructs assume inter-related factors, and therefore should apply an oblique rotation.

  3. Exploratory factor analysis: Current use, methodological developments

    Psychological research often relies on Exploratory Factor Analysis (EFA). As the outcome of the analysis highly depends on the chosen settings, there is a strong need for guidelines in this context. Therefore, we want to examine the recent methodological developments as well as the current practice in psychological research. We reviewed ten years of studies containing EFAs and contrasted them ...

  4. Notes to Factor Analysis Techniques for Construct Validity

    This paper introduces and discusses factor analysis techniques for construct validity, including some suggestions for reporting using the evidence to support the construct validity from exploratory...

  5. One Size Doesn't Fit All: Using Factor Analysis to Gather Validity

    A brief history of the philosophical foundations of exploratory factor analysis. Journal of Multivariate Behavioral Research, 22(3), ... Scale development research: A content analysis and recommendations for best practices. The Counseling Psychologist, 34(6), 806-838. Google Scholar;

  6. On exploratory factor analysis: A review of recent evidence, an

    Based on the findings from factor analysis research, it seems likely that the use of such methods may have had a material, adverse effect on the solutions generated. ... Of the 773 research papers published in the 10 journals in 2012, 54 factor analyses (and PCA) were reported in the findings sections of 28 papers. Of the 54 factor analysis ...

  7. Exploratory Factor Analysis: Implications for Theory, Research, and

    Exploratory factor analysis (EFA) serves many useful purposes in human resource development (HRD) research. The most frequent applications of EFA among researchers consists of reducing relatively large sets of variables into more manageable ones, developing and refining a new instrument's scales, and exploring relations among variables to build theory.

  8. Exploratory Factor Analysis; Concepts and Theory

    1 Introduction. Factor analysis is a significant instrument which is utilized in development, refinement, and evaluation of tests, scales, and measures (Williams, Brown et al. 2010). Exploratory factor analysis (EFA) is widely used and broadly applied statistical approach in information system, social science, education and psychology.

  9. Exploratory Factor Analysis: Basics and Beyond

    Exploratory factor analysis (EFA) is a statistical method used to answer a wide range of research questions pertaining to the underlying structure of a set of variables. A primary goal of this chapter is to provide sufficient background information to foster a comprehensive understanding for the series of methodological decisions that have to ...

  10. Exploratory Factor Analysis: A Guide to Best Practice

    Journal of Research in Personality, 29, 168-188. doi ... Factor analysis was deemed the most well-suited procedure to uncover an underlying structure of the items due to its prevalence in ...

  11. (PDF) Overview of Factor Analysis

    Chapter 1. Theoretical In tro duction. • Factor analysis is a collection of methods used to examine how underlying constructs influence the. resp onses on a n umber of measured v ariables ...

  12. A systematic literature review of exploratory factor analyses in

    Exploratory factor analysis (EFA) is a powerful statistical technique that enables researchers to use their judgement and interpretation to identify a set of latent factors that meaningfully and parsimoniously represent a set of indicators (Goretzko et al., 2021, Hair et al., 2019, Howard, 2016, Watkins, 2018).The technique estimates the number of latent factors underlying the indicators as ...

  13. Factor Analysis as a Tool for Survey Analysis

    Results The ICOPES-TW had a two-factor structure (body functionality [eigenvalue = 1.932] and life adaptation [eigenvalue = 1.170]) as indicated by the results of exploratory factor analysis.

  14. PDF A Beginner's Guide to Factor Analysis: Focusing on Exploratory Factor

    The formula for deriving the communalities is where a equals the loadings for j variables. Using the factor loadings in Table 1, we then calculate the communalities using the aforementioned formula, thus. = 0.78. The values in the table represent the factor loadings and how much the variable contributes to. Figure 2.

  15. Evaluating the use of exploratory factor analysis in psychological

    Despite the widespread use of exploratory factor analysis in psychological research, researchers often make questionable decisions when conducting these analyses. This article reviews the major design and analytical decisions that must be made when conducting a factor analysis and notes that each of these decisions has important consequences for the obtained results. Recommendations that have ...

  16. Factor Analysis in Counseling Research and Practice

    This article summarizes the general uses and major characteristics of factor analysis, particularly as they may apply to counseling research and practice. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are overviewed, including their principal aims, procedures, and interpretations. The basic steps of each type of ...

  17. Factor analysis

    Discover a faster, simpler path to publishing in a high-quality journal. PLOS ONE promises fair, rigorous peer review, broad scope, and wide readership - a perfect fit for your research every time.. Learn More Submit Now

  18. Factor Analysis as a Tool for Survey Analysis

    The application of factor analysis for questionnaire evaluation provides very valuable inputs to the decision makers to focus on few important factors rather than a large number of parameters. ... "Evaluating structural equation models with unobservable variables and measurement error," Journal of Marketing Research, 18(1), 39-50. 1981.

  19. Factor Analysis

    Factor Analysis Steps. Here are the general steps involved in conducting a factor analysis: 1. Define the Research Objective: Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis. 2. Data Collection: Gather the data on the variables of interest.

  20. The Application and Misapplication of Factor Analysis in Marketing Research

    The use of factor analysis as a method for examining the dimensional. structure of data is contrasted with its frequent misapplication as a tool for identifying clusters and segments. Procedures for determining when a data set is appropriate for factoring, for determining the number of factors to extract, and for rotation are discussed.

  21. Exploratory Factor Analysis: A Guide to Best Practice

    Exploratory factor analysis (EFA) is a multivariate statistical method that has become a fundamental tool in the development and validation of psychological theories and measurements. However, researchers must make several thoughtful and evidence-based methodological decisions while conducting an EFA, and there are a number of options available ...

  22. Use of Exploratory Factor Analysis in Published Research:

    Given the proliferation of factor analysis applications in the literature, the present article examines the use of factor analysis in current published research across four psychological journals. Notwithstanding ease of analysis due to computers, the appropriate use of factor analysis requires a series of thoughtful researcher judgments.

  23. Co-factor analysis of citation networks: Journal of Computational and

    Simulations show that our estimator has promising finite sample properties, and that naive approaches fail to recover latent co-factor structure. We leverage our estimator to investigate 255,780 papers published in statistics journals from 1898 to 2024, resulting in the most comprehensive topic model of the statistics literature to date.

  24. In progress (January 2025)

    Receive an update when the latest issues in this journal are published. Sign in to set up alerts ... Research article Full text access Vibration suppression of a platform by a fractional type electromagnetic damper and inerter-based nonlinear energy sink ... Research article Full text access Analysis on transient wave propagation in the soft ...

  25. Ribosomal protein RPL39L is an efficiency factor in the cotranslational

    Nucleic Acids Research, Volume 52, Issue 15, 27 August 2024, Pages 9028-9048, https ... Nitish Mittal, Mihaela Zavolan, Ribosomal protein RPL39L is an efficiency factor in the cotranslational folding of a subset of proteins with alpha helical domains, Nucleic ... By CryoEM analysis of purified RPL39 and RPL39L-containing ribosomes we found ...

  26. Reviewer Resources: Confirmatory Factor Analysis

    Abstract. Confirmatory factor analyses (CFA) are widely used in the organizational literature. As a result, understanding how to properly conduct these analyses, report the results, and interpret their implications is critically important for advancing organizational research. The goal of this paper is to summarize the complexities of CFA ...

  27. Tourists' Satisfaction, Experience, and Revisit ...

    Considering that the factor loadings between 0.5 and 0.7 can be retained if AVE and CR values of the related ... Nitzl C. (2020). Assessing measurement model quality in PLS-SEM using confirmatory composite analysis. Journal of Business Research, 109, 101-110. Crossref. Google Scholar. Hair J. F., Hult G. T. M., Ringle C. M., Sarstedt M. (2017 ...