What is The Null Hypothesis & When Do You Reject The Null Hypothesis

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A null hypothesis is a statistical concept suggesting no significant difference or relationship between measured variables. It’s the default assumption unless empirical evidence proves otherwise.

The null hypothesis states no relationship exists between the two variables being studied (i.e., one variable does not affect the other).

The null hypothesis is the statement that a researcher or an investigator wants to disprove.

Testing the null hypothesis can tell you whether your results are due to the effects of manipulating ​ the dependent variable or due to random chance. 

How to Write a Null Hypothesis

Null hypotheses (H0) start as research questions that the investigator rephrases as statements indicating no effect or relationship between the independent and dependent variables.

It is a default position that your research aims to challenge or confirm.

For example, if studying the impact of exercise on weight loss, your null hypothesis might be:

There is no significant difference in weight loss between individuals who exercise daily and those who do not.

Examples of Null Hypotheses

Research QuestionNull Hypothesis
Do teenagers use cell phones more than adults?Teenagers and adults use cell phones the same amount.
Do tomato plants exhibit a higher rate of growth when planted in compost rather than in soil?Tomato plants show no difference in growth rates when planted in compost rather than soil.
Does daily meditation decrease the incidence of depression?Daily meditation does not decrease the incidence of depression.
Does daily exercise increase test performance?There is no relationship between daily exercise time and test performance.
Does the new vaccine prevent infections?The vaccine does not affect the infection rate.
Does flossing your teeth affect the number of cavities?Flossing your teeth has no effect on the number of cavities.

When Do We Reject The Null Hypothesis? 

We reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect. This often occurs when the p-value (probability of observing the data given the null hypothesis is true) is below a predetermined significance level.

If the collected data does not meet the expectation of the null hypothesis, a researcher can conclude that the data lacks sufficient evidence to back up the null hypothesis, and thus the null hypothesis is rejected. 

Rejecting the null hypothesis means that a relationship does exist between a set of variables and the effect is statistically significant ( p > 0.05).

If the data collected from the random sample is not statistically significance , then the null hypothesis will be accepted, and the researchers can conclude that there is no relationship between the variables. 

You need to perform a statistical test on your data in order to evaluate how consistent it is with the null hypothesis. A p-value is one statistical measurement used to validate a hypothesis against observed data.

Calculating the p-value is a critical part of null-hypothesis significance testing because it quantifies how strongly the sample data contradicts the null hypothesis.

The level of statistical significance is often expressed as a  p  -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01) as general guidelines to decide if you should reject or keep the null.

When your p-value is less than or equal to your significance level, you reject the null hypothesis.

In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis.

In this case, the sample data provides insufficient data to conclude that the effect exists in the population.

Because you can never know with complete certainty whether there is an effect in the population, your inferences about a population will sometimes be incorrect.

When you incorrectly reject the null hypothesis, it’s called a type I error. When you incorrectly fail to reject it, it’s called a type II error.

Why Do We Never Accept The Null Hypothesis?

The reason we do not say “accept the null” is because we are always assuming the null hypothesis is true and then conducting a study to see if there is evidence against it. And, even if we don’t find evidence against it, a null hypothesis is not accepted.

A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. 

It is risky to conclude that the null hypothesis is true merely because we did not find evidence to reject it. It is always possible that researchers elsewhere have disproved the null hypothesis, so we cannot accept it as true, but instead, we state that we failed to reject the null. 

One can either reject the null hypothesis, or fail to reject it, but can never accept it.

Why Do We Use The Null Hypothesis?

We can never prove with 100% certainty that a hypothesis is true; We can only collect evidence that supports a theory. However, testing a hypothesis can set the stage for rejecting or accepting this hypothesis within a certain confidence level.

The null hypothesis is useful because it can tell us whether the results of our study are due to random chance or the manipulation of a variable (with a certain level of confidence).

A null hypothesis is rejected if the measured data is significantly unlikely to have occurred and a null hypothesis is accepted if the observed outcome is consistent with the position held by the null hypothesis.

Rejecting the null hypothesis sets the stage for further experimentation to see if a relationship between two variables exists. 

Hypothesis testing is a critical part of the scientific method as it helps decide whether the results of a research study support a particular theory about a given population. Hypothesis testing is a systematic way of backing up researchers’ predictions with statistical analysis.

It helps provide sufficient statistical evidence that either favors or rejects a certain hypothesis about the population parameter. 

Purpose of a Null Hypothesis 

  • The primary purpose of the null hypothesis is to disprove an assumption. 
  • Whether rejected or accepted, the null hypothesis can help further progress a theory in many scientific cases.
  • A null hypothesis can be used to ascertain how consistent the outcomes of multiple studies are.

Do you always need both a Null Hypothesis and an Alternative Hypothesis?

The null (H0) and alternative (Ha or H1) hypotheses are two competing claims that describe the effect of the independent variable on the dependent variable. They are mutually exclusive, which means that only one of the two hypotheses can be true. 

While the null hypothesis states that there is no effect in the population, an alternative hypothesis states that there is statistical significance between two variables. 

The goal of hypothesis testing is to make inferences about a population based on a sample. In order to undertake hypothesis testing, you must express your research hypothesis as a null and alternative hypothesis. Both hypotheses are required to cover every possible outcome of the study. 

What is the difference between a null hypothesis and an alternative hypothesis?

The alternative hypothesis is the complement to the null hypothesis. The null hypothesis states that there is no effect or no relationship between variables, while the alternative hypothesis claims that there is an effect or relationship in the population.

It is the claim that you expect or hope will be true. The null hypothesis and the alternative hypothesis are always mutually exclusive, meaning that only one can be true at a time.

What are some problems with the null hypothesis?

One major problem with the null hypothesis is that researchers typically will assume that accepting the null is a failure of the experiment. However, accepting or rejecting any hypothesis is a positive result. Even if the null is not refuted, the researchers will still learn something new.

Why can a null hypothesis not be accepted?

We can either reject or fail to reject a null hypothesis, but never accept it. If your test fails to detect an effect, this is not proof that the effect doesn’t exist. It just means that your sample did not have enough evidence to conclude that it exists.

We can’t accept a null hypothesis because a lack of evidence does not prove something that does not exist. Instead, we fail to reject it.

Failing to reject the null indicates that the sample did not provide sufficient enough evidence to conclude that an effect exists.

If the p-value is greater than the significance level, then you fail to reject the null hypothesis.

Is a null hypothesis directional or non-directional?

A hypothesis test can either contain an alternative directional hypothesis or a non-directional alternative hypothesis. A directional hypothesis is one that contains the less than (“<“) or greater than (“>”) sign.

A nondirectional hypothesis contains the not equal sign (“≠”).  However, a null hypothesis is neither directional nor non-directional.

A null hypothesis is a prediction that there will be no change, relationship, or difference between two variables.

The directional hypothesis or nondirectional hypothesis would then be considered alternative hypotheses to the null hypothesis.

Gill, J. (1999). The insignificance of null hypothesis significance testing.  Political research quarterly ,  52 (3), 647-674.

Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method.  American Psychologist ,  56 (1), 16.

Masson, M. E. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing.  Behavior research methods ,  43 , 679-690.

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy.  Psychological methods ,  5 (2), 241.

Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test.  Psychological bulletin ,  57 (5), 416.

Print Friendly, PDF & Email

Logo for UH Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Hypothesis Testing with One Sample

Null and Alternative Hypotheses

OpenStaxCollege

[latexpage]

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 : The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.

H a : The alternative hypothesis: It is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 .

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are “reject H 0 ” if the sample information favors the alternative hypothesis or “do not reject H 0 ” or “decline to reject H 0 ” if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in H 0 and H a :

equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

H 0 : The drug reduces cholesterol by 25%. p = 0.25

H a : The drug does not reduce cholesterol by 25%. p ≠ 0.25

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:

H 0 : μ = 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ = 66
  • H a : μ ≠ 66

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:

H 0 : μ ≥ 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ ≥ 45
  • H a : μ < 45

In an issue of U. S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.

H 0 : p ≤ 0.066

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : p = 0.40
  • H a : p > 0.40

<!– ??? –>

Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

Chapter Review

In a hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we:

Formula Review

H 0 and H a are contradictory.

has: equal (=) greater than or equal to (≥) less than or equal to (≤)
has: not equal (≠) greater than (>) less than (<) less than (<) greater than (>)

If α ≤ p -value, then do not reject H 0 .

If α > p -value, then reject H 0 .

α is preconceived. Its value is set before the hypothesis test starts. The p -value is calculated from the data.

You are testing that the mean speed of your cable Internet connection is more than three Megabits per second. What is the random variable? Describe in words.

The random variable is the mean Internet speed in Megabits per second.

You are testing that the mean speed of your cable Internet connection is more than three Megabits per second. State the null and alternative hypotheses.

The American family has an average of two children. What is the random variable? Describe in words.

The random variable is the mean number of children an American family has.

The mean entry level salary of an employee at a company is 💲58,000. You believe it is higher for IT professionals in the company. State the null and alternative hypotheses.

A sociologist claims the probability that a person picked at random in Times Square in New York City is visiting the area is 0.83. You want to test to see if the proportion is actually less. What is the random variable? Describe in words.

The random variable is the proportion of people picked at random in Times Square visiting the city.

A sociologist claims the probability that a person picked at random in Times Square in New York City is visiting the area is 0.83. You want to test to see if the claim is correct. State the null and alternative hypotheses.

In a population of fish, approximately 42% are female. A test is conducted to see if, in fact, the proportion is less. State the null and alternative hypotheses.

Suppose that a recent article stated that the mean time spent in jail by a first–time convicted burglar is 2.5 years. A study was then done to see if the mean time has increased in the new century. A random sample of 26 first-time convicted burglars in a recent year was picked. The mean length of time in jail from the survey was 3 years with a standard deviation of 1.8 years. Suppose that it is somehow known that the population standard deviation is 1.5. If you were conducting a hypothesis test to determine if the mean length of jail time has increased, what would the null and alternative hypotheses be? The distribution of the population is normal.

A random survey of 75 death row inmates revealed that the mean length of time on death row is 17.4 years with a standard deviation of 6.3 years. If you were conducting a hypothesis test to determine if the population mean time on death row could likely be 15 years, what would the null and alternative hypotheses be?

  • H 0 : __________
  • H a : __________
  • H 0 : μ = 15
  • H a : μ ≠ 15

The National Institute of Mental Health published an article stating that in any one-year period, approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness. If you were conducting a hypothesis test to determine if the true proportion of people in that town suffering from depression or a depressive illness is lower than the percent in the general adult American population, what would the null and alternative hypotheses be?

Some of the following statements refer to the null hypothesis, some to the alternate hypothesis.

State the null hypothesis, H 0 , and the alternative hypothesis. H a , in terms of the appropriate parameter ( μ or p ).

  • The mean number of years Americans work before retiring is 34.
  • At most 60% of Americans vote in presidential elections.
  • The mean starting salary for San Jose State University graduates is at least 💲100,000 per year.
  • Twenty-nine percent of high school seniors get drunk each month.
  • Fewer than 5% of adults ride the bus to work in Los Angeles.
  • The mean number of cars a person owns in her lifetime is not more than ten.
  • About half of Americans prefer to live away from cities, given the choice.
  • Europeans have a mean paid vacation each year of six weeks.
  • The chance of developing breast cancer is under 11% for women.
  • Private universities’ mean tuition cost is more than 💲20,000 per year.
  • H 0 : μ = 34; H a : μ ≠ 34
  • H 0 : p ≤ 0.60; H a : p > 0.60
  • H 0 : μ ≥ 100,000; H a : μ < 100,000
  • H 0 : p = 0.29; H a : p ≠ 0.29
  • H 0 : p = 0.05; H a : p < 0.05
  • H 0 : μ ≤ 10; H a : μ > 10
  • H 0 : p = 0.50; H a : p ≠ 0.50
  • H 0 : μ = 6; H a : μ ≠ 6
  • H 0 : p ≥ 0.11; H a : p < 0.11
  • H 0 : μ ≤ 20,000; H a : μ > 20,000

Over the past few decades, public health officials have examined the link between weight concerns and teen girls’ smoking. Researchers surveyed a group of 273 randomly selected teen girls living in Massachusetts (between 12 and 15 years old). After four years the girls were surveyed again. Sixty-three said they smoked to stay thin. Is there good evidence that more than thirty percent of the teen girls smoke to stay thin? The alternative hypothesis is:

  • p < 0.30
  • p > 0.30

A statistics instructor believes that fewer than 20% of Evergreen Valley College (EVC) students attended the opening night midnight showing of the latest Harry Potter movie. She surveys 84 of her students and finds that 11 attended the midnight showing. An appropriate alternative hypothesis is:

  • p > 0.20
  • p < 0.20

Previously, an organization reported that teenagers spent 4.5 hours per week, on average, on the phone. The organization thinks that, currently, the mean is higher. Fifteen randomly chosen teenagers were asked how many hours per week they spend on the phone. The sample mean was 4.75 hours with a sample standard deviation of 2.0. Conduct a hypothesis test. The null and alternative hypotheses are:

  • H o : \(\overline{x}\) = 4.5, H a : \(\overline{x}\) > 4.5
  • H o : μ ≥ 4.5, H a : μ < 4.5
  • H o : μ = 4.75, H a : μ > 4.75
  • H o : μ = 4.5, H a : μ > 4.5

Data from the National Institute of Mental Health. Available online at http://www.nimh.nih.gov/publicat/depression.cfm.

Null and Alternative Hypotheses Copyright © 2013 by OpenStaxCollege is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

13.1 Understanding Null Hypothesis Testing

Learning objectives.

  • Explain the purpose of null hypothesis testing, including the role of sampling error.
  • Describe the basic logic of null hypothesis testing.
  • Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

  The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables in a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called  parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 adults with clinical depression and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for adults with clinical depression).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of adults with clinical depression, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s  r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called  sampling error . (Note that the term error  here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s  r  value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

  • There is a relationship in the population, and the relationship in the sample reflects this.
  • There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

The Logic of Null Hypothesis Testing

Null hypothesis testing  is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the  null hypothesis  (often symbolized  H 0  and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the  alternative hypothesis  (often symbolized as  H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

  • Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
  • Determine how likely the sample relationship would be if the null hypothesis were true.
  • If the sample relationship would be extremely unlikely, then reject the null hypothesis  in favor of the alternative hypothesis. If it would not be extremely unlikely, then  retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of  d  = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favor of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the  p value . A low  p  value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A p  value that is not low means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the  p  value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called  α (alpha)  and is almost always set to .05. If there is a 5% chance or less of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be  statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to reject it. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood  p  Value

The  p  value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [1] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the  p  value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the  p  value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The  p  value is really the probability of a result at least as extreme as the sample result  if  the null hypothesis  were  true. So a  p  value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the  p  value is not the probability that any particular  hypothesis  is true or false. Instead, it is the probability of obtaining the  sample result  if the null hypothesis were true.

image

“Null Hypothesis” retrieved from http://imgs.xkcd.com/comics/null_hypothesis.png (CC-BY-NC 2.5)

Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the  p  value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the  p  value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s  d  is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s  d  is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word  Yes , then this combination would be statistically significant for both Cohen’s  d  and Pearson’s  r . If it contains the word  No , then it would not be statistically significant for either. There is one cell where the decision for  d  and  r  would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Sample Size Weak Medium Strong
Small (  = 20) No No  = Maybe

 = Yes

Medium (  = 50) No Yes Yes
Large (  = 100)  = Yes

 = No

Yes Yes
Extra large (  = 500) Yes Yes Yes

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [2] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word  significant  can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the  statistical  significance of a result and the  practical  significance of that result.  Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

image

“Conditional Risk” retrieved from http://imgs.xkcd.com/comics/conditional_risk.png (CC-BY-NC 2.5)

Key Takeaways

  • Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
  • The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favor of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
  • The probability of obtaining the sample result if the null hypothesis were true (the  p  value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
  • Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
  • Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.
  • The correlation between two variables is  r  = −.78 based on a sample size of 137.
  • The mean score on a psychological characteristic for women is 25 ( SD  = 5) and the mean score for men is 24 ( SD  = 5). There were 12 women and 10 men in this study.
  • In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
  • In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
  • A student finds a correlation of  r  = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.
  • Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
  • Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Creative Commons License

Share This Book

  • Increase Font Size

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

null hypothesis about depression

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

Prevent plagiarism. Run a free check.

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved August 5, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

8.6 - interaction effects, example 8-4: depression treatments section  .

Now that we've clarified what additive effects are, let's take a look at an example where including " interaction terms " is appropriate.

Some researchers (Daniel, 1999) were interested in comparing the effectiveness of three treatments for severe depression. For the sake of simplicity, we denote the three treatments A, B, and C. The researchers collected the following data ( Depression Data ) on a random sample of n = 36 severely depressed individuals:

  • \(y_{i} =\) measure of the effectiveness of the treatment for individual i
  • \(x_{i1} =\) age (in years) of individual i
  • \(x_{i2} = 1\) if individual i received treatment A and 0, if not
  • \(x_{i3} = 1\) if individual i received treatment B and 0, if not

A scatter plot of the data with treatment effectiveness on the y -axis and age on the x -axis looks like this:

depression treatments scatterplot with groups

The blue circles represent the data for individuals receiving treatment A, the red squares represent the data for individuals receiving treatment B, and the green diamonds represent the data for individuals receiving treatment C.

In the previous example, the two estimated regression functions had the same slopes —that is, they were parallel. If you tried to draw three best-fitting lines through the data of this example, do you think the slopes of your lines would be the same? Probably not! In this case, we need to include what are called " interaction terms " in our formulated regression model.

A (second-order) multiple regression model with interaction terms is:

\(y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3}+\beta_{12}x_{i1}x_{i2}+\beta_{13}x_{i1}x_{i3}+\epsilon_i\)

and the independent error terms \(\epsilon_i\) follow a normal distribution with mean 0 and equal variance \(\sigma^{2}\). Perhaps not surprisingly, the terms \(x_{i} x_{i2}\) and \(x_{i1} x_{i3}\) are the interaction terms in the model.

Let's investigate our formulated model to discover in what way the predictors have an " interaction effect " on the response. We start by determining the formulated regression function for each of the three treatments. In short —after a little bit of algebra (see below) —we learn that the model defines three different regression functions —one for each of the three treatments:

Treatment Formulated regression function
If patient receives A, then \( \left(x_{i2} = 1, x_{i3} = 0 \right) \) and ...

\(\mu_Y=(\beta_0+\beta_2)+(\beta_1+\beta_{12})x_{i1}\)

If patient receives B, then \( \left(x_{i2} = 0, x_{i3} = 1 \right) \) and ...

\(\mu_Y=(\beta_0+\beta_3)+(\beta_1+\beta_{13})x_{i1}\)

If patient receives C, then \( \left(x_{i2} = 0, x_{i3} = 0 \right) \) and ...

\(\mu_Y=\beta_0+\beta_{1}x_{i1}\)

So, in what way does including the interaction terms, \(x_{i1} x_{i2}\) and \(x_{i1} x_{i3}\), in the model imply that the predictors have an " interaction effect " on the mean response? Note that the slopes of the three regression functions differ —the slope of the first line is \(\beta_1 + \beta_{12}\), the slope of the second line is \(\beta_1 + \beta_{13}\), and the slope of the third line is \(\beta_1\). What does this mean in a practical sense? It means that...

  • the effect of the individual's age \(\left( x_1 \right)\) on the treatment's mean effectiveness \(\left(\mu_Y \right)\) depends on the treatment \(\left(x_2 \text{ and } x_3\right)\), and ...
  • the effect of treatment \(\left(x_2 \text{ and } x_3\right)\) on the treatment's mean effectiveness \(\left(\mu_Y \right)\) depends on the individual's age \(\left( x_1 \right)\).

In general, then, what does it mean for two predictors " to interact "?

  • Two predictors interact if the effect on the response variable of one predictor depends on the value of the other .
  • A slope parameter can no longer be interpreted as the change in the mean response for each unit increase in the predictor, while the other predictors are held constant.

And, what are " interaction effects "?

A regression model contains interaction effects if the response function is not additive and cannot be written as a sum of functions of the predictor variables. That is, a regression model contains interaction effects if:

\(\mu_Y \ne f_1(x_1)+f_1(x_1)+ \cdots +f_{p-1}(x_{p-1})\)

For our example concerning treatment for depression, the mean response:

\(\mu_Y=\beta_0+\beta_1x_{1}+\beta_2x_{2}+\beta_3x_{3}+\beta_{12}x_{1}x_{2}+\beta_{13}x_{1}x_{3}\)

can not be separated into distinct functions of each of the individual predictors. That is, there is no way of "breaking apart" \(\beta_{12} x_1 x_2 \text{ and } \beta_{13} x_1 x_3\) into distinct pieces. Therefore, we say that \(x_1 \text{ and } x_2\) interact, and \(x_1 \text{ and } x_3\) interact.

In returning to our example, let's recall that the appropriate steps in any regression analysis are:

  • Model formulation
  • Model estimation
  • Model evaluation

So far, within the model-building step, all we've done is formulate the regression model as:

We can use Minitab —or any other statistical software for that matter —to estimate the model. Doing so, Minitab reports:

Regression Equation

y = 6.21 + 1.0334 age + 41.30 x2+ 22.71 x3 - 0.703 agex2 - 0.510 agex3

Now, if we plug the possible values for \(x_2 \text{ and } x_3\) into the estimated regression function, we obtain the three "best fitting" lines —one for each treatment (A, B, and C) —through the data. Here's the algebra for determining the estimated regression function for patients receiving treatment A.

Doing similar algebra for patients receiving treatments B and C, we obtain:

Treatment Estimated regression function
If patient receives A, then \(\left(x_2 = 1, x_3 = 0 \right)\) and ...

\(\hat{y}=47.5+0.33x_1\)

If patient receives B, then \(\left(x_2 = 0, x_3 = 1 \right)\) and ...

\(\hat{y}=28.9+0.52x_1\)

If patient receives C, then \(\left(x_2 = 0, x_3 = 0 \right)\) and ...

\(\hat{y}=6.21+1.03x_1\)

And, plotting the three "best fitting" lines, we obtain:

depression treatments scatterplot with groups and fitted lines

What do the estimated slopes tell us?

  • For patients in this study receiving treatment A, the effectiveness of the treatment is predicted to increase by 0.33 units for every additional year in age.
  • For patients in this study receiving treatment B, the effectiveness of the treatment is predicted to increase by 0.52 units for every additional year in age.
  • For patients in this study receiving treatment C, the effectiveness of the treatment is predicted to increase by 1.03 units for every additional year in age.

In short, the effect of age on the predicted treatment effectiveness depends on the treatment given. That is, age appears to interact with treatment in its impact on treatment effectiveness. The interaction is exhibited graphically by the "nonparallelness" (is that a word?) of the lines.

Of course, our primary goal is not to draw conclusions about this particular sample of depressed individuals, but rather about the entire population of depressed individuals. That is, we want to use our estimated model to draw conclusions about the larger population of depressed individuals. Before we do so, however, we first should evaluate the model.

The residuals versus fits plot:

residual vs fitted value plot

exhibits all of the "good" behavior, suggesting that the model fits well, there are no obvious outliers, and the error variances are indeed constant. And, the normal probability plot:

normal probability plot

exhibits a linear trend and a large P -value, suggesting that the error terms are indeed normally distributed.

Having successfully built —formulated, estimated, and evaluated —a model, we now can use the model to answer our research questions. Let's consider two different questions that we might want to be answered.

First research question. For every age, is there a difference in the mean effectiveness for the three treatments? As is usually the case, our formulated regression model helps determine how to answer the research question. Our formulated regression model suggests that answering the question involves testing whether the population regression functions are identical.

That is, we need to test the null hypothesis \(H_0 \colon \beta_2 = \beta_3 =\beta_{12} = \beta_{13} = 0\) against the alternative \(H_A \colon\) at least one of these slope parameters is not 0.

We know how to do that! The relevant software output:

Analysis of Variance

Source DF Seq SS Seq MS F-Value P-Value
Regression 5 4932.85 986.57 64.04 0.000
age 5 3424.43 3424.43 222.29 0.000
x2 1 803.80 803.80 52.18 0.000
x3 1 1.19 1.19 0.08 0.783
agex2 1 375.00 375.00 24.34 0.000
agex3 1 328.42 328.42 21.32 0.000
Error 30 462.15 15.40    
Lack-of-Fit 27 285.15 10.56 0.18 0.996
Pure Error 3 177.00 59.00    
Total 35 5395.0      

tells us that the appropriate partial F -statistic for testing the above hypothesis is:

\(F=\frac{(803.8+1.19+375+328.42)/4}{15.4}=24.49\)

And, Minitab tells us:

F Distribution with 4 DF in Numerator and 30 DF in denominator

x \(p(X \leq x)\)
24.49 1.00000

that the probability of observing an F -statistic —with 4 numerator and 30 denominator degrees of freedom —less than our observed test statistic 24.49 is > 0.999. Therefore, our P -value is < 0.001. We can reject our null hypothesis. There is sufficient evidence at the \(\alpha = 0.05\) level to conclude that there is a significant difference in the mean effectiveness for the three treatments.

Second research question. Does the effect of age on the treatment's effectiveness depend on the treatment? Our formulated regression model suggests that answering the question involves testing whether the two interaction parameters \(\beta_{12} \text{ and } \beta_{13}\) are significant. That is, we need to test the null hypothesis \(H_0 \colon \beta_{12} = \beta_{13} = 0\) against the alternative \(H_A \colon\) at least one of the interaction parameters is not 0.

The relevant software output:

\(F=\dfrac{(375+328.42)/2}{15.4}=22.84\)

F Distribution with 2 DF in Numerator and 30 DF in denominator

x \(p(X \leq x)\)
22.84 1.00000

that the probability of observing an F -statistic — with 2 numerator and 30 denominator degrees of freedom — less than our observed test statistic 22.84 is > 0.999. Therefore, our P -value is < 0.001. We can reject our null hypothesis. There is sufficient evidence at the \(\alpha = 0.05\) level to conclude that the effect of age on the treatment's effectiveness depends on the treatment.

A model with an interaction term Section  

  • The formulated regression function for patients receiving treatment B.
  • The formulated regression function for patients receiving treatment C.

\(\mu_Y = \beta_0+\beta_1x1+\beta_2 x_2+\beta_3 x_3+\beta_{12}x_1 x_2+\beta_{13} x_1 x_3\)

\(\mu_Y = \beta_0+\beta_1 x_1+\beta_2(0)+\beta_3(1)+\beta_{12} x_1(0)+\beta_{13} x_1(1) = (\beta_0+\beta_3)+(\beta_1+\beta_{13})x_1\)

\(\mu_Y = \beta_0+\beta_1 x_1+\beta_2(0)+\beta_3(0)+\beta_{12} x_1(0)+\beta_{13} x_1(0) = \beta_0+\beta_1 x_1\)

Treatment B, \(x_2 = 0 , x_3 = 1\), so

Treatment C, \(x_2 = 0 , x_3 = 0\), so

For the depression study, plug the appropriate values for \(x_2 \text{ and } x_3\) into the estimated regression function and perform the necessary algebra to determine:

  • The estimated regression function for patients receiving treatment B.
  • The estimated regression function for patients receiving treatment C.

\(\hat{y} = 6.21 + 1.0334 x_1 + 41.30 x_2+22.71 x_3 - 0.703 x_1 x_2 - 0.510 x_1 x_3\)

\(\hat{y} = 6.21 + 1.0334 x_1 + 41.30(0) + 22.71(1) - 0.703 x_1(0) - 0.510 x_1(1) = (6.21 + 22.71)+(1.0334 - 0.510) x_1 = 28.92 + 0.523 x_1\)

\(\hat{y} = 6.21 + 1.0334 x_1 + 41.30(0) + 22.71(0) - 0.703 x_1(0) - 0.510 x_1(0) = 6.21 + 1.033 x_1\)

For the first research question that we addressed for the depression study, show that there is no difference in the mean effectiveness between treatments B and C, for all ages, provided that \(\beta_3 = 0 \text{ and } \beta_{13} = 0\). ( HINT : Follow the argument presented in the chalk-talk comparing treatments A and C.)

\(\mu_Y|\text{Treatment B} - \mu_Y|\text{Treatment C} = (\beta_0 + \beta_3)+(\beta_1 + \beta_{13}) x_1 - (\beta_0 + \beta_1 x_1) = \beta_3 + \beta_{13} x_1 = 0\), if \(\beta_3 = \beta_{13} = 0\)

A study of atmospheric pollution on the slopes of the Blue Ridge Mountains (Tennessee) was conducted. The Lead Moss data contains the levels of lead found in 70 fern moss specimens (in micrograms of lead per gram of moss tissue) collected from the mountain slopes, as well as the elevation of the moss specimen (in feet) and the direction (1 if east, 0 if west) of the slope face.

  • Write the equation of a second-order model relating mean lead level, E ( y ), to elevation \(\left(x_1 \right)\) and the slope face \(\left(x_2 \right)\) that includes an interaction between elevation and slope face in the model.
  • Graph the relationship between mean lead level and elevation for the different slope faces that are hypothesized by the model in part a.
  • In terms of the β's of the model in part a, give the change in lead level for every one-foot increase in elevation for moss specimens on the east slope.
  • Fit the model in part a to the data using an available statistical software package. Is the overall model statistically useful for predicting lead level? Test using \(α = 0.10\).
  • Write the estimated equation of the model in part a relating mean lead level, E ( y ), to elevation \(\left(x_1 \right)\) and slope face \(\left(x_2 \right)\).

Since the p-value for testing whether the overall model is statistically useful for predicting lead level is 0.857, we conclude that this model is not statistically useful.

(e) The estimated equation is shown in the Minitab output above, but since the model is not statistically useful, this equation doesn’t do us much good.

\(\mu_Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1 x_2\)

plot

East slope, \(x_2=1\), \(\mu_Y = \beta_0 + \beta_1 x_1 + \beta_2(1) + \beta_{12} x_1(1) = (\beta_0 + \beta_2)+(\beta_1 + \beta_{12}) x_1\), so average lead level changes by \(\beta_1 + \beta_{12}\) micrograms of lead per gram of moss tissue for every one foot increase in elevation for moss specimens on the east slope.

Minitab output

Logo for Portland State University Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Understanding Null Hypothesis Testing

Rajiv S. Jhangiani; I-Chant A. Chiang; Carrie Cuttler; and Dana C. Leighton

Learning Objectives

  • Explain the purpose of null hypothesis testing, including the role of sampling error.
  • Describe the basic logic of null hypothesis testing.
  • Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

 The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables. These descriptive data for the sample are called statistics .  In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 adults with clinical depression and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for adults with clinical depression).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of adults with clinical depression, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s  r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called  sampling error . (Note that the term error  here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s  r  value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

  • There is a relationship in the population, and the relationship in the sample reflects this.
  • There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

The Logic of Null Hypothesis Testing

Null hypothesis testing (often called null hypothesis significance testing or NHST) is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the   null hypothesis  (often symbolized  H 0 and read as “H-zero”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the alternative hypothesis  (often symbolized as  H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

  • Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
  • Determine how likely the sample relationship would be if the null hypothesis were true.
  • If the sample relationship would be extremely unlikely, then reject the null hypothesis  in favor of the alternative hypothesis. If it would not be extremely unlikely, then  retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of  d  = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favor of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the probability of the sample result or a more extreme result if the null hypothesis were true (Lakens, 2017). [1] This probability is called the p value . A low  p value means that the sample or more extreme result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A p value that is not low means that the sample or more extreme result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the p value criterion be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called α (alpha) and is almost always set to .05. If there is a 5% chance or less of a result at least as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to reject it. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood  p  Value

The  p  value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [2] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the  p  value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the  p  value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The  p  value is really the probability of a result at least as extreme as the sample result  if  the null hypothesis  were  true. So a  p  value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the  p  value is not the probability that any particular  hypothesis  is true or false. Instead, it is the probability of obtaining the  sample result  if the null hypothesis were true.

Null Hypothesis. Image description available.

Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the  p  value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the  p  value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s  d  is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s  d  is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word  Yes , then this combination would be statistically significant for both Cohen’s  d  and Pearson’s  r . If it contains the word  No , then it would not be statistically significant for either. There is one cell where the decision for  d  and  r  would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Sample Size Weak Medium Strong
Small (  = 20) No No  = Maybe

 = Yes

Medium (  = 50) No Yes Yes
Large (  = 100)  = Yes

 = No

Yes Yes
Extra large (  = 500) Yes Yes Yes

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [3] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word  significant  can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the  statistical  significance of a result and the  practical  significance of that result.  Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

Conditional Risk. Image description available.

Image Description

“Null Hypothesis” long description:  A comic depicting a man and a woman talking in the foreground. In the background is a child working at a desk. The man says to the woman, “I can’t believe schools are still teaching kids about the null hypothesis. I remember reading a big study that conclusively disproved it  years  ago.”  [Return to “Null Hypothesis”]

“Conditional Risk” long description:  A comic depicting two hikers beside a tree during a thunderstorm. A bolt of lightning goes “crack” in the dark sky as thunder booms. One of the hikers says, “Whoa! We should get inside!” The other hiker says, “It’s okay! Lightning only kills about 45 Americans a year, so the chances of dying are only one in 7,000,000. Let’s go on!” The comic’s caption says, “The annual death rate among people who know that statistic is one in six.”  [Return to “Conditional Risk”]

Media Attributions

  • Null Hypothesis  by XKCD  CC BY-NC (Attribution NonCommercial)
  • Conditional Risk  by XKCD  CC BY-NC (Attribution NonCommercial)
  • Lakens, D. (2017, December 25). About p -values: Understanding common misconceptions. [Blog post] Retrieved from https://correlaid.org/en/blog/understand-p-values/ ↵
  • Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
  • Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Descriptive data that involves measuring one or more variables in a sample and computing descriptive summary data (e.g., means, correlation coefficients) for those variables.

Corresponding values in the population.

The random variability in a statistic from sample to sample.

A formal approach to deciding between two interpretations of a statistical relationship in a sample.

The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error (often symbolized H0 and read as “H-zero”).

An alternative to the null hypothesis (often symbolized as H1), this hypothesis proposes that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

A decision made by researchers using null hypothesis testing which occurs when the sample relationship would be extremely unlikely.

A decision made by researchers in null hypothesis testing which occurs when the sample relationship would not be extremely unlikely.

The probability of obtaining the sample result or a more extreme result if the null hypothesis were true.

The criterion that shows how low a p-value should be before the sample result is considered unlikely enough to reject the null hypothesis (Usually set to .05).

An effect that is unlikely due to random chance and therefore likely represents a real effect in the population.

Refers to the importance or usefulness of the result in some real-world context.

Understanding Null Hypothesis Testing Copyright © by Rajiv S. Jhangiani; I-Chant A. Chiang; Carrie Cuttler; and Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Null Hypothesis: Definition, Rejecting & Examples

By Jim Frost 6 Comments

What is a Null Hypothesis?

The null hypothesis in statistics states that there is no difference between groups or no relationship between variables. It is one of two mutually exclusive hypotheses about a population in a hypothesis test.

Photograph of Rodin's statue, The Thinker who is pondering the null hypothesis.

  • Null Hypothesis H 0 : No effect exists in the population.
  • Alternative Hypothesis H A : The effect exists in the population.

In every study or experiment, researchers assess an effect or relationship. This effect can be the effectiveness of a new drug, building material, or other intervention that has benefits. There is a benefit or connection that the researchers hope to identify. Unfortunately, no effect may exist. In statistics, we call this lack of an effect the null hypothesis. Researchers assume that this notion of no effect is correct until they have enough evidence to suggest otherwise, similar to how a trial presumes innocence.

In this context, the analysts don’t necessarily believe the null hypothesis is correct. In fact, they typically want to reject it because that leads to more exciting finds about an effect or relationship. The new vaccine works!

You can think of it as the default theory that requires sufficiently strong evidence to reject. Like a prosecutor, researchers must collect sufficient evidence to overturn the presumption of no effect. Investigators must work hard to set up a study and a data collection system to obtain evidence that can reject the null hypothesis.

Related post : What is an Effect in Statistics?

Null Hypothesis Examples

Null hypotheses start as research questions that the investigator rephrases as a statement indicating there is no effect or relationship.

Does the vaccine prevent infections? The vaccine does not affect the infection rate.
Does the new additive increase product strength? The additive does not affect mean product strength.
Does the exercise intervention increase bone mineral density? The intervention does not affect bone mineral density.
As screen time increases, does test performance decrease? There is no relationship between screen time and test performance.

After reading these examples, you might think they’re a bit boring and pointless. However, the key is to remember that the null hypothesis defines the condition that the researchers need to discredit before suggesting an effect exists.

Let’s see how you reject the null hypothesis and get to those more exciting findings!

When to Reject the Null Hypothesis

So, you want to reject the null hypothesis, but how and when can you do that? To start, you’ll need to perform a statistical test on your data. The following is an overview of performing a study that uses a hypothesis test.

The first step is to devise a research question and the appropriate null hypothesis. After that, the investigators need to formulate an experimental design and data collection procedures that will allow them to gather data that can answer the research question. Then they collect the data. For more information about designing a scientific study that uses statistics, read my post 5 Steps for Conducting Studies with Statistics .

After data collection is complete, statistics and hypothesis testing enter the picture. Hypothesis testing takes your sample data and evaluates how consistent they are with the null hypothesis. The p-value is a crucial part of the statistical results because it quantifies how strongly the sample data contradict the null hypothesis.

When the sample data provide sufficient evidence, you can reject the null hypothesis. In a hypothesis test, this process involves comparing the p-value to your significance level .

Rejecting the Null Hypothesis

Reject the null hypothesis when the p-value is less than or equal to your significance level. Your sample data favor the alternative hypothesis, which suggests that the effect exists in the population. For a mnemonic device, remember—when the p-value is low, the null must go!

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Failing to Reject the Null Hypothesis

Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. The sample data provides insufficient data to conclude that the effect exists in the population. When the p-value is high, the null must fly!

Note that failing to reject the null is not the same as proving it. For more information about the difference, read my post about Failing to Reject the Null .

That’s a very general look at the process. But I hope you can see how the path to more exciting findings depends on being able to rule out the less exciting null hypothesis that states there’s nothing to see here!

Let’s move on to learning how to write the null hypothesis for different types of effects, relationships, and tests.

Related posts : How Hypothesis Tests Work and Interpreting P-values

How to Write a Null Hypothesis

The null hypothesis varies by the type of statistic and hypothesis test. Remember that inferential statistics use samples to draw conclusions about populations. Consequently, when you write a null hypothesis, it must make a claim about the relevant population parameter . Further, that claim usually indicates that the effect does not exist in the population. Below are typical examples of writing a null hypothesis for various parameters and hypothesis tests.

Related posts : Descriptive vs. Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Group Means

T-tests and ANOVA assess the differences between group means. For these tests, the null hypothesis states that there is no difference between group means in the population. In other words, the experimental conditions that define the groups do not affect the mean outcome. Mu (µ) is the population parameter for the mean, and you’ll need to include it in the statement for this type of study.

For example, an experiment compares the mean bone density changes for a new osteoporosis medication. The control group does not receive the medicine, while the treatment group does. The null states that the mean bone density changes for the control and treatment groups are equal.

  • Null Hypothesis H 0 : Group means are equal in the population: µ 1 = µ 2 , or µ 1 – µ 2 = 0
  • Alternative Hypothesis H A : Group means are not equal in the population: µ 1 ≠ µ 2 , or µ 1 – µ 2 ≠ 0.

Group Proportions

Proportions tests assess the differences between group proportions. For these tests, the null hypothesis states that there is no difference between group proportions. Again, the experimental conditions did not affect the proportion of events in the groups. P is the population proportion parameter that you’ll need to include.

For example, a vaccine experiment compares the infection rate in the treatment group to the control group. The treatment group receives the vaccine, while the control group does not. The null states that the infection rates for the control and treatment groups are equal.

  • Null Hypothesis H 0 : Group proportions are equal in the population: p 1 = p 2 .
  • Alternative Hypothesis H A : Group proportions are not equal in the population: p 1 ≠ p 2 .

Correlation and Regression Coefficients

Some studies assess the relationship between two continuous variables rather than differences between groups.

In these studies, analysts often use either correlation or regression analysis . For these tests, the null states that there is no relationship between the variables. Specifically, it says that the correlation or regression coefficient is zero. As one variable increases, there is no tendency for the other variable to increase or decrease. Rho (ρ) is the population correlation parameter and beta (β) is the regression coefficient parameter.

For example, a study assesses the relationship between screen time and test performance. The null states that there is no correlation between this pair of variables. As screen time increases, test performance does not tend to increase or decrease.

  • Null Hypothesis H 0 : The correlation in the population is zero: ρ = 0.
  • Alternative Hypothesis H A : The correlation in the population is not zero: ρ ≠ 0.

For all these cases, the analysts define the hypotheses before the study. After collecting the data, they perform a hypothesis test to determine whether they can reject the null hypothesis.

The preceding examples are all for two-tailed hypothesis tests. To learn about one-tailed tests and how to write a null hypothesis for them, read my post One-Tailed vs. Two-Tailed Tests .

Related post : Understanding Correlation

Neyman, J; Pearson, E. S. (January 1, 1933).  On the Problem of the most Efficient Tests of Statistical Hypotheses .  Philosophical Transactions of the Royal Society A .  231  (694–706): 289–337.

Share this:

null hypothesis about depression

Reader Interactions

' src=

January 11, 2024 at 2:57 pm

Thanks for the reply.

January 10, 2024 at 1:23 pm

Hi Jim, In your comment you state that equivalence test null and alternate hypotheses are reversed. For hypothesis tests of data fits to a probability distribution, the null hypothesis is that the probability distribution fits the data. Is this correct?

' src=

January 10, 2024 at 2:15 pm

Those two separate things, equivalence testing and normality tests. But, yes, you’re correct for both.

Hypotheses are switched for equivalence testing. You need to “work” (i.e., collect a large sample of good quality data) to be able to reject the null that the groups are different to be able to conclude they’re the same.

With typical hypothesis tests, if you have low quality data and a low sample size, you’ll fail to reject the null that they’re the same, concluding they’re equivalent. But that’s more a statement about the low quality and small sample size than anything to do with the groups being equal.

So, equivalence testing make you work to obtain a finding that the groups are the same (at least within some amount you define as a trivial difference).

For normality testing, and other distribution tests, the null states that the data follow the distribution (normal or whatever). If you reject the null, you have sufficient evidence to conclude that your sample data don’t follow the probability distribution. That’s a rare case where you hope to fail to reject the null. And it suffers from the problem I describe above where you might fail to reject the null simply because you have a small sample size. In that case, you’d conclude the data follow the probability distribution but it’s more that you don’t have enough data for the test to register the deviation. In this scenario, if you had a larger sample size, you’d reject the null and conclude it doesn’t follow that distribution.

I don’t know of any equivalence testing type approach for distribution fit tests where you’d need to work to show the data follow a distribution, although I haven’t looked for one either!

' src=

February 20, 2022 at 9:26 pm

Is a null hypothesis regularly (always) stated in the negative? “there is no” or “does not”

February 23, 2022 at 9:21 pm

Typically, the null hypothesis includes an equal sign. The null hypothesis states that the population parameter equals a particular value. That value is usually one that represents no effect. In the case of a one-sided hypothesis test, the null still contains an equal sign but it’s “greater than or equal to” or “less than or equal to.” If you wanted to translate the null hypothesis from its native mathematical expression, you could use the expression “there is no effect.” But the mathematical form more specifically states what it’s testing.

It’s the alternative hypothesis that typically contains does not equal.

There are some exceptions. For example, in an equivalence test where the researchers want to show that two things are equal, the null hypothesis states that they’re not equal.

In short, the null hypothesis states the condition that the researchers hope to reject. They need to work hard to set up an experiment and data collection that’ll gather enough evidence to be able to reject the null condition.

' src=

February 15, 2022 at 9:32 am

Dear sir I always read your notes on Research methods.. Kindly tell is there any available Book on all these..wonderfull Urgent

Comments and Questions Cancel reply

Null Hypothesis Examples

ThoughtCo / Hilary Allison

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

In statistical analysis, the null hypothesis assumes there is no meaningful relationship between two variables. Testing the null hypothesis can tell you whether your results are due to the effect of manipulating ​a dependent variable or due to chance. It's often used in conjunction with an alternative hypothesis, which assumes there is, in fact, a relationship between two variables.

The null hypothesis is among the easiest hypothesis to test using statistical analysis, making it perhaps the most valuable hypothesis for the scientific method. By evaluating a null hypothesis in addition to another hypothesis, researchers can support their conclusions with a higher level of confidence. Below are examples of how you might formulate a null hypothesis to fit certain questions.

What Is the Null Hypothesis?

The null hypothesis states there is no relationship between the measured phenomenon (the dependent variable ) and the independent variable , which is the variable an experimenter typically controls or changes. You do not​ need to believe that the null hypothesis is true to test it. On the contrary, you will likely suspect there is a relationship between a set of variables. One way to prove that this is the case is to reject the null hypothesis. Rejecting a hypothesis does not mean an experiment was "bad" or that it didn't produce results. In fact, it is often one of the first steps toward further inquiry.

To distinguish it from other hypotheses , the null hypothesis is written as ​ H 0  (which is read as “H-nought,” "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the null hypothesis is not true, perhaps because the experimenter did not account for a critical factor or because of chance. This is one reason why it's important to repeat experiments.

Examples of the Null Hypothesis

To write a null hypothesis, first start by asking a question. Rephrase that question in a form that assumes no relationship between the variables. In other words, assume a treatment has no effect. Write your hypothesis in a way that reflects this.

Are teens better at math than adults? Age has no effect on mathematical ability.
Does taking aspirin every day reduce the chance of having a heart attack? Taking aspirin daily does not affect heart attack risk.
Do teens use cell phones to access the internet more than adults? Age has no effect on how cell phones are used for internet access.
Do cats care about the color of their food? Cats express no food preference based on color.
Does chewing willow bark relieve pain? There is no difference in pain relief after chewing willow bark versus taking a placebo.

Other Types of Hypotheses

In addition to the null hypothesis, the alternative hypothesis is also a staple in traditional significance tests . It's essentially the opposite of the null hypothesis because it assumes the claim in question is true. For the first item in the table above, for example, an alternative hypothesis might be "Age does have an effect on mathematical ability."

Key Takeaways

  • In hypothesis testing, the null hypothesis assumes no relationship between two variables, providing a baseline for statistical analysis.
  • Rejecting the null hypothesis suggests there is evidence of a relationship between variables.
  • By formulating a null hypothesis, researchers can systematically test assumptions and draw more reliable conclusions from their experiments.
  • Random Error vs. Systematic Error
  • What Is a Hypothesis? (Science)
  • What Are Examples of a Hypothesis?
  • Scientific Method Flow Chart
  • What Are the Elements of a Good Hypothesis?
  • Scientific Method Vocabulary Terms
  • Understanding Simple vs Controlled Experiments
  • The Role of a Controlled Variable in an Experiment
  • What Is an Experimental Constant?
  • Six Steps of the Scientific Method
  • What Is a Testable Hypothesis?
  • Scientific Hypothesis Examples
  • What Is the Difference Between a Control Variable and Control Group?
  • DRY MIX Experiment Variables Acronym
  • What Is a Controlled Experiment?
  • Scientific Variable

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13.1 Understanding Null Hypothesis Testing

Learning objectives.

  • Explain the purpose of null hypothesis testing, including the role of sampling error.
  • Describe the basic logic of null hypothesis testing.
  • Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables for a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 clinically depressed adults and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for clinically depressed adults).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of clinically depressed adults, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called sampling error . (Note that the term error here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s r value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

  • There is a relationship in the population, and the relationship in the sample reflects this.
  • There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

The Logic of Null Hypothesis Testing

Null hypothesis testing is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the null hypothesis (often symbolized H 0 and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the alternative hypothesis (often symbolized as H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

  • Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
  • Determine how likely the sample relationship would be if the null hypothesis were true.
  • If the sample relationship would be extremely unlikely, then reject the null hypothesis in favor of the alternative hypothesis. If it would not be extremely unlikely, then retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of d = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favor of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value . A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the p value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called α (alpha) and is almost always set to .05. If there is less than a 5% chance of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to conclude that it is true. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood p Value

The p value is one of the most misunderstood quantities in psychological research (Cohen, 1994). Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the p value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the p value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the probability of obtaining the sample result if the null hypothesis were true.

Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the p value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the p value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s d is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s d is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 “How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant” shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word Yes , then this combination would be statistically significant for both Cohen’s d and Pearson’s r . If it contains the word No , then it would not be statistically significant for either. There is one cell where the decision for d and r would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Table 13.1 How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant

Relationship strength
Sample Size Weak Medium Strong
Small ( = 20) No No

= Maybe

= Yes

Medium ( = 50) No Yes Yes
Large ( = 100)

= Yes

= No

Yes Yes
Extra large ( = 500) Yes Yes Yes

Although Table 13.1 “How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant” provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

Statistical Significance Versus Practical Significance

Table 13.1 “How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant” illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007). The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word significant can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the statistical significance of a result and the practical significance of that result. Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

Key Takeaways

  • Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
  • The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favor of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
  • The probability of obtaining the sample result if the null hypothesis were true (the p value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
  • Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
  • Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.

Practice: Use Table 13.1 “How Relationship Strength and Sample Size Combine to Determine Whether a Result Is Statistically Significant” to decide whether each of the following results is statistically significant.

  • The correlation between two variables is r = −.78 based on a sample size of 137.
  • The mean score on a psychological characteristic for women is 25 ( SD = 5) and the mean score for men is 24 ( SD = 5). There were 12 women and 10 men in this study.
  • In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
  • In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
  • A student finds a correlation of r = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.

Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003.

Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science , 16 , 259–263.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

The problem of depression in adolescence

  • PMID: 7246333

The subjects for this study were tested using MMPI Scales 2 and 4 and the Beck Depression Inventory. There were four different groups studied, adolescent male patients, adolescent female patients, mothers, and fathers. The first null hypothesis stated that there will be no significant incidence of depression among adolescents hospitalized in a specific hospital facility. It is obvious that the null hypothesis was rejected for adolescents, both male and female, in the sample on the basis of the MMPI results. The second null hypothesis stated that there will be no significant incidence of depression among parents of adolescents admitted to a specific hospital for psychiatric treatment. There is not sufficient evidence to suggest that parents are depressed to a significant degree; therefore, the second null hypothesis is not rejected based on the results of the tests used in this study. However, there were a significant number of fathers who fell within the range of mild depression. Mothers did not appear depressed. The third null hypothesis stated that there will be no significant relationship between the presence of depression in hospitalized adolescents and the presence of depression in one or both parents. There was not evidence to permit rejection of this hypothesis; thus, there is not proof on the basis of this study that there is a correlation between family members.

PubMed Disclaimer

Similar articles

  • [Prevalence of depressive disorders in children and adolescents attending primary care. A survey with the Aquitaine Sentinelle Network]. Mathet F, Martin-Guehl C, Maurice-Tison S, Bouvard MP. Mathet F, et al. Encephale. 2003 Sep-Oct;29(5):391-400. Encephale. 2003. PMID: 14615688 French.
  • Parent-adolescent communication and its relationship to adolescent depression and suicide proneness. Stivers C. Stivers C. Adolescence. 1988 Summer;23(90):291-5. Adolescence. 1988. PMID: 3407491
  • MMPI profiles of depressed adolescents with and without conduct disorder. Herkov MJ, Myers WC. Herkov MJ, et al. J Clin Psychol. 1996 Nov;52(6):705-10. doi: 10.1002/(SICI)1097-4679(199611)52:6 3.0.CO;2-Q. J Clin Psychol. 1996. PMID: 8912114
  • Comparison of the suicidal behavior of adolescent inpatients with borderline personality disorder and major depression. Horesh N, Orbach I, Gothelf D, Efrati M, Apter A. Horesh N, et al. J Nerv Ment Dis. 2003 Sep;191(9):582-8. doi: 10.1097/01.nmd.0000087184.56009.61. J Nerv Ment Dis. 2003. PMID: 14504567
  • [Vulnerability to depression in children and adolescents: update and perspectives]. Purper-Ouakil D, Michel G, Mouren-Siméoni MC. Purper-Ouakil D, et al. Encephale. 2002 May-Jun;28(3 Pt 1):234-40. Encephale. 2002. PMID: 12091784 Review. French.
  • The effects of assertive training on the performance self-esteem of adolescent girls. Stake JE, Deville CJ, Pennell CL. Stake JE, et al. J Youth Adolesc. 1983 Oct;12(5):435-42. doi: 10.1007/BF02088725. J Youth Adolesc. 1983. PMID: 24306362
  • Search in MeSH

Related information

Linkout - more resources.

  • Genetic Alliance
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Depression history, depression vulnerability and the experience of everyday negative events

Megan a. o’grady.

1 University of Connecticut Health Center

Howard Tennen

Stephen armeli.

2 Fairleigh Dickinson University

This study examined whether deficits in dealing with daily problems emerge before a depressive episode (i.e., pre-existing vulnerability) or after a depressive episode (i.e., psychosocial scar). Participants completed a 30-day daily diary in which they reported their most negative event of the day, their appraisals of that event, and their mood. Three years later, they completed a structured depression interview. The sample consisted of 350 college students, 24 of whom had a past history of depression and 54 of whom experienced a depressive episode subsequent to dairy completion. Multilevel modeling revealed that students with past depression blamed others more than the never-depressed and those with subsequent depression, which supported the scar hypothesis. In support of the vulnerability hypothesis, as compared to the never-depressed group, participants with past depression demonstrated steeper declines in positive mood on more stressful days but did not significantly differ from the subsequent depression group. Overall, our findings do not provide clear support for either hypothesis; however, this study is the first to use a daily diary design to directly compare individuals with past depression to individuals who would subsequently experience depression.

Daily stressors have been linked to increases in negative mood ( Bolger, DeLongis, Kessler, & Schilling, 1989 ), physical illness ( Stone, Reed, & Neale, 1987 ), and relationship problems ( Bodenmann, Pihet, & Kayser, 2006 ). Moreover, ample research has examined how various personality factors, such as neuroticism, are related to differential exposure and reactivity to daily stressors (e.g., Bolger & Zuckerman, 1995 ). We know little, however, about whether individuals who have a history of depression, or who are vulnerable to depression, appraise, cope with, and react to stressors in ways that distinguish them from their more resilient peers. In the present study we examined the experience of daily stressful encounters among young adults who have a history of depression, those who are vulnerable to depression (by virtue of experiencing a subsequent depressive episode), and a group of never depressed individuals.

Evidence suggests that a history of major depression affects outcomes in diverse areas. For example, formerly depressed women show an increased risk for coronary heart disease due to a dysfunction in the blood vessels that is not evident among their never depressed counterparts ( Wagner, Tennen, Mansoor, & Abott, 2006 ). Intervention efficacy is also related to history of depression, as Zautra et al. (2008) found that treatment outcomes for adapting to rheumatoid arthritis depended on whether patients had a history of recurrent depression. Further, as compared to people who were never depressed, dysfunctional attitudes and negative attribution styles are elevated after remission from a major depressive episode ( Eaves & Rush, 1984 ; Haeffel et al., 2005 ). As illustrated here, the effects of a major depressive episode are far-reaching and present even when depressive episodes occurred and resolved years before (e.g., Fifield, Tennen, Reisine, & McQuillan, 1998 ).

Investigators have recently begun to investigate vulnerability conferred by history of depression using micro-longitudinal research designs – such designs allow for a better understanding of the day to day unfolding of the stress and coping process. For example, Tennen, Affleck, and Zautra (2006) found that fibromyalgia (FM) patients with a history of depression vented emotions more and experienced lower levels of pain coping efficacy on days with increased pain as compared to FM patients who had never been depressed. They also found that mood became less positive on more painful days among patients who had a history of depression and elevated levels of current depressive symptoms, a pattern that was not evidenced among those without a history of depression. The latter findings are consistent with the priming model, also known as the interaction model, which suggests that depressive vulnerability remains latent until it is triggered or primed by a state of elevated distress ( Roberts & Kassel, 1996 ; Tennen et al., 2006 ). Conner et al. (2006) found a similar pattern among individuals with rheumatoid arthritis.

We are aware of no research that has examined the role of history of depression in the experience of daily negative events or stressors. However, several studies have examined the association between current depressive symptoms and daily negative event exposure, appraisal and reactivity. Gunthert, Cohen, and Armeli (2002) found that individuals with elevated current depressive symptoms reported low levels of coping efficacy, and coping strategies were particularly ineffective in relieving distress when dealing with negative daily events. Also, similar to the pain and mood relationships in studies of history of depression, current depressive symptoms moderated the relationship between stressfulness ratings of daily negative events and positive affect. Gunthert, Cohen, Butler, and Beck (2007) found that among clinic outpatients high in current depressive symptoms, as compared to those low in current symptoms, negative thoughts and negative mood increased to a greater degree on days following a stressful interpersonal encounter as compared to days following a non-interpersonal stressor. Myin-Germeys et al. (2003) found that people with major depressive disorder showed greater increases in negative mood in response to daily stress than did their non-depressed counterparts.

These studies suggest that both mood and appraisals of negative events are affected by current depression status. This is supported by the broader depression literature, which indicates that depressed individuals maintain a negative evaluation bias ( Haaga & Beck, 1992 ) and that depression is characterized by lack of self-efficacy and control, pessimism, hopelessness and maladaptive patterns of blame and coping ( Abramson, Metalsky, & Alloy, 1989 ; Beck, Rush, Shaw, & Emery, 1979 ). It is unclear, however, whether previously depressed people or those who would experience a depressive episode during the next several years react to negative events less effectively, and whether current levels of depressive symptoms contribute to their stress-related responses.

Although previous research has shown that individuals with prior depression have some deficits when dealing with daily problems ( Conner et al., 2006 ; Tennen et al., 2006 ), we do not know whether these deficits emerge before or after the depressive episode. Two competing models have been posited to explain deficits among people with a history of depression. The “scar hypothesis” suggests that a depressive episode leaves lasting changes in personality and self-concept that lead a person to be more vulnerable to future mood disturbance ( Rohde, Lewinsohn, & Seely, 1990 ; Zautra et al., 2007 ). Based on this model one would not anticipate high levels of stress reactivity prior to the experience of the first depressive episode; increased stress reactivity should only occur after the first depressive episode. On the other hand, a vulnerability or “trait marker” hypothesis posits that formerly depressed individuals have preexisting characteristics that make them vulnerable to depression and that persist beyond remission from the depressive episode ( Rohde et al., 1990 ; Tennen et al., 2006 ). According to this model, increased stress reactivity would be expected both prior to and after experiencing a depressive episode. Distinguishing between the two hypotheses is important in order to refine etiological models of depression and design effective treatment programs that aim to prevent future episodes of depression ( Rohde et al., 1990 ).

There is mixed support in the literature regarding these two hypotheses. For example, in a longitudinal, prospective study, Rohde et al. (1990) found support for the scar hypothesis because individuals viewed themselves as less socially skilled after, but not prior to a depressive episode. Similarly, Rohde, Lewsinsohn, and Seeley (1994) reported that a variety “psychosocial scars” (e.g., depressive symptoms, anxiety, emotional reliance on others) were evident among adolescents recovered from a major depressive episode. However, Rohde et al. (1990) also found that participants who became depressed during their study rated their health as poorer than non-depressed controls both before and after the episode, supporting the vulnerability hypothesis. Beevers, Rohde, Stice, and Nolen-Hoeksema (2007) found additional support for the vulnerability hypothesis in their prospective study of female adolescents. They found that the negative effects of a major depressive episode did not persist on a variety of psychological and social outcomes after recovery from that episode as the scar hypothesis would suggest; instead, many factors were elevated both before and after depression onset. Finally, while a recent review found more consistent support for the vulnerability hypothesis, some studies did support the scar hypothesis ( Christensen & Kessing, 2006 ). However, this review focused exclusively on personality traits (e.g., whether personality was altered by a depressive episode), rather than psychosocial functioning.

Importantly, to our knowledge, no study focusing on the daily stress process using a daily process design has compared these models. Although the rigorous prospective studies in this area have made important contributions to our understanding of the psychosocial deficits associated with a history of depression and depression vulnerability, daily process designs can capture deficits as they play out in the context of daily functioning, and in relation to day-to-day changes in life encounters. Psychosocial scars and vulnerability deficits may be difficult to detect in traditional longitudinal or cross-sectional studies if they are manifested only during periods of elevated stress ( Rohde et al., 1990 ); daily studies may be more sensitive to detecting the interplay between current depressive symptoms, elevated daily stressors, and history of depression status.

The Current Study

We combined the virtues of daily process methods and a longer-term longitudinal study design to distinguish between the scar and vulnerability hypotheses. Specifically, a cohort of young adults completed an initial baseline survey and an electronic daily diary each day for 30 days. We then measured history of depression status three years after the baseline and diary assessment. In this way we were able to distinguish participants who had experienced a depressive episode prior to their participation in the study, those who became depressed for the first time subsequent to completing the diary, and those who had never experienced a depressive episode.

The diary portion of the study focused on two aspects of the daily stress process: negative event appraisals (perceived threat, appraised control, coping efficacy, other-blame attributions 1 ) and mood. Negative event appraisals and negative mood have been linked to psychological and biological strain, increased risk of mental and physical health problems, and compromised academic performance ( Felsten, 2002 ; Hojat, Gonnella, Erdmann, & Vogel, 2003 ; Lazarus & Folkman, 1984 ).

In line with previous research on history of depression and daily functioning (e.g., Conner et al., 2006 ), we anticipated that formerly depressed individuals, as compared to those without a history of depression, would report more daily event stress, less event control, less event coping efficacy, more event threat, and more negative mood. We also predicted that on days appraised as more stressful, participants with a history of depression, compared to their never depressed peers, would show greater elevations in negative mood, event threat, and other-blame, as well as greater decreases in event control and event coping efficacy.

Further, in our comparison of formerly depressed versus subsequently depressed participants, the scar hypothesis would be supported if formerly depressed participants showed a more maladaptive pattern of event attributions and mood than subsequently depressed participants. More specifically, the scar hypothesis would predict that participants who had been depressed prior to beginning the diary study, compared to those who became depressed subsequent to their study participation, would show greater deficits in mood and appraisals and would display more pronounced within-person changes in daily mood and appraisals when experiencing greater negative event stress. On the other hand, the vulnerability hypothesis would receive support if participants with past depression and those with subsequent depression show similar patterns of event appraisals and mood on average and when experiencing increased negative event stress. It should be noted that for either hypothesis to be fully supported, never depressed participants should exhibit more adaptive patterns of functioning than individuals with a past history of depression. In tests of each of these questions, we also examined the priming hypothesis by determining whether history of depression status interacted with current depressive symptoms to predict average levels of the daily variables and within-person relations between daily stress and the other daily variables (see Conner et al., 2006 and Tennen et al., 2006 ).

Participants and Procedure

Participants were recruited from an introductory psychology participant subject pool to take part in a study of daily experience and health-related behaviors. Introductory psychology is a core university course that is the most heavily subscribed (non-required) course at the University. Over 3000 students of diverse majors enroll in this course every academic year. The subject pool participants during recruitment for this study were similar to the overall student body in SAT scores (1167 vs. 1168 campus-wide) and ethnic make-up (14% minority [non Caucasian] vs. 17% campus-wide).

Participants first attended an information session where they provided consent and were given instructions on how to complete an Internet-based initial assessment and daily survey. Within a week of the information session, students completed an initial baseline assessment which included demographic, personality and health measures by logging onto a secure website. Approximately two weeks after completing the initial assessment, participants began the daily diary procedure. They used a secure website to complete a daily survey between 2:30pm and 7:00pm for 30 days which took about 5-minutes. To improve compliance with the daily surveys, participants received daily e-mail reminders. At the end of the 30 days, participants attended an exit session, received compensation, and agreed to be contacted for a future follow-up interview. Participants received course credit and monetary compensation for participation. Approximately three years later, they were contacted via phone or e-mail and invited to participate in the depression phone interview portion of the study. There were separate consent and reimbursement procedures for this portion of the study

Of the 574 participants who enrolled in the larger study, 447 (78%) completed the structured diagnostic interview for depression three years after completing the 30-day electronic diary. Of the 447 participants who completed the diagnostic interview, 17 were excluded because it was determined that they were depressed at the time they completed the diary portion of the study, and 4 were excluded because they reported a depressive episode both prior to the diary and during the subsequent 3 years. Finally, participants ( n = 76) were excluded if they completed less than 15 days of the diary data, or had incomplete data from the initial assessment 2 . This resulted in a final sample of 350 (156 men; 194 women). Participants were mostly White (89%), and at the start of the study were mainly freshmen (59%) with a mean age of 18.65 ( SD = .86). The three hundred-fifty participants provided 8869 days of data of out a possible 10,500 daily surveys, which is a compliance rate of 84%.

Initial assessment

In addition to demographic questions, participants completed the 13-item short form of the Beck Depression Inventory (BDI; Beck & Beck, 1972 ) to measure current depressive symptoms. BDI scores were summed (α = .83) and ranged from 0 – 20 ( M = 4.5, SD = 4.05). Neuroticism was also assessed and served as a control variable, as past research shows that neuroticism is related to depression and stressor appraisals, coping and reactivity ( Bolger & Shilling, 1991 ; Tennen et al., 2006 ). Neuroticism was measured using the 12-item Neuroticism subscale of the NEO five-factor inventory ( Costa & McCrae, 1992 ). Participants rated the extent to which they agreed with statements measuring neurotic tendencies (e.g., I often feel inferior) from 1 ( strongly disagree ) to 7 ( strongly agree ). Items were averaged (α = .86) and scores ranged from 1.08 – 6.33 ( M = 3.56, SD = 1).

Daily mood, negative event stress, and negative event appraisals

Participants described their current daily mood by rating adjectives on a scale from 1 ( not at all ) to 5 ( extremely ). A “pleasant mood” scale (α = .88) 3 was created by averaging daily scores for “happy” and “cheerful.” Unpleasant mood (α = .73) was created by averaging daily scores for “sad” and “dejected.” Items were drawn from Larsen and Diener’s (1992) mood complex, and similar to Tennen et al. (2006) , only the pure pleasant and unpleasant mood states were retained. Participants were then asked to think about their most negative experience that day, provide a brief description of it, and rate how stressful and threatening this event was, how much they felt they could control the event’s outcome, how well they felt they could deal with the event, and if this event was someone else’s’ fault. These questions were rated on a scale of 1 ( not at all ) to 7 ( extremely ). Participants listed a variety of negative events, with academic being the most common (33%), and the remaining interpersonal (e.g., friendship issues; 15%), health (e.g., trouble sleeping; 25%), or other (e.g., the weather; 27%) events 4 .

History of depression status

The mood episode module of the Structured Clinical Interview for the DSM-IV (SCID-I; First, Spitzer, Gibbon, & Williams, 2002 ) was used to classify participants’ history of depression status. This interview occurred three years after the diary portion of the study. The SCID-I requires that a mood disturbance created significant impairment to life at the time of the depressive episode. Interviews were conducted by research associates who were trained in the administration and coding of the SCID-I by a clinical psychologist, also a trained SCID evaluator. SCID depression interviews were conducted by phone, which have been shown to be comparable with face-to-face protocols ( Rohde, Lewinsohn, & Seeley, 1997 ; Simon, Revicki, & Von Korff, 1993 ). To qualify for a diagnosis of lifetime major depression, a participant needed to endorse a time in which he or she experienced depressed mood or loss of interest every day or nearly every day for at least two weeks and that these changes in mood/interest significantly impaired his/her functioning at that time. During this period, the participant also needed to report having experienced at least four of the following symptoms: Changes in appetite or weight; sleep disturbance; fatigue or lack of energy; diminished self-worth; motor agitation or slowing; and suicidal thoughts. The depressive episode could not be due to normal bereavement, injury, illness, alcohol/drugs, or medication. During these interviews participants reported both past and current depressive episodes. Prior research indicates that people can accurately recall the occurrence of a previous depressive episode ( Thompson, Bogner, Coyne, Gallo, & Eaton, 2004 ).

Participants were classified into three depression groups: Past depression, (i.e., a major depressive episode prior to the diary portion of the study; n = 24), subsequent depression, (i.e., a first major depressive episode during the three years following the beginning of the study; n = 54), and never depressed ( n = 272). The mean age of depression episode was 20.88 ( SD = 1.09) for participants in the subsequent depression group and 15.74 ( SD = 2.25) for those in the past depression group. See Table 1 for descriptive statistics for each depression group on the personality measures, diary outcomes, and negative events reported.

Descriptive statistics based on depression history status

Subsequent Depression GroupNever Depressed GroupPast Depression Group
M ( ) or %M ( ) or %M ( ) or %
Age18.57(.69)18.63(.83)19.13(1.26)
BDI6.13(4.58)4.11(3.92)5.25(3.27)
N3.93(.98)3.45(.97)3.94(1.13)
Academic Events28.234.4%26.7
Interpersonal Events17.913.9%19.6
Health Events24.924.9%25.8
Other Events27.926.7%26.5
Negative Event Stress4.52(.66)4.05(1.05)4.47(.68)
Negative Event Threat3.05(1.57)2.76(1.08)3.04(.98)
Negative Event Control3.74(.98)3.88(.93)3.91(.96)
Negative Event Ability to Deal4.76(.96)4.72(1.01)4.79(.74)
Negative Event Other Blame2.91(.82)2.65(.93)3.50(.91)
Pleasant Mood2.45(.70)2.71(.74)2.67(.59)
Unpleasant Mood1.38(.40)1.33(.40)1.40(.39)

One-way ANOVAs and Tukey’s post-hoc tests indicated that participants with past depression were slightly older at the beginning of the study than those who were never depressed and those who went on to become depressed subsequent to diary participation ( F (2,347) = 4.07, p < .05). There were also mean differences between groups on the BDI and Neuroticism measures. The subsequent depression group had the highest BDI scores, and scored significantly higher than participants in the never depressed group; participants with past depression did not differ from the other two groups on the BDI ( F (2,347) = 6.21, p < .01). Never depressed participants had the lowest Neuroticism scores, and their Neuroticism scores were significantly lower than those in the subsequent depression and past depression groups ( F (2,347) = 7.33, p < .01). Chi-square analyses indicated there were no significant differences between the depression groups on gender, race, and dating status.

Statistical analysis

The daily diary data had a nested structure in which the 30 repeated daily assessments (level 1) were nested within people (level 2; Raudenbush & Bryk, 2002 ). Therefore, multilevel modeling using Hierarchical Linear Modeling software (v. 6.06; Raudenbush, Bryk, & Congdon, 2008 ) was used to investigate the hypotheses. In all analyses, continuous level 1 variables were person-mean centered and continuous level 2 variables were grand mean-centered. The history of depression status variable was dummy-coded such that the past depression group was compared to the no depression and the subsequent depression groups.

History of Depression Status and Average Mood and Negative Event Appraisals

Intercept-only models were constructed to determine whether history of depression status was associated with average levels of mood and event appraisals; current depressive symptoms (BDI), neuroticism and age were also included in all models. Age and neuroticism were included as control variables because they differed based on history of depression status. In addition, to test the priming hypothesis, a product term derived from the history of depression status and current depressive symptoms predictors was included to test for their interactive effect (e.g., Fifield et al., 1998 ). Using the pleasant mood outcome as an example, the multilevel equations are as follows:

In these equations, we were interested in the significance of G 01, and G 02 as these indicate the unique effects of history of depression status (at average levels of BDI) on average pleasant mood. The G 06 and G 07 terms were also of interest because they indicate whether history of depression status and current depression interact to affect average mood in these models. These interaction terms were dropped from the model if they were not significant to ease interpretation of the lower order BDI effect.

As Table 2 indicates, individuals with past depression were more likely to blame others for a negative event than were the never depressed and subsequently depressed groups. Also, there were two significant interactions between history of depression status and current symptoms indicating a priming effect. As illustrated in Figure 1 , individuals in the past depression group had the strongest relationship between current symptoms and perceived threat, while those with subsequent depression had the weakest. As Table 2 indicates, these slopes differed significantly between the subsequent and past depression groups. The slopes did not differ between the past and never depressed groups. Figure 2 shows a similar pattern, such those with past depression had the strongest relationship between current symptoms and unpleasant mood, with the past depression group slope significantly differing from the subsequent depression group’s slope. There was also a trend with marginal significance that the never depressed group differed from the past depression group’s slope.

An external file that holds a picture, illustration, etc.
Object name is nihms194460f1.jpg

Perceived threat of negative event as a function of current depressive symptoms and history of depression status.

An external file that holds a picture, illustration, etc.
Object name is nihms194460f2.jpg

Unpleasant mood as a function of current depressive symptoms and history of depression status.

The association between depression history status and current depression symptoms and average diary measures

Outcome MeasureUnstandardized coefficients
Subsequent Depression Group Never Depressed Group BDIBDI × Subsequent DepressionBDI × Never Depressed
Negative Event Stress−0.07−0.370.05 ----
Negative Event Threat0.10−0.130.14 −.16 −.07
Negative Event Control−0.19−0.060.02----
Negative Event Ability to Deal−0.05−0.24−0.01----
Negative Event Other Blame−0.58 −0.84 −0.001----
Pleasant Mood−0.22−0.15−0.04 ----
Unpleasant Mood−0.01−0.0010.06 −.06 −.05

Note . Neuroticism and age were controlled for in all models. Interaction terms were dropped from the model if they were not significant.

History of Depression Status and the Relationship between Negative Event Stress, Mood, and Appraisals

We determined whether history of depression status was related to differences in the within-person daily associations between negative event stress and mood and appraisals using a series of slope-as-outcomes models in HLM. Using pleasant mood as an example, the multilevel equations are as follows:

In these models we focused upon the significance tests of G 11 and G 12 as these represent the association between history of depression status and the within-person relation between negative event stress and pleasant mood (while controlling for current depression, age and Neuroticism). In addition, the G 16 and G 17 terms indicate if history of depression status and current depression interact in predicting the relationship between daily negative event stress and mood. Again, if the BDI × history of depression status interaction was not significant, it was removed from the model to ease interpretation of the lower order terms.

As noted in Table 3 , deviations from one’s mean level of negative event stress was related to perceiving events as more threatening, feeling less able to deal with the event, greater other-blame for the event, less pleasant mood and more unpleasant mood. In addition, history of depression status affected the relationship between negative event stress and pleasant mood. As shown in Figure 3 , as daily negative event stress increased, daily pleasant mood declined most sharply among those with past depression, as compared to those with no depression history and subsequent depression; however the difference in slopes was significant only between the past and never depressed groups. We also found a three-way interaction between current symptoms, history of depression status (subsequent depression vs. past depression) and daily stress in predicting daily negative event threat. Figure 4 illustrates that the there was a stronger moderating effect of depression status on the relationship between BDI scores and daily perceived event threat on high stress days versus low stress days. Specifically, among past depressed individuals, but not subsequently depressed individuals, current symptoms show a stronger relationship with daily threat perceptions on high versus low stress days.

An external file that holds a picture, illustration, etc.
Object name is nihms194460f3.jpg

Pleasant mood as a function of negative event stress and history of depression status.

An external file that holds a picture, illustration, etc.
Object name is nihms194460f4.jpg

Perceived threat of negative event as a function of perception of negative event stress, current depressive symptoms and history of depression status.

The within-person association between depression history status, current depression symptoms, negative event stress and diary measures

Outcome MeasureUnstandardized coefficients
Daily event stressSubsequent Depression Group × Daily stressNever Depressed Group × Daily stressBDI × Daily stressBDI × Subsequent Depression × Daily stressBDI × Never Depressed × Daily stress
Negative Event Threat0.54 −0.01−0.040.04 −.04 −.03
Negative Event Control0.05−0.01−0.01−0.004----
Negative Event Ability to Deal−0.27 −0.020.09−0.01----
Negative Event Other Blame0.18 −0.060.01−0.003----
Pleasant Mood−0.16 0.030.06 0.001----
Unpleasant Mood0.09 −0.02−0.030.001----

Note . Neuroticism and age were controlled for in all models. The interaction terms involving depression history group and BDI were dropped from the model if they were not significant.

We predicted that formerly depressed individuals, as compared to those without a history of depression, would appraise daily events more negatively. In support of this prediction, we found that compared to participants who had never been depressed, those who had experienced a depressive episode prior to study participation were more inclined to blame others for the negative events they encountered during the diary period. While blaming the self for negative events is often considered maladaptive and has been emphasized in the literature, blaming others is also associated with poor psychological adjustment because it may interfere with adaptive coping, lower sense of control over events, or hinder social support ( Hall, French, & Marteau, 2003 ; Tennen & Affleck, 1990 ). Therefore, our findings suggest that depression researchers pay closer attention to other-blame among formerly depressed individuals.

We also predicted that on days with a more stressful event, individuals with a history of depression would report greater increases in maladaptive event-related appraisals, and more negative and less positive mood as compared to never depressed individuals. Echoing Conner et al. (2006) , we found that participants who had been depressed prior to study participation demonstrated steeper declines in positive mood on more stressful days. Similar to Tennen et al. (2006) we did not find this pattern for negative affect. It is possible that because there was less variability in negative mood than positive mood, this effect was harder to capture. Future research will be needed to determine why in some cases positive affect is affected but negative affect is not. In any case, declines in positive affect are important even without increases in negative affect. For example, according to the ‘broaden and build’ theory ( Fredrickson, 2001 ), positive emotions increase emotional well-being by broadening people’s attention and cognitive resources and should facilitate coping with stress (Fredrickson & Joiner, 2002). The declines in positive mood seen in this study among the formerly depressed on high stress days indicate that these individuals may have difficulty finding meaning in everyday negative events, which could continue their cycle of lower positive affect and pessimistic thinking (Fredrickson & Joiner, 2002).

Our findings indicate that people with a history of depression do have deficits on some of the outcomes in this study as compared to those who have never been depressed, suggesting that either psychosocial scars or preexisting vulnerabilities may be responsible. In this study, we found support for both explanations. Full support for the scar hypothesis would be provided if the past depression group significantly differed from both the future and the never depressed group by reporting greater deficits in functioning. The strongest support for this hypothesis comes from the finding that individuals with a history of depression blamed others for negative events more, on average, compared to those who would subsequently become depressed and who were never depressed. Additional support for the scar hypothesis, albeit less strong, comes from a priming effect. Participants with past depression, who also had high current depressive symptoms, reported higher daily levels of unpleasant mood as compared to participants in the subsequent depression and never depressed groups. However, full support for the scar hypothesis was dampened due to the marginal nature of the difference between the never depressed and past depression groups’ slopes; a significant difference between these groups would be required for full support. An additional priming effect suggested that among individuals with greater current depressive symptoms, the past depression group perceived events to be more threatening than the subsequent depression group, especially on high negative event stress days, providing initial support for the scar hypothesis. However this finding did not provide full support because the past and never depressed groups did not significantly differ from each other. While tentative, the findings reported here in support of the scar hypothesis raise concerns regarding individuals with a history of depression because other-blame, negative mood and appraising events as threatening has been related to poorer psychological adjustment among a variety of populations (e.g., Chandler, Kennedy, & Sandhu, 2007 ; Felsten, 2002 ; Pakenham, 1999 ; Pakenham & Rinaldis, 2001 ).

As noted, several of these findings occurred as a result of a priming effect, which suggests that vulnerabilities from depressive episodes are triggered or “primed” by a state of elevated distress (e.g., high current depressive symptoms; Roberts & Kassel, 1996 ). This indicates that in some cases, deficits due to “psychosocial scars” may only be evident in day to day functioning at times when symptoms are elevated. Therefore, in addition to collecting information about past depressive episodes, researchers should also measure current symptoms and determine their interactive affect on the stress reactivity process.

We also found evidence for the vulnerability hypothesis, with the strongest support coming from our finding that as compared to the never depressed group, participants who had been depressed prior to study participation demonstrated steeper declines in positive mood on more stressful days and they did not significantly differ from the subsequent depression group. Similar support for the vulnerability hypothesis is seen in the lack of differences between the past depression and subsequent depression groups on four of the seven outcomes, indicating that these two groups have similar cognitive patterns in some areas both prior to and after depressive episodes. However, due to a host of factors (especially the small sample sizes), these null findings should be interpreted with caution.

While we add to the mixed findings on the scar vs. vulnerability hypothesis debate, the findings of this study are noteworthy because we provided a stringent test of the hypotheses by comparing previously depressed individuals to both never depressed and subsequently depressed individuals using a daily dairy method. To our knowledge, this is the first such test of these hypotheses.

Strengths and Limitations

This study had several distinct strengths and offers some insight into the role of major depression in daily functioning. We combined a daily process and longitudinal study design and this method differs from the cross-sectional and prospective designs that have previously been used to investigate the scar versus the vulnerability hypothesis. This study is the first to use such a design to directly compare individuals who had previously experienced a depressive episode to individuals who would subsequently experience a depressive episode several years later allowing for a clear comparison of the two hypotheses. Our study methods also allowed us to examine the influence of individual differences in history of depression status on daily within-person slopes (e.g., when-then contingencies or behavioral signatures; Shoda, Mischel & Wright, 1994 ; Tennen, Affleck, Armeli, & Carney, 2000 ), rather than utilizing nomothetic methods which at times have not detected deficiencies in functioning before and after depressive episodes (see Haeffel et al., 2005 ).

Despite these notable strengths, there are limitations which must be mentioned. First, the past depression group was relatively small, which limits the power of our analyses. Second, although we used a well-validated method to measure history of depression status (i.e., SCID interviews), we nonetheless had to rely on participants’ recollections of the timing of their previous depression. Although the SCID has been shown to be valid for recall of previous episodes ( Thompson et al., 2004 ), this method in not as precise as prospective assessment methods (e.g., Alloy et al., 1999 ). This study offered a unique glimpse into everyday psychosocial functioning among individuals who would subsequently become depressed; however, a diary study embedded within a fully prospective longitudinal design will be an important next step. In addition, a more comprehensive (e.g., multiple daily assessments) study of daily stress, appraisals, coping, and mood might provide more reliable assessments of these processes.

Although we followed participants for three years after they completed the diary, the students in our study had not yet reached the normative age of highest risk for the onset of a first major depressive episode ( Zisook et al., 2004 ). Therefore, there are students in the never depressed group who may go on to experience a depressive episode perhaps leading us to understate the observed associations and miss others. Even longer follow-up periods will be required to fully answer the questions posed in this study. In addition, because the average age of first onset of major depression is in the mid-twenties, this study may reflect outcomes specific to those with early-onset/adolescent depression. As compared to adult-onset, pre-adulthood onset is associated with greater clinical (e.g., more irritability) and psychosocial (e.g., lower educational attainment) deficits; therefore, future research may need to determine whether similar outcomes are found among those with adult-onset depression (e.g., Kessler, Foster, Saunders, & Stang, 1994; Zisook et al., 2004 ). We did not assess whether participants had received a depression diagnosis, or whether they had received treatment; therefore, it is unknown how this may have affected the results. In the future, diagnostic and treatment history should be gathered and perhaps used as a control in statistical analyses. Finally, our sample was predominantly White and consisted of college students. This is a strength because previous history of depression diary studies have focused on clinical populations and this study extends findings to a more normative population; however, we do not know whether these findings generalize to more diverse populations in terms of race, age, education, and socioeconomic status. We did, however, document that our study participants reflected the broader student population on campus.

Implications

This study has implications for depression theory and research. It is important to understand why differences in functioning are present between the formerly depressed and never depressed in order to better understand the progression and treatment of depression. Theoretically, we found some support for the scar hypothesis, though several other studies have found limited to no support for it. It is possible that other studies did not support the scar hypothesis because of the nature of the outcomes measured (e.g., non stress-related outcomes), or the method used. As we mentioned previously, daily process methods appear to be particularly well suited to testing the scar and vulnerability hypotheses and future investigations should consider using such methods.

Methodologically, this study suggests that typical control groups used in depression research may need to be reconsidered. Although all of our study participants would be considered non-depressed in most studies, and thus would be eligible to be included in a control group in a depression study, our previously depressed participants showed some depression-related appraisal and mood reactivity patterns. Although investigators cannot identify who will go on to experience a depressive episode, they can reliably determine depression history.

Future Directions

We know little about whether individuals who have a history of depression or who are vulnerable to depression are exposed to more daily stressors than their never-depressed counterparts; however, our methods in the current study did not allow us to examine differential stress generation and exposure processes between the depression groups. While research indicates that depressed individuals appear to generate stressful interpersonal events ( Hammen, 1991 ), future research should examine whether history of depression status and the appraisal processes investigated in this study affect the generation of stressful events.

In the current study, some outcomes supported the vulnerability hypothesis, while others supported the scar hypothesis. It is not clear why this occurred, however it could be because we asked participants to report only on the most negative event of their day rather than allowing them to report on an unlimited amount of daily negative events. In addition, much less is known about how positive events are perceived among depressed individuals. Future research should investigate whether some types of appraisals may be particularly prone to deficits related to depression, and should expand the type and amount of negative events reported, as well as investigate positive events. Finally, it is possible that both hypotheses may be true in some cases, as this and other research indicates that both scars and preexisting vulnerabilities contribute to deficits in functioning.

In conclusion, previous daily diary research has consistently suggested that past depressive episodes negatively affect individuals’ appraisals, coping and mood. However, these studies did not determine why such vulnerabilities existed. In this study, we offer the first examination of the scar versus the vulnerability hypotheses using daily process methods in addition to extending previous findings on the history of depression and daily life. Overall, our findings do not provide clear support for either the scar hypothesis or the vulnerability hypothesis, with the past depression and subsequent depression groups differing on several of the outcome measures, but not on others. However, this study is an important first step to understanding why people who experienced depression in the past are sometimes less able to maintain their well-being in the face of daily challenges.

Acknowledgments

This research was supported by grants T32-AA007290 and P50-AA03510 from the National Institute on Alcohol Abuse and Alcoholism.

1 A variety of negative event attributions (see Abramson et al., 1989 ; Tennen & Affleck, 1990 ) have been linked to depression, including the perception of how stable and global the event cause is, and blaming the self or others for the event. In the current study we only assessed other-blame. We considered assessing other types of attributions but felt other-blame was important because it has received little attention in the literature and including other attribution types would have required many additional diary items.

2 Analyses indicated that excluded students ( M = 18.95, SD = 1.34) were slightly older than included students ( M = 18.65, SD = .86), t (572) = −3.23, p < .01, and differed by race and gender in that a larger proportion of males and students who identified as Hispanic/Latino or Other were excluded, X 2 (4) = 15.56, p < .01.

3 Alpha reliabilities for composite measures were computed at the daily level (across all individuals and all days).

4 Additional multilevel analyses (available from the first author) were performed to investigate whether event type interacted with depression group to affect appraisals and mood. Results indicated that the relationship between event type (e.g., academic, interpersonal, health) and the outcomes variables did not differ across depression groups except for the pleasant mood outcome. For this outcome, the past depression group’s positive mood decreased on days when a negative event was non-academic, as compared to an academic event. The other two group’s moods were not affected by whether the event was academic or non-academic.

  • Abramson LY, Metalsky GI, Alloy LB. Hopelessness depression: A theory-based subtype of depression. Psychological Review. 1989; 96 :358–372. [ Google Scholar ]
  • Alloy LB, Abramson LY, Whitehouse WG, Hogan ME, Tashman NA, Steinberg DL, et al. Depressogenic cognitive styles: predictive validity, information processing and personality characteristics, and developmental origins. Behaviour Research and Therapy. 1999; 37 :503–531. [ PubMed ] [ Google Scholar ]
  • Beck AT, Beck RW. Screening depressed patients in family practice: A rapid technique. Postgraduate Medicine. 1972; 52 :81–85. [ PubMed ] [ Google Scholar ]
  • Beck AT, Rush AJ, Shaw BF, Emery G. Cognitive therapy of depression. New York: Guilford; 1979. [ Google Scholar ]
  • Beevers CG, Rohde P, Stice E, Nolen-Hoeksema S. Recovery from major depressive disorder among female adolescents: A prospective test of the scar hypothesis. Journal of Consulting and Clinical Psychology. 2007; 75 :888–900. [ PubMed ] [ Google Scholar ]
  • Bodenmann G, Pihet S, Kayser K. The relationship between dyadic coping and marital quality: A 2-year longitudinal study. Journal of Family Psychology. 2006; 20 :485–493. [ PubMed ] [ Google Scholar ]
  • Bolger N, DeLongis A, Kessler RC, Schilling EA. Effects of daily stress on negative mood. Journal of Personality and Social Psychology. 1989; 57 :808–818. [ PubMed ] [ Google Scholar ]
  • Bolger N, Shilling EA. Personality and the problems of everyday life: The role of neuroticism in exposure and reactivity to daily stressors. Journal of Personality. 1991; 59 :355–386. [ PubMed ] [ Google Scholar ]
  • Bolger N, Zuckerman A. A framework for studying personality in the stress process. Journal of Personality and Social Psychology. 1995; 69 :890–902. [ PubMed ] [ Google Scholar ]
  • Chandler M, Kennedy P, Sandhu N. The association between threat appraisals and psychological adjustment in partners of people with spinal cord injuries. Rehabilitation Psychology. 2007; 52 :470–477. [ Google Scholar ]
  • Christensen MV, Kessing LV. Do personality traits predict first onset in depressive and bipolar disorder? Nordic Journal of Psychiatry. 2006; 60 :79–88. [ PubMed ] [ Google Scholar ]
  • Conner TS, Tennen H, Zautra AJ, Affleck G, Armeli S, Fifield J. Coping with rheumatoid arthritis pain in daily life: Within-person analyses reveal hidden vulnerability for the formerly depressed. Pain. 2006; 126 :198–209. [ PubMed ] [ Google Scholar ]
  • Costa PT, McCrae RR. NEO PI-R. Professional Manual. Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO-FFI) Odessa, FL: Psychological Assessment Resources, Inc; 1992. [ Google Scholar ]
  • Eaves G, Rush AJ. Cognitive patterns in symptomatic and remitted unipolar major depression. Journal of Abnormal Psychology. 1984; 93 :31–40. [ PubMed ] [ Google Scholar ]
  • Felsten G. Minor stressors and depressed mood: Reactivity is more strongly correlated than total stress. Stress and Health. 2002; 18 :75–81. [ Google Scholar ]
  • Fifield J, Tennen H, Reisine S, McQuillan J. Depression and the long-term risk of pain, fatigue, and disability in patients with rheumatoid arthritis. Arthritis and Rheumatism. 1998; 41 :1851–1857. [ PubMed ] [ Google Scholar ]
  • First MB, Spitzer RL, Gibbon M, Williams JBW. Structured clinical interview for DSM-IV TR Axis I disorders, research version, non-patient edition (SCID-I/NP) New York: Biometrics Research, New York State Psychiatric Institute; 2002. [ Google Scholar ]
  • Fredrickson BL. The role of positive emotions in positive psychology: The broaden and build theory of positive emotions. American Psychologist. 2001; 56 :218–226. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fredrickson BL, Joiner T. Positive emotions trigger and upward spiral toward emotional well-being. Psychological Science. 13 :172–175. [ PubMed ] [ Google Scholar ]
  • Gunthert KC, Cohen LH, Armeli S. Unique effects of depressive and anxious symptomatology on daily stress and coping. Journal of Social and Clinical Psychology. 2002; 21 :583–609. [ Google Scholar ]
  • Gunthert KC, Cohen L, Butler AC, Beck JS. Depression and next-day spillover of negative mood and depressive cognitions following interpersonal stress. Cognitive Research and Therapy. 2007; 31 :521–532. [ Google Scholar ]
  • Haaga D, Beck AT. Cognitive therapy. In: Paykel ES, editor. Handbook of affective disorders. 2. New York: The Guilford Press; 1992. pp. 511–523. [ Google Scholar ]
  • Haeffel GJ, Abramson LY, Voelz ZR, Metalsky GI, Halberstadt L, Dykman BM, et al. Negative cognitive styles, dysfunctional attitudes, and the remitted depressive paradigm: A search for the elusive cognitive vulnerability to depression factor among remitted depressives. Emotion. 2005; 5 :343–348. [ PubMed ] [ Google Scholar ]
  • Hall S, French DP, Marteau TM. Causal attributions following serious unexpected negative events: A systematic review. Journal of Social & Clinical Psychology. 2003; 22 :515–536. [ Google Scholar ]
  • Hojat M, Gonnella JS, Erdmann JB, Vogel WH. Medical students’ cognitive appraisal of stressful life events as related to personality, physical well-being, and academic performance: A longitudinal study. Personality and Individual Differences. 2003; 35 :219–235. [ Google Scholar ]
  • Hammen C. Generation of stress in the course of unipolar depression. Journal of Abnormal Psychology. 1991; 100 :555–561. [ PubMed ] [ Google Scholar ]
  • Kessler RC, Foster CL, Saunders WB, Stang PE. The social consequences of psychiatric disorders: I. Educational attainment. American Journal of Psychiatry. 1995; 152 :1026–1032. [ PubMed ] [ Google Scholar ]
  • Larsen RJ, Diener E. Promises and problems with the circumplex model of emotion. In: Clark MS, editor. Review of personality and social psychology. Newbury Park: Sage; 1992. pp. 25–59. [ Google Scholar ]
  • Lazarus RS, Folkman S. Stress, appraisal, and coping. New York: Springer; 1984. [ Google Scholar ]
  • Myin-Germeys I, Peeters F, Havermans R, Nicolson NA, deVries MW, Delespaul P, et al. Emotional reactivity to daily life stress in psychosis and affective disorder: An experience sampling study. Acta Psychiatrica Scandinavica. 2003; 107 :124–131. [ PubMed ] [ Google Scholar ]
  • Pakenham KI. Adjustment to multiple sclerosis: Application of a stress and coping model. Health Psychology. 1999; 18 :383–392. [ PubMed ] [ Google Scholar ]
  • Pakenham KI, Rinaldis M. The role of illness, resources, appraisal and coping strategies in adjustment to HIV/AIDS: The direct and buffering effects. Journal of Behavioral Medicine. 2001; 24 :259–279. [ PubMed ] [ Google Scholar ]
  • Raudenbush SW, Bryk AS. Hierarchical linear models: Applications and data analysis methods. London: Sage; 2002. [ Google Scholar ]
  • Raudenbush SW, Bryk AS, Congdon R. HLM 6 hierarchical linear and nonlinear modeling. Scientific Software International Inc; 2008. [ Google Scholar ]
  • Roberts JE, Kassel JD. Mood-state dependence in cognitive vulnerability to depression: The roles of positive and negative affect. Cognitive Therapy and Research. 1996; 20 :1–12. [ Google Scholar ]
  • Rohde P, Lewinsohn PM, Seeley JR. Are people changed by the experience of having an episode of depression? A further test of the scar hypothesis. Journal of Abnormal Psychology. 1990; 99 :264–271. [ PubMed ] [ Google Scholar ]
  • Rohde P, Lewinsohn PM, Seeley JR. Are adolescents changed by an episode of major depression? Journal of the American Academy of Child & Adolescent Psychiatry. 1994; 33 :1289–1298. [ PubMed ] [ Google Scholar ]
  • Rohde P, Lewinsohn PM, Seeley JR. Comparability of telephone and face-to-face interviews assessing Axis I and II disorders. American Journal of Psychiatry. 1997; 154 :1593–1598. [ PubMed ] [ Google Scholar ]
  • Shoda Y, Mischel W, Wright J. Intraindividual stability in the organization and patterning of behavior: Incorporating psychological situations into the idiographic analysis of personality. Journal of Personality and Social Psychology. 1994; 67 :674–687. [ PubMed ] [ Google Scholar ]
  • Simon GE, Revicki D, Von Korff M. Telephone assessment of depression severity. Journal of Psychiatric Research. 1993; 27 :247–252. [ PubMed ] [ Google Scholar ]
  • Stone AA, Reed BR, Neale JM. Changes in daily event frequency precede episodes of physical symptoms. Journal of Human Stress. 1987; 13 :70–74. [ PubMed ] [ Google Scholar ]
  • Tennen H, Affleck G. Blaming others for threatening events. Psychological Bulletin. 1990; 108 :209–232. [ Google Scholar ]
  • Tennen H, Affleck G, Armeli S, Carney MA. A daily process approach to coping: Linking theory, research, and practice. American Psychologist. 2000; 55 :626–636. [ PubMed ] [ Google Scholar ]
  • Tennen H, Affleck G, Zautra A. Depression history and coping with chronic pain: A daily process analysis. Health Psychology. 2006; 25 :370–379. [ PubMed ] [ Google Scholar ]
  • Thompson R, Bogner HR, Coyne JC, Gallo JJ, Eaton WW. Personal characteristics associated with consistency of recall of depressed or anhedonic mood in the 13-year follow-up of the Baltimore Epidemiologic Catchment Area Survey. Acta Psychiatrica Scandinavica. 2004; 109 :345–354. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wagner JA, Tennen H, Mansoor GA, Abbott G. History of depressive disorder and endothelial function in postmenopausal women. Psychosomatic Medicine. 2006; 68 :80–86. [ PubMed ] [ Google Scholar ]
  • Zautra AJ, Parrish BP, Van Puymbroeck CM, Tennen H, Davis MC, Reich JW, et al. Depression history, stress, and pain in rheumatoid arthritis patients. Journal of Behavioral Medicine. 2007; 30 :187–197. [ PubMed ] [ Google Scholar ]
  • Zautra AJ, Davis MC, Reich JW, Nicassio P, Tennen H, Finan P, et al. Comparison of cognitive behavioral and mindfulness meditation interventions on adaptation to rheumatoid arthritis for patients with and without history of recurrent depression. Journal of Consulting and Clinical Psychology. 2008; 76 :408–421. [ PubMed ] [ Google Scholar ]
  • Zisook S, Rush AJ, Albala A, Alpert J, Balasubramani GK, Fava M, et al. Factors that differentiate early vs. later onset of major depression disorder. Psychiatry Research. 2004; 129 :127–140. [ PubMed ] [ Google Scholar ]

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 April 2022

Transcriptome-wide association study for postpartum depression implicates altered B-cell activation and insulin resistance

  • Jerry Guintivano   ORCID: orcid.org/0000-0003-3541-1101 1 ,
  • Karolina A. Aberg   ORCID: orcid.org/0000-0001-6103-5168 2 ,
  • Shaunna L. Clark 3 ,
  • David R. Rubinow 1 ,
  • Patrick F. Sullivan 1 , 4 , 5 ,
  • Samantha Meltzer-Brody 1 &
  • Edwin J. C. G. van den Oord 2  

Molecular Psychiatry volume  27 ,  pages 2858–2867 ( 2022 ) Cite this article

4511 Accesses

10 Citations

216 Altmetric

Metrics details

  • Molecular biology

Postpartum depression (PPD) affects 1 in 7 women and has negative mental health consequences for both mother and child. However, the precise biological mechanisms behind the disorder are unknown. Therefore, we performed the largest transcriptome-wide association study (TWAS) for PPD (482 cases, 859 controls) to date using RNA-sequencing in whole blood and deconvoluted cell types. No transcriptional changes were observed in whole blood. B-cells showed a majority of transcriptome-wide significant results (891 transcripts representing 789 genes) with pathway analyses implicating altered B-cell activation and insulin resistance. Integration of other data types revealed cell type-specific DNA methylation loci and disease-associated eQTLs (deQTLs), but not hormones/neuropeptides (estradiol, progesterone, oxytocin, BDNF), serve as regulators for part of the transcriptional differences between cases and controls. Further, deQTLs were enriched for several brain region-specific eQTLs, but no overlap with MDD risk loci was observed. Altogether, our results constitute a convergence of evidence for pathways most affected in PPD with data across different biological mechanisms.

Similar content being viewed by others

null hypothesis about depression

Genome-wide gene expression changes in postpartum depression point towards an altered immune landscape

null hypothesis about depression

Pilot validation of blood-based biomarkers during pregnancy and postpartum in women with prior or current depression

null hypothesis about depression

Genome-wide profiling of DNA methylome and transcriptome in peripheral blood monocytes for major depression: A Monozygotic Discordant Twin Study

Introduction.

Postpartum depression (PPD), a diagnostic subtype of major depressive disorder (MDD) that occurs in the postpartum period, is a common complication of the perinatal period. It affects approximately half a million women annually in the U.S [ 1 , 2 , 3 , 4 ] and is one of the greatest causes of maternal morbidity and mortality [ 5 , 6 ]. Additionally, PPD can have long-term adverse consequences on the newborn [ 7 , 8 , 9 , 10 , 11 ]. Despite this impact to public health, there is a lack of studies investigating the biology behind PPD. Although the precise mechanisms are unknown, PPD is a complex disorder that likely involves the culmination of genetic risk factors, response to hormonal fluctuations, and environmental factors.

Pregnancy is characterized by dynamic physiological changes that are expected to return to pre-pregnancy levels during the postpartum period. The stress axis, reproductive system, glucoregulation, and immune activation are a few examples of biological systems that interact and adapt to support the growing fetus. Perturbations in the recovery of these systems after childbirth, in addition to other risk factors, could result in PPD symptoms. Performing transcriptome-wide association studies (TWAS) allows for the interrogation of functional changes associated with case status. Employing TWAS, we can identify the biologically relevant changes in PPD that result from the aforementioned system disruptions, providing crucial insight into specific causes of the disorder.

Traditionally, TWAS have been performed on bulk tissues that are composed of multiple diverse cell types. This cellular heterogeneity has a detrimental impact on the ability to detect disease associations [ 12 ]. Thus, in bulk tissue, case-control differences will be “diluted” when they affect only one or few cell types, may cancel out if the differences are of opposite signs across cell types, or may be undetectable altogether if the differences involve low abundant cells. Furthermore, identifying the specific cell types from which the association signals originate is key to formulate refined hypotheses of PPD disease pathology, designing proper follow-up experiments, and to develop effective clinical interventions. Efforts to address these issues with cell type-specific effects have been attempted using purified cell populations or single-cell RNA sequencing. However, these methods are labor-intensive and/or cost-prohibitive for most large-scale transcriptomic interrogation. As an alternative, statistical methods have been developed to deconvolute effects of individual cell types using data generated from bulk tissue [ 12 , 13 , 14 , 15 , 16 , 17 ].

Transcriptomic information can also be combined with other data types leading to deeper mechanistic insight into the regulation of transcription. For example, single nucleotide polymorphisms (SNPs) have been shown to regulate expression in a cell type-specific manner [ 18 , 19 ]. In addition, DNA methylation can regulate gene transcription [ 20 , 21 ]. Identifying the (epi-) genomic regulators of transcriptional differences is a key step for generating novel hypotheses about PPD disease etiology. This would allow, for example, designing functional follow-up studies. The identification of (epi-) genomic regulators also has translational value as they are potential targets for correcting aberrant transcription.

In this work, we performed the largest TWAS for PPD, using RNA-sequencing of whole blood, in a cohort of women six-weeks following childbirth. To date, three TWAS studies have been performed with sample sizes ranging from 6 to 15 cases and 10 to 122 controls [ 22 , 23 , 24 ]. Further, the analyses presented here are performed on a cell type-specific level. Additionally, SNPs, DNA methylation, and hormone levels were used to identify regulators underlying the observed case-control transcriptional differences. Altogether, this represents the largest and most diverse interrogation into the biology of PPD.

We recruited a case-control cohort of 1551 women (579 PPD cases, 972 controls) at six weeks postpartum with PPD case-control status established using clinical interviews. Participants were racially (66.5% Black, 32.9% White, and 0.6% Asian) and ethnically (15.9% Hispanic) diverse (Table  S1 ) [ 25 ]. We generated whole blood-derived transcription data using RNA-sequencing (RNA-seq), which resulted in 134,302 known transcripts from 51,079 genes (88.2% of all Ensembl annotated genes) [ 26 ]. In addition, to study regulators of PPD expression differences we generated array-based DNA methylation data (Illumina 450k), assaying 373,635 CpGs, and genome-wide SNP data which, with imputation, assayed ~12.5 million variants. Cell type proportions were estimated from DNA methylation data using standard methods [ 14 , 27 ]. The average estimated cell type proportions were 9.7% CD8 + T-cells, 16.3% CD4 + T-cells, 5.6% B-cells, 4.7% monocytes, 59.4% granulocytes, and 3.0% natural killer cells. The validity of the estimated proportions was established through high correlations with complete blood counts available for a subset of participants (Supplemental Results) [ 14 , 27 ]. We observe differences in cell proportions between cases and controls for CD4 + T-cells ( β  = −0.06, p  = 0.03) and granulocytes ( β  = 0.06, p  = 0.02), but not for the other cell types (Table  S1 ). All downstream analyses control for cell type proportions.

TWAS for PPD identifies cell type-specific dysregulation

In addition to TWAS of whole blood, we performed cell-type-specific TWAS using a statistical deconvolution approach [ 12 , 16 , 18 , 28 , 29 ]. Figure  1A shows Manhattan plots, QQ-plots and lambdas for the TWAS results for whole blood and each cell type separately. Across all analyses, we correct for multiple testing using false discovery rate (FDR), with a q-value < 0.1 threshold indicating significance. Decreasing the q-value to 0.05 would, for example, result in only a modest reduction in false positives but decreases the probability of finding true effects exponentially as shown previously [ 30 ]. The TWAS in whole blood and in the different cell types each revealed transcripts that were significantly associated with PPD (see Tables  S2 – S7 for all transcripts with p-value < 0.05). The QQ-plots and lambdas from the TWAS results (Fig.  1A ) in combination with TWAS of permuted case-control status for each analysis (Fig.  S1 ), which yielded average lambdas of approximately one, indicated no evidence of test-statistic inflation.

figure 1

A Manhattan plots and QQ-plots of PPD transcriptome-wide association study (TWAS) results in bulk tissue and individual cell types. Non-significant transcripts are shown in grey/black. Significant transcripts at an FDR < 0.1 are shown in color. B Upset plot showing overlap of significant TWAS transcripts across bulk tissue and individual cell types. C Biotype distribution of significant TWAS transcripts in bulk tissue and individual cell types.

Whole blood

TWAS in whole blood revealed nine transcripts from nine genes that were significantly associated with PPD (Table  S2 ). The most significant transcript was for the small nuclear RNA, RNVU1-9 ( p -value = 1.92 × 10 −7 ). Using the Ensembl biotype annotations [ 26 ], which provide an indication of transcript function, we see that all nine transcripts are processed transcripts (Fig.  1C ).

Granulocytes

TWAS in granulocytes had two transcripts from two genes significantly associated with PPD case status: a protein-coding transcript for SH3PXD2A ( p -value = 8.81 × 10 −8 ) and a processed transcript for CSF3R ( p -value = 7.45 × 10 −8 ) (Fig.  1C ; Table  S3 ).

Monocyte TWAS resulted in eight transcripts from eight genes that reached transcriptome-wide significance (Table  S4 ). Transcript biotypes are varied with three protein-coding, three processed transcripts, one nonsense-mediated decay, and one immunoglobulin (IG) gene (Fig.  1C ). The top hit ( p -value = 6.60 × 10 −7 ) is a protein-coding transcript for the gene VWA3B .

CD8 ± T-cells

There were 90 transcripts in 87 genes that reached transcriptome-wide significance in the TWAS of CD8 + T-cells (Table  S5 ). These transcripts are composed of 56 protein-coding, 24 processed transcripts, five pseudogenes, three nonsense-mediated decay, and two IG genes (Fig.  1C ). In this cell population, a protein-coding transcript for the gene, HEATR2 , was the most significantly associated transcript ( p -value = 6.46 × 10 −12 ). We found that the significant protein-coding transcripts were enriched for 18 pathways, which group into three clusters related to protein secretion, signal transduction, and neuronal processes (Table  S8 ).

CD4 ± T-cells

In CD4 + T-cells, 36 transcripts from 35 genes passed transcriptome-wide significance (Table  S6 ). These significant transcripts included 21 protein-coding, 11 processed transcripts, three pseudogenes, and one IG gene (Fig.  1C ). IGKV1D-13 , was the top transcript ( p -value = 2.94 × 10 −4 ). Top protein-coding transcripts from the CD4 + T-cell TWAS were enriched for 19 pathways, which segregate into three clusters related to neuronal development, signal transduction, and cell receptor signaling (Table  S9 ).

The TWAS in B-cells yielded 891 transcripts representing 789 genes that were transcriptome-wide significant (Table  S7 ). The altered transcripts are comprised of 534 protein coding, 242 processed transcripts, 56 nonsense-mediated decay, 46 pseudogenes, 12 IG genes, and one T-cell receptor (TR) gene (Fig.  1C ). The most significant ( p -value = 1.76 × 10 −120 ) is the only protein-coding transcript for the gene FMOD . Pathway analyses of the protein-coding findings show enrichment for 98 pathways. Pathway clustering results in ten clusters including those related to B-cell activation, apoptotic pathways, cellular starvation, cellular metabolism, nucleic acid metabolism, neuron cell morphology, organ development, glucose metabolism, and cell adhesion and cytoskeleton organization (Table  S10 ).

Overall, individual cell types have unique profiles of transcripts that are differentially expressed between cases and controls (Fig.  1B ). However, an overlap of 20 significant transcripts is seen among CD4 + T-cells, CD8 + T-cells and B-cells, which all are lymphocytes. These overlapping differently expressed transcripts could reflect common functions associated to PPD that are shared by the different lymphocytes cell types. Among the overlapping transcripts are three pathway clusters that involve nervous system development, regulation of signal transduction, and regulation of macromolecule metabolism (Table  S11 ).

Top Loci in Whole Blood Overlap Findings From Previous Transcriptome Studies of PPD

Three other transcriptome studies of PPD have been performed [ 22 , 23 , 24 ]. These previous studies were performed using whole blood and are independent of the current study, which allowed us to test whether our top bulk results overlap the most significant genes in the previous studies.

As shown in Table  1 , we found the top 5% of our whole blood results shared genes with the significant genes reported by Landsman et al. Further, for the remaining two studies (Pan et al. and Mehta et al.) we observed overlap between the top 5% of our whole blood results and the top 5% of the reported genes. We used sign tests to compare the overall patterns of results between our results and the previous PPD TWAS. Under the null, the expectation is that 50% of the signs of the overlapping genes will be the same between two independent sets of results. The significance of the observed proportion was evaluated using the binomial distribution. From this, we find that our results were concordant with results by Mehta et al. ( p  < 0.001), which may be due to the small sample sizes of the other studies.

SNPs and DNA methylation, but not hormones, may regulate PPD-associated transcripts

Whole blood and cell type-specific TWAS identified multiple transcripts that differed between cases and controls. We performed regulation analyses to study whether these differences were regulated by pregnancy-related hormones (estradiol, E2; progesterone, P4; oxytocin; BDNF), DNA methylation, or SNPs measured in the same samples. These analyses followed the model depicted in Fig.  S4 , which assumes the tested marker regulates transcription (a mediator), which in turn alters PPD risk. As there may be additional mechanisms through which the marker may affect PPD, the model also allows for a direct effect of the marker on PPD. The null hypothesis states that the marker does not regulate differentially expressed transcripts (H 0 : a  ×  b  = 0). SNPs and CpGs were tested as putative regulators if they had a nominal association with case status ( p  < 0.05) and were within a 10 kb window (cis-acting elements) of the genes tested. Hormones were tested as putative regulators if they had a nominal association with case status ( p  < 0.05).

The postpartum period is characterized by a large fluctuation in hormones, making dysregulation as a result of hormone changes an attractive mechanism. We observed significant association between PPD case status and oxytocin levels ( p -value = 2.38 × 10 −4 ), but not for any other measured hormones (Table  S1 ). However, when we examine oxytocin as a potential regulator of transcriptional differences, we did not observe any significant effects (Table  S12 ). This is not to say hormones are not contributing to case-control differences, but we do not see supporting evidence that hormones regulate the specific transcriptional differences observed in this study.

Among the disease-associated DNA methylation sites (10 CpGs in bulk, 158 CpGs in CD8 + T-cells, 67 CpGs in CD4 + T-cells, 1,520 CpGs in B-cells, 13 CpGs in monocytes, 5 CpGs in granulocytes), we identified five CpGs that were significantly associated with four differentially expressed transcripts in B-cells ( q -value < 0.1; Table  S13 ). Thus, these methylation marks may serve as potential cis-acting regulators for their corresponding protein-coding transcripts for CD22 , CXXC5 , MYO1D , and KCNG1 [ 31 ]. It should be noted that methylation was assayed with a commonly used array-based platform resulting in 373,635 high-quality methylation markers, which corresponds to ~1.3% out of the 28.3 million CpGs in the human genome [ 32 ]. Thus, it is possible, and likely, that many additional regulatory methylation marks are present that were not assayed in our dataset.

Using our SNP data, we examined whether disease-associated eQTLs (deQTL) exist for any of the differentially expressed transcripts. We tested 36 SNPs in whole blood, 1696 SNPs in CD8 + T-cells, 322 SNPs in CD4 + T-cells, 15,154 SNPs in B-cells, 149 SNPs in monocytes, and 58 SNPs in granulocytes as putative regulators. No deQTLs were identified for whole blood and granulocytes. However, we detected 17 deQTLs for seven transcripts in CD8 + T-cells, seven deQTLs for three transcripts in CD4 + T-cells, 523 deQTLs for 124 transcripts in B-cells, and four deQTLs for two transcripts in monocytes ( q -value < 0.1; Table  S14 ). The majority of the transcripts with deQTLs were protein-coding (78%). These deQTLs are genomic regulators of transcriptional differences associated with PPD and have translational value as potential targets for correcting aberrant transcription.

PPD deQTLs are enriched for multiple brain eQTLs but not for MDD GWAS loci

We tested whether the PPD deQTLs we identified were enriched for significant MDD GWAS loci [ 33 , 34 ]. However, we did not observe any overlap between PPD deQTLs and significant MDD GWAS loci. The lack of overlap may be due to differences in ancestry between studies. Our PPD cohort is comprised mainly of Black and Hispanic women, whereas the MDD GWAS was limited to individuals of European ancestry. Alternatively, our results could support literature that PPD is a distinct disorder with a different underlying etiology [ 35 , 36 , 37 , 38 ].

Further, to examine the potential impact of PPD deQTLs across tissues, we tested whether they overlapped eQTLs in bulk brain tissue or neurons from various brain regions as reported by the most recent GTEx analyses [ 18 , 19 ]. We found significant enrichment with our deQTLs for eQTLs in nearly all brain regions (Table  2 ; 10 out of 12, 83.3%), with the exception of amygdala and substantia nigra. However, we did not identify any enrichment for eQTLs in neurons. Although additional investigations would be required, this could suggest that the PPD deQTLs detected in this study do not exert their effects on neurons, but rather non-neuronal cell types, such as glia. In total, there were 45 deQTL containing genes overlapping with eQTLs in various brain tissues (Table  S14 ).

Convergent evidence implicates pathways most affected in PPD

Pathway analyses of TWAS results identified pathways that are altered in PPD cases compared with controls. However, RNA transcription is potentially regulated by many biological processes. Our analyses suggested that SNPs (deQTLs), and to a lesser extent DNA methylation, may be involved in the regulation of PPD-related transcriptional differences. By examining the overlap of these pathways with deQTLs we can identify which pathways are initially disrupted by genetic loci in cases compared to the effect of other factors. We identified 138 transcripts regulated by deQTLs in whole blood and across the five cell types examined. More specifically, 124 of these transcripts were found in B-cells and overlap pathways in every pathway cluster identified. Figure  2 illustrates the B-cell pathways that contain at least one deQTL containing gene organized by cluster. Additional results for other cell types can be found in  Supplemental Tables . Overall, our results suggest the pathway clusters shown in Fig.  2 may be potentially dysregulated due to genetic factors in women with PPD.

figure 2

Each cluster is represented by a different color. Opaque bars are the total number of genes overlapping the pathway. Solid bars are the number of genes with a deQTL. Black points are -log(p-value) for the pathway enrichment.

To generate novel hypotheses for PPD disease pathology, we studied the biological underpinnings of PPD in a large cohort of women six weeks after childbirth. Results showed cell type-specific transcriptional differences associated with PPD, with a majority of the changes seen in B-cells. Furthermore, these associations were significantly overrepresented in multiple sets of pathways. These pathways reflected the significant effects of SNPs regulating the PPD-associated transcriptional changes. This constitutes a convergence of evidence with data from two different biological mechanisms.

Pregnancy is characterized by substantial changes in multiple physiological systems. Failure to return to pre-pregnancy levels during the postpartum period may contribute to PPD symptoms, making these systems candidate mechanisms for PPD. Our association and pathway results implicate two, potentially co-occurring, such mechanisms: B-cell activation and insulin resistance (IR). In addition to showing pregnancy-related changes, both mechanisms have previously been linked to depression [ 39 , 40 , 41 , 42 , 43 , 44 ].

Our results specifically implicate B-cell activation (Fig.  2 , cluster 10), which plays a critical role in the immune system. B-cells becomes activated when their receptor recognizes an antigen and binds to it. Activated B-cells then produce antibodies, along with secreting pro- and anti-inflammatory factors. During pregnancy, B-cells undergo dynamic changes as the maternal immune system has to balance tolerance of the foreign‐growing fetus with maintaining vigilance against pathogens [ 45 , 46 ]. Thus, B-cell concentrations are significantly lower during the third trimester and immediately following delivery compared to non-pregnant women, but levels typically return to those seen in non-pregnant women by six-weeks postpartum [ 47 ].

A growing body of evidence suggests that inflammatory processes may play a significant role in PPD [ 24 , 48 , 49 , 50 ]. However, the specific role of B-cells has yet to be elucidated. Recent work has shown an increase in B-cell densities in the brains of those with mood disorders compared to controls [ 39 ]. Furthermore, in whole blood, altered B-cell homeostasis was observed in those with MDD compared to controls [ 40 , 41 ]. A possible mechanism contributing to increased B-cell activation could be related to autoimmunity [ 51 ]. Depression is often co-morbid with autoimmune disease; risk of depression is 1.25–3.56 times higher in people with autoimmune disease than without [ 52 , 53 , 54 ]. Additionally, a feature of many autoimmune disorders is a loss of B-cell tolerance coinciding with the inappropriate production of autoantibodies [ 51 , 55 ]. Thus, an aberrant autoimmune response could potentially contribute to PPD.

Further, we did not observe significant differences in B-cell proportions between cases and controls ( p -value = 0.78). As multiple subtypes of B-cells exist, it may be that we did not observe differences in overall B-cell proportions but there may be differences in more specific B-cell subsets.

Not only do we observe a pathway cluster, composed of 12 pathways, directly related to B-cell activation (cluster 10), we see multiple pathway clusters associated with cellular metabolism, which supports our hypothesis of B-cell activation. Activation initializes cellular reprogramming of quiescent naïve B-cells to drive re-entry into the cell cycle [ 56 ]. This rapid expansion requires the production of biomolecules (lipids, proteins, nucleotides in clusters 2, 5, 6, and 9) at an increased rate. Additionally, work in mice has shown that B-cell activation results in increased glucose uptake (cluster 2) and mitochondrial remodeling (cluster 8) [ 56 ]. Upon B-cell activation, not only is there a slew of metabolic changes, but there are changes to the cellular structure (cluster 1). Antigen binding triggers substantial remodeling of the cell cytoskeleton, which induces cell spreading, the formation of the immune synapse, and the gathering of antigen for endocytosis [ 57 ]. Additionally, apoptosis is a carefully regulated process through the lifecycle of B-cells. Disruptions to apoptotic pathways (cluster 8) affect multiple processes including homeostasis, quality control of the antibody response, and tolerance [ 58 ].

The second implicated mechanism, insulin resistance (IR), is supported by several factors. Insulin promotes the absorption of excess blood glucose into other tissues for energy storage. IR occurs when cells become insensitive to the effects of insulin leading to a buildup of blood glucose and insulin. Starting in the second trimester of pregnancy, insulin sensitivity is progressively reduced as much as 60–80% [ 59 ]. This coincides with steady increases in insulin [ 60 ]. These changes serve as a physiological adaptation of the mother to ensure adequate carbohydrate supply for the rapidly growing fetus [ 61 ]. After delivery, insulin returns to pre-pregnancy levels [ 62 , 63 , 64 ].

IR is a risk factor for depression. Rodent studies have shown that brain IR alters dopamine turnover and induces anxiety and depressive-like behaviors in mice [ 65 ]. In humans, greater glycemic variability has been associated with negative moods [ 66 ]. IR typically predates the development of diabetes. A meta-analysis of 27 studies demonstrated that depression is significantly associated with hyperglycemia for both type 1 and type 2 diabetes [ 67 ]. Studies further suggest that insulin-sensitizing agents could play a significant role in the treatment of major depression, particularly in patients with documented IR [ 68 , 69 ]. Pregnancy is known to increase the risk of developing Type 2 diabetes after giving birth [ 70 ]. Furthermore, pre-pregnancy or gestational diabetes was independently associated with perinatal depression, including new onset of PPD [ 71 , 72 , 73 ].

We tested whether genes implicated by our top results were significantly overrepresented for genes related to A1C [ 74 ] and IR [ 75 ] in whole blood. Hemoglobin A1C levels are measure of a person’s blood sugar levels over the past three months and are highly correlated with measures of IR [ 76 ]. We found the top 5% of our whole blood findings for PPD were enriched for the top 5% of associations with A1C ( p -value = 4.79 × 10 −7 ) and IR ( p -values = 0.04). Databases can not directly implicate IR as such a pathway does not exist. IR is a disorder characterized by disruptions of multiple biological functions. However, IR can be implicated by nearly all clusters in our pathway analyses (Fig.  2 ). With the evidence linking IR and B-cells [ 77 , 78 ], it is reasonable to observe a signature of IR in B-cells. For example, B-cells contribute to the development of IR (cluster 10). These cells can promote IR through T-cell modulation and production of pathogenic antibodies [ 77 , 78 ]. Insulin signaling regulates diverse cellular functions including metabolic pathways, apoptosis, mitogenesis, and membrane trafficking through protein kinases (cluster 1) [ 79 , 80 ]. Insulin directly affects glucose metabolic processes (cluster 2). Circulating levels of purines (cluster 5) [ 81 , 82 ], amino acids, and fatty acids (cluster 6) [ 83 , 84 ] are also associated with IR. The administration of carboxylic acids (cluster 6) improved glycemic control, potentially by reducing IR [ 85 ]. IR may lead to inadequate intracellular glucose potentially leading to apoptosis and intracellular starvation (cluster 7 and 8) [ 86 ]. Wnt signaling (cluster 9) is involved in the regulation of glucose homeostasis in multiple organs, particularly in insulin-responsive tissues [ 87 ].

A number of limitations of the present study should be mentioned. While we studied blood, the pathogenic processes for PPD most likely manifest in the brain. It is likely that in studying blood, other possible PPD-related mechanisms might have been missed. However, there is cross-talk between the two tissues across the blood-brain barrier [ 88 ]. This would allow altered B-cell activation and IR to affect the brain and be observed in our study. Furthermore, we observed deQTLs that affect genes in both blood and brain, specifically in brain regions implicated in mood disorders (e.g., hippocampus, cingulate cortex, frontal cortex). These deQTLs can be studied in model systems for functional follow-up to evaluate causality and their downstream biological effects [ 89 ]. Additionally, the B-cell activation and IR hypotheses for PPD requires further validation through direct measurements in PPD cases versus controls.

In conclusion, we have performed the largest and most comprehensive biological interrogation of PPD, to date. Our results suggest that PPD is associated with an increase in B-cell activation, a finding previously unreported in the literature. While we do not know the precise mechanisms behind this increase in B-cell activation, we hypothesize it could be due to co-occurring dysregulation in IR. Additionally, we identified genetic variants, deQTLs, that regulate, in part, the transcriptional differences between cases and controls. Our findings require further validation and follow-up studies. However, these novel hypotheses for PPD provide promising avenues for future research.

Study population

Detailed information about the study can be found elsewhere [ 90 ]. Briefly, we followed the 2010 US Census terminology for describing the self-reported “race” and “ethnicity” (Hispanic or Non-Hispanic) of subjects. We refer to the participants as Asian, Latina (“of Latino, Hispanic, or Spanish origin”), Black (or African-American), and White (i.e., European ancestry, non-Hispanic).

Recruitment of postpartum women aged 17–45 years occurred from 9/2012 to 6/2016 in four outpatient obstetrical clinics (University of North Carolina Women’s Hospital, Wake County Health Department, Alamance County Health Department, East Carolina University School of Medicine) during routine six-week postpartum visits (± 1–2 weeks). Detailed recruitment procedures can be found in  Supplemental Methods .

Case-control status was determined using clinical interview. All women attending these clinics were first screened for study inclusion using the Edinburgh Postnatal Depression Scale (EPDS). The 10-item EPDS is a commonly used PPD screening instrument. High EPDS scores are consistent with a PPD diagnosis by structured clinical interview [ 91 ]. Women with high EPDS scores (≥11) or low EPDS scores (≤7) were invited to participate. All women then had PPD case status determined using the MINI diagnostic interview. For a full list of inclusion/exclusion criteria, see  Supplemental Methods . Briefly, all participants included no indication of MDD during the first or second trimesters of pregnancy, singleton pregnancy, and live term birth (≥34 weeks gestation). This study was approved by the University of North Carolina Institutional Review Board Committee for the Protection of Human Subjects. All subjects provided written informed consent and signed the Health Insurance Portability and Accountability Act release.

Subject Assessments

All participants were administered the MINI International Neuropsychiatric Interview (MINI-Plus, version 6.0), a structured clinical interview for the assessment of psychiatric disorders [ 92 , 93 ]. Experienced and certified (κ > 0.8 versus criterion ratings) psychiatric research nurses working in each clinic setting administered the MINI-Plus. Cases for this study were defined by having current MDD as assessed by the MINI-Plus. Controls did not have current MDD using the MINI-Plus. All study procedures could be performed in Spanish with a native speaker.

Biological Sampling

Peripheral blood was sampled and immediately processed on-site at the time of subject assessment. For plasma separation, blood aliquots were centrifuged at 3300 rpm for 10 min at room temperature immediately after sampling. For serum separation, blood aliquots were centrifuged at 3300 rpm for 10 min at 2–8 °C, 60 min after blood draw. All plasma and serum samples were then snap-frozen and kept at −80 °C until analysis. Aliquots were drawn into PAXgene RNA tubes and stored at −80 °C until RNA extraction. Genomic DNA was extracted from aliquots of whole blood using Qiagen Autopure LS, which utilized Qiagen Puregene chemistry. Samples that were missing, had insufficient sample, or that did not meet minimum detection thresholds were excluded from analyses.

RNA Sequencing

For RNA extraction, samples are pulled from −80 °C freezers, allowed to thaw at +4 °C overnight, and extracted using the QIAsymphony platform. A detailed RNA extraction protocol can be found in  Supplemental Methods . Fresh-frozen total RNA was prepared for sequencing following the Nugen Ovation Human Blood RNA-seq library prep kit according to the manufacturer’s instructions. RNA libraries were sequenced as 2 × 50 bp paired-end reads with 24 samples per lane on an Illumina HiSeq 4000 sequencer. Each sample was run on two different lanes at two different times. Samples were balanced by case status, age, race, ethnicity, and recruitment site across sequencing pools to reduce technical biases. Preliminary sample and read quality control (QC) was performed using FastQC using default settings. Briefly, raw sequence reads are read in and reports are generated on read quality and composition. No samples were dropped or required resequencing. Reads were aligned with HISAT2 (v2.1.0) and transcriptomes were reconstructed using StringTie (v.1.3.3), both within the rnacocktail pipeline [ 94 ]. Reads from runs 1 and 2 for every sample were merged prior to quantification. Reference transcriptome was downloaded from ENSEMBL (GRCh37, release 92; http://ftp.ensembl.org/pub/grch37/release-92/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz ) [ 26 ] and used for alignment, transcriptome reconstruction, and quantification steps. This reference includes all available biotypes including protein coding genes, as well as pseudogenes, lncRNA, and ncRNA. Following transcriptome assembly, the StringTie merge option was used to combine all assembled transcriptomes across all samples and then re-quantified against the merged transcriptome (stringtie -eB) so expression measures are consistent across all samples. Transcripts were excluded if they were depletion targets for library prep (ENSEMBL gene_biotype “rRNA”, HBA1 , HBA2 , HBB , HBD ), unannotated (not associated with an ENSEMBL ID), present in < 1% of samples, had an average TPM < 1 (low expression outlier) or > 20,000 (high expression outlier). Following this quality control, data for 108,474 transcripts remained for association testing.

For association testing, technical variables were measured for each sample including: i) the total number of reads, the number of uniquely aligned reads, and the proportion of reads aligned using StringTie, ii) sequencing pool, and iii) calculation of the first ten principal components across all transcript counts for depletion targets for library prep (ENSEMBL gene_biotype “rRNA”, HBA1 , HBA2 , HBB , HBD ). Final association models included maternal age, race/ethnicity, estimated cell proportions, proportion of reads aligned, number of uniquely aligned reads, and sequencing pool. Additionally, principal components of TPM values were used to capture any remaining unmeasured source of variation. One principal component (PC1) was included based on the scree test. Lastly, multidimensional outliers were excluded, resulting in data for 482 cases and 859 controls. Quantile-quantile plots for each cell type–specific TWAS (Fig.  1A ), along with TWAS of permuted case-control status for each analysis yielded average lambdas of approximately 1 (Figure  S1 ), indicated no evidence of test-statistic inflation under the empirical null.

DNA Methylation Assessment

A detailed DNA methylation pipeline can be found in  Supplemental Methods . DNA sample bisulfite conversion and microarray hybridization were through the Illumina Fast Track Genotyping service. DNA methylation was assessed using the Infinium Human 450k array. Quality control steps are described elsewhere [ 95 ]. Briefly, we employed a stringent quality control pipeline comprised of the following steps: i) removal of samples with > 1% of probes with detection P-value > 0.001, ii) removal of probes with > 1% of samples with detection P-value > 0.001, iii) removal of cross-hybridizing probes, iv) and probes containing a SNP with minor allele frequency > 1% within 10 bp of the single-base extension position [ 96 ]. Normalization of the DNA methylation data was performed using the BMIQ function [ 97 ] within the minfi package. Following quality control of probes, data for 373,635 CpGs remained for regulation analyses.

Residuals were used for regulation (deQTL) analyses. Covariates were selected using multiple regression analyses in RaMWAS [ 98 ] from a pool of multiple types of variables (see  Supplemental Methods for description of variables tested). Final association models included maternal age, race/ethnicity, estimated cell proportions, slide and array (batch), and median methylated and unmethylated signal intensities, and three PCs from raw control probes (PCs 2, 8, 10). Additionally, PCs of beta values were used to capture any remaining unmeasured source of variation. PCs 1–5 were included based on the scree test. As a final step, multidimensional outliers across PC1–15 were identified using the mvoutliers R package and excluded, resulting in data for 503 cases and 897 controls.

SNP Genotyping

Genotypes were assessed using the Illumina Multi-Ethnic Genome Arrays (MEGA; Illumina, San Diego, CA, USA) through the Illumina Fast Track Genotyping service. GenomeStudio software version 2.0 (Illumina, San Diego, CA, USA) was used to call genotypes from raw Illumina data. We have described our quality control procedures for SNPs elsewhere [ 99 , 100 ]. Briefly, SNPs are removed for bad genome mapping of array probe, missingness (>0.01), and low MAF (<0.01). Any individual with high missingness was excluded (>0.01). Genotypes were imputed against the Haplotype Reference Consortium [ 101 ] using the University of Michigan Imputation Server [ 102 ]. Following imputation, genotypes underwent another round of quality control. Genotypes were excluded for having low quality scores ( r 2  < 0.8), missingness (>0.01), and low MAF (<0.01). Again, any individual with high missingness was excluded (>0.01).

For deQTL analyses, principal components of SNPs were used to capture any unmeasured source of variation in our genetic data (e.g population stratification). Two principal components (PCs 1 and 2) were included based on the scree test, resulting in data for 487 cases and 864 controls.

Cell type-specific analyses

Whole blood contains a mixture of different cell types. Cell type-specific TWAS are critical as expression changes may remain undetectable in whole blood as changes may cancel each other out when they have opposites affects across cell types or involve effects of low abundant cells that are diluted by effects of common cell types [ 12 , 18 ]. Cell type-specific TWAS also improve the biological interpretation of findings as knowing the cell type in which the change occurred may provide further clues about the underlying biological mechanisms. Therefore, we performed cell type-specific deconvolution analyses, which were first introduced about 20 years ago [ 103 ]. Most of the initial deconvolution papers include sections showing the validity of the approach. For example, the 2010 paper by Shen-Orr [ 29 ] experimentally validated the method through tests with predesigned mixtures.

For TWAS, cell type-specific analyses were conducted using a deconvolution approach described and validated previously [ 16 , 28 , 29 ]. Briefly, cell proportions are estimated from bulk (cellularly heterogenous) DNA methylation data using available reference panels [ 27 ]. These predicted cell type proportions are then used to test case-control differences on a cell type-specific level using all study samples with available bulk transcription data. The statistical model we use is:

Thus, measurements from bulk tissue Y bulk are regressed on c  = 1 to n c , cell type proportions P c , and the product of disease status for PPD coded as 0 or 1 by cell type proportions ( PPD  ×  P c ). The model allows for covariates (not shown) and residual effects E . Coefficient m c is the effect of cell type c . The case-control difference \(m_c^{PPD}\) for cell type c is used to test the null hypothesis that cell type means are equal for cases and controls. Note that the model has no constant due to \(\mathop {\sum }\nolimits_{c = 1}^{n_c} P_c \cong 1\) . Alternatively, the model is sometimes written with a constant whereby one of the cell type proportions is omitted [ 17 ] but this produces identical results [ 16 , 28 , 29 ].

Pathway analyses

Pathway analyses were performed with Reactome [ 104 ] and GO [ 105 , 106 ] databases. These analyses also used circular permutations (For a detailed description see Circular Permutations section of  Supplemental Methods ) that properly control the Type I error in the presence of correlated sites. Furthermore, as the permutations are performed on a marker level they also properly account for gene size, as larger genes with more markers are more likely to be among the top results in the permutations. Specifically, we first mapped each marker to genes (Ensembl gene annotations GRCh37, release 92: ftp://ftp.ensembl.org/pub/grch37/release-92/ ) [ 26 ] using the Bioconductor GRanges package. Gene boundaries were extended to include a 10 kb upstream flank (i.e., promoters). Markers were allowed to map to multiple genes if their genomic position overlapped multiple unique gene annotations. After mapping, we performed 10,000 circular permutations at the marker level. For each permutation, a two-by-two table was created by cross classifying whether or not the genes were among the top findings for each analysis (TWAS) versus whether or not the gene was in the tested pathway. Each gene was counted only once when creating this table (thus, if there were three markers in the gene, this was counted as one and not as three). Cramér’s V (sometimes referred to as Cramér’s phi) was used as the test statistic to measure whether genes from the pathway were overrepresented among the top analysis results. P values were calculated as the proportion of permutations that yielded a value equal to or greater than that of Cramér’s V observed in the empirical data. A minimum of three input genes were required to be present in each queried pathway and were considered significant after controlling the family-wide error rate at α  = 0.05. Pathway tests were run on all markers with a q -value < 0.1. As many pathways share a large number of common gene members, we used the Louvain method [ 107 ] in the igraph R package to cluster enriched pathways by similarity.

Regulation Analyses

To study regulatory effects of SNPs, we perform mediation analyses using the model in Fig.  S4 (covariates not shown) assuming that the SNP regulates (path a ) transcription (a mediator), which in turn alters (path b ) PPD risk. As there may be additional mechanisms through which the marker may affect PPD, the model also allows for a direct effect of the SNP (path c ’) on PPD. The null hypothesis states that the marker does not regulate differentially expressed transcripts (H 0 : a  ×  b  = 0).

The model in Fig.  S4 is the standard approach underlying Mendelian Randomization [ 108 ] that aims to improve causal inferences in correlational studies. In this model, the SNP is the instrumental variable where the direction of effect has to be from SNP to transcript abundance (path a) and SNP to PPD status (path c’). Thus, the causal direction cannot be reversed as neither gene expression nor PPD status can change the SNP. The direction of effect from transcript abundance to PPD status (path b) can in principle be reversed. However, if we reverse the direction of effect there can no longer be an indirect effect of the SNP on PPD so that in these instances the null hypothesis, H0: a  ×  b  = 0, is true. Therefore, rejecting the null hypothesis in the SNP regulatory analyses essentially provides evidence for the causal direction of effects displayed in Fig.  S4 where transcriptional changes alter PPD risk and not the other way around.

We also used the model depicted in Fig.  S4 to study possible regulatory effects of hormones and DNA methylation. However, as both the direction of path a and path c’ can be reversed it no longer provides evidence for the causal direction of effects so strictly speaking significant findings cannot be interpreted as meaning that hormones and DNA methylation are regulators.

For each gene with a differentially expressed transcript, all annotated transcripts with expression data were tested. SNPs and CpGs were tested as putative regulators if they had a nominal association with case status ( p  < 0.05) in a 10 kb window of the genes tested. Hormones were tested as putative regulators if they had a nominal association with case status ( p  < 0.05). This pre-selection avoids running regulatory analyses with a large number of markers (SNPs, CpGs, hormones), the majority of which cannot be regulators because they do not affect transcript abundance. All raw data had their respective covariates regressed out prior to mediation testing. Causal mediation analyses were conducted with the mediate package (v4.5.0) in R. Specifically, we used the mediate function which implements a quasi-Bayesian approach with 1000 to 1,000,000 Monte Carlo draws for the approximation of the p -values for the mediation effect [ 109 ]. All analyses begin with 1000 simulations. If a p-value cannot be approximated ( p -value = 0), another round is performed using 10,000 simulations. This process continutes with a 10-fold increment in simulations until a p-value can be approximated, or the number of simulations reaches 1,000,000. A q -value < 0.1 was used to declare significance [ 30 ].

Data availability

Full transcriptomic, DNA methylation, and SNP data used to support the findings of this study have, or will be, deposited in dbGaP (accession: phs002103.v1.p1). [ https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002103.v1.p1 ]. but will be embargoed until results from full datasets are in press.

Wisner KL, Moses-Kolko EL, Sit DK. Postpartum depression: A disorder in search of a definition. Arch Women’s Ment Health. 2010;13:37–40.

Article   Google Scholar  

Marmorstein NR, Malone SM, Iacono WG. Psychiatric disorders among offspring of depressed mothers: associations with paternal psychopathology. Am J Psychiatry. 2004;161:1588–94.

Article   PubMed   Google Scholar  

Flynn HA, Davis M, Marcus SM, Cunningham R, Blow FC. Rates of maternal depression in pediatric emergency department and relationship to child service utilization. Gen Hosp Psychiatry. 2004;26:316–22.

Hamilton BE, Martin JA, Osterman MJ, Curtin SC, Matthews TJ. Births: Final data for 2014. Natl Vital- Stat Rep. 2015;64:1–64.

PubMed   Google Scholar  

Gavin NI, Gaynes BN, Lohr KN, Meltzer-Brody S, Gartlehner G, Swinson T. Perinatal depression: A systematic review of prevalence and incidence. Obstet Gynecol. 2005;106:1071–83.

Gaynes BN, Gavin N, Meltzer-Brody S, Lohr KN, Swinson T, Gartlehner G, et al. Perinatal depression: Prevalence, screening accuracy, and screening outcomes. Evid Rep./Technol Assess. 2005;119:1–8.

Google Scholar  

Slomian J, Honvo G, Emonts P, Reginster JY, Bruyere O. Consequences of maternal postpartum depression: A systematic review of maternal and infant outcomes. Women’s Health (Lond). 2019;15:1745506519844044.

CAS   Google Scholar  

Gelaye B, Rondon MB, Araya R, Williams MA. Epidemiology of maternal depression, risk factors, and child outcomes in low-income and middle-income countries. Lancet Psychiatry. 2016;3:973–82.

Article   PubMed   PubMed Central   Google Scholar  

Farias-Antunez S, Xavier MO, Santos IS. Effect of maternal postpartum depression on offspring’s growth. J Affect Disord. 2018;228:143–52.

Netsi E, Pearson RM, Murray L, Cooper P, Craske MG, Stein A. Association of persistent and severe postnatal depression with child outcomes. JAMA Psychiatry. 2018;75:247–53.

Oh Y, Joung YS, Baek JH, Yoo N. Maternal depression trajectories and child executive function over 9 years. J Affect Disord. 2020;276:646–52.

Shen-Orr SS, Gaujoux R. Computational deconvolution: Extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol. 2013;25:571–8.

Article   CAS   PubMed   Google Scholar  

Aran D, Hu Z, Butte AJ. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18:220.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Koestler DC, Christensen B, Karagas MR, Marsit CJ, Langevin SM, Kelsey KT, et al. Blood-based profiles of DNA methylation predict the underlying distribution of cell types: a validation analysis. Epigenetics. 2013;8:816–26.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Guintivano J, Aryee MJ, Kaminsky ZA. A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression. Epigenetics. 2013;8:290–302.

Zheng SC, Breeze CE, Beck S, Teschendorff AE. Identification of differentially methylated cell types in epigenome-wide association studies. Nat Methods. 2018;15:1059–66.

Montano CM, Irizarry RA, Kaufmann WE, Talbot K, Gur RE, Feinberg AP, et al. Measuring cell-type specific differential methylation in human brain tissue. Genome Biol. 2013;14:R94.

Kim-Hellmuth S, Aguet F, Oliva M, Munoz-Aguirre M, Kasela S, Wucher V, et al. Cell type-specific genetic regulation of gene expression across human tissues. Science. 2020;369:6509.

Article   CAS   Google Scholar  

Consortium GT. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–30.

Harris CJ, Scheibe M, Wongpalee SP, Liu W, Cornett EM, Vaughan RM, et al. A DNA methylation reader complex that enhances gene transcription. Science. 2018;362:1182–6.

Neri F, Rapelli S, Krepelova A, Incarnato D, Parlato C, Basile G, et al. Intragenic DNA methylation prevents spurious transcription initiation. Nature. 2017;543:72–7.

Pan D, Xu Y, Zhang L, Su Q, Chen M, Li B, et al. Gene expression profile in peripheral blood mononuclear cells of postpartum depression patients. Sci Rep. 2018;8:10139.

Landsman A, Aidelman R, Smith Y, Boyko M, Greenberger C. Distinctive gene expression profile in women with history of postpartum depression. Genomics. 2017;109:1–8.

Mehta D, Grewen K, Pearson B, Wani S, Wallace L, Henders AK, et al. Genome-wide gene expression changes in postpartum depression point towards an altered immune landscape. Transl Psychiatry. 2021;11:155.

Guintivano J, Sullivan PF, Stuebe AM, Penders T, Thorp J, Rubinow DR, et al. Adverse life events, psychiatric history, and biological predictors of postpartum depression in an ethnically diverse sample of postpartum women. Psychol Med. 2018;48:1190–200.

Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, et al. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–6.

Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinforma. 2012;13:86.

Chan RF, Turecki G, Shabalin AA, Guintivano J, Zhao M, Xie LY, et al. Cell type-specific methylome-wide association studies implicate neurotrophin and innate immune signaling in major depressive disorder. Biol Psychiatry. 2020;87:431–42.

Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, et al. Cell type-specific gene expression differences in complex tissues. Nat Methods. 2010;7:287–9.

van den Oord EJ, Sullivan PF. False discoveries and models for gene discovery. Trends Genet. 2003;19:537–42.

Article   PubMed   CAS   Google Scholar  

Luo C, Hajkova P, Ecker JR. Dynamic DNA methylation: In the right place at the right time. Science. 2018;361:1336–40.

Babenko VN, Chadaeva IV, Orlov YL. Genomic landscape of CpG rich elements in human. BMC Evol Biol. 2017;17:19.

Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81.

Howard DM, Adams MJ, Clarke TK, Hafferty JD, Gibson J, Shirali M, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 2019;22:343–52.

Di Florio A, Meltzer-Brody S. Is postpartum depression a distinct disorder? Curr Psychiatry Rep. 2015;17:76.

Batt MM, Duffy KA, Novick AM, Metcalf CA, Epperson CN. Is postpartum depression different from depression occurring outside of the perinatal period? A review of the evidence. Focus (Am Psychiatr Publ). 2020;18:106–19.

Bernstein IH, Rush AJ, Yonkers K, Carmody TJ, Woo A, McConnell K, et al. Symptom features of postpartum depression: are they distinct? Depress Anxiety. 2008;25:20–6.

Pawluski JL, Lonstein JS, Fleming AS. The neurobiology of postpartum anxiety and depression. Trends Neurosci. 2017;40:106–20.

Schlaaff K, Dobrowolny H, Frodl T, Mawrin C, Gos T, Steiner J, et al. Increased densities of T and B lymphocytes indicate neuroinflammation in subgroups of schizophrenia and mood disorder patients. Brain Behav Immun. 2020;88:497–506.

Syed SA, Beurel E, Loewenstein DA, Lowell JA, Craighead WE, Dunlop BW, et al. Defective inflammatory pathways in never-treated depressed patients are associated with poor treatment response. Neuron. 2018;99:914–24 e913.

Ahmetspahic D, Schwarte K, Ambree O, Burger C, Falcone V, Seiler K, et al. Altered B cell homeostasis in patients with major depressive disorder and normalization of CD5 surface expression on regulatory B cells in treatment responders. J Neuroimmune Pharm. 2018;13:90–9.

Hamer JA, Testani D, Mansur RB, Lee Y, Subramaniapillai M, McIntyre RS. Brain insulin resistance: A treatment target for cognitive impairment and anhedonia in depression. Exp Neurol. 2019;315:1–8.

Kan C, Silva N, Golden SH, Rajala U, Timonen M, Stahl D, et al. A systematic review and meta-analysis of the association between depression and insulin resistance. Diabetes Care. 2013;36:480–9.

Frangou S, Shirali M, Adams MJ, Howard DM, Gibson J, Hall LS, et al. Insulin resistance: Genetic associations with depression and cognition in population based cohorts. Exp Neurol. 2019;316:20–6.

Muzzio D, Zenclussen AC, Jensen F. The role of B cells in pregnancy: the good and the bad. Am J Reprod Immunol. 2013;69:408–12.

Dutta S, Sengupta P, Haque N. Reproductive immunomodulatory functions of B cells in pregnancy. Int Rev Immunol. 2020;39:53–66.

Lima J, Martins C, Leandro MJ, Nunes G, Sousa MJ, Branco JC, et al. Characterization of B cells in healthy pregnant women from late pregnancy to post-partum: A prospective observational study. BMC Pregnancy Childbirth. 2016;16:139.

Brann E, Fransson E, White RA, Papadopoulos FC, Edvinsson A, Kamali-Moghaddam M, et al. Inflammatory markers in women with postpartum depressive symptoms. J Neurosci Res. 2020;98:1309–21.

Brann E, Papadopoulos F, Fransson E, White R, Edvinsson A, Hellgren C, et al. Inflammatory markers in late pregnancy in association with postpartum depression-A nested case-control study. Psychoneuroendocrinology. 2017;79:146–59.

Leff-Gelman P, Mancilla-Herrera I, Flores-Ramos M, Cruz-Fuentes C, Reyes-Grajeda JP, Garcia-Cuetara Mdel P, et al. The immune system and the role of inflammation in perinatal depression. Neurosci Bull. 2016;32:398–420.

Rawlings DJ, Metzler G, Wray-Dutra M, Jackson SW. Altered B cell signalling in autoimmunity. Nat Rev Immunol. 2017;17:421–36.

Andersson NW, Gustafsson LN, Okkels N, Taha F, Cole SW, Munk-Jorgensen P, et al. Depression and the risk of autoimmune disease: A nationally representative, prospective longitudinal study. Psychol Med. 2015;45:3559–69.

Siegmann EM, Muller HHO, Luecke C, Philipsen A, Kornhuber J, Gromer TW. Association of depression and anxiety disorders with autoimmune thyroiditis: A systematic review and meta-analysis. JAMA Psychiatry. 2018;75:577–84.

Euesden J, Danese A, Lewis CM, Maughan B. A bidirectional relationship between depression and the autoimmune disorders - New perspectives from the National Child Development Study. PLoS One. 2017;12:e0173015.

Carter RH. B cells in health and disease. Mayo Clin Proc. 2006;81:377–84.

Waters LR, Ahsan FM, Wolf DM, Shirihai O, Teitell MA. Initial B cell activation induces metabolic reprogramming and mitochondrial remodeling. Science. 2018;5:99–109.

Tolar P. Cytoskeletal control of B cell responses to antigens. Nat Rev Immunol. 2017;17:621–34.

Defrance T, Casamayor-Palleja M, Krammer PH. The life and death of a B cell. Adv Cancer Res. 2002;86:195–225.

Buchanan TA, Metzger BE, Freinkel N, Bergman RN. Insulin sensitivity and B-cell responsiveness to glucose during late pregnancy in lean and moderately obese women with normal glucose tolerance or mild gestational diabetes. Am J Obstet Gynecol. 1990;162:1008–14.

Butte NF. Carbohydrate and lipid metabolism in pregnancy: Normal compared with gestational diabetes mellitus. Am J Clin Nutr. 2000;71:1256S–61S.

Sivan E, Homko CJ, Chen X, Reece EA, Boden G. Effect of insulin on fat metabolism during and after normal pregnancy. Diabetes. 1999;48:834–8.

Flores DL, Hendrick VC. Etiology and treatment of postpartum depression. Curr Psychiatry Rep. 2002;4:461–6.

Di Cianni G, Miccoli R, Volpe L, Lencioni C, Del, Prato S. Intermediate metabolism in normal pregnancy and in gestational diabetes. Diabetes Metab Res Rev. 2003;19:259–70.

Chen TH, Lan TH, Yang CY, Juang KD. Postpartum mood disorders may be related to a decreased insulin level after delivery. Med Hypotheses. 2006;66:820–3.

Kleinridders A, Cai W, Cappellucci L, Ghazarian A, Collins WR, Vienberg SG, et al. Insulin resistance in brain alters dopamine turnover and causes behavioral disorders. Proc Natl Acad Sci USA. 2015;112:3463–8.

Penckofer S, Quinn L, Byrn M, Ferrans C, Miller M, Strange P. Does glycemic variability impact mood and quality of life? Diabetes Technol Ther. 2012;14:303–10.

Lustman PJ, Anderson RJ, Freedland KE, de Groot M, Carney RM, Clouse RE. Depression and poor glycemic control: a meta-analytic review of the literature. Diabetes Care. 2000;23:934–42.

Lin KW, Wroolie TE, Robakis T, Rasgon NL. Adjuvant pioglitazone for unremitted depression: Clinical correlates of treatment response. Psychiatry Res. 2015;230:846–52.

Colle R, de Larminat D, Rotenberg S, Hozer F, Hardy P, Verstuyft C, et al. PPAR-gamma agonists for the treatment of major depression: A review. Pharmacopsychiatry. 2017;50:49–55.

CAS   PubMed   Google Scholar  

Kim C. Maternal outcomes and follow-up after gestational diabetes mellitus. Diabet Med. 2014;31:292–301.

Kozhimannil KB, Pereira MA, Harlow BL. Association between diabetes and perinatal depression among low-income mothers. JAMA. 2009;301:842–7.

Miller ES, Peri MR, Gossett DR. The association between diabetes and postpartum depression. Arch Women’s Ment Health. 2016;19:183–6.

Ruohomaki A, Toffol E, Upadhyaya S, Keski-Nisula L, Pekkanen J, Lampi J, et al. The association between gestational diabetes mellitus and postpartum depressive symptomatology: A prospective cohort study. J Affect Disord. 2018;241:263–8.

Slieker RC, van der Heijden A, van Leeuwen N, Mei H, Nijpels G, Beulens JWJ, et al. HbA1c is associated with altered expression in blood of cell cycle- and immune response-related genes. Diabetologia. 2018;61:138–46.

Razny U, Polus A, Goralska J, Zdzienicka A, Gruca A, Kapusta M, et al. Effect of insulin resistance on whole blood mRNA and microRNA expression affecting bone turnover. Eur J Endocrinol. 2019;181:525–37.

Umeno A, Yoshida Y. Utility of hemoglobin A1c in detecting risk of type 2 diabetes: comparison of hemoglobin A1c with other biomarkers. J Clin Biochem Nutr. 2019;65:59–64.

Winer DA, Winer S, Shen L, Chng MH, Engleman EG. B lymphocytes as emerging mediators of insulin resistance. Int J Obes Suppl. 2012;2:S4–S7.

Winer DA, Winer S, Shen L, Wadia PP, Yantha J, Paltser G, et al. B cells promote insulin resistance through modulation of T cells and production of pathogenic IgG antibodies. Nat Med. 2011;17:610–7.

Virkamaki A, Ueki K, Kahn CR. Protein-protein interaction in insulin signaling and the molecular mechanisms of insulin resistance. J Clin Invest. 1999;103:931–43.

Manning BD, Toker A. AKT/PKB signaling: Navigating the network. Cell. 2017;169:381–405.

Romeo GR, Jain M. Purine metabolite signatures and type 2 Diabetes: Innocent bystanders or actionable items? Curr Diab Rep. 2020;20:30.

Weber G, Lui MS, Jayaram HN, Pillwein K, Natsumeda Y, Faderan MA, et al. Regulation of purine and pyrimidine metabolism by insulin and by resistance to tiazofurin. Adv Enzym Regul. 1985;23:81–99.

Yoon MS. The emerging role of branched-chain amino acids in insulin resistance and metabolism. Nutrients 2016;8:405.

Article   PubMed Central   CAS   Google Scholar  

Sears B, Perry M. The role of fatty acids in insulin resistance. Lipids Health Dis. 2015;14:121.

Mingrone G, Castagneto-Gissey L, Mace K. Use of dicarboxylic acids in type 2 diabetes. Br J Clin Pharm. 2013;75:671–6.

Fournier AM. Intracellular starvation in the insulin resistance syndrome and type II diabetes mellitus. Med Hypotheses. 1998;51:95–9.

Abiola M, Favier M, Christodoulou-Vafeiadou E, Pichard AL, Martelly I, Guillet-Deniau I. Activation of Wnt/beta-catenin signaling increases insulin sensitivity through a reciprocal regulation of Wnt10b and SREBP-1c in skeletal muscle cells. PLoS One. 2009;4:e8509.

Daneman R, Prat A. The blood-brain barrier. Cold Spring Harb Perspect Biol. 2015;7:a020412.

Anzalone AV, Koblan LW, Liu DR. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat Biotechnol. 2020;38:824–44.

Guintivano J, Sullivan PF, Stuebe AM, Penders T, Thorp J, Rubinow DR, et al. Adverse life events, psychiatric history, and biological predictors of postpartum depression in an ethnically diverse sample of postpartum women. Psychol Med. 2018;48:1190–1200.

Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–6.

Otsubo T, Tanaka K, Koda R, Shinoda J, Sano N, Tanaka S, et al. Reliability and validity of Japanese version of the Mini-International Neuropsychiatric Interview. Psychiatry Clin Neurosci. 2005;59:517–26.

Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59:22–33.

Sahraeian SME, Mohiyuddin M, Sebra R, Tilgner H, Afshar PT, Au KF, et al. Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat Commun. 2017;8:59.

Guintivano J, Shabalin AA, Chan RF, Rubinow DR, Sullivan PF, Meltzer-Brody S, et al. Test-statistic inflation in methylome-wide association studies. Epigenetics. 2020;15:1163–6.

Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–9.

Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics. 2013;29:189–96.

Shabalin AA, Hattab MW, Clark SL, Chan RF, Kumar G, Aberg KA, et al. RaMWAS: Fast methylome-wide association study pipeline for enrichment platforms. Bioinformatics. 2018;34:2283–5.

International Schizophrenia C, Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–52.

Wang KS, Liu XF, Aragam N. A genome-wide meta-analysis identifies novel loci associated with schizophrenia and bipolar disorder. Schizophr Res. 2010;124:192–9.

Loh PR, Danecek P, Palamara PF, Fuchsberger C, AR Y, KF H. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet. 2016;48:1443–8.

Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48:1284–7.

Venet D, Pecasse F, Maenhaut C, Bersini H. Separation of samples into their constituents using gene expression data. Bioinformatics. 2001;17:S279–87.

Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44:D481–7.

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9.

The Gene Ontology C. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47:D330–8.

Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech-Theory Exp. 2008:12:P10008.

Davey Smith G, Ebrahim S. What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ. 2005;330:1076–9.

Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychological Methods. 2010;15:309–34.

Download references

Author information

Authors and affiliations.

Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Jerry Guintivano, David R. Rubinow, Patrick F. Sullivan & Samantha Meltzer-Brody

Center for Biomarker Research and Precision Medicine, Virginia Commonwealth University, Richmond, VA, USA

Karolina A. Aberg & Edwin J. C. G. van den Oord

Department of Psychiatry & Behavioral Sciences, Texas A&M University, College Station, TX, USA

Shaunna L. Clark

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Patrick F. Sullivan

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden

You can also search for this author in PubMed   Google Scholar

Contributions

EVDO, JG, and KA conceived the idea, designed and supervised the implementation of the study, interpreted the results, revised the paper. JG conducted the analyses, interpreted the results, drafted, and revised the paper. SC contributed to the analyses. PFS, DR, SMB contributed to the study design and interpretation of the results. All authors discussed and commented on the paper.

Corresponding author

Correspondence to Jerry Guintivano .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material, supplemental tables, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Guintivano, J., Aberg, K.A., Clark, S.L. et al. Transcriptome-wide association study for postpartum depression implicates altered B-cell activation and insulin resistance. Mol Psychiatry 27 , 2858–2867 (2022). https://doi.org/10.1038/s41380-022-01525-7

Download citation

Received : 29 September 2021

Revised : 08 February 2022

Accepted : 09 March 2022

Published : 01 April 2022

Issue Date : June 2022

DOI : https://doi.org/10.1038/s41380-022-01525-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Investigating neonatal health risk variables through cell-type specific methylome-wide association studies.

  • Thomas L. Campbell
  • Karolina A. Aberg

Clinical Epigenetics (2024)

The role of gut microbiota in the pathogenesis and treatment of postpartum depression

  • Sheng Zhang

Annals of General Psychiatry (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

null hypothesis about depression

IMAGES

  1. PPT

    null hypothesis about depression

  2. A unified hypothesis on depression and some observations from general

    null hypothesis about depression

  3. The state effect hypothesis and 2 alternative hypotheses. Depression

    null hypothesis about depression

  4. Comparison of three hypothesis-free approaches to studying depression a

    null hypothesis about depression

  5. PPT

    null hypothesis about depression

  6. Solved Peer Victimization and Depression. Exercise 25.2

    null hypothesis about depression

VIDEO

  1. Misunderstanding The Null Hypothesis

  2. Hypothesis Testing

COMMENTS

  1. What Is The Null Hypothesis & When To Reject It

    When your p-value is less than or equal to your significance level, you reject the null hypothesis. In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. In this case, the sample data provides ...

  2. Null and Alternative Hypotheses

    They are called the null hypothesis and the alternative hypothesis. These hypotheses contain opposing viewpoints. H 0: ... approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness. ...

  3. A Systematic Review of the Use of Telepsychiatry in Depression

    Evidence supporting the use of telemedicine in depression is still uncertain ... as it may be more difficult for a study that was unable to reject its null hypothesis to be published by a reputable journal. Conclusions. In terms of future directions for this field, we would urge researchers to consider more closely the impact of the technology ...

  4. Mood Homeostasis, Low Mood, and History of Depression in 2 Large

    We first tested the null hypothesis that mood homeostasis at any level of mean mood is equal to zero. We then tested the null hypothesis that mood homeostasis is identical between a mean mood equal to the mean in the top half of the population (75.2) and that in the bottom half of the population (46.9).

  5. Major Depressive Disorder

    The monoamine hypothesis of depression postulates a deficiency in serotonin or norepinephrine neurotransmission in the brain. ... Testing the theories in a manner that can reject the null ...

  6. 13.1 Understanding Null Hypothesis Testing

    The mean number of depressive symptoms might be 8.73 in one sample of adults with clinical depression, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. ... Null hypothesis testing is a formal approach to deciding between two interpretations of a statistical relationship in ...

  7. Null & Alternative Hypotheses

    The null and alternative hypotheses offer competing answers to your research question. When the research question asks "Does the independent variable affect the dependent variable?": The null hypothesis ( H0) answers "No, there's no effect in the population.". The alternative hypothesis ( Ha) answers "Yes, there is an effect in the ...

  8. 12.5: Anxiety and Depression

    Anxiety and depression are often reported to be highly linked (or "comorbid"). ... Our obtained value was larger than our critical value, so we can reject the null hypothesis. Reject \(H_0\). Based on our sample of 10 people, there is a statistically significant, strong, positive relation between anxiety and depression, \(r(8) = 0.70, p ...

  9. Major depressive disorder: hypothesis, mechanism, prevention and

    Numerous investigations have demonstrated that 5-HT is intimately related to the pathophysiological process of major depression. The 5-HT hypothesis primarily asserts that a decrease in the 5-HT ...

  10. 8.6

    A regression model contains interaction effects if the response function is not additive and cannot be written as a sum of functions of the predictor variables. That is, a regression model contains interaction effects if: μ Y ≠ f 1 ( x 1) + f 1 ( x 1) + ⋯ + f p − 1 ( x p − 1) For our example concerning treatment for depression, the ...

  11. Understanding Null Hypothesis Testing

    The Logic of Null Hypothesis Testing. Null hypothesis testing (often called null hypothesis significance testing or NHST) is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the null hypothesis (often symbolized H0 and read as "H-zero").

  12. Null Hypothesis: Definition, Rejecting & Examples

    The null hypothesis in statistics states that there is no difference between groups or no relationship between variables. It is one of two mutually exclusive hypotheses about a population in a hypothesis test. When your sample contains sufficient evidence, you can reject the null and conclude that the effect is statistically significant.

  13. How to Formulate a Null Hypothesis (With Examples)

    To distinguish it from other hypotheses, the null hypothesis is written as H 0 (which is read as "H-nought," "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the ...

  14. PDF The serotonin hypothesis of depression: both long discarded ...

    The main areas of research do not provide support for the most well-known neurochemical hypothesis, the serotonin theory of depression—the idea that depression is caused by low levels or low ...

  15. Stress, coping, and depression: testing new hypotheses in a

    This hypothesis suggests that, in the context of chronic stress, Blacks' engagement in UHBs may serve to buffer the deleterious consequences of stress on depression through the HPA pathway, leading to a lower prevalence of depression but a greater prevalence of physical health problems than would have otherwise occurred.

  16. 7.3: The Research Hypothesis and the Null Hypothesis

    This null hypothesis can be written as: H0: X¯ = μ H 0: X ¯ = μ. For most of this textbook, the null hypothesis is that the means of the two groups are similar. Much later, the null hypothesis will be that there is no relationship between the two groups. Either way, remember that a null hypothesis is always saying that nothing is different.

  17. 9.1: Null and Alternative Hypotheses

    Review. In a hypothesis test, sample data is evaluated in order to arrive at a decision about some type of claim.If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis, typically denoted with \(H_{0}\).The null is not rejected unless the hypothesis test shows otherwise.

  18. How to Write a Null Hypothesis (5 Examples)

    H 0 (Null Hypothesis): Population parameter =, ≤, ≥ some value. H A (Alternative Hypothesis): Population parameter <, >, ≠ some value. Note that the null hypothesis always contains the equal sign. We interpret the hypotheses as follows: Null hypothesis: The sample data provides no evidence to support some claim being made by an individual.

  19. 13.1 Understanding Null Hypothesis Testing

    A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value. A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample ...

  20. Understanding the Null Hypothesis for Linear Regression

    xi: The value of the predictor variable xi. Multiple linear regression uses the following null and alternative hypotheses: H0: β1 = β2 = … = βk = 0. HA: β1 = β2 = … = βk ≠ 0. The null hypothesis states that all coefficients in the model are equal to zero. In other words, none of the predictor variables have a statistically ...

  21. The problem of depression in adolescence

    The first null hypothesis stated that there will be no significant incidence of depression among adolescents hospitalized in a specific hospital facility. It is obvious that the null hypothesis was rejected for adolescents, both male and female, in the sample on the basis of the MMPI results. The second null hypothesis stated that there will be ...

  22. Depression history, depression vulnerability and the experience of

    Two competing models have been posited to explain deficits among people with a history of depression. The "scar hypothesis" suggests that a depressive episode leaves lasting changes in personality and self-concept that lead a person to be more vulnerable to future mood disturbance (Rohde, Lewinsohn, & Seely, 1990; Zautra et al., 2007 ...

  23. Transcriptome-wide association study for postpartum depression ...

    Postpartum depression (PPD) affects 1 in 7 women and has negative mental health consequences for both mother and child. ... The null hypothesis states that the marker does not regulate ...

  24. 10.2.1: Null and Alternative Hypotheses (Exercises)

    Exercise. The National Institute 9.2.14 of Mental Health published an article stating that in any one-year period, approximately 9.5 percent of American adults suffer from depression or a depressive illness. Suppose that in a survey of 100 people in a certain town, seven of them suffered from depression or a depressive illness.