Statistics in Research
Understanding Statistical Analysis in Research
The Wikipedia definition of Statistics states that “it is a discipline that concerns the collection, organization, analysis, interpretation and presentation of date”
It means, as part of statistical analysis, we collect, organize, interpret, and draw meaningful conclusions from the data through mathematical explanations.
In the investigation of most clinical research questions, some form of quantitative data will be collected. Initially these data exist in the raw form, which means that they are nothing more than a collection of numbers representing empirical observation from a group of individuals. For these data to be useful, they must be organizing, summarized, and analyzed, so that their meaning can be communicated. Statistical analysis means investigating trends, patterns, and relationships using quantitative data
To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process. You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.
Statistics used in research is broadly categorized into two types:
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics
As the name suggest in Descriptive Statistics, we describe data using numerical measures. The data is characterized based on its properties.
In descriptive statistics, there is no uncertainty – the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations.
Example: In Descriptive statistics we collect data on the ACT scores of all 11th graders in a school for five years. You can use descriptive statistics to get a quick overview of the school’s scores in those years. You can then directly compare the mean ACT score with the mean scores of other schools.
Types of Descriptive Statistics
There are four major types of descriptive statistics [1]
1) Measures of Frequency: Shows how often something occurs. It is used to show how often a response is given, e.g., count, percent, frequency. It is primarily recorded and denoted in a tabular format and used for qualitative and quantitative data analysis. Common charts and graphs used in frequency distribution presentation and visualization include bar charts, histograms, pie charts, and line charts.
Example: Let us assume that a school takes a group of students to picnic every year. Some of the students have already visited the picnic spot before; they are visiting the picnic spot for the second time. Some students have visited the picnic spot more than two times as well. Here, students are divided based on the number of visits. The number of visits, therefore, denotes the frequency distribution among the students.
2) Measures of Central Tendency: This is used when you want to describe the average or most indicated response.
A measure of central tendency describes a set of data by identifying the central position in the data set as a single value. We can think of it as a tendency of data to cluster around a middle value. There are three different measures of central tendency: the mode, the median and the mean
The Mode: The mode is the score that occurs most frequently in a distribution.
The Median: The median represents the mid-value of the given set of data when arranged in a particular order like ascending or descending order
The Mean: Mean is the most used measure of central tendency. It represents the average of the given collection of data. It is equal to the sum of all the values in the collection of data divided by the total number of values.
3) Measures of Dispersion or Variation: This type of central tendency is used when you want to show how “spread out” the data is. Variability explains the extent to which data points are dispersed from each other. It also designs a range of dispersion and the degree of variance occurring in the data sample from its highest to its lowest value.
In this category, the most used statistical measures are range, variance and standard deviation.
Range: The simplest measure of variability is the range. It is the difference between highest and lowest values in a distribution. Range is a relatively simple statistical measure; its applicability is limited because it is determined using only the two extreme scores in the distribution.
Variance: The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in relation to the mean.
Standard deviation: Standard deviation is a statistical measurement of the amount a number varies from the average number in a series. A low standard deviation means that the data is very closely related to the average, thus very reliable. A high standard deviation means that there is a large variance between the data and the statistical average and is not as reliable.
4) Measures of Position: This type of statistical measure describes how scores fall in relation to one another. It relies on standardizes scores and used when we need to compare scores to a normalized score (e.g., a national norm).
Common measures of this type of statistical measure is percentile and quartile.
Percentile: A percentile is a measure that is used to describe a score’s position within a distribution. For example, 90% of the data values lie below the 90th percentile, whereas 10% of the data values lie below the 10th percentile.
Quartiles: Quartiles are values that divide a (part of a) data table into four groups containing an approximately equal number of observations. The total of 100% is split into four equal parts: 25%, 50%, 75% and 100%.
Inferential Statistics
As we learned that descriptive statistics can be used to summarize and describe the data, however they are not sufficient for testing theories about the effects of experimental treatments or for generalizing the behavior of samples to population.
For these purposes, researchers use a statistical measure called Inferential Statistics. Inferential statistics involve a decision-making process that allows researchers to draw inferences about population characteristics from sample data.
Data collection can become very difficult and expensive when trying to collect it from entire population, that is why most of the time, we can only acquire data from samples of the population we are targeting.
While descriptive statistics can only summarize a sample’s characteristics, inferential statistics use your sample to make reasonable guesses about the larger population. [3]
Example: In inferential statistics, we randomly select a sample of 11th graders in some state and collect data on their ACT scores and other characteristics. We can use inferential statistics to make estimates and test hypotheses about the whole population of 11th graders in the state based on the sample data.
Inferential statistics has two main objects:
Making estimates about the target population (for example, the mean ACT score of all 11th graders in the US).
Testing hypotheses to draw conclusions about target populations (for example, the relationship between ACT scores and number of AP classes students took).
Since inferential statistics involves a decision-making process that allows us to draw conclusion about the target populations’ s characteristics from the sample data, it is important to make sure that certain pivotal concepts are considered to ensure that sample represents larger population.
There are the two most important concepts of statistical reasoning that are considered are:
· Probability: Probability is a complex but essential concept for understanding inferential statistics. In simple terms probability means “likelihood” or “good chance”. Probability is the likelihood that any one event will occur, given all the possible outcomes. We use probability as a means of prediction.
· Sampling: sampling is simply the process of choosing our subjects to collect data on. For more information on sampling please check the section of “Sampling”.
The two most universally recognized statistics we consider when using inferential statistics are p-value and Confidence Interval (we will cover this more under Hypothesis Testing)
Hypothesis Testing
Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution.
When interpreting research findings, researchers need to assess whether these findings may have occurred by chance. Hypothesis testing is a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population.
Hypothesis testing uses sample data to evaluate a hypothesis about a population. A hypothesis test assesses how unusual the result is whether it is reasonable chance variation or whether the result is too extreme to be considered chance variation. [1] [5]
There is 5 step process involved in hypothesis testing: [2]
1. Choosing your null hypothesis and alternative hypothesis:
To carry out statistical hypothesis testing, researchers usually propose two statistical hypothesis one to be rejected and one to be accepted, null hypothesis and research hypothesis.
Research or alternative hypothesis: Research hypothesis is the initial hypothesis that you propose that predicts the relationship between the variables. It is also known as the alternative hypothesis.
For example: There is a relationship between SAT scores and socioeconomic status.
Null hypothesis: The null hypothesis is the opposite of the research hypothesis and expresses that there is no relationship between variables, or no differences between groups.
For example: There is no relationship between SAT scores and socioeconomic status.
2. Collect the data:
For a statistical test to be valid, it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative or samples properly, then you cannot make statistical inferences about the population you are interested in.
For more information on sampling, check out our section of Sampling.
3. Conduct statistical testing:
A statistical test is a way to evaluate the evidence the data provides against a hypothesis.
These tests enable us to make decisions based on observed pattern from data. There is a wide range of statistical tests. The choice of which statistical test to utilize depends upon the structure of data, the distribution of the data, and variable type.
There are many different types of tests used in inferential statistics like t-test, Z-test, chi-square test, anova test, Mann-Whitney’s test, binomial test, one sample median test etc. [4]
4. Decided based on the statistical results whether to reject or accept the null hypothesis: [5]
Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.
A concept known as the p-value provides a convenient basis for drawing conclusions in hypothesis-testing applications.
Statistical significance is a term used by researchers to explain that it is unlikely their observation could have occurred by chance, it is usually denoted by a p-value or probability value. A p-value is calculated to assess whether trial results are likely to have occurred simply through chance, assuming that there is no real difference between the two comparison groups and assuming of course that the study was well conducted.
What Is P-Value?
A p-value is a measure of the probability that an observed difference could have occurred just by random chance. A p-value simply provides a cut off beyond which we assert that the findings are “statically significant”. When the p-value is sufficiently small, then the results are unlikely to have arisen by chance alone and we reject the idea that there is no difference between the treatments, hence rejecting the null hypothesis. When the p-value is large, then the results in the data could be explained by chance and we do not reject the idea that there is no difference between the treatments. We draw the inference that data is deemed consistent with (while proving) the null hypothesis.
In most cases you will use the p-value generated by your statistical test guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true. By convention, p-values of less than 0.05 are considered “small”. That means, id p-value is less <0.05 there is a less than one in 20 chance that a difference as bid as that seen in the study could have arisen just by chance. With p-values this small (<0.05) we say that these results from the study are statistically significant (unlikely to have arisen by chance).[5], [6],[9]
What is Confidence Interval?
Confidence interval provides a different type of information than what we get from p-value used in hypothesis testing. Hypothesis testing produces a decision about any observed difference either being “statistically significant” or “statistically non-significant”.
In contrast, confidence interval provides a range of about the size of the observed effects. This range is constructed in a way that we know how likely it is to capture the true, but unknow, effect size.
Thus, the formal definition of a confidence interval is “A range of values for a variable of interest (in our case, the measure of the treatment effect) constructed so that this range has a specified probability of including the true value of the variable. The specified probability is called the confidence level, and the end points of the confidence interval are called the confidence limits” (Last JM. A dictionary of epidemiology. Oxford: International Journal of Epidemiology. 1988)
It is conventional to create confidence interval at 95% level, so this means that 95% of the time properly constructed confidence intervals should contain the true value of the variable of interest. This corresponds to hypothesis testing with p-values, with a conventional cut off for P is <0.05. [7] [8] [9]
5. Report your findings:
Drawing a Conclusion
1. P-value <= significance level (a) => Reject your null hypothesis in favor of your alternative hypothesis. Your result is statistically significant.
2. P-value > significance level (a) => Fail to reject your null hypothesis. Your result is not statistically significant.
The results of hypothesis testing will be presented in the results and discussion sections of your research paper or thesis.
In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p-value). In the discussion, you can discuss whether your initial hypothesis was supported by your results.
Conclusion
Note that we never reject or fail to reject the alternate hypothesis. This is because the testing of hypothesis is not designed to prove or disprove anything. However, it is designed to test if a result is spuriously occurred, or by chance. Thus, statistical hypothesis testing becomes a crucial statistical tool to mathematically define the outcome of a research question.[2]
[1] From Portney L and Watkins M, Foundations of Clinical Research, Edition 2, page 372-376, pages 387-402
[2] The Beginner's Guide to Statistical Analysis | 5 Steps & Examples (scribbr.com)
[3] Inferential Statistics | An Easy Introduction & Examples (scribbr.com)
[4] Choosing the Right Statistical Test | Types & Examples (scribbr.com)
[5] Understanding Hypothesis Tests: Why We Need to Use Hypothesis Tests in Statistics (minitab.com)
[6] Understanding Hypothesis Tests: Significance Levels (Alpha) and P values in Statistics (minitab.com)
[7] Understanding Hypothesis Tests: Confidence Intervals and Confidence Levels (minitab.com)
[8] What_are_Conf_Inter.pdf (bandolier.org.uk)
[9] The clinician’s guide to p values, confidence intervals, and magnitude of effects | Eye (nature.com)