Validity And Reliability In Research
Reliability and Validity in Research
Reliability and validity are the two key concepts used in the research setting to test the quality of research. They indicate how well a method or a test measure something. Reliability is about how consistent is the test in measuring what it is supposed to measure, and validity is about the accuracy of a measure.
Reliability
In simple terms, research reliability is the degree to which research method produces stable and consistent results. If the research methods produce consistent results, then the methods are likely to be reliable and not influenced by external factors. [1] [2]
So, reliability means the extent to which the results can be reproduced when the research is repeated under the same conditions.
Types of Reliability:
1. Test-Retest Reliability: The test-retest reliability method in research involves giving a sample of individuals an identical test more than once, keeping all testing conditions as constant as possible If the results of the test are similar each time you give it to the sample group, that shows your research method is likely reliable and not influenced by external factors, like the sample group's mood or the day of the week.
For Example: Give a group of factory workers a survey about their satisfaction of the cafeteria meals on Monday, Wednesday, and Friday, then compare the results to check the test-retest reliability.
2. Rater reliability: This aspect of reliability is of utmost importance to the validity of any research study involving testers or raters, whether one individual does all the testing, or several test takers are involved in observation or measurement taking. Data cannot be interpreted with confidence unless those who collect, record, and reduce the data are reliable. In many studies, the testers undergo a period of training, so that the methods are more standardized, and chances of errors are reduced.
There are two types of rater reliability:
Intrarater Reliability: Intrarater reliability tests the reliability of one individual performing multiple trials or test in the same setting. Essentially you are testing for reliability and chances of error when testing is done multiple times by the same tester or rater.
For example: A tester is asked to test the same instrument to measure the joint range of motion on a subject three times in a row.
Interrater Reliability: Interrater Reliability evaluates the consistency of a measure across different testers or raters. Do you get the same results when different people conduct the same measurement, under same circumstances? This can help them avoid influencing factors related to the tester such as personal bias and human error. If most of the results from different testers are similar, it's likely the research method or testing method is reliable, and it can produce usable research because the testers gathered the same data from the group.
For example: Multiple teachers may observe a group of children trying to read a given text to determine their reading levels and intellectual development and then compare notes to check for inter-rater reliability.
3. Internal Consistency: Questionnaire, surveys, written examinations, or interviews are usually composed of set of questions or items designed to measure specific knowledge or attributes of their subjects. Internal consistency measures the extent to which items measure various aspects of the same characteristic and nothing else. Meaning it measure the consistency of the measurement itself, do you get the same results from different parts of a test that are designed to measure the same thing?
For example: You design a questionnaire to measure depression levels. If you randomly split the results into two halves, there should be a strong correlation between the two sets of results. If the two results are very different, this indicates low internal consistency.
There are two methods of measuring internal consistency:
Split-half reliability test: You can perform this test by splitting a research method, like a survey or test in half and delivering both halves separately to a sample group, then comparing the results to ensure that method can produce consistent results. If the results are consistent, then the results of the research method are likely reliable.
Inter-item reliability test: With this method, you administer sample groups multiple testing items and the calculate the correlation between the results of the method used. With this information, you calculate the average and use the number to determine if the results are reliable.
For example: You may give office workers a questionnaire about which software works the best for data analysis, but you split it in half and give each half to the workers separately and then calculate the correlation to test for slit-half reliability.
Later, you interview office workers, then bring them into small groups and observe them at work to determine which software gets used the most and which softer ware workers like the best. You calculate the correlation between these answers and observations and average the results to find the average inter-item reliability.
Validity
Validity in research simply means that test or a tool is measuring what it claims to measure. Validity places emphasis on the objectives of a test and its ability to draw inferences from test scores or measurements. For example, a ruler is considered a valid instrument for measuring length, because we can decide how long an object is by measuring in inchers or centimeters. Thus, validity addresses what we can do with the test results. For example, asking questions like can we make accurate predictions about patient’s treatment prognosis based on the outcome of the test, hence testing the validity of the test. [2] [3]
Types of Validity measurements
1. Face Validity: This is the weakest form of validity. Face validity is simply whether the test appears (at face value) to measure what it claims to.
2. Content Validity: Content validity measures the extent to which the test covers all aspects of the concept being measured or covers all the content it needs to provide the outcome you are expecting. It is useful with questionnaires, examinations, inventories, and interviews that attempt to evaluate a range of information by selected test items or questions.
3. Criterion-related Validity: The extent to which the result of a measure corresponds to other valid measures of the same concept. This is the most practical approach to validity testing and the most objective one. It is based on the ability of one test to predict results obtained on the other test. The test to be validated, called the target test, is compared with a gold standard or criterion measure that is already established to be valid.
Concurrent validity: Concurrent validity is studied when the measurements to be validated and the criterion measures are taken relatively at the same time(concurrently), so that they both reflect the same incidence of behavior. This approach to validation is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical or safer to use then the established one and is being proposed as an alternative instrument.
Predictive validity: Predictive validity establishes a measure that will be a valid predictor or some future criterion score. A test with good predictive validity helps a researcher make successful decisions by providing a basis for predicting outcomes or future behaviors.
4. Construct Validity: Construct validity reflects the ability of an instrument to measure an abstract concept or construct. Construct validity concerns how well a set of indicators represent or reflect a concept that is not directly measurable. It is the appropriateness of inferences made on the basis of observations or measurements (often test scores), specifically whether a test can reasonably be considered to reflect the intended construct or concept. Constructs are abstractions that are deliberately created by researchers in order to conceptualize the latent variable, which is correlated with scores on a given measure (although it is not directly observable). Construct validity examines the question: Does the measure behave like the theory says a measure of that construct should behave? Construct validity is essential to the perceived overall validity of the test. It is particularly important in the social sciences and language studies.
[1] From Portney L and Watkins M, Foundations of Clinical Research; Applications to Practice, Edition 2, page 61-70
[2] Reliability vs. Validity in Research | Difference, Types and Examples (scribbr.com)
[3] From Portney L and Watkins M, Foundations of Clinical Research, Edition 2, page 79-88