Wednesday 11 January 2017

What is the Alternative to Self-Report? What Works Best for the Assessment of Personality?

Written by Nea Lulik, MSc in Psychology of Individual Differences

Psychology has been trying hard for a long time to find a measurement that would accurately measure theoretical constructs. Personality is one of such abstract and difficult to accurately measure theoretical constructs. Considering personality is part of every individual, it would be logical to assume that self-report will be the best answer. In the past decades self-report has been largely used as the main and only measurement of personality. However, people are biased in many forms, and that led to re-evaluation of self-report, as the fundamental measurement of personality. 



Recently, interest of developing and using alternatives to self-report, or in combination to self-report has emerged. Alternative measurement to self-report (Schwarz, 1999) are informant method (Vazire, 2006), behavioural method (Moskowitz, 1986), multiple method (McDonald, 2008) and computer based personality judgement model (Youyou, Kosinski & Stillwell, 2015). These methods can measure various psychological constructs, traits and processes, but in this article I will concentrate on personality measurement. All four methods are described below, including their strengths and weaknesses.

Self-report is a very common method to assess a various number of psychological constructs and traits in psychological research (Schwarz, 1999). It usually involves asking participants about their traits, feelings, behaviour, attitudes, beliefs, personality… Self-report measures “self-observations of individuals collected in the form of global ratings or responses to items on questionnaires” (Moskowitz, 1986, p. 299). Self-report can be direct, indirect or open-ended (Paulhus & Vazire, 2007).

Validity and reliability are very important when it comes to scientific method. Reliability means that it is self-consistent and it is possible to re-test and expect the same results over time. Validity requires high reliability, and means that a test is valid if it measures what it is supposed to measure. This is especially difficult when it comes to personality tests, because personality itself is an abstract construct (Kline, 1983). Construct validity is the most important in personality research (Bagozzi, 1993). Construct validity is obtained when the questionnaire is formulated in the context of a theory which can predict behaviours in relation to the questionnaire (Loewenthal, 1996). It is also very important, besides reliability and validity, to pay attention to other major factors that might affect the quality of the data.

Self-report results can be affected by many factors such as minor changes in question wording, question format, or question context can result in great changes in the results (Schwarz, 1999). Phrasing of different items can affect an individual's response, such as understanding of the question, recall of a significant behaviour, and various others (Holden & Troister, 2009). Open against closed response formats, frequency scales and reference periods also have the power to impact results (Schwarz, 1999). Rating scales can affect results too, depending on the numbers, being positive or negative, that are used in the scale (Schwarz, Knauper, Hippler, Noelle-Neumann & Clack, 1991). Also, reversing the relation between the verbal and numerical label such as, lower numbers corresponding to stronger agreement, can affect the results (Rammstedt & Krebs, 2007).

Although, self-report measurement has many advantages, such as being practical, efficient, convenient, easy to administer, inexpensive, has a direct insight into very personal information of the individual, individuals are usually motivated to respond, and can control most response biases (McDonald, 2008). However, it has also many disadvantages (Paulhus & Vazire, 2007). Many resources have arisen potential issues with credibility of response and systematic errors (Wiechman, Smith, Smoll & Ptacek, 2000) in self-reports due to response biases such as social desirability, acquiescent and extreme responding, non-response bias, assumption that participants are self-knowledgeable and have no disported self-perception, non-situation-specific language use of questions, and cultural limitations (Mundia, 2011). All these biases represent a threat to construct validity and that is why they have to be controlled or minimized (Moskowitz, 1986).

Social desirability is one of the main concerns regarding personality measurement (Bäckström, Björklund & Larsson, 2009), especially in self-report and it is described as “the tendency of subjects to endorse an item according to how socially desirable a response is” (Kline, 1983, p. 19). People who engage in this kind of response bias are motivated to provide a positive self-presentation (Holden & Troister, 2009), they want to appear in favourable, bright way, are motivated by the approval of others. Basically, they want to project themselves to the outside world in a very positive aspect. This occurrence is quite common in human beings, but the extent of social desirability is not uniform and it reveals itself differently in different situations (Mundia, 2011). Paulhus (1984) has distinguished social desirability in two components; Self-deception and impression management. Self-deceptive positivity is characterized as an honest but extremely positive view of the individual himself, and it is shown to be linked to adjustment (Paulhus, 1991). Meanwhile, impression management is more related to the standard characteristics of social desirability, such as socially desirable apparent behaviours, so others can see him in a positive light (Paulhus, 1984). Although, some studies (Paunonen & LeBel, 2012) argue that social desirability does not significantly influence results due to the distribution that resembles the normal/Gauss curve (though biased toward the positive end), many studies are still prone to significant social desirability effects (Holden & Troister, 2009; Soubelet & Salthouse, 2011) or mask important relations between different variables and therefore needs to be controlled or minimized (Wiechman, Smith, Smoll & Ptacek, 2000) with scales, inventories or statistical techniques (Paulhus, 1991).

Another type of response bias is acquiescent responding and it is described as “the tendency to agree with an item, regardless of its content” (Holden & Troister, 2009, p. 126). People who engage in this kind of response bias can agree with two statement even though they are mutually exclusive (Paulhus, 1991). Research has shown that individuals will more likely reply with ’yes’ to a neutral statement rather than to extreme statements (Knowles & Condon, 1999). Authors concluded that the best way to control this response bias is to balance the trait-indicating items with the trait-contraindicating items, and balance the ratio of assertions and negations, so there is the same number of ’yes’ and ’no’ answers. Although, there are ways to control and minimize this kind of response bias, it still can happen and cause problems with interpretation of results, or inability to interpret results at all.

The third response bias is extreme responding. Extreme responding represents “the tendency to use the extreme choices on a rating scale, both positive and negative” (Holden & Troister, 2009, p. 126). Extreme response bias was found to be stable over time and a consistent individual difference  (Paulhus & Vazire, 2007). A study (Naemi, Beal & Payne, 2009) about extreme response style found that intolerance of ambiguity is related to extreme responding. They found that decisiveness accounts for a significant amount of the variance in extreme responding style. Also, they discovered that quick response time to the questionnaire plus intolerance to ambiguity, decisiveness and simplistic thinking leads to extreme response style. This kind of response bias is difficult to interpret, because it is never really clear if the individual's response is due to decisiveness of the answer, lack of introspection regarding intensity or frequency, tendency toward extreme ratings, or something else altogether.

There is another associated matter regarding credibility of responses in self-report, and it is associated with self-perception distortions. Self-report measurements rely on participant's self-understanding and presume that participants are self-aware and self-knowledgeable (McDonald, 2008). Though, that is not always the case, considering that self-reports measure a person's self-perception of a psychological trait and not the actual trait (Roberts, Yeidner & Matthews, 2001). That is, assuming the information we seek is available to conscious interpretation. Therefore, it is self-perception that raise response biases and it can be also due to that people are predisposed toward self-enhancement (Smith, 2005), or they are trying to preserve unrealistic positive image about themselves (McDonald, 2008). In contrast with social desirability, distorted self-perception cannot be controlled or measured by a lie scale, which can cause limitations.

In self-report there are often issues with non-context-specific language in questionnaire items (McDonald, 2008). Answers to personality questionnaires are suggested to be influenced by many factors of the semantic networks, such as the individual's ego ideal, the intention to be semantically consistent over questions, and the representations of distinct life experiences that might be retrieved when answering a question (Kagan, 2007). Also, questionnaires are constructed such that a person can either consider asserting the trait or the context in which the trait is present, which can lead to different answers to the same item being misinterpreted as implication of different behaviours.

Self-report has been considered to have cultural limitations. Research regarding cultural difference in response style with the use of dialectical thinking (Hamamura, Heine & Paulfus, 2008) has discovered that East-Asians exhibit more ambivalent and moderate responding on self-report than North Americans. There has been a debate in the literature (Smith, 2005) between individualist and collectivist culture regarding self-enhancement, which suggest that there are indeed cultural differences in self-report.

Considering all the above disadvantages of self-report, it would be logical and understandable to consider other alternatives


An alternative to self-report is Informant report. 

Informant report is a method in “obtaining peer reports in which a number of other informants provide ratings about the individual” (McDonald, 2008, p. 5). The observers can be the parents, friends, spouses, peers, or co-workers, basically people who share a history with the participant (Moskowitz, 1986)

Generally this method uses inventories to quantify the information from informants. These information include everyday functioning, frequency and levels of specific behaviours, ratings on personality scales in the third-person perspective or scales that use meta-perceptual wording approach (Simms, Zelazny, Yam & Gros, 2010). Meta-perceptual approach means assessing and rating informants’ perception of the participants’ self-perception.

The advantages of this method is that informants provide more objective information about the individual  (McDonald, 2008). If there are more informants involved, there probably will be more data, and therefore could lead to reliability of results. Informants can also provide situational insight on behaviours (Hofstee, 1994). Most importantly, there are no social desirability response biases in informant report. In the past the traditional informant method was seen as time-consuming, expensive, not valid due to dishonest responding, and the informants were believed to not cooperate. However, in the Twenty-first Century, with the development of internet and technology, the view of this method should be reconsidered. Informant report method has the potential of becoming more practical, convenient, less time-consuming, and inexpensive using the informant questionnaires on the internet (Vazire, 2006). With the use of e-mails and internet questionnaires, informant reports become inexpensive, and the researcher saves time on data entry and can easily keep track of informants' participation. Results (Vazire, 2006) also show that participants are willing to cooperate due to time-efficiency, internet questionnaires, and the validity would increase if participants are acquainted with each other, as well as with informants' rating consensus correlations and self-other agreement. However, there are still some weaknesses in using this method. Like self-report, this method can have potential issues with response biases such as acquiescent and extreme responding. This issues can present themselves also in the choosing the right informants, because they can be biased based on relationship or research aims (McDonald, 2008). This method can have difficulty in assessing a specific behaviour in a specific situation (Berry, Carpenter & Barratt, 2012). Finally, the most obvious weakness would be, that others cannot access certain personal information about the individual, because only the individual himself has the access to them (Hofstee, 1994; Paulhus & Vazire, 2007).


Over the years researchers have started relying more and more on self-report and abandoning behavioural observation. 

Behavioural observation is a method where the observant collects behavioural data through observation of overt behaviour either in an artificially constructed laboratory or in a naturally occurring situation (Moskowitz, 1986). This measurement involves external judges’ view and coding individual's behaviours (McDonald, 2008)

The researcher can use various technological tools to help prevent missing important information such as cameras, microphones, or electronically activated recorder (Mehl, Gosling & Pennebaker, 2006)

Behavioural observation in a laboratory setting involves the researcher observing and measuring the participant behaviour through one or two brief standardized situations. The procedures that occur in the laboratory setting are implemented. Doing so minimize the reduction of the standardization of the situation and the reliability with which responses are coded (Moskowitz, 1986)Strengths of this setting are that it directly observes behaviour and the observer is in control of the situation and of the stimuli, which he can use to his advantage to observe the desirable overt behaviour. There are many weaknesses to the behavioural observations in a laboratory setting such as being time-consuming, unethical and inconvenient (Baumeister, Vohs & Funder, 2007). Because it is performed in a laboratory it includes artificiality, as well as the insight of only situation-specific behaviours (Kagan, 2007), therefore generalization cannot be made (McDonald, 2008). Another possible problem in the laboratory setting is the awareness of the participant at being observed. Social desirability can occur in this type of settings, as in participant responds while thinking about what the observer's expectations are. Social desirability can constitute, as well in self-report, many problems of validity (Moskowitz, 1986).

On the other hand, behavioural observation in naturally occurring situations are restricted to the use of one general setting in which data gets collected on various occasions. Situations might vary, depending on the setting that the observer chooses. Most of the time it is possible to identify persistent configurations of stimuli (Moskowitz, 1986). Because the data collection occurs in a period of time, and not in one or two brief situation, it gives the researcher insight into more general behaviour of the individual (Mehl, Gosling & Pennebaker, 2006). However, the presence of the observer might delay adaptation, the participant participating in ordinary activities will conjure habitual modes of responding in his behaviour (Moskowitz, 1986).

Overall, the behavioural observation method has many strengths because it directly examines behaviour, gets situation-specific information, has less response biases than self-report, and it can be done in two settings (McDonald, 2008). On the other hand, the weaknesses are that it contains lack in practicality and convenience, has complex coding of behaviour, is time-consuming, expensive, there are ethical concerns involved. Also, because of the nature of observational studies, it is not completely sure where the line between overt behaviour and specific trait. 
 

After considering all the alternatives to self-report and their strengths and weaknesses, it would be safe to conclude that the combination of these measurements will provide greater construct validity and closer to the truth results (Kagan, 2007; Williamson, 2007; Holden & Troister, 2009).  

Multiple methods measurement can improve construct validity, provide more accuracy, and gain richness of the data by opposing methods against each other (Hofstee, 1994)

Combining behavioural observation with self-report can lead to greater validity of self-report responses, because the observational measures will additionally support those responses (Fulmer & Frijters, 2009). This method envelopes all the strengths of individual measurements (McDonald, 2008). Although, it encompass all the individual measurement weaknesses as well, these individual weaknesses will be less problematic, because of the mixture of the methods. This type of method also requires more effort, expenses, resources, time and skills, but the greater construct validity and better results should be worth it. 


A recent study (Youyou, Kosinski & Stillwell, 2015) has come up with another alternative how to assess human personality, more specifically, human personality judgements. The study compared the accuracy of human and computer-based personality judgement

Human personality judgements results were obtained with self-report, other-report, which was done by one individuals' friend or two, and using Likes on Facebook (digital footprint) made by individuals. They selected the realistic approach with three key criteria; self-other agreement, inter-judge agreement and external validity. 

Results in the self-other agreement have shown that computers' average accuracy over the Big Five traits grows firmly with the number of Likes the individual had on his profile. Results in the inter-judge agreement have shown that the average consensus between computer models was greatly higher than the estimation of personality judgement of two individuals' friends. As for the external validity, the computer' judgement was able to predict twelve out of thirteen life outcomes, behaviours and traits related to behaviour, and it was higher than human judges. This study has proven a new meaning to the assessment of personality and that is a huge strength. It might become commonly used and therefore researchers would have less trouble collecting data. It is also inexpensive, accurate and less time-consuming. On the other hand, a weakness could emerge in the light of ethics, and protecting people's privacy. 


Finally, after describing and evaluating self-report and all the alternative measurements of personality, it is safe to say that best method to assess personality would be multiple method measurement. Observational and informant method have proved to be very useful and although they have disadvantages, they should be considered by psychologists as valuable as self-report. Observational method in laboratory settings has potential for good construct validity due to control over the situation, though a poor reliability. However, behavioural observation in natural sittings has good potential for both construct validity and reliability, if data is collected on a satisfactory number of events. Informant reports have high reliability and were proven to be an improvement to the validity of assessment of personality. Self-report should absolutely still be used in the assessment of personality because it provides unique information about the individual that only he has access to. Although, it should definitely be merged with another method, either observational or informant. Combining two or more methods together would achieve better construct validity and richer data. Also, combining different measurements might provide in the future a measurement to accurately measure theoretical constructs, like personality. More research should be done in this field especially combined with the evergrowing technology. The computer-based personality judgement model looks promising, but it is relatively new and more research should be done in this area. Although, it is a big step in the right direction.


References:
Bäckström, M., Björklund, F., & Larsson, M. (2009). Five-Factor Inverntories Have a Major General Factor Related to Social Desirability Which Can Be Reduced by Framing Items Neutrally. Journal of Research in Personality, 335-344.
Bagozzi, R. P. (1993). Assessing Construct Validity in Personality Research: Applications to Measures of Self-Esteem. Journal of Research in Personality, 49-87.
Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the Scence of Self-Reports and Finger Movements. Whatever Happened to Actual Behavior? Perspectives on Psychological Science, 396-403.
Berry, C., Carpenter, N., & Barratt, C. (2012). Do Other-Reports of Counterproductive Work Behavior Provide an Incremental Contribution Over Self-Report? A Meta-Analytic Comparison. Journal of Applied Psychology, 613-636.
Fulmer, S. M., & Frijters, J. C. (2009). A Review of Self-Report and Alternative Approaches in the Measurement of Student Motivation. Educational Psychology Review, 219-246.
Hamamura, T., Heine, S., & Paulfus, D. (2008). Cultural Differences in Response Style: The Role of Dialectical Thinking. Personality and Individual Differences, 932-942.
Hofstee, W. K. (1994). Who Should Own the Definition of Personality? Europian Journal of Personality, 149-162.
Holden, R., & Troister, T. (2009). Developments in the Self-Report Assessment of Personality and Psychopathology in Adults. Canadian Psychology, 120-130.
Kagan, J. (2007). A Trio of Concerns. Perspectives on Psychological Science, 361-376.
Kline, P. (1983). Personality: Measurement and theory. London: Hutchinson.
Knowles, E., & Condon, C. (1999). Why People Say "Yes": A Dual-Process Theory Of Acquiescence. Journal of Personality and Social Psychology, 379-386.
Loewenthal, K. M. (1996). An Introduction to Psychological Tests and Scales. London: UCL Press.
McDonald, J. D. (2008). Measuring Personality Contructs: The Advantages and Disadvantages of Self-Reports, Informant Reports and Behavioural Assessments. Enquire, 75-94.
Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in Its Natural Habitat: Manifestations and Implicit Folk Theories of Personality in Daily Life. Journal of Personality and Social Psychology, 862-877.
Moskowitz, D. (1986). Comparison of Self-Reports, Reports by Knowledgeable Informants, and Behavioral Observation Data. Journal of Personality, 294-317.
Mundia, L. (2011). Social Desirability, Non-Response Bias and Reliability in a Long Self-Report Measure: Illustations From the MMPI-2 Administered to Brunei Student Teachers. Educational Psychology, 207-224.
Naemi, B., Beal, D., & Payne, S. (2009). Personality Predictors of Extreme Response Style. Journal of Personality, 261-286.
Paulhus, D. (1984). Two-Component Models of Socially Desirable Responding. Personality Processes and Individual Differences, 598-609.
Paulhus, D. (1991). Measurement and Control of Response Bias. In J. P. Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). San Diego, CA: Academic Press.
Paulhus, D., & Vazire, S. (2007). The Self-Report Method. In R. W. Robins, R. C. Fraley, & R. F. Kueger (Eds.), Handbook of Research Methods in Personality Psychology (pp. 224-239). New York: Guilford.
Paunonen, S., & LeBel, E. (2012). Socially Desirable Responding and Its Elusive Effects on the Validity of Personality Assessments. Journal of Personality and Social Psychology, 158-175.
Rammstedt, B., & Krebs, D. (2007). Does Response Scale Format Affect the Answering of Personality Scales? Europian Journal of Psychological Assessment, 32-38.
Roberts, R., Yeidner, M., & Matthews, G. (2001). Does Emotional Intelligence Meet Traditional Standards for an Intelligence? Some New Data and Conclusions. Emotion, 196-231.
Schwarz, N. (1999). Self-Reports. Americal Psychologist, 93-105.
Schwarz, N., Knauper, B., Hippler, B., Noelle-Neumann, E., & Clack, L. (1991). Rating Scales: Numeric Values May Change the Meaning of Scale Labels. Public Opinion Quarterly, 570-582.
Simms, L. J., Zelazny, K., Yam, W. H., & Gros, D. F. (2010). Self-Informant Agreement for Personality and Evaluative Person Description: Comparing Methods For Creating Informant Measures. European Journal of Personality, 207-221.
Smith, G. (2005). On Construct Validity: Issues of Method and Measurement. Psychological Assessment, 396-408.
Soubelet, A., & Salthouse, T. (2011). Influence of Social Desirability on Age Differences in Self-Reports of Mood and Personality. Journal of Personality, 741-762.
Vazire, S. (2006). Informants Reports: A Cheap, Fast, and Easy Method for Personality Assessment. Journal of Research in Personality, 472-481.
Wiechman, S., Smith, R., Smoll, F., & Ptacek, J. (2000). Masking Effects of Social Desirability Response Set On Relations Between Psychosocial Factors and Sport Injuries: A Methodological Note. Journal of Science and Medicine in Sport, 194-202.
Williamson, A. (2007). Using Self-Report Measures in Neurobehavioural Toxicology: Can They Be Trusted? NeuroToxicology, 227-234.
Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-Based Personality Judgments are More Accurate Than Those Made by Humans. PNAS, 1036-1040.

No comments:

Post a Comment

LinkWithin