Assessing mental health service user and carer involvement in physical health care planning: The development and validation of a new patient-reported experience measure
Assessing mental health service user and carer involvement in physical health care planning: The development and validation of a new patient-reported experience measure
Chris J. Sidey-GibbonsID 0 2
Helen BrooksID 2
Judith GellatlyID 2
Nicola Small 1 2
Karina Lovell 2
Penny Bee 2
0 Patient-reported Outcomes, Value, and Experience (PROVE) Center, Brigham and Women's Hospital , Boston, MA , United States of America, 2 Department of Surgery, Harvard Medical School , Boston, MA , United States of America, 3 Department of Psychological Sciences, Institute of Psychology, Health and Society, University of Liverpool , Liverpool , United Kingdom , 4 Mental Health Research Group, Division of Nursing , Midwifery and Social Work , Faculty of Biology, Medicine and Health, School of Health Sciences, Manchester Academic Health Science Centre, University of Manchester , Manchester , United Kingdom
1 NIHR School of Primary Care Research, Division of Population Health, Health Services Research and Primary Care, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester , Manchester , United Kingdom
2 Editor: Alessandra Solari, Foundation IRCCS Neurological Institute C. Besta , ITALY
Data Availability Statement: All relevant data are
in the paper and its Supporting Information files.
Funding: This research was funded by the National
Institute for Health Research Collaboration for
Leadership in Applied Health Research and Care
(NIHR CLAHRC) Greater Manchester and Research
Trainees Coordinating Centre grant
CDF-2017-10019. The views expressed in this article are those
of the author(s) and not necessarily those of the
NHS, the NIHR, or the Department of Health and
We employed psychometric and statistical techniques to refine a bank of candidate
questionnaire items, derived from qualitative interviews, into a valid and reliable measure
involvement in physical health care planning. We assessed the psychometric performance of the
item bank using modern psychometric analyses. We assessed unidimensionality,
scalability, fit to the partial credit Rasch model, category threshold ordering, local dependency,
differential item functioning, and test-retest reliability. Once purified of poorly performing and
Social Work. The funders had no role in study
design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
erroneous items, we simulated computerized adaptive testing (CAT) with 15, 10 and 5 items
using the calibrated item bank.
Issues with category threshold ordering, local dependency and differential item functioning
were evident for a number of items in the nascent item bank and were resolved by removing
problematic items. The final 19 item PREM had excellent fit to the Rasch model fit (x2 =
192.94, df = 1515, P = .02, RMSEA = .03 (95% CI = .01-.04). The 19-item bank had
excellent reliability (marginal r = 0.87). The correlation between questionnaire scores at baseline
and 2-week follow-up was high (r = .70, P < .01) and 94.9% of assessment pairs were within
the Bland Altman limits of agreement. Simulated CAT demonstrated that assessments
could be made using as few as 10 items (mean SE = .43).
We developed a flexible patient reported outcome measure to quantify service user and
carer involvement in physical health care planning. We demonstrate the potential to
substantially reduce assessment length whilst maintaining reliability by utilizing CAT.
People diagnosed with severe mental illnesses, such as disorder schizophrenia and bipolar
disorders, exhibit higher rates of physical co-morbidities and, as a result, are significantly more
likely to die prematurely than the general population.[
Factors contributing to this deterioration in physical health for mental health service users
are known to include side-effects from anti-psychotic medications, higher rates of smoking
and substance abuse, poor nutrition, and physical inactivity.[
] Though the relationship
between serious mental health issues, physical comorbidity, and reduced life expectancy is well
understood, far less is known about how to organize care delivery to improve physical health
and reduce the risk of associated morbidity in this population. Recent evidence suggests that,
despite increased awareness of these issues, mortality risk associated with all mental health
conditions is rising internationally.[
One approach to improve the management of known risk factors is individualized care
] an approach which involves service users and carers working collaboratively
with professionals to co-develop a written care plan. This plan aims to accurately document
the core issues that a service user would like to address as part of their mental health recovery.
A growing body of research shows that, although collaborative care planning is s aligned
with the desires of both service users and carers there is a paucity of care models which have
been shown to effectively increase involvement in care planning for physical health in this
] More broadly, increasing the quality of mental health services was the top research
priority expressed by an international working group comprising both professionals as well as
users and carers.[
Progress in the development of interventions to improve care planning involvement
between service users, carers, and providers is stymied by the lack of a meaningful outcome
assessment. Quantification of abstract subjective phenomena, such as involvement with care
planning, is best accomplished by directly assessing the perspective of the service user or carer;
usually using a tool commonly referred to as a patient-reported outcome measure (PREM).
2 / 15
Patient reported outcome measures are a efficient and accurate way to quantify the views of
service users and their carers. A relevant example is the EQUIP PREM, which was developed
by our group to assess service user and carer involvement in mental health care planning.[
Previous research has highlighted the importance of brief assessments for mental health service
users and their carers, with a strong user preference for minimising response burden by
developing shorter questionnaires.[
] New assessment modalities including computerized
adaptive testing (CAT), are able to tailor person-centred assessments to the individual, a process
which tends to result in shorter, more relevant assessments. 
The objective of the current paper is to create a novel PREM to assess mental health service
user and carer involvement in physical health care planning. We seek to develop a PREM that
is accurate, reliable, and suitable for individualized CAT assessment.
Item development methods
A set of 67 candidate items were developed following qualitative interviews with mental health
service users (SUs), their carers, and mental health professionals from the UK. Further details
of the qualitative interview process can be found in a separate manuscript.[
] Items were
developed to reflect six pre-identified themes; three of which covered general mental health
care planning requirements and three of which were unique to physical health care planning.
The general themes included: tailoring a collaborative working relationship between the
service users and their carers and the service providers, maintaining a trusting relationship with a
professional, having access to a tangible document which could be edited and updated. The
physical health themes were: valuing physical health equally with mental health, experiencing
coordinated care between health professionals in different disciplines, and having a
personalised physical heath discussion.
Potential participants who expressed an interest in taking part in the study were given a
participant information sheet written to current UK National Research Ethics Service (NRES)
guidelines. We worked with service users and carers to co-develop the information sheet. The
information sheet included details on the study including the potential risks and benefits of
taking part, the ways in which participants could take part in the study (e.g. online via
SelectSurvey or through the completion and return of paper versions of the questionnaire), and
provided potential participants with the contact details of researchers should they wish to discuss
their involvement prior to taking part. All participants responded affirmatively to the question
?I have read and understood the participant information sheet? and consent was implied by
the completion and return of questionnaires. The study and all associated procedures were
approved by the London?West London and Gene Therapy Advisory Committee (GTAC)
Research Ethics Committee (16/LO/0386) in February 2016.
We fitted data from nascent scale to the partial credit ?Rasch? model (PCM)[
] in order to
assess psychometric performance. We evaluated factor structure, scalability and monotonicity
by fitting data to non-parametric Mokken model before more rigorous psychometric
assessments using the PCM. The combination of the two methodologies has been shown to be
useful in previous research conducted by members of our group and others.[
scale data did not conform to the assumptions of either the Mokken or the partial credit model,
3 / 15
an iterative process of item reduction was undertaken to remove the violating items from the
analysis. The iterative process involved assessments of scalability, model and item fit to the
PCM, category threshold disordering, local dependency, and differential item functioning
(DIF). Each concept and the method by which it is assessed is described in greater detail below.
Mokken analysis. The Mokken model is a non-parametric extension of the simple
deterministic Guttman scaling model. [
] The model provides a framework to extend the
unreastically error-free Guttman models using probabilistic estimation, thus accounting for
] As a non-parametric item response theory (NIRT) model, the
Mokken models relax some assumptions of item response theory whilst affirming essential
assumptions such as unidimensionality and scalability.[
] We fitted data to the double monotonicity
model, a NIRT model which estimates a single parameter for each item (i.e., the level of the
construct which that item assesses). By successfully fitting scale data to a Mokken model it can
be said to be both unidimensional and properly scaled. We utilized parallel polychoric
principal component analysis which compared the experimental eigenvalues with a Monte Carlo
simulated eigenvalues to verify the unidimensional factor structure before proceeding to item
response theory analysis.[
The partial credit model. The PCM is a measurement model which describes the
probabalistic relationship between the assessment and the respondent as an interaction between the
amount of the latent construct that the respondent has (i.e. involvement with physical health
care planning) and the level of the latent construct which the item measures. Both the amount
of the construct that the respondent has and the level of the latent construct that the item
measures can be described in terms of theta (?). For example, a item which measures a very high
level of physical health care planning (which would be an question that we would not expect
many people to affirm; for example the questionnaire item may ask about service user or
carer?s access to a document containing a detailed strategy for physical health care) would be less
likely to be affirmed than an item measuring a low level of the latent construct (which would
be a question that we would expect many people to affirm; for example the questionnaire item
may ask about whether a health care professional had asked whether a service user was
receiving any care for physical health issues).
Goodness-of-fit statistics can be used to assess the data?s fit to the PCM model at both the
item and scale level. In this study we used both the Chi-square and root-mean square error of
approximation to evaluate the fit to the model. We accepted both a non-significant Chi-square
interaction (P > .05) and RMSEA (< .05) indicating good fit. [
Category threshold ordering. In the case of a Likert or ?multiple choice? item response
the probability of responding to each category is modelled separately. As the level of the
underlying construct (i.e., involvement in physical health care planning) rises the probability of
responding to each Likert category rises to a peak before falling. Different probabilities are
given for each response category at every level of ?. It is essential that each category becomes
the most likely response option at a certain level of ?. If this is not the case the item is said to
exhibit category threshold disordering.
Category threshold disordering refers to the situation in which one or more of the Likert
response categories are not the most likely response at any point of along the underlying ?
continuum. In the case of disordered category thresholds, we ?collapsed? adjacent categories so
that they received the same score. Care was taken not to collapse categories if it were
semantically illogical to do so, (i.e., ?Agree? would not be collapsed into ?Neither Agree nor Disagree?).
An illustrated side-by-side example is provided in a previous paper from our group. [
Item response theory models (of which the Rasch model is a special case) are predicated on
the assumption that differences in responses to items are driven solely by changes in the
] One way in which items can violate this assumption is local dependency,
4 / 15
a situation whereby the response to one item is dependent on the response to another. In
practice, this can occur where items are too similar. Local dependency is assessed using Yen?s
Q3 statistic, in which the correlation of item residuals are compared, and item pairs with
residual correlations beyond a threshold are said to be locally dependent. We set the threshold to be
equal to .2 + the average observed residual.[
Local dependency can be resolved in a number of ways, including subtesting (where locally
dependent items are joined into a ?super? item) and item deletion.[
] As we began with a
large bank of candidate items, we elected to remove items which were locally dependant. Our
strategy was to remove an item if it were locally dependent with more than one other item and
then, in the case that a locally independent item pair only demonstrated dependency with one
another, item information curves for each item were examined alongside the item wording
and the item which provided less information was removed from further analysis.
Another issue which can interfere with the assumption that differences in item scores ought
to be driven solely by differences in the underlying trait is differential item functioning (DIF).
] Differential item functioning occurs when the probability of a certain response to a
question varies across different demographic groups. For example, if men were more likely to
respond affirmatively to a certain item than women despite having an equal level of overall
involvment with physical health care planning, that question would be said to be affected by
DIF. We used the iterative hybrid ordinal logistic regression/item response theory approach to
conduct DIF analyses. For items flagged as having signficiant DIF following Bonferroni
correction, we used the McFadden pseudo R2 estimation with recommend cut-off of R2 > .035
being indicative of meaningful DIF.[
] By assessing DIF between service users and carers we
will explore the suitablility of the nascent PREM for both groups. Models were fitted with
missing data present. However, missing data were imputed using IRT-based estimation.[
the well-documented issues with model fit statistics, we prioritized meeting the assumptions of
the Mokken and Rasch models over model fit, as has been recommended elsewhere.[
Items that violated any of the above assumptions were removed, and the remaining items
were reanalyzed. We evaluated the reliability of the final scale and the overall fit to the Rasch
model. Once a final set of purified questions were calibrated by fitting them to the PCM, we
simulated computerized adaptive tests (CATs)[
]. The CAT algorithms conducted stepwise
assessments by iteratively selecting the item which will maximise the test information based on
the participant based on participant?s ? estimate which is based off their previous responses.
The first item for the assessment is the item which maximises information at the mean
population level of ?.
We simulated CATs to assess the viability of brief assessment using the nascent scale
using the final items as a ?bank? of candidate items. In computerized adaptive testing an
algorithm is used to select the next most appropriate item for the patients based on their
previous responses. This approach has been shown to substantially reduce the length of
tick-box assessments whilst maintaining, and even increasing, reliability.[
We similated CATs using the Firestar script for R. Firestar uses a Bayesian expected a
posteriori ? estimator and selected items based on the maximum posterior weighted information
(MPWI) criterion. The MPWI selects items based on the item information weighted by the
posterior distribution of trait/phenomena values.[
]This criterion has been shown to provide
excellent measurement information for CAT using polytomous items.
Analyses were conducted using the R Statistical Computing Environment with the ?mokken?,
?mirt?, ?lordif?, ?psych?, ?ggplot2?, ?methcomp? and ?BlandAltLeh? packages installed. [
5 / 15
Computer adaptive testing simulation was conducted using the FIRESTAR script, which was
modified to add additional statistics. This modified FIRESTAR code is available on request
from the authors.
We collected data from 267 mental health services users from the United Kingdom. 67
participants completed the 67 candidate questionnaire items a second time after two weeks. No data
were available on the number of participants who began the survey but did not complete it.
16% of PREM data was missing. Demographic information is displayed in Table 1.
Mokken analysis revealed violations of monotonicity for a number of items (5, 6, 8, 10, 25, 26,
40). In addition, Loevinger?s scalability coefficient was too low (Item H >.30) for items 7, 9,
39. The 57 remaining items were free from violations of monotonicity and were
unidimensional. Parallel principal component and factor analysis confirmed the unidimensional
structure of the dataset as the eigenvalue for the second factor/component (2.87, 2.16) was below
simulated eigenvalues in the Monte Carlo dataset (1.50, 1.19).
The remaining items were fitted to the partial credit model. The initial fit to the model was
poor (RMSEA > .10); thus prompting evaluation of item performance in the context of Rasch
model assumptions. There appeared to be substantial threshold disordering throughout the
scale. A single solution was chosen to rescore all items (0-1-1-2-2). The amended threshold
probability curves for all the items can be see in S1 Table.
Fig 1. Comparison of item information curves for locally dependent items.
Model fit improved slightly after rescoring but was still unacceptable (RMSEA = .097 (95%
CI = .091-.10)). We evaluated the correlations between item residuals, which were above the
threshold in a number of instances. In total, 96 item pairs were locally dependant (see S2
Table). A total of 27 individual items that displayed local dependency with more than one
other item were removed from the analysis. Four sets of items remained which were locally
dependant with one another. The item information curves for both pairs of locally-dependent
items were compared side-by-side (see Fig 1) and in each case the item with the lowest item
information removed from the scale.
Fig 1 Shows item information curves for a pair of locally-dependent items. The amount of
information which each gives about the participant is given on the y-axis and the level of
underlying construct that the person has (i.e., involvement with physical health care planning)
is on the x-axis.
No DIF was detected for age or gender but item 65 ?My care planning team communicates
effectively? was found to have significant DIF (R2 change = .06) between service users and
7 / 15
Following adjustment for category threshold ordering, local dependency, and differential
item functioning; item 36 ?I feel comfortable attending discussions about my care plan? misfit
the model and was removed (x2 = 34.81, df = 15, P = .003). The removal of item 36 led to a
final item bank of 19 items which were free from breaches of assumptions of the Rasch model,
displayed excellent model fit (x2 = 192.94, df = 1515, P = .02, RMSEA = .03 (95% CI = .01-.04).
The 19-item bank had excellent reliability (marginal r = 0.87). Details of final items, including
parameters, are given in Table 2.
PLOS ONE | https://doi.org/10.1371/journal.pone.0206507
8 / 15
The correlation between theta scores at baseline and 2-week follow-up was high (r = .70, P <
.01). Bland Altman analysis revealed that 94.9% of assessment pairs were within the 95% limits
of agreement (see Fig 2).
Computerized adaptive testing
Adaptive testing simulations were conducted with a simulated Gaussian N (-0.08,1.90)
distribution, which matched the distribution of the data used to develop the item banks. Results of
Fig 2. Bland Altman plot for test-retest reliability.
9 / 15
CAT simulation are shown in the Table 3. Assessments as short as 10 items demonstrated high
correlation with the total score of the full scale. The overall information and standard error
which was available in the entire item banks is displayed in Fig 3.
We present the co-development and validation of a new service user and carer-reported
assessment for physical health care planning in serious mental health services, the EQUIP Physical
Health PREM (EQUIP-PH-PREM).
The new PREM contains 19 items which were successfully fitted to a single-parameter
Rasch item response theory model. The PREM is suitable for assessing both service users and
carers. The 19 items also serve as an item bank for computerized adaptive testing (CAT) which
can tailor assessments to the individuals who complete the PREM. We show that using CAT
administration could substantially reduce burden of response by reducing the number of
items in the assessment from 19 to 10, whilst still maintaining acceptable accuracy and high
correlation between scores.
Fig 3. Overall scale information and standard error. Key?SE = Standard Error.
10 / 15
In the EQUIP-PH-PREM we provide a tool to support investigations into the experience of
service users and carers who are receiving care for a severe mental illness from mental health
providers. Adequate service user and carer involvement in care planning decisions are
predicated on successful interaction both within and between stakeholder groups. In order to
ensure the new PREM incorporated these important aspects the items were developed in
collaboration with service users, carers and mental health professionals.
The final PREM items include those that cover having the opportunity and time to be able
to discuss physical health concerns, reflecting previously identified organisational barriers to
providing integrated care.[
] Similarly, they highlight the importance of co-created care
plans, which are known to be highly valued by both service users and carers [
items serve to facilitate long-term self-management skills that are required to manage physical
This new PREM has operationalized the evidence-based best practice framework developed
previously which will allow health care providers and service users to challenge current
practice by quantifying service user and carer involvement from the user perspective.[
The measure will facilitate benchmarking of service quality and service user experience,
aligned with contemporary philosophies and policies for collaborative recovery-focused
mental health care. The philosophy of the new PREM is that mental and physical health are equally
important (the so-called parity of esteem), and parity of esteem is increasingly being embedded
in policy and practice imperatives derived from stakeholder consultation.[
The EQUIP-PH PREM assesses issues which have been consistently highlighted in
consultations with service users and carers and, as such, is well suited for use as an tool to assess the
outcome of interventions. Other relevant interventions include those designed to improve
inter- and intra- professional communication including professional training and improved
health systems to enhance the integration and continuity of care for those under the care of
The current study has some limitations. Firstly, our dataset consisted of predominantly
white, female service users. Though all systematic differences between demographic groups
were corrected for in the current analysis, further research would be warranted to ensure that
the items perform well in groups which were not well represented in our data. It should be
noted that whilst we demonstrated uniform scale performance across demographic groups?
including service users and carers, we did not collect information relating to comorbidities,
physical activity or substance and further research would be necessary to explicitly confirm
that the scale is unaffected by differences in disease or lifestyle factors within groups of service
Our study is also limited by the necessity to evaluate the CATs using simulated, rather than
actual, data. This technique is likely to the slightly over-estimate the accuracy of the CAT as it
does not take into account aberrant responders who do not conform to the expectations of the
model. Our previous research developing item banks for depression and quality of life suggests
that this effect is marginal and that CAT assessment is efficient and precise both when
simulations are made using participant data and when the CAT is deployed in the real world.[
It is noteworthy that when administering CATs each individual respondent is likely to
complete different combinations of items which form a subset of the complete item bank. Though
the scores between the unidimensional CAT and the fixed-length short-form are highly
correlated, there is no guarantee that every patient will complete items from each of the content
domains which were nominated by service users and carers. In the current manuscript, we
prioritize brevity and accuracy and simulate CAT administration without content balancing or
prioritizing certain items. We acknowledge that other users may prioritize item exposure and
thus may utilize CATs differently.
11 / 15
Parties who wish to use CAT administration for the EQUIP-PH measure are directed
towards many packages available for the R Statistical Programming Environment including
mirt and catR.[
] One tool for implementing CATs is the Concerto platform, developed
and maintained by the University of Cambridge. Further details can be found on the
Concerto website (concertoplatform.com) or by request to the authors of this manuscript.
Our study also has some notable strengths. We have collected a geographically diverse
group of both service users and carers and created a flexible assessment which can be used
without modification of assessing and comparing both groups. The EQUIP-PH PREM which
we have developed is related to the EQUIP measure, a questionnaire measure for service user
and carer involvement in care planning, which was recently developed by our group[
tools could be used together to gain a holistic understanding of how involved service users and
carers are in mental health care planning. Further research could usefully be conducted to
understand the scores from the two instruments in relation to one another and provide further
insight into their use as a tool to assess global care planning and service delivery.
In conclusion, The EQUIP-PH PREM is a brief, accurate, and flexible service user- and
carer-reported assessment of involvement in physical health care planning for users of mental
health services with serious mental illnesses. The measure provides a reliable means to evaluate
and benchmark the quality of physical health management in the context of mental health
S1 Table. Anonymized baseline dataset.
S2 Table. Anonymized follow up dataset.
This research was funded by the National Institute for Health Research Collaboration for
Leadership in Applied Health Research and Care (NIHR CLAHRC) Greater Manchester. The
views expressed in this article are those of the author(s) and not necessarily those of the NHS,
the NIHR, or the Department of Health and Social Work. The funders had no role in study
design, data collection and analysis, decision to publish, or preparation of the manuscript.
Conceptualization: Chris J. Sidey-Gibbons, Helen Brooks, Judith Gellatly, Karina Lovell,
Data curation: Helen Brooks, Judith Gellatly, Nicola Small, Penny Bee.
Formal analysis: Chris J. Sidey-Gibbons, Helen Brooks, Nicola Small, Karina Lovell, Penny
Funding acquisition: Karina Lovell.
Investigation: Chris J. Sidey-Gibbons.
Methodology: Chris J. Sidey-Gibbons, Penny Bee.
Project administration: Judith Gellatly, Karina Lovell, Penny Bee.
Resources: Chris J. Sidey-Gibbons, Karina Lovell, Penny Bee.
12 / 15
Visualization: Chris J. Sidey-Gibbons.
Writing ? original draft: Chris J. Sidey-Gibbons, Helen Brooks, Judith Gellatly, Penny Bee.
Writing ? review & editing: Chris J. Sidey-Gibbons, Helen Brooks, Judith Gellatly, Nicola
Small, Karina Lovell, Penny Bee.
13 / 15
14 / 15
1. Harris EC , Barraclough B . Excess mortality of mental disorder . Br J Psychiatry. The Royal College of Psychiatrists; 1998 ; 173 : 11 - 53 . https://doi.org/10.1192/BJP.173.1.11 PMID: 9850203
2. Walker ER , McGee RE , Druss BG . Mortality in Mental Disorders and Global Disease Burden Implications. JAMA Psychiatry . American Medical Association; 2015 ; 72 : 334 . https://doi.org/10.1001/ jamapsychiatry. 2014 .2502 PMID: 25671328
3. Rodgers M , Dalton J , Harden M , Street A , Parker G . Integrated care to address the physical health needs of people with severe mental illness: a rapid review . 2016 ; Available: https://www.ncbi.nlm.nih. gov/books/NBK355962/
4. Brown S , Birtwistle J , Roe L , Thompson C. The unhealthy lifestyle of people with schizophrenia . Psychol Med . 1999 ; 29 : 697 - 701 . https://doi.org/10.1017/S0033291798008186 PMID: 10405091
5. Doyle C , Lennox L , open DB -B, 2013 undefined. A systematic review of evidence on the links between patient experience and clinical safety and effectiveness . bmjopen.bmj.com. Available: http://bmjopen. bmj.com/content/3/1/e001570.short
6. Care Quality Commission . Survey of Mental Health Inpatient Services . 2009 .
7. Bee P , Brooks H , Fraser C , Lovell K. Professional perspectives on service user and carer involvement in mental health care planning: a qualitative study . Int J Nurs Stud . 2015 ; 52 : 1834 - 1835 . Available: http://www.sciencedirect.com/science/article/pii/S0020748915002308 https://doi.org/10.1016/j. ijnurstu. 2015 . 07 .008 PMID: 26253574
8. Fiorillo A , Luciano M , Del Vecchio V , Sampogna G , Obradors-Tarrago ? C, Maj M . Priorities for mental health research in Europe: A survey among national stakeholders' associations within the ROAMER project . World Psychiatry. Wiley-Blackwell; 2013 ; 12 : 165 - 170 . https://doi.org/10.1002/wps.20052 PMID: 23737426
9. Bee P , Gibbons C , Callaghan P , Fraser C , Lovell K. Evaluating and Quantifying User and Carer Involvement in Mental Health Care Planning (EQUIP): Co-Development of a New Patient-Reported Outcome Measure . PLoS One. Public Library of Science; 2016 ; 11 : e0149973. https://doi.org/10.1371/journal. pone. 0149973 PMID: 26963252
10. Gibbons CJ , Bee PE , Walker L , Price O , Lovell K. Service user- and carer-reported measures of involvement in mental health care planning: methodological quality and acceptability to users . Front psychiatry . 2014 ; 5 : 178 . https://doi.org/10.3389/fpsyt. 2014 .00178 PMID: 25566099
11. Gibbons C , Bower P , Lovell K , Valderas J , Skevington S. Electronic quality of life assessment using computer-adaptive testing . J Med Internet Res . 2016 ; 18 : e240. https://doi.org/10.2196/jmir.6053 PMID: 27694100
12. Small N , Brooks H , Grundy A , Pedley R , Gibbons C , Lovell K , et al. Understanding experiences of and preferences for service user and carer involvement in physical health care discussions within mental health care planning . BMC Psychiatry . 2017 ; 17 : 138 . https://doi.org/10.1186/s12888-017 -1287-1 PMID: 28407746
13. Rasch G . Probabilistic Models for Some Intelligence and Attainment Tests . Copenhagen: Danish Institute for Educational Research; 1960 .
14. Masters GN . A rasch model for partial credit scoring . Psychometrika . 1982 ; 47 : 149 - 174 . https://doi. org/10.1007/BF02296272
15. Mokken RJ . A Theory and Procedure of Scale Analysis: With Applications in Political Research [Internet] . Walter de Gruyter; 1971 . Available: https://books.google.com/books?hl=en&lr=& id= vAumIrkzYj8C&pgis=1
16. Meijer RR , Sijtsma K , Smid NG . Theoretical and Empirical Comparison of the Mokken and the Rasch Approach to IRT . Appl Psychol Meas. Sage PublicationsSage CA: Thousand Oaks, CA; 1990 ; 14 : 283 - 298 . https://doi.org/10.1177/014662169001400306
17. Stochl J , Jones PB , Croudace TJ . Mokken scale analysis of mental health and well-being questionnaire item responses: a non-parametric IRT method in empirical research for applied health researchers . BMC Med Res Methodol . 2012 ; 12 : 74 . https://doi.org/10.1186/ 1471 -2288-12-74 PMID: 22686586
18. Loe BS , Stillwell D , Gibbons C . Computerized Adaptive Testing Provides Reliable and Efficient Depression Measurement Using the CES-D Scale . J Med Internet Res. JMIR Publications Inc.; 2017 ; 19 : e302. https://doi.org/10.2196/jmir.7453 PMID: 28931496
19. Gibbons CJ , Small N , Rick J , Burt J , Hann M , Bower P. The Patient Assessment of Chronic Illness Care produces measurements along a single dimension: results from a Mokken analysis . Health Qual Life Outcomes . BioMed Central; 2017 ; 15 : 61 . https://doi.org/10.1186/s12955-017 -0638-4 PMID: 28376878
20. Pallant J , Tennant A . An introduction to the Rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS) . Br J Clin Psychol . 2007 ; 46 : 1 - 18 . PMID: 17472198
21. Guttman L. The basis for scalogram analysis . 1949 ; Available: https://scholar.google.co.uk/scholar?hl = en&q=The+basis+for+Scalogram+analysis&btnG=&as_sdt=1%2C5&as_sdtp=#0
22. van Schuur WH. Mokken Scale Analysis: Between the Guttman Scale and Parametric Item Response Theory . Polit Anal . 2003 ; 11 : 139 - 163 . https://doi.org/10.1093/pan/mpg002
Watkins MW. Determining Parallel Analysis Criteria . J Mod Appl Stat Methods . 2005 ; 5 : 344 - 346 . https://doi.org/10.22237/jmasm/1162354020
24. Abdi H , Williams LJ . Principal component analysis . Wiley Interdiscip Rev Comput Stat . Wiley-Blackwell; 2010 ; 2 : 433 - 459 . https://doi.org/10.1002/wics.101
25. Browne , M. W. , & Cudeck R. Alternative ways of assessing model fit . In: 1993 .
26. Gibbons C , Bower P , Lovell K , Valderas J , Skevington S , Bower P. Electronic Quality of Life Assessment Using Computer-Adaptive Testing . J Med Internet Res. BioMed Central ; 2016 ; 18 : e240. https:// doi.org/10.2196/jmir.6053 PMID: 27694100
27. Hambleton R , Swaminathan H , Rogers H . Fundamentals of Item Response Theory . Newbury Park, CA: Sage; 1991 .
Wright B. Local dependency, correlations and principal components . Rasch Meas Trans . 1996 ; 10 : 509 - 511 .
29. Yen WM . Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model . Appl Psychol Meas. Sage PublicationsSage CA: Thousand Oaks, CA; 1984 ; 8 : 125 - 145 . https://doi.org/10.1177/014662168400800201
30. Christiansen K , Maransky G , Horton M. Critical Values for Yen's Q3: Identification of Local Dependence in the Rasch Model Using Residual Correlations . Appl Psychol Meas . 2017 ; 41 : 179 - 194 .
31. Lundgren N , medicine AT -J of rehabilitation, 2011 undefined. Past and present issues in Rasch analysis: the functional independence measure (FIMTM) revisited . europepmc.org. Available: http:// europepmc.org/abstract/med/21947180
32. Holland P , Wainer H. Differential item functioning . Hillsdale, New Jersey: Lawrence Erlbaum Associates, Inc.; 2012 .
33. Teresi J . Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics . Med Care . 2006 ; Available: http://journals.lww.com/lwwmedicalcare/Abstract/2006/11001/Different_Approaches_to_Differential_Item.21.aspx
34. Chalmers R. mirt: A multidimensional item response theory package for the R environment . J Stat Softw . 2012 ; Available: https://scholar.google.co.uk/scholar?hl =en&q=Chalmers+MIRT&btnG=&as_ sdt=1%2C5&as_sdtp=#0
35. Reeve BB , Hays RD , Bjorner JB , Cook KF , Crane PK , Teresi JA , et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care . LWW; 2007 ; 45: S22 - S31 . https://doi.org/10.1097/ 01.mlr. 0000250483 .85507.04 PMID: 17443115
36. Wainer H , Dorans N , Flaugher R , Green B . Computerized adaptive testing: A primer [Internet] . 2000 . Available: https://books.google.co.uk/books?hl =en&lr=&id=73d9AwAAQBAJ&oi=fnd&pg=PP1&dq= Computerized+adaptive+testing:+A+Primer&ots=OtILaYiPRd&sig= y5BNptGGWjRqnwE3G9P4miUlzXo
37. Choi SW , Swartz RJ . Comparison of CAT Item Selection Criteria for Polytomous Items . Appl Psychol Meas . 2009 ; 33 : 419 - 440 . https://doi.org/10.1177/0146621608327801 PMID: 20011456
38. Ark L Van der . Mokken scale analysis in R. J Stat Softw . 2007 ; 20 : 1 - 19 . Available: http://www.jstatsoft. org/v20/a11/paper
39. Choi S , Gibbons L , Crane P. Lordif : An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo . J Stat Softw . 2011 ; Available: http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3093114/
40. RDC Team. R: A language and environment for statistical computing . [Internet]. Vienna, Austria; Available: http://www.r-project.org
41. Revelle W. psych: Procedures for personality and psychological research . Northwest Univ Evanston R Packag version. 2014 ; Available: https://scholar.google.co.uk/scholar?hl =en&q=psych%3A +Procedures+for+Personality+ and+Psychological+Research&btnG=& as_sdt=1%2C5&as_sdtp=#0
42. Carstensen B. The MethComp Package for R. Comparing Clinical Measurement Methods . Chichester, UK: John Wiley & Sons, Ltd; 2010 . pp. 149 - 152 . https://doi.org/10.1002/9780470683019.ch13
43. Choi S. Firestar: Computerized adaptive testing simulation program for polytomous item response theory models . Appl Psychol Meas . 2009 ; Available: http://media.metrik.de/uploads/incoming/pub/ Literatur/2009_FirestarComputerizedAdaptiveTestingSimulationProgramforPolytomousItemResponseTheoryModels,Choi. pdf
44. Grundy AC , Bee P , Meade O , Callaghan P , Beatty S , Olleveant N , et al. Bringing meaning to user involvement in mental health care planning: a qualitative exploration of service user perspectives . J Psychiatr Ment Health Nurs . 2016 ; 23 : 12 - 21 . https://doi.org/10.1111/jpm.12275 PMID: 26634415
45. Cree L , Brooks HL , Berzins K , Fraser C , Lovell K , Bee P. Carers ' experiences of involvement in care planning: a qualitative exploration of the facilitators and barriers to engagement with mental health services . BMC Psychiatry . BioMed Central; 2015 ; 15 : 208 . https://doi.org/10.1186/s12888-015-0590-y PMID: 26319602
46. Millard C , Wessely S. Parity of esteem between mental and physical health . BMJ . British Medical Journal Publishing Group; 2014 ; 349 : g6821. https://doi.org/10.1136/BMJ.G6821 PMID: 25398394
47. Magis D , Ra??che G. catR An R Package for Computerized Adaptive Testing . Appl Psychol Meas . 2011 ; Available: http://apm.sagepub.com/content/35/7/576.short
48. Centre Psychometrics . Concerto Adaptive Testing Platform . Cambridge: University of Cambridge; 2013 .