Verifying the validity and reliability of the Japanese version of the Face, Legs, Activity, Cry, Consolability (FLACC) Behavioral Scale
Verifying the validity and reliability of the Japanese version of the Face, Legs, Activity, Cry, Consolability (FLACC) Behavioral Scale
Yujiro Matsuishi 0 1
Haruhiko Hoshino 0 1
Nobutake Shimojo 0 1
Yuki Enomoto 0 1
Takahiro Kido 0 1
Tetsuya Hoshino 0 1
Masahiko Sumitani 1
Yoshiaki Inoue 0 1
0 Department of Emergency and Critical Care Medicine, Faculty of Medicine, University of Tsukuba , Tsukuba, Ibaraki , Japan , 2 Pediatric Intensive Care Unit, University of Tsukuba Hospital , Tsukuba, Ibaraki , Japan , 3 University of Tsukuba Hospital, Department of Pediatrics , Tsukuba, Ibaraki , Japan , 4 Department of Anesthesiology and Pain Relief Center, University of Tokyo Hospital , Tokyo , Japan
1 Editor: Kazutaka Ikeda, Tokyo Metropolitan Institute of Medical Science , JAPAN
Pediatric patients, especially in the preverbal stage, cannot self-report intensity of pain therefore several validated observational tools, including the Face, Legs, Activity, Cry, Consolability (FLACC) Behavioral Scale, have been used as a benchmark to evaluate pediatric pain. Unfortunately, this scale is currently unavailable in Japanese, precluding its widespread use in Japanese hospitals.
Data Availability Statement: All relevant data are
within the paper and its Supporting Information
Funding: This study was supported by the Health
Labour Science Research Grant from the Japanese
Ministry of Health, Labour and Welfare
(H26Kakushintekigan-ippan-060) to MS. The funder had
no role in study design, data collection and
analysis, decision to publish, or preparation of the
To translate and verify the validity and reliability of the Japanese version of the FLACC
Back-translation was first conducted by eight medical researchers, then an available sample
of patients at the University of Tsukuba Pediatric Intensive Care Unit (from May 2017 to
August 2017) was enrolled in a clinical study. Two researchers evaluated the validity of the
translated FLACC Behavioral Scale by weighted kappa coefficient and intraclass correlation
coefficients (ICC). Observational pain was simultaneously measured by the visual analog
scale (VAS obs) and reliability was evaluated by correlation analysis.
The original author approved the translation. For the clinical study, a total of 121
observations were obtained from 24 pediatric patients. Agreement between observers was highly
correlated for each of the FLACC categories (Face: κ = 0.85, Leg: κ = 0.74, Activity:
κ = 0.89, Cry: κ = 0.93, Consolability: κ = 0.93) as well as the total score (Total: κ = 0.95,).
Correlation analysis demonstrated a good criterion validation between the FLACC scale
and the VAS obs. (r = 0.96)
Competing interests: The authors have declared
that no competing interests exist.
Our Japanese version of the FLACC Behavioral Scale shows high validity and reliability.
Relief of pain is a basic human right regardless of expressive ability and, in a concerning trend,
several studies have reported that patients in the pediatric intensive care unit (PICU) [
require more invasive procedures compared to the general ward. Additionally, painful
procedures such as heel sticks and venous arterial punctures are frequently performed in PICU
which would logically indicate higher pain levels in these settings . However, pediatric
nurses are often challenged to identify pain at the preverbal development stage and efforts to
do so are further complicated in critically ill patients undergoing sedation and mechanical
ventilation. To solve this situation, several validated observational tools, including the Face, Legs,
Activity, Cry, Consolability (FLACC) Behavioral Scale , have been developed for pediatric
patients in intensive care settings. The FLACC Behavioral Scale has the advantages of both
wide recognition and distribution (it is available in several languages) and previous studies
have reported high reliability and validity in assessing acute pain for pediatric patients [
However, to this point in time, reliable assessment tools for detecting pediatric pain, such as
the FLACC Behavioral Scale, have been unavailable in Japanese hospitals due to language
barriers. Thus, the aims of the present study are to translate the FLACC Behavioral Scale using
the back-translation method and to analyze the reliability and validity of this new Japanese
Prior to the beginning of the study, written permission to translate the FLACC Behavioral
Scale was obtained from the developer (Ms. Sandra Merkel) and we received an Academic/
Non-Profit license from the University of Michigan. Translation was conducted using the back-translation method. This method is a widely accepted method that maintains the overall literature and meaning between the original and translated versions. The translation process of the FLACC Behavioral Scale was as follows (Fig 1).
In the first step, the principal researcher created a tentative English to Japanese version.
Next, we submitted this tentative version to a second set of translators that consisted of both a
Japanese who had been a nurse in the U.S. and a native speaker of American English. In the
third step, eight medical workers (including two clinical researchers, two intensive medical
doctors, two pediatric doctors and two nurses working at PICU) discussed the differences
observed in all individual translations, back translated the document from English to Japanese,
and then resubmitted this to the translators described above. For consistency in translation as
well as reduction in variability between multi-disciplinary medical staff, eight local medical
workers carefully checked any possible differences between the original and back-translated
versions. Every effort was made to carefully execute all the steps in order to avoid the loss of
the original content due to cultural differences. After completion, the final document was then
checked and approved by the original author (Ms. Sandra Merkel). Technical details of these
process was shown in our previous reports [
2 / 9
Fig 1. Back translation method. Flow of the back translation method used to translate the Face, Legs, Activity, Cry, Consolability (FLACC) Behavioral Scale.
The second and third translation steps previously described above were repeated once.
Although minor changes between the tentative and completed versions were needed to address nuances in Japanese meaning, there were no major changes. The completed version was checked and confirmed by the original author and sited on website .
Validation and reliability study
We performed a validation and reliability study using our newly-established Japanese version
of the FLACC Behavioral Scale. We enrolled a number of patients from the PICU at the
University of Tsukuba Hospital from May to August, 2017 on every Wednesday, and we exclude
patient using muscle relaxants. We recorded baseline characteristics, including age, sex,
diagnosis for PICU admission, ventilation status, withdrawal syndrome as assessed by The
Withdrawal Assessment ToolÐVersion 1 (WAT-1) , delirium as assessed by the Cornell
Assessment of Pediatric Delirium (CAPD)  and severity calculated by Pediatric Index of
Mortality 2 (PIM2) . Additional evaluation of the FLACC Behavioral Scale was done by two
researchers who objectively and simultaneously measured pain by the observational visual
analog scale (VAS obs) for each patient. VAS obs is the method which observers estimate subject
symptoms by observation. Using VAS obs for neonate and child is previously reported [
and Correration between FLACC Behavioral Scale and VAS obs is measured by correration
analysis. Acoording to Guilford's Rule of Thumb , we consider correlation coefficients of
3 / 9
less than 0.20 as "slight almost negligible relationships", 0.20 to 0.40 as "low correlation;" 0.40
to 0.70 as "moderate correlation;" 0.70 to.90 as "high correlation" and greater than 0.90 as "very
high correlation". Main researcher was blind to the score of the other and VAS obs was
evaluated before the FLACC Behavioral Scale to remove any bias.
Adequate sample size and variability change depending on the cohort. Thus, we calculated our
needed sample size based on reliability as previously published [
]. Based on this previous
], agreement between observers is taken as an estimate of strong correlation (r = 0.7).
We determined that a sample size of 17 patients would be required for a significance level (α) of 0.05 and test power (1-β) of 0.90 .
Agreement between observers for each of the five FLACC categories was evaluated by weighed
Cohen's kappa coefficient which is commonly used for summarizing the cross-classification of
ordinal variables with identical categories . It allows the use of weights to describe the
closeness of agreement between categories. We additionally examined inter-rater agreement
(concordance) by the widely-used intraclass correlation coefficient (ICC) [
] that contains 10
model groups that can be chosen based on purpose [
]. For this study, we selected the
twoway random-effects model (absolute agreement with multiple raters/measurements (2, k)) [
to generalize our reliability results.
To assess the validity criterion, agreement between VAS obs and the FLACC Behavioral
Scale was evaluated by correlation analysis. All statistical analyses were performed using SPSS version 24 (SPSS, Inc., Chicago, IL). Values under 0.05 were considered statistically significant.
This study was approved by the Institutional Review Board (IRB) of the University of Tsukuba
Hospital and written informed consent was obtained from patients or legally designated representatives (such as family) prior to study.
From May to August, 2017, total of 121 observations were obtained from 24 pediatric patients. Table 1 presents baseline patient study characteristics.
The median age at enrollment was 38 months (± 47), 45% of the patients were male and
50% of the total pool of patients received at least one day of mechanical ventilation. The PIM2
average was 1.6 (± 5.4) and the prevalence of delirium was 30%. No withdrawal syndrome was
noted in any patient. The primary medical diagnosis for PICU admission was cardiac surgery
Agreement between observers was highly correlated for each of the FLACC categories (Face:
κ = 0.85, 95%CI [0.73±0.96], Leg: κ = 0.74, 95%CI [0.55±0.94], Activity: κ = 0.89, 95%CI
[0.73±1.0], Cry: κ = 0.93, 95%CI [0.8±1.0], Consolability: κ = 0.93, 95%CI [0.8±1.0]) as well as
total score (Total: κ = 0.95, 95%CI [0.91±0.98]). The categories of Cry and Consolability show
the highest agreement between observers. The reliability of the FLACC Behavioral Scale is
4 / 9
slightly higher in patients who did not receive mechanical ventilation versus those who did
(Non-Mechanical Ventilation group: κ = 0.93, 95%CI [0.86±1.0] vs. Mechanical Ventilation
group: κ = 0.91, 95%CI [0.83±0.99]). Inter-rater agreement, as evaluated by ICC (2, k)
calculations, returned a similar result to Cohen's weighted Kappa coefficient. (Table 2)
The FLACC Behavioral Scale score was very highly correlation with VAS obs (r = 0.96).
Total number of times N = 121
With Mechanical ventilation N = 70
Without Mechanical ventilation N = 51
a: Data are kappa coefficient [95% confidence interval]
b: Data are Intra class correlation coefficient [95% confidence interval]
5 / 9
Fig 2. Criterion validity on Japanese version of FLACC Behavioral Scale. Correlation analysis between observational visual analog scale (VAS obs) and FLACC
Behavioral Scale. FLACC Behavioral Scale score significantly correlated with VAS obs. (r = 0.96).
Both of mechanically and non-mechanically ventilated patients were very highly correlation
(Non-Mechanical Ventilation group: r = 0.96, Mechanical Ventilation group: r = 0.95).
The present study is the first to translate the FLACC Behavioral Scale from English to Japanese
by using the back-translation method. As a previous study mentioned that direct translation
does not guarantee sufficient equivalency [
], we therefore used the back-translation method
and included a multi-disciplinary committee to remedy content variance. Of particular
concern were medical terms and delicate nuances that might be hard to understand for laymen so
we chose a Japanese nurse with certification and work experience in the U.S as well as a native
speaker of American English. Additionally, we performed a criterion validation and reliability
study for the completed translation. As language barriers often prevent useful medical
evaluation standards from being propagated internationally, we hope that our present method could
be applied to other medical translation efforts. In the original study, the FLACC Behavioral
scale showed a high correlation between observers (r = 0.92), however diverse studies have
shown a wide-ranging moderate to high reliability [20±22]. In this report, we show that our
Japanese version has both high criterion validation and reliability in assessing pain for the
patients in PICU. A previous study showed that the Cry category poorly correlated with other
categories, most likely because of intubation [
]. Our results show high reliability (κ = 1.0,
ICC = 1) in mechanically ventilated patients and relatively low reliability in non-mechanically
ventilated patients (κ = 0.65, ICC = 0.79). This might be attributed to translation errors or
6 / 9
cohort differences. As for translation, there are no cultural differences in the concept or
language of crying between English and Japanese, so this could be ruled out. However, the fact
that the primary diagnosis category of participants was cardiac surgery (45%) leads to the
assumption that patients in need of mechanical ventilation might have a more severe condition
that requires sedation. Thus, they are not vigorous enough to cry and are therefore more
difficult to accurately assess in comparison with non-mechanically ventilated patients.
Correlation analysis demonstrated a solid criterion validation between the FLACC scale
and the VAS obs (r = 0.92). In the previous studies, the FLACC Behavioral Scale was compared
with other observable behavioral pain scales such as the Children's Hospital of Eastern Ontario
Pain Scale (CHEOPS), the Children's and Infants Post Operative Pain Scale (CHIPPS), and the
Objective Pain Scale (OPS) [20,23]. However, as Japanese hospitals do not currently use any of these observable scales, we thusly chose the VAS obs which is considered a simple assessment scale . Our present results are in line with the original author's results .
Our findings were limited by the use of a non-randomized participant pool that was chosen
primarily by availability during the study period which may reduce the generalizability of our
findings. Additionally, some numbers of measurements could not estimate patients pain,
because of response to clinical emergency situation. We included various diagnostic categories
to reflect intensive care settings but the resulting sample sizes might be insufficient for
analyzing specific cohorts within each diagnostic condition.
We established a novel Japanese version of the Face, Legs, Activity, Cry, Consolability (FLACC) Behavioral Scale through back-translation, and clinically tested for the patients in our PICU. High criterion validity and reliability were confirmed through our prospective study.
S1 File. This file contains all the data reported in the results.
This study was supported by the Health Labour Science Research Grant from the Japanese
Ministry of Health, Labour and Welfare (H26-Kakushintekigan-ippan-060) to MS. The fund
ers had no role in study design, data collection and analysis, decision to publish, or preparation
of the manuscript. We would like to thank Dr. Bryan J. Mathis of the University of Tsukuba
Medical English Communication Center for critical reading of this manuscript.
Conceptualization: Yujiro Matsuishi.
Data curation: Haruhiko Hoshino.
Formal analysis: Nobutake Shimojo.
Investigation: Yujiro Matsuishi, Haruhiko Hoshino, Takahiro Kido, Tetsuya Hoshino.
Methodology: Yujiro Matsuishi.
7 / 9
Project administration: Yujiro Matsuishi, Nobutake Shimojo, Yoshiaki Inoue.
Resources: Nobutake Shimojo.
Software: Yujiro Matsuishi, Yuki Enomoto.
Validation: Yuki Enomoto.
Visualization: Yujiro Matsuishi.
Writing ± original draft: Yujiro Matsuishi.
Supervision: Nobutake Shimojo, Yuki Enomoto, Masahiko Sumitani, Yoshiaki Inoue.
8 / 9
1. Barker DP , Rutter N. Exposure to invasive procedures in neonatal intensive care unit admissions . Arch Dis Child Fetal Neonatal Ed . 1995 ; 72 ( 1):F47±8 . PMID: 7743285
2. Carbajal R , Rousset A , Danan C . Epidemiology and treatment of painful procedures in neonates in intensive care units . JAMA . 2008 ; 300 ( 1 ): 60 ± 70 . https://doi.org/10.1001/jama.300.1.60 PMID: 18594041
3. Merkel SI , Voepel-Lewis T , Shayevitz JR , Malviya S. The FLACC : a behavioral scale for scoring postoperative pain in young children . Pediatr Nurs . 23 ( 3 ): 293 ± 7 . PMID: 9220806
4. Kabes AM , Graves JK , Norris J . Further validation of the nonverbal pain scale in intensive care patients . Crit Care Nurse . 2009 ; 29 ( 1 ): 59 ± 66 . https://doi.org/10.4037/ccn2009992 PMID: 19182281
5. Matsuishi Y , Hoshino H , Shimojo N , Enomoto Y , Kido T , Jesmin S , et al. Development of the Japanese version of the Preschool Confusion Assessment Method for the ICU . Acute Med Surg . 2017 ; 1 ± 4 .
6. Matsuishi Y. Japanese version of The Face, Legs, Activity, Cry, Consolability (FLACC) Behavioral Scale . http://www.md.tsukuba.ac.jp/clinical-med/e-ccm/_src/317/FLACC_Japanese_HP.pdf
7. Franck LS , Harris SK , Soetenga DJ , Amling JK , Curley MAQ . The Withdrawal Assessment Tool±1 (WAT±1): An assessment instrument for monitoring opioid and benzodiazepine withdrawal symptoms in pediatric patients* . Pediatr Crit Care Med . 2008 ; 9 ( 6 ): 573 ± 80 . https://doi.org/10.1097/PCC. 0b013e31818c8328 PMID: 18838937
8. Traube C , Silver G , Kearney J , Patel A , Atkinson TM , Yoon MJ , et al. Cornell Assessment of Pediatric Delirium. Crit Care Med . 2014 ; 42 ( 3 ): 656 ± 63 . https://doi.org/10.1097/CCM.0b013e3182a66b76 PMID: 24145848
9. Slater A , Shann F , Pearson G. PIM2: a revised version of the Paediatric Index of Mortality . Intensive Care Med . 2003 ; 29 ( 2 ): 278 ± 85 . https://doi.org/10.1007/s00134-002 -1601-2 PMID: 12541154
10. Lawrence J , Alcock D , McGrath P , Kay J , MacMurray SB , Dulberg C. The development of a tool to assess neonatal pain . Neonatal Netw . 1993 Sep; 12 ( 6 ): 59 ± 66 . PMID: 8413140
11. LaMontagne LL , Johnson BD , Hepworth JT . Children's ratings of postoperative pain compared to ratings by nurses and physicians . Issues Compr Pediatr Nurs . 2018 ; 14 ( 4 ): 241 ± 7 .
12. Guilford JP . Fundamental statistics in psychology and education . New York: McGraw Hill.; 1956 . 244 p.
13. Voepel-Lewis T , Zanotti J , Dammeyer JA , Merkel S. Reliability and validity of the face, legs, activity, cry, consolability behavioral tool in assessing acute pain in critically ill patients . Am J Crit Care . 2010 ; 19 ( 1 ): 55 ± 61 . https://doi.org/10.4037/ajcc2010624 PMID: 20045849
14. Hulley SB , Cummings SR , Browner WS , Grady D N T. Designing clinical research: an epidemiologic approach . 4th ed. Lippincott Williams & Wilkins; 2013 . 79 p.
Warrens MJ . Cohen's linearly weighted kappa is a weighted average . Adv Data Anal Classif. 2012 Apr 29 ; 6 ( 1 ): 67 ± 79 .
16. Koo TK , Li MY . A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research . J Chiropr Med . Elsevier B.V.; 2016 ; 15 ( 2 ): 155 ± 63 .
17. McGraw KO . Wong S. Forming inferences about some intraclass correlation coefficients . 1st ed. Psychol Methods.; 1996 . 30 ±46 p.
18. Shrout PE , Fleiss JL . Intraclass correlations: Uses in assessing rater reliability . Psychol Bull . 1979 ; 86 ( 2 ): 420 ± 8 . PMID: 18839484
19. Brislin R.W. Back-translation for cross-cultural research . J Cross Cult Psychol . 1970 ; 1 : 185 ± 216 .
Bringuier S , Picot MC , Dadure C , Rochette A , Raux O , Boulhais M , et al. A prospective comparison of post-surgical behavioral pain scales in preschoolers highlighting the risk of false evaluations . Pain. International Association for the Study of Pain; 2009 ; 145 ( 1 ±2): 60 ± 8 .
RAMELET A-S , REES NW , MCDONALD S , BULSARA MK , HUIJER ABU-SAAD H . Clinical validation of the Multidimensional Assessment of Pain Scale . Pediatr Anesth . 2007 ; 17 ( 12 ): 1156 ± 65 .
Gomez RJ , Barrowman N , Elia S , Manias E , Royle J , Harrison D. Establishing intra- and inter-rater agreement of the face, legs, activity, cry, consolability scale for evaluating pain in toddlers during immunization . Pain Res Manag . 2013 ; 18 ( 6 ): 124 ± 8 .
Suraseranivongse S , Santawat U , Kraiprasit K , Petcharatana S , Prakkamodom S , Muntraporn N. Cross-validation of a composite pain scale for preschool children within 24 hours of surgery . 2001 ; 87 ( 3 ): 400 ± 5 .
Rhee H , Belyea M , Mammen J . Visual analogue scale VAS) as a monitoring tool for daily changes in asthma symptoms in adolescents: a prospective study . Allergy, Asthma Clin Immunol. BioMed Central ; 2017 ; 1±8 .