A general class of improved population variance estimators under non-sampling errors using calibrated weights in stratified sampling
www.nature.com/scientificreports
OPEN
A general class of improved
population variance estimators
under non‑sampling errors using
calibrated weights in stratified
sampling
M. K. Pandey 1*, G. N. Singh 1, Tolga Zaman 2, Aned Al Mutairi 3 & Manahil SidAhmed Mustafa 4
This paper proposes a new calibration estimator for population variance within a stratified two-phase
sampling design. It takes into account random non-response and measurement errors, specifically
applying this method to estimate the variance in Gas turbine exhaust pressure data. The study
integrates additional information from two highly positively correlated auxiliary variables to develop
a general class of estimators tailored for the stratified two-phase sampling scheme. The properties of
these estimators, in terms of their biases and mean square errors, have been thoroughly examined
and extensively analyzed through numerical and simulation studies. Furthermore, the calibrated
weights of the strata are derived. The proposed estimators outperform the natural estimator of
population variance. Finally, suitable recommendations have been made for survey statisticians
intending to apply these findings to real-life problems.
In many practical scenarios, estimating population variance is a crucial task with wide-ranging applications,
spanning various domains including finance, healthcare, and weather forecasting. Actuaries and insurance analysts heavily rely on population variance estimation to make well-informed decisions. In the realm of weather
forecasting, grasping the variability in temperature, humidity, and other meteorological factors at diverse locations is fundamental for precise predictions. To bolster the precision of estimators in sample surveys, auxiliary
variables play a pivotal role. For instance, when estimating crop yields, incorporating data on the area covered
by crops can significantly enhance prediction accuracy. Numerous studies, such as1 did work on the use of auxiliary information in estimating the finite population v ariance2, developed a class of estimators using auxiliary
information for estimating finite population variance, and3 introduced a new procedure for variance estimation
in simple random sampling using auxiliary i nformation4. further improved the estimation of finite population
variance using dual supplementary information under stratified random sampling, while6 explored the more
efficient use of auxiliary information in population variance estimation, presenting a new family of estimators.
Moreover, recent research has delved into variance estimation using auxiliary information, with innovative
approaches like memory type ratio and product e stimators7,8 gaining attention. These endeavors aim to enhance
the accuracy and reliability of population variance estimation in diverse sampling designs.
However, sample surveys often encounter practical challenges that result in non-response or missing data.
These challenges encompass non-contact, refusal to cooperate, and various other reasons. When a substantial
amount of data goes missing, it casts doubt on the reliability of ensuing statistical results. Diverse types of missing
data patterns, such as missing at random (MAR) and missing completely at random (MCAR), can be observed.
Particularly noteworthy is the MAR pattern, characterized by the probability of missingness being independent
of the unobserved data’s value.
In the presence of random non-response or measurement errors, various researchers have addressed the
need for robust estimators9. introduced a class of estimators using auxiliary information for estimating finite
1
Department of Mathematics and Computing, Indian Institute of Technology (Indian School of Mines),
Dhanbad 826004, India. 2Faculty of Health Sciences, Gumushane University, Gumushane, Turkey. 3Department
of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428,
Riyadh 11671, Saudi Arabia. 4Department of Statistics, Faculty of Science, University of Tabuk, Tabuk, Saudi
Arabia. *email:
Scientific Reports |
(2024) 14:2948
| https://doi.org/10.1038/s41598-023-47234-1
1
Vol.:(0123456789)
www.nature.com/scientificreports/
population variance in the presence of measurement errors, w
hile10 developed classes of factor-type estimators
in the presence of measurement e rror11. focused on the estimation of the population coefficient of variation in
the presence of measurement errors, a nd12 worked on estimating the population mean in the presence of measurement error and non-response under stratified random s ampling13. contributed to the estimation of the finite
population distribution function with the dual use of auxiliary information under non-response, a nd14 introduced
a generalized class of estimators for sensitive variables in the presence of measurement error and non-response15.
explored the estimation of finite population mean using dual auxiliary variables for non-response using simple
random sampling, while16 and Bhushan (2023) proposed classes of robust estimators to handle correlated measurement errors and new logarithmic type imputation techniques in presence of measurement errors within the
survey sampling literature. These errors may stem from flawed measuring instruments, shortcomings in survey
methodology, vague questionnaires, or imprecise measurements.
The calibration approach, pioneered by18, has garnered prominence in statistical practice. Its objective is
to devise unbiased estimation procedures with minimal dispersion, leveraging auxiliary variables. Subsequent
researchers, exemplified b
y19 and20, have fine-tuned and extended calibration estimation procedures, striving to
minimize the divergence between initial and final weights while adhering to calibration equations and constraints.
Recent advances in calibration techniques, as demonstrated b
y21, have focused on a class of calibration estimators under stratified random sampling in the presence of various kinds of non-sampling e rrors22. Explored
calibration estimation for ratio estimators in stratified sampling for proportion allocation, and23 further advanced
the finite population distribution function estimation with the dual use of auxiliary information under simple
and stratified random sampling5. investigated the use of dual ancillary variables to estimate the population mean
under stratified random sampling, while24 worked on modified estimators of the finite population distribution
function based on the dual use of auxiliary information under stratified random sampling. These techniques have
streamlined the optimization of stratum weights in stratified random sampling, ultimately refining estimates,
particularly when closely related auxiliary variables are integrated.
To underscore the practical significance of this research, let’s consider real-life examples:
1. In healthcare research, when conducting patient surveys to evaluate the effectiveness of medical treatment (...truncated)