Study design and population
PURE is a large-scale international study on the incidence, mortality and risk factors associated with non-communicable diseases among individuals from urban and rural communities in 21 countries, including Malaysia. Coordinated by the Population Health Research Institute (PHRI) in Hamilton, Ontario, Canada, data collection for the study began in 2007, and follow-up is projected to continue until 2030. In Malaysia, data collection is done collaboratively by the Universiti Kebangsaan Malaysia (UKM) and the Universiti Teknologi MARA (UiTM). This study has enrolled 15,792 Malaysian individuals aged between 35 and 70 years old at the baseline phase. The design of the PURE study has been described in previous studies [14,15,16,17]. Assessment for all the component of this paper was carried out as part of PURE study in Malaysia.
Participants were conveniently recruited from selected urban and rural areas throughout peninsular and east Malaysia. After the field researcher acquired permission from the community leaders, health screening and promotion booths were set up in the communities’ assembly halls. With the help of community leaders, residents were informed and invited to visit the booths. Interested and eligible participants were briefed about the study. Once written informed consent were obtained, medical histories were taken, basic physical examinations were conducted and home visit were set. During the home visits, other individuals living in the same household were asked to join the study. Only the household members intend to continue living in their current home for a further 4 years were selected to join this study to ensure the feasibility of long-term follow-up. All participants provided written informed consent after they understood that their participation was entirely voluntary.
To ensure standardised methods of data collection, research assistants were trained with comprehensive operation manuals, videos and workshops. Data were transferred electronically to the project office and the coordinating centre at PHRI for quality control. The protocol was approved by the Hamilton Health Sciences Research Ethics Board (PHRI), the Research and Ethics Committee (UKM Medical Centre) and the Research Ethics Committee (UiTM) (Project code: PHUM-2012–01).
Measurements
Participants were classified as part of the CVD group if they self-reported of having been diagnosed by certified medical practitioner with coronary heart disease (CHD), heart failure, stroke hypertension or recorded high blood pressure (SBP/DBP > 140/90) [9]. Two recordings of blood pressure after 5 min of rest in a sitting position with the use of an automatic Omron blood pressure monitor (model HEM-7111).
Long term dietary intake of participants was measure using validated food frequency questionnaire [9]. Participants were asked “during the past year, on average, how often have you consumed the following foods or drinks” and the list of food items was given. The frequencies of consumption varied from never to more than 6 times /day. Standard serving sizes (e.g. an egg) were assigned to each food item. To compute the daily food and nutrient intakes, the reported frequency of consumption for each food item was converted to daily intake and then was multiplied by the portion size. Legumes included long beans, winged beans, peas and soybean products (tofu). Legume intake was reported as servings per day (1 serving = 1 cup = 72 g) [18]. Participants were grouped based on their intake into < 3 and ≥ 3 servings of legumes per day.
Information on demographic characteristics was obtained from the validated Adult’s PURE questionnaire [15, 16, 19]. Demographic characteristics included age (rounded to the nearest year), sex, race (Malay or non-Malay), marital status (single, married or divorced), education level (none, primary, secondary or tertiary) and employment status (yes or no). The residency area (urban or rural) of the participants was defined based on local government gazetted area. Urban areas were defined as areas occupied by more than 150 residents per square kilometre. Height and weight were measured using a portable stature meter and the TANITA (BC-558 Ironman®) segmental body composition analyser. Height was measured to the nearest 1 cm and body weight was measured to the nearest 100 g, when participants wore no shoes and only light clothing. Body mass index (BMI) was derived by dividing weight by height squared, and individuals were categorized as obese (≥ 30 kg/m2) or non-obese (< 30 kg/m2).
Statistical analysis
The data were analysed using the SPSS version 26. The chi-square test was used to assess differences among the CVD group according to the following variables: legume intake, BMI, age, gender, race, marital status, education level, employment status and residency. Then, a simple logistic regression was used to determine the association between individual factors and CVD status. Biologically plausible characteristics with a p-value < 0.3 at simple logistic regression were included in the multiple logistic regression model, for which the prevalence of rejecting the null hypothesis was set at 0.05. Confounding factors of BMI, age, gender, marital status, education level, employment status and residency were included in the final logistic regression model. The results were reported as frequencies, percentages, prevalence odds ratios (POR) and 95% confidence intervals (95% CI).
Moderation analysis was employed to examine the effects of legume intake (moderator) on the relationship between BMI and CVD prevalence [20]. The moderation analysis replicated the Dawson method using a two-way interaction equation as shown below [21]. Only curvilinear interactions are shown in the results as these supersede the linear interaction analysis.
$$Y={b}_{0}+{b}_{1}X+{b}_{2}{X}^{2}+{b}_{3}Z+{b}_{4}XZ+{b}_{5}{X}^{2}Z+\varepsilon$$
(Two-way interaction equation)
where:
Y, dependent variable = CVD.
X, independent variable = BMI.
Z, moderator = legume intake.
In this analysis, age, residency (urban/rural), education level and employment status were included as controlled variables. These variables were selected based on a p < 0.05 as determined by the previous simple logistic regression and conform to known individual and social determinants of health. CVD incidence and legume intake were measured as categorical variables, while age and BMI were measured as continuous variables. Both age and BMI were Z-standardised before moderation analysis. Then, moderation effects were visualised graphically using the online Microsoft Excel spreadsheet by Dawson [21]. According to Dawson, the moderation effect was considered significant when both of the following conditions were met: (1) the interaction between the dependent and independent variables had a p of less than 0.05 as determined by the moderation analysis and (2) the visualisation of the interaction exhibited an intercept with the moderator’s slope [21].