A categorical data analysis of impact of biogas on women of rural households – with examples from Nepal

Received Nov 18, 2018 Revised Jan 20, 2019 Accepted Jan 31, 2019 This paper is based on primary data collected from 400 households of biogas consumers. A detailed structured questionnaire was developed and pretested. Here response to each question was given as a multiple choice option resulting in more than 450 categorical data. These variables studied various aspects of households using biogas. The results focused on women of these households. Interrelationships between several other variables including the role of a woman in various decisions related to the plant were analyzed. Logistic regression of decision making of woman on several variables including asset index of the family was done. This asset index was calculated from the principle component analysis of socio-economic variables. The family dynamics in the choice of biogas as renewable energy source was quantified using odds ratio and regression coefficients. The interdependence between variables was tested using Chi square test of Independence of Attributes. A novel data based approach of generation and analysis of categorical data is demonstrated here. The suitability of generation of categorical data in the absence of accurate measurement instruments is highlighted. This method is also suitable for countries without a strong backbone of good quality official records, and provides a good backup data for official statistics.


INTRODUCTION
Energy and women are very closely linked to one another. Finding fuel or energy for cooking is a woman's responsibility. Among economically weaker sections of countries like Nepal, energy is like water as it plays a key role in cooking. Here, women are supposed to find fuel on daily basis. As firewood is primarily used for cooking in rural areas of Nepal, a lot of time and energy is spent in the collection of firewood.
Agriculture based economy calls for labor intensive work. Women's contribution as a labor force is mainly focused on farm and domestic activities. Subsistence agriculture based Nepalese economy has an involvement of about 50% females, 44% males and 6% children between 10 -14 years [1]. In all areas of Nepal the work burden for women is higher than for men with the highest work burden found amongst women in the mountains. A significant proportion of time of women here is spent in collecting firewood and fodder, indicating the depleted resource base in the mountain areas. According to census 2011, 19.72% of household reported the ownership of land or house or both in the name of female family members [2]. In urban areas, 26.77% of households show female ownership of fixed assets while the percentage stands at 18.02 in rural areas. The remainder of this paper is arranged in following manner. Section 2 is called Research Methods. Here the significance of Principal Components Analysis and Logistic regression are explained. Theoretical background to these techniques is provided. This is followed by Result and Analysis Section 3. This s ection discusses about the data, where dataset that motivates this study is described. Here, steps followed during the design as well as implementation of the survey are elaborated. The design and construction of Biogas consumers profile database is explained. It is followed by Section 4 called conclusion.

RESEARCH METHOD
Multivariate analysis is one of the techniques that analyses the interdependence between multiple variables governing a phenomenon. As mentioned by Hair et al. [22], multivariate analysis methods are not only related to analytic aspect of research but also to the design and approach to data collection for decision making and problem solving. In the case of multivariate data where several variables might be interrelated and the true information of the data might be disturbed due to this multi-collinearity.

Principal component analysis
It is one of the methods of Multivariate Analysis. Principal components analysis retains the variability of the original data by transforming the correlated variables into fewer orthogonal variables. In this statistical approach, the interrelationship among a large number of variables can be analyzed. The techniques focuses on condensing information contained in a large number of original variables into a smaller set of variables (factors) with minimum loss of information. This data summarization helps identify the underlying dimension or factor, estimates of factors and contribution of each variable to the factors (termed loadings). Un rotated factor matrix comprising of factor loadings is used when the main objective of research is in best linear combination of variables where the a particular combination of original variables account for more of variance in the data as a whole than any other linear combination. The theoretical background of Principal Components Analysis is the following.
Suppose we have a set of N variables, a1j * to aNj * , representing the ownership of N assets by each household j. Further, let us standardize each variable by its mean and standard deviation; for example, , where * is the mean of * across households and * is its standard deviation. These selected variables are expressed as linear combination of a set of underlying components for each household j: * = * + * + ⋯ + * * = * + * + ⋯ + * Where, = 1, … .
A's are the components and v's are the coefficient on each component for each variable. The "scoring factors" from the model are recovered by inverting the system implied (1), and yield a set of estimates for each of the N principal components:

Multinomial logistic regression
Similarly logistic regression helps predict the probability of occurrence of a non metric dependent variable where the independent variables may be metric, non metric or both. It is another method used in multivariate analysis. It is used to predict categorical placement in or the probability of category membership on a dependent variable based on multiple independent variables. It has multiple categories' of independent and multiple categories of dependent variables. Binary logistic regression is a special case of multinomial logistic regression. It allows for only two categories of the dependent or outcome variable but multiple categories of independent variables. Multinomial logistic regression uses maximum likelihood estimation to evaluate the probability of categorical membership.
Let J denote the number of categories of Y. Here Y is a multinomial response variable. Let {π1, π2, πj} denote the response probabilities, satisfying the condition that their sum is equal to 1. Logit models for multinomial response pair each category with a baseline category [23]. When the last category (J) is the baseline, the baseline-category logits are log (πi/πj), j = 1J-1 The main difference between Binary Logistic Regression and Multinomial Logistic Regression is that for Binary logistic regression the dependent variable takes two values so, i = 1, 2 . But for Multinomial Logistic Regression i = 1, 2, I Given that the response falls in the category j or J, this is the log odds that the response is j. The baseline category logit model with predictor x is log (πj/πj) = αj + βjx, j = 1, J-1 The model has J-1 equations with separate parameters for each. The effects vary according to the category paired with the baseline. When J = 2, this model simplifies to single linear equation for log (π1/π2) = logit(π1), resulting in ordinary logistic regression for binary responses. There is an odds ratio associated with each predictor. It is denoted by Exp(B). It is more than 1 in cases where predictors increase the logit, Exp(B) is equal to 1 in cases where predictor don't have any influence on the logit and Exp(B) is less than 1 in cases where predictors decrease the logit.

RESULTS AND ANALYSIS 3.1. Data
A detailed survey of 400 households of biogas consumers was conducted. The questionnaire comprised of 59 questions with response provided as a multiple choice option. This resulted in a categorical data that could be analyzed on an ordinal scale. Due to large sample size of 400 households these ordinal data can be treated as continuous. This is by the application of Central Limit Theorem. There were 467 variables related to the response to these 59 questions. The information was stored in a consumer profile data base of biogas users [16]. There were 2272 individuals living in these 400 households. Information was collected and stored on age distribution, land holdings, livestock, and fuel wood expenses before and after the plants, their assets and decision making power. So with before and after structure of the questions, information could be obtained about the households that haven't installed biogas.

Results
Woman's role in decision making is closely related to the socioeconomic status of the household. So the questions assessing socioeconomic status held special importance here. But questions related to socioeconomic status are of sensitive nature. If such questions are asked directly, the chances of getting correct answer are very low. Thus questions indirectly assessing the income group and related to material holdings were asked. These are called proxy asset indicator and the response of the interviewee can verified by observation. The variables related to the ownership of assets like land holding, private ownership of water sources for drinking, bathing, irrigating, possession of cars, tractor, bicycle, radio, location of toilet, amount of loan incurred are considered as proxy indicators of the economic prosperity. Further, biogas consumers in Nepal and also in this study are primarily farmers located in rural settings of Nepal. In such settings most of the economic transactions are carried out outside the market. Thus, direct questions on income earned fails to reflect the socio-economic scenario of the respondents. Structured questions on proxy asset indicators were designed in the following manner. The four groups of the questions comprised namely of the first group with 8 questions relating to asset ownership which comprised of land, house, electronic equipments such as computer, television, radio, mobiles, telephones and refrigerator, means of transportation. There were 10 indicator variables in this category. Characteristics of house dwellings and toilet were the second group and comprised of 6 questions and 10 indicator variables. Here questions on materials used in the construction of house were asked. Open latrine, latrine far away from the house or close to the house etc were queries related to the type of latrine. Water source and needs was the third group with 6 questions and 20 indicator variables. Detailed question on the use of water such as for bathing or cooking etc were asked. Different sources of water such as private well, open well, community water supply etc were asked. Fourth and the last group was Then the descriptive statistics of the variables including mean, standard deviation and correlation were calculated. This gave an idea on the average value of the variable per household and its spread across the household. Since most of the asset variables data are 0 or 1 as they were in response to yes/no questions. Here absence was labeled as 0 and presence was labeled as 1. Thus the binary data can be labeled ordinal scale. Here the mean of most common asset variable is close to 1 and the standard deviation is very low for example in response to the yes/no question of Do you have a latrine?, the mean is 0.97 and the standard deviation is 0.17. Whereas in response to the yes/no question Do you have a well built latrine?, the mean is 0.93 and standard deviation is 0.26. This suggests that although latrine is almost universal among the consumers well built latrine is somewhat less and not so common. The data were mainly ordinal and ratio in nature. The large sample size of 400 households supports the assumption of normality by the application of Central Limit Theorem.
The decision making power of a woman in the household is dependent on its socioeconomic standards which is quantified using principle components analysis [22]. The interdependence of decision making of woman on the socio economic background, educational qualification, involvement of work of other family members is predicted using logistic regression. Here the results of principle components analysis calculated in terms of asset index are incorporated as independent variable. Out of 467 variables, 47 variables were identified as proxy asset indicators. These variables are classified under different headings mainly classified in six categories namely ownership of fixed assets, type of house, type of toilet, source of water, ownership of land, loan. Principle components analysis of these 47 asset variables extracted 10 components. They explained 60% of the total variance [22]. Using linear combination of three main principal components the 400 households are divided into three groups. This is on the basis of asset index. This is a linear combination of factor loadings and normalized asset ownership variables. Here rich comprised of top 20% of society, middle income group of middle 40% and the lowest 40% are economically most deprived. The robustness of this classification is tested by the data on those assets which are conventionally owned by people who are more economically well off. Thus the dimensionality of the data comprising of 47 interrelated asset variables is reduced to 10 orthogonal variables. It is also seen that ownership of own water source is the main factor differentiating economically well off households with poor households. The overview of various statistical methodologies used here is explained in Figure 1.  Figure 3 that about 81.8% of female were actively involved in the construction and operation of the biogas plant. As reflected in Figure 4, payment for the biogas was made by the husband in 341 households that is 85.3% followed by other male members 41(10.3%). The number of households with wives making payment was found to be only in 16 household that is 4%. It showed, generally male members are the ones making financial decisions on a source of renewable energy. The involvement of the women was found to be less than male by almost 21%. Most of the households had their biogas plants registered under husband's name, which had the frequency of 254 household (63.5%), as reflected in Figure 5. This was three times more than that of the number household that has it registered in wives names(22.8%) followed by other male members 43 (10.8%). In more than 50% of household, biogas was registered under the name of the husband. Plants registered under male members exceed that of female by more than two times. Equal participation by both genders in the decision making is shown in Figure 6.
The gender differential with respect to age and level of education among 400 households of biogas consumers are shown in Table 1. The difference in the educational background between male and female is dependent on the socio-economic status. It can be seen that male members are more actively involved in education. This is in comparison to their female counterparts. The socio-economic standard of biogas consumers with respect to ownership of material assets in comparison to the census 2011 data of rural households is shown in Table 2. A comparison between the material assets owned by 400 households of biogas consumers is compared in results obtained from census 2011. The ownership of refrigerator is only 3.3% of the households in rural Nepal [1], whereas it is 4% of households among biogas consumers. Televisions are owned by 81.25% of the households of biogas consumers inhabiting in rural areas which is much higher than 30.7% of household inhabiting in rural Nepal. The data of asset ownership of biogas consumers is based on the survey of biogas consumers done as part of this research. The biogas consumers are economically well off in comparison to the households inhabiting in rural Nepal.
Gender wise comparison of among the members of 400 households with respect to education is given in Table 3. As seen from Table 3, the gender wise difference for example in the age group 6 -16 in the level of education is not significant. The proportion of male members in these 400 households going to school is 0.8034. The proportion of female members in 400 households is 0.8239. It is seen from Table 3 that at younger ages the difference with respect to education is not significant. But at older ages this difference is significant. For example, the gender wise differentials among students passing class 10 board examinations in age group 26-50 years and 50-75 years are highly significant.
The dependence between socio-economic status and role of wife in a family in deciding for a biogas plant is given in Table 4. It is seen that in 358 households, husband or wife played a key role deciding for a biogas plant. As seen from Table 5; 55, 81 and 38 wives in poor, middle and rich income groups played a key role in deciding about the plant. The dependence between socio-economic status and role of wife in keeping profits gained by using biogas plant is given in Table 5. It is seen that in 340 households, husband or wife played a key role deciding for a biogas plant. As seen from Table 5; 21, 36 and 7 wives in poor, middle and rich income groups play a key role in keeping profits from the plant. Similarly, Table 4 and Table 5 are related to woman empowerment. The role of woman in that particular consumer household in deciding for the construction of plant and her involvement in the decision making and other activities of the household, is tested using data given in Table 4 and Table 5. The dependence of decision making power of a woman in household on the socio-economic classification is then tested with the help of chi-square test of independence of attributes. There is dependence between the income groups and decision for the construction of biogas plant by wife with p = 0.02. There is dependence between the income groups and profit made from the construction of biogas plant kept by wife with p value = 0.02.
The decision making power of a woman regarding a source of renewable energy is mathematically quantified. Logistic regression is used in this assessment and quantification of dynamics of change is done by odds ratio and regression coefficients. Answers to several questions were given as independent variables. They were namely who initiated the construction of the plant, who was involved in construction of plant, who made payment for the construction, under whose name is it registered, and who decided for the plant. Options such as husband, wife, son, daughter, other male member, other female member were provided. This classification of income group denoted by pcases is output of principle components analysis [4]. It is used as independent variable is used in this logistic regression. The results of this binary logistic regression are provided in Table 6. The probability of a wife deciding in favor of biogas is predicted on the basis of income group. Decision making power of husband or wife is regressed on economic status using logistic regression. So the economic status is an ordinal, multinomial independent variable. Decision of husband or wife is a yes/no binary dependent variable. Probability of a wife deciding for a plant was regressed on all the variables. Backward elimination of the independent variables was done. Here income classification represented by pcases is an ordinal data with 3, 2 and 1 representing the rich, middle income and poor respectively. A variable named total animal generates ratio data. Distance of more than 500m covered is denoted by the variable name mnore500. It is for the collection of firewood after the construction of plant. Redfuelm1 is a binary yes/no data for reduced fuel expenses by male members. Similarly redfuelf1 is response by female members. Redfuelmf1 is a binary yes/no data for male and female. District is the nominal data for the districts where 3 stands for Bhaktapur district, 2 stands for Simara and 1 stands for Sarlahi. All these variables represent post plant construction phase. As seen from Table 6 the odds of wife deciding for the construction of biogas plant decreases by half from rich income group to poor income group (p value = 0.014) whereas the odds of wife deciding for the construction of biogas is same for the middle income group. Here the last category is the reference category. The decision for a plant by a wife or a husband is dependent on the socio-economic status with p value = 0.005. This implies that middle income group and high income group women are equally likely to decide for the construction of biogas plant in comparison to low income group. In lower income group the chances that a woman decides for a plant decreased by 50%. Here among 358 selected households 139 belong to low income group, 147 to middle income group and 72 to rich income group. Similarly for those households which still cover more than 500m the odd in favour of a wife deciding for a plant is 2.7 times more than that of the husband with p value 0.001. The chances of husband deciding for a plant are higher when the total number of cattle (total animals) increases (p value 0.007). Thus households with cattle have higher probability of husband having a greater say in the decision making process.Similarly the odds of a wife deciding for a plant are six times and three times in Simara and Sarlahi respectively in comparison to Bhaktapur. If a male responds in affirmative to reduced expenses the odds of a wife deciding for the plant increases by 2.6 times. The accuracy of the model is 63.1%.

CONCLUSION
This paper studies the cause and effect of renewable energy in general and biogas in particular in uplifting the status of women. The intangible effect of use of energy in uplifting the lives of women in rural agrarian economy is measured with the help of several statistical methods. The entire work done can be summarized in the following manner. First survey of 400 households of biogas consumers was conducted; it was followed by database construction. Then the trends and patterns in this multivariate data were explored and analyzed by exploratory data analysis. This was followed by principal components analysis of 47 assets variables. Then in this paper, this result of asset variables were used in quantifying the decision making power of women through logistic regression. The interrelationship between socio-economic status and the empowerment of a woman in terms of decision making and profit keeping is measured. First the dependence between these attributes is explored with the help of chi-square test of independence of attributes. Then probability of a wife deciding for a plant is regressed on several variables including socio-economic status measured by asset index, number of animal. The impact is quantified with odds ratio.
The use of biogas saves the time of women by mean value of 65 minutes daily. The modal value is 1-3 hours per day. This time was spent on collection of firewood and in the kitchen. This time saved is used by the women in teaching children at home and in other income generating activities. Firewood based kitchen have adverse impact on health. A switch over to smoke free kitchen by using biogas is good for the health of women. The women reaped financial benefits by engaging in income generating activities like livestock rearing and working in the farm. There were fewer cases of diseases related to respiratory tract due to the use of smoke free kitchen. The use of biogas uplifts the status of women by giving them time for income generation.