Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers

Nyambo, Devotha; Luhanga, Edith; Yonah, Zaipuna; Mujibi, Fidalis

COSTECH Repository Home
→
The Nelson Mandela African Institution of Science and Technology
→
Computational and Communication Science Engineering
→
Research Articles [BUSH]
→
View Item

Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers

Nyambo, Devotha; Luhanga, Edith; Yonah, Zaipuna; Mujibi, Fidalis

URI: http://hdl.handle.net/123456789/94629

Description:

Research Article published by Hindawi The Scientific World Journal

The heterogeneity of smallholder dairy production systems complicates service provision, information sharing, and dissemination of new technologies, especially those needed to maximize productivity and profitability. In order to obtain homogenous groups within which interventions can bemade, it is necessary to define clusters of farmers who undertake similar management activities. This paper explores robustness of production cluster definition using various unsupervised learning algorithms to assess the best approach to define clusters. Data were collected from 8179 smallholder dairy farms in Ethiopia and Tanzania. From a total of 500 variables, selection of the 35 variables used in defining production clusters and household membership to these clusters was determined by Principal Component Analysis and domain expert knowledge. Three clustering algorithms, K-means, fuzzy, and Self-Organizing Maps (SOM), were compared in terms of their grouping consistency and prediction accuracy. The model with the least household reallocation between clusters for training and testing data was deemed the most robust. Prediction accuracy was obtained by fitting a model with fixed effects model including production clusters on milk yield, sales, and choice of breeding method. Results indicated that, for the Ethiopian dataset, clusters derived fromthe fuzzy algorithm had the highest predictive power (77% for milk yield and 48% for milk sales), while for the Tanzania data, clusters derived from Self-Organizing Maps were the best performing.The average cluster membership reallocation was 15%, 12%, and 34% for K-means, SOM, and fuzzy, respectively, for households in Ethiopia. Based on the divergent performance of the various algorithms evaluated, it is evident that, despite similar information being available for the study populations, the uniqueness of the data fromeach country provided an over-riding influence on cluster robustness and prediction accuracy.The results obtained in this study demonstrate the difficulty of generalizing model application and use across countries and production systems, despite seemingly similar information being collected.

Show full item record

Files in this item

Files	Size	Format	View
JA_ICSE_2019.pdf	2.091Mb	application/pdf	View/Open

This item appears in the following Collection(s)

Research Articles [BUSH] [195]
This is Collection for Research Articles [BUSH]

Search COSTECH

Advanced Search

Browse

All of COSTECH
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers

Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers

Description:

Files in this item

This item appears in the following Collection(s)

Search COSTECH

Browse

All of COSTECH

This Collection

My Account