Since mapunits usually contain multiple components, each sample point within a map unit can be considered to possibly belong to one of several components, and a general probability of each component (label) is known only for all occurrences of the mapunit. Hence there is ambiguity in training set labels. Several methods were used to disambiguate training set labels for machine learning from the existing maps, including constrained sampling to reduce geographic and other errors, sequential updating of training-set labels using a k-nearest neighbor classification, and the use of a cost-sensitive classifier.
A variety of machine learning models were applied to all locations for the county, and alternately to the adjacent county. A Random Forest classifier proved to make the most useful map predictions. Within each county, model predictions proved useful for locating many individual components within map unit consociations and associations. Predictive mapping also provided information useful for assessing optimal mapunit density and composition for updates.