Wednesday, November 7, 2007 - 10:15 AM
300-4

Application of Machine Learning Techniques for Soil Survey Updates.

Brian Slater and Sakthi Kumaran Subburayalu. The Ohio State University, 2021 Coffey road,, 210 Kottman hall, columbus, OH 43210

Machine learning methods were used to build predictive soil-landscape models for two counties (Monroe and Noble) in southeast Ohio, USA where soil survey updates are in progress. Both counties are within the Central Allegheny Plateau, (MLRA 126), and hence have a consistent range of major environmental correlates. Training data were sampled from 10m resolution raster coverages for terrain attributes, surficial geology, climatic attributes, and historical vegetation. Information on soil mapunits and associated environmental correlates were extracted by sampling from existent county soil maps, and associated geospatial layers.

Since mapunits usually contain multiple components, each sample point within a map unit can be considered to possibly belong to one of several components, and a general probability of each component (label) is known only for all occurrences of the mapunit. Hence there is ambiguity in training set labels. Several methods were used to disambiguate training set labels for machine learning from the existing maps, including constrained sampling to reduce geographic and other errors, sequential updating of training-set labels using a k-nearest neighbor classification, and the use of a cost-sensitive classifier.

A variety of machine learning models were applied to all locations for the county, and alternately to the adjacent county. A Random Forest classifier proved to make the most useful map predictions. Within each county, model predictions proved useful for locating many individual components within map unit consociations and associations. Predictive mapping also provided information useful for assessing optimal mapunit density and composition for updates.