måndag 14 mars 2016

Archaeological density surface estimates around Ostlänken using statistical machine learning techniques

WP 3 in ARKDIS is focused on developing new quantitative methods to work with the large amounts of data that is produces in archaeology. Based on the detailed information on GIS data from archaeological excavations compiled for the Ostlänken project, Daniel Löwenborg is now finalising a collaboration project with Mike Ashcroft and Anna Nilsson from the department of Information Technology, Uppsala University. The purpose is to use the information available from the excavations to estimate existence and densities of archaeological features in the parts of the Ostlänken corridor that has not been excavated. Estimates are based on existing data of number of features per square meter excavated area, including areas excavated without any evidence of archaeological features, and characterisations of the physical landscape as predictive variables.

Example of landscape variables calculated in GIS

The possibility to include “empty areas” are interesting and that kind of information has previously not been available in a format ready to analyse on a large scale. Likewise, previously there has not been this kind of detailed data available, with information about individual features found within an excavated area. Hence, there is now the possibility to work with much more data intensive materials, which in turn calls for new methods. Using techniques for statistical machine learning, Anna is writing a program that will calculate expected probability of existence and densities for the whole area as part of her Bachelor thesis. Predictions are based on a wide range of environmental variables describing the physical characterisation of land and topography within the area. As the correlation between archaeology and landscape can be expected to be weak, it will also be essential to estimate how well the model fit and how large errors that can be expected. Statistical machine learning is so far fairly untested with this kind of archaeological models, but holds the potential to greatly improve the process of visualizing and analysing macro- and micro- level patterns.

An illustration of how the Random Forest algorithm works.

With the finalised program it will be possible to use similar data to make predictions for any area. As we are currently seeing an increase of archaeological data being made available, with more to come in the near future by the DAP project at the Swedish National Heritage Board there will be even better possibilities for this kind of analysis in the future. With more data available, it will be possible to start working with different types of archaeological features, thus making models for only settlements or burial grounds, etc, as well as using the model with data from only certain time periods, making it possible to start asking some very specific archaeological question and explore settlement dynamics and social development.