Application of data mining techniques for the identification of healthy and pathological aging factors
Progress and achieved results
ImageH deals with the design, development and exploitation of automatic learning techniques with the purpose of building a statistical inference system fed by a wide and heterogeneous database collected in the geographical area of Iberia (MADRID+90 and BRAGA+90) and including, among others, demographic, lifestyle, clinical, cognitive and neuroimaging data.
The objective is to study the combination of predictors of healthy aging and the achievement of a computational method to help predict those individuals at risk of developing dementia. Different models are developed based on both classical parametric methods and automatic learning algorithms.
The experience and data accumulated by the CIEN Foundation in the Vallecas Project through the longitudinal follow-up of a cohort of volunteers over 70 years of age allows this project to be approached in an ambitious way. Thus, it opens the possibility of contrasting different hypotheses related to the identification of protective factors against aging and early detection markers of Alzheimer's disease. At present, the project is towards the middle of the seventh view and starting the ninth. ImageH will benefit from the dataset and the methodology developed for the Vallecas Project, allowing an optimal use of resources as well as the delineation of realistic objectives based on the experience gained during the Vallecas Project. In essence, both the objectives and the analysis methodology proposed in the Vallecas Project constitute a solid starting point for ImageH.
At ImageH we focus on the study of a sector of the population that is so far poorly characterized in the scientific literature. There is a need for longitudinal studies that are capable of revealing the functioning of the brain throughout the aging process. The target population of PILEP+90 is of particular interest here. ImageH provides an opportunity to study connectivity patterns in the brain of nonagenarians, as well as network properties extracted from the covariance matrix. One of the theoretical supports on which we rely on ImageH is the conceptualization of the brain as an adaptive system in a changing environment, the modeling of the capacity of this organ to use available resources in response to metabolic and informational demands can be very useful when understanding the neurological substrate of a long and healthy life.
In collaboration with the Statistical Office of the Madrid City Council, stratified random sampling is being carried out by census units of a total of 692 persons over 90 years of age registered in the municipality of Madrid. At the time of writing this report, our project is in the phase of collecting forms, a posteriori, will be selected around fifty individuals to undergo a detailed clinical examination which includes structural magnetic resonance, perfusion and functional (T1, 3D ASL and FMRI).
In addition, the hiring of a postdoctoral researcher associated with the project has been successfully completed. This researcher is being trained in acquisition techniques and modeling of magnetic resonance imaging.
2.1. Objectives, milestones and degree of compliance
The activities carried out in the context of ImageH and associated with each specific objective and each specific milestone are described below. Table 1 presents the reminder of the degree of compliance with the milestones defined in the project.
Table 1. Objectives, milestones and degree of compliance 15/10/2019
2.2. Activities carried out
Table 2 contains the description of the activities programmed in the project along with their current situation at the time of writing this report.
2.3. Problems and changes in the work plan
As an integrated sub-project within PILEP+90, the progress of ImageH depends on progress in MADRID+90 and BRAGA+90. Thus, the delays occurred during the execution of the work plan initially planned in MADRID+90 have conditioned the start of activities in ImageH. These delays are detailed in the report of the MADRID+90 Project so it is not necessary to reiterate them here. It should be noted that neuroimaging studies are expected to start within the next two months. In addition, we have identified around twenty volunteers of the Vallecas Project who are already in their nineties or close to being so, and we have already begun to study the cerebral volumetric analysis of this group of subjects (A.10).
3. Preliminary results
The demographic change registered in the last decades, as well as the constant increase in life expectancy are two factors that condition the maintenance of standards of quality of life and well-being. The expected results within ImageH will have a return in the form of new knowledge that can later be applied to the rest of the population.
With respect to the factors associated with healthy aging, it is intended to obtain information not only on those factors that individually characterize long-lived individuals, but also to determine the best combination of all of them through data mining techniques. The application of such automatic learning techniques will help to discover unintuitive relationships between data.
Preliminary results can be grouped into the following points:
Selection process and hiring of a postdoctoral researcher. Training in neuroimaging acquisition and modeling techniques.
Pipeline generation integrating heterogeneous data (demographics, lifestyle, cognitive and neuroimaging). Special attention has been paid to the planning of the generation of resonance sequences adjusted to the needs and limitations of the nonagenarian population.
Volumetric segmentation of a group of 20 subjects over 85 years of age. The volumetry of subcortical structures has been estimated, as well as estimates of cortical thickness and gyrification.
Design, implementation and validation of machine learning algorithms. The ensemble type algorithms (set of N classifiers) have been shown to be superior for the prediction of transition to mild cognitive impairment trained in the Vallecas Project sample. The algorithms will be validated with the ImageH sample.
One of the biggest challenges of machine learning techniques is the explainability of the results. The usual characterization of machine learning as a black-box system is not necessarily correct, there are techniques that allow to estimate the relative importance of the variables in the prediction. We have implemented a system for determining the importance of variables using permutation techniques (SHAP values). This work can be freely consulted at http://dx.doi.org/10.1101/785519. The techniques for evaluating the importance of variables will play a fundamental role in the construction of epistemologically plausible models in ImageH.