CANDIDATA 1. Sandra Benítez PeñaInstituto de Matemáticas de la Universidad de Sevilla (IMUS) |
Departing from one-model-fits-all: A clustered approach to Data Envelopment Analysis
Autores: Sandra Benítez Peña, Peter Bogetoft and Dolores Romero MoralesIn this paper, we tackle the feature selection as well as the peer selection problems in Data Envelopment Analysis jointly. The goal is to cluster the Decision Making Units and develop a more targeted model for each of the clusters to maximize the average efficiency. This is formulated as a Mixed Integer Linear Programming problem and a collection of constructive heuristics is developed. The clustered approach is illustrated on a real-world dataset from the benchmarking of electricity Distribution System Operators, as well as two simulated datasets.
Keywords: Data Envelopment Analysis; Feature Selection; Clustering; Mixed Integer Linear Programming
CANDIDATA 2. Elena María Castilla GonzálezDepartamento de Estadística e Investigación Operativa. Universidad Complutense de Madrid |
Inference for one-shot device test data under log-normal lifetimes
Autores: N. Balakrishnan and E. CastillaOne-shot devices result in an extreme case of interval censoring, wherein one can only know whether the failure time is either before or after the test time. The study of one-shot device testing has been developed considerably recently, both in terms of estimation and optimal design under different lifetime distributions. However, one-shot device testing analysis under lognormal lifetime distribution has not been studied yet. While the hazard function for exponential distribution is always a constant, and that of Weibull and gamma are either increasing or decreasing, the lognormal distribution has increasing - decreasing behaviour of hazard which is encountered often in practice as units usually experience early failure and then stabilize over time in terms of performance. In this paper, we develop the EM algorithm for the likelihood estimation based on one-shot device test data under lognormal distribution and also the design of optimal CSALTs under this set up with budget constraints. A simulation study carried out to asses the performance of the methods of inference developed here and some real-life data are analyzed for illustrative purpose.
Keywords: accelerated life-test; best test plan; EM algorithm; lognormal distribution; one-shot device; reliability; optimal design
CANDIDATO 3. Antonio Elías FernándezDepartamento de Matemáticas Aplicada, Grupo OASYS. Universidad de Málaga |
Integrated Depths for Partially Observed Functional Data
Autores: Elías-Fernández, Raúl Jiménez, Anna M. Paganoni and Laura M. SangalliPartially observed functional data are frequently encountered in applications, and are the object of an increasing interest by the literature. We here address the problem of measuring the centrality of a datum in a partially observed functional sample. We propose an integrated functional depth for partially observed functional data, which has good theoretical properties. The proposed depth corresponds, in mean, to a depth measure for fully observed functional data. We moreover prove the consistency of the empirical version of the proposed depth, and demonstrate by simulations its very good performances on finite samples. Our proposal enables the use of benchmark methods based on depths, originally introduced for fully observed data, in the case of partially observed functional data. This includes the functional boxplot, the ouliergram and the depth vs depth classifiers. We illustrate our proposal on two case studies, the first concerning a problem of outlier detection in German electricity supply functions, the second regarding a classification problem with data obtained from medical imaging.
Keywords: Incomplete functional data, functional depth, robustness, functional outliers, functional boxplot, classification of partially observed functional data
CANDIDATO 4. Ricardo GázquezDepartamento de Métodos Cuantitativos para la Economía y la Empresa. Instituto de Matemáticas. Universidad de Granada |
Continuous maximal covering location problems with interconnected facilities
Autores: Víctor Blanco, Ricardo GázquezIn this paper we analyze a continuous version of the maximal covering location problem, in which the facilities are required to be linked by means of a given graph structure (provided that two facilities are allowed to be linked if a given distance is not exceed). We propose a mathematical programming framework for the problem and different resolution strategies. First, we provide a Mixed Integer Non Linear Programming formulation for the problem and derive some geometrical properties that allow us to reformulate it as an equivalent pure integer linear programming problem. We propose two branch-&-cut approaches by relaxing some sets of constraints of the former formulation. We also develop a math-heuristic algorithm for the problem capable to solve instances of larger sizes. We report the results of an extensive battery of computational experiments comparing the performance of the different approaches.
Keywords: Maximal Covering Location, Continuous Location, Mixed Integer Non LinearProgramming, Integer Linear Programming, Branch-&-Cut approaches
CANDIDATA 5. Inmaculada Gutiérrez García-PardoFacultad de Estadística. Instituto de Evaluación Sanitaria. Universidad Complutense de Madrid |
Community detection problem based on fuzzy measures
Autores: Inmaculada Gutiérrez, Daniel Gómez, Javier Castro, Rosa EspínolaTwo branches of the Science are combined in this work: Graph Theory and Fuzzy Sets Theory. Both are very popular in the hot Data Analysis field. Based on classical tools such as the Shapley value or the interaction index, the weighted graph associated with a fuzzy measure is defined as a way to simply represent and understand these complex functions. Then, it is characterized a complex model which allows the representation of real situations which can not be modeled by classical tools: the extended fuzzy graph. On its basis, a new perception of the community detection problem is approached to find communities in a graph with additional information modeled by a fuzzy measure. It is proposed a methodology to solve this problem, which is tested with the consideration of several benchmark models in combination with the calculation of the NMI. To conclude this work, it is presented a case study related to the COVID-19 crisis.
Keywords: Community detection problem, Fuzzy measures, Graph Theory, Extended fuzzy graph, Weighted graph associated with a fuzzy measure, Shapley value, Interaction index
CANDIDATA 6. María Asunción Jiménez CorderoGrupo OASYS. Universidad de Málaga |
A novel embedded min-max approach for feature selection in nonlinear Support Vector Machine classification
Autores: Asunción Jiménez-Cordero, Juan Miguel Morales, Salvador PinedaIn recent years, feature selection has become a challenging problem in several machine learning fields, such as classification problems. Support Vector Machine (SVM) is a well-known technique applied in classification tasks. Various methodologies have been proposed in the literature to select the most relevant features in SVM. Unfortunately, all of them either deal with the feature selection problem in the linear classification setting or propose ad-hoc approaches that are difficult to implement in practice. In contrast, we propose an embedded feature selection method based on a min-max optimization problem, where a trade-off between model complexity and classification accuracy is sought. By leveraging duality theory, we equivalently reformulate the min-max problem and solve it without further ado using off-the-shelf software for nonlinear optimization. The efficiency and usefulness of our approach are tested on several benchmark data sets in terms of accuracy, number of selected features and interpretability.
Keywords: Machine learning Min-max optimization Duality theory Feature selection Nonlinear Support Vector Machine classification
CANDIDATO 7. Roi NaveiroInstituto de Ciencias Matemáticas (ICMAT-CSIC) |
Protecting Classifiers From Attacks
Autores: Víctor Gallego, Roi Naveiro, Alberto Redondo, David Ríos Insua and Fabrizio RuggeriIn multiple domains such as malware detection, automated driving systems, or fraud detection, classification algorithms are susceptible of being attacked by malicious agents who are able to perturb the value of the covariates of instances to attain certain goals. Such problems pertain to the field of adversarial machine learning and have been dealt with mostly through game-theoretic ideas with strong underlying common knowledge assumptions. These are not realistic in numerous application domains. We present an alternative statistical framework that accounts for the lack of knowledge about the attacker’s behavior using adversarial risk analysis. A key ingredient is the ability to sample from the distribution of originating instances given the possibly attacked observed one. We propose a sampling procedure based on approximate Bayesian computation usable during operations; within it, we simulate the attacker’s problem taking into account our uncertainty about his elements. Large scale problems require an alternative, scalable approach that affects the training stage instead. Globally, we are able to robustify statistical classification algorithms against malicious attacks.
Keywords: Classification, Bayesian Methods, Adversarial Machine Learning, Adversarial Risk Analysis, Deep Models.
CANDIDATA 8. Consuelo Parreño TorresDepartamento de Estadística e Investigación Operativa. Universidad de Valencia |
Minimizing crane times in pre-marshalling problems
Autores: Consuelo Parreño-Torresa, Ramon Alvarez-Valdesa, Rubén Ruiz, Kevin TierneyThe pre-marshalling problem has been extensively studied in recent years with the aim of minimizing the number of movements needed to rearrange a bay of containers. Time is a more realistic objective for measuring process efficiency, and we show that it does not correlate with the number of movements. As a result, we study the problem of minimizing crane times and develop two exact approaches to solve it: an integer linear model, and a branch and bound algorithm, with new upper and lower bounds, dominance criteria, and a heuristic procedure, to provide optimal solutions for problems of practical size.
Keywords: Logistics, Container pre-marshalling, Crane time, Maritime transport, Terminal operations