Ƶ

Associate Professor Gustavo Batista

Associate Professor Gustavo Batista

Lecturer
  • 2016. Habilitation. Computer Science. University of São Paulo at São Carlos, Brazil.
  • 2003. Doctor of Philosophy (PhD). Computer Science. University of São Paulo at São Carlos, Brazil.
  • 1997. Master in Science (MSc). Computer Science University of São Paulo at São Carlos, Brazil.
  • 1994. Bachelor (BS). Computer Science. São Paulo State University, Brazil.
Engineering
Computer Science and Engineering

I joined UNSW as an associate professor in 2018, after working for more than ten years for the University of Sao Paulo (USP). During 2010-2012, I was a visiting researcher at the University of California, Riverside (UCR) working in the prof. Eamonn Keogh's laboratory.

During my stay at UCR, I continued my work with time series analysis, particularly developing methods for classification and clustering of time-oriented data. In conjunction with Dr Keogh, I proposed the first time series distance invariant to complexity and speed-up techniques to compare massive amounts of time series data under warping.

More recently, I have worked with data streams, particularly with classification with label latency and proposed efficient unsupervised methods to detect concept drifts as well as to learn in the presence of these changes in the data distribution.

My research is motivated by applying Machine Learning in practice. My approach is to work on challenging applications that help my students and me to identify gaps in the literature or assumptions in the state-of-the-art that do not hold for our applications. This research approach often leads to contributions both in Computer Science as well as the application areas.

One instance of such an approach is the challenge of incorporating classification algorithms on embedded devices. For example, I have developed lightweight models that can run in environments with severe power restrictions such as satellites and sensors. One notorious application is the development of sensors to classify insects in flight automatically, allowing the creation of surveillance systems for disease vectors, invasive species and pests. I have also developed EmbML, a Machine Learning tool to convert sickit-learn and Weka classifiers into C++ code crafted to run into low-power microcontrollers, such as ones found in the Arduino family.

In the last years, I have actively worked in the area of Machine Learning Quantification, developing new algorithms to count events accurately. These recent developments have led to the proposal of a novel Data Mining task known as One-class Quantification as well as a family of efficient quantification algorithms.

The impact of my research can be measured by the number of recent papers citing my research articles. According to Google Scholar, my paper have more
than 9,000 citations, with more than 1,000 citations in 2020.

Phone
‭+61-2-9385 1607‬
Location
Room 510L, Building J17 School of Computer Science and Engineering University of New South Wales NSW 2052
  • Book Chapters | 2022
    da Silva TP; Parmezan ARS; Batista GEAPA, 2022, 'Geographic Context-Based Stacking Learning forElection Prediction fromSocio-economic Data', in , pp. 641 - 656,
    Book Chapters | 2021
    NADAI BLD; MOURA L; MALETZKE AG; BATISTA GEAPA; CORBI JJ, 2021, 'TECNOLOGIA NO MONITORAMENTO AMBIENTAL DE MOSQUITOS TRANSMISSORES DE DOENÇAS: QUAIS SÃO OS DESAFIOS? UMA BREVE REVISÃO', in INDICADORES BIOLÓGICOS DE QUALIDADE EM AMBIENTES AQUÁTICOS CONTINENTAIS: MÉTRICAS E RECORTES PARA ANÁLISES, RFB Editora,
    Book Chapters | 2019
    dos Reis D; Maletzke A; Cherman E; Batista G, 2019, 'One-Class Quantification', in Machine Learning and Knowledge Discovery in Databases, Springer Nature, pp. 273 - 289,
    Book Chapters | 2014
    Maletzke AG; Lee HD; Enrique G; Batista APA; Coy CSR; Fagundes JAJ; Chung WF, 2014, 'Time series classification with motifs and characteristics', in Soft Computing for Business Intelligence, Springer, Berlin, Heidelberg, pp. 125 - 138
  • Journal articles | 2024
    Donyavi Z; Serapiao ABS; Batista G, 2024, 'MC-SQ and MC-MQ: Ensembles for Multi-Class Quantification', IEEE Transactions on Knowledge and Data Engineering, 36, pp. 4007 - 4019,
    Journal articles | 2024
    de Nadai BL; Moura L; Castro GB; Silva KJS; Maletzke AG; Corbi JJ; Batista GEAPA; Machado RB, 2024, 'Can microplastic contamination affect the wing morphology and wingbeat frequency of Aedes aegypti (Diptera: Culicidae) mosquitoes?', Environmental Science and Pollution Research,
    Journal articles | 2023
    Pashamokhtari A; Batista G; Gharakheili HH, 2023, 'Efficient IoT Traffic Inference: From Multi-view Classification to Progressive Monitoring', ACM Transactions on Internet of Things, 5,
    Journal articles | 2023
    Pashamokhtari A; Okui N; Nakahara M; Kubota A; Batista G; Habibi Gharakheili H, 2023, 'Dynamic Inference from IoT Traffic Flows under Concept Drifts in Residential ISP Networks', IEEE Internet of Things Journal, 10, pp. 15761 - 15773,
    Journal articles | 2022
    Parmezan ARS; Souza VMA; Batista GEAPA, 2022, 'Time Series Prediction via Similarity Search: Exploring Invariances, Distance Measures and Ensemble Functions', IEEE Access, 10, pp. 78022 - 78043,
    Journal articles | 2022
    Parmezan ARS; Souza VMA; Seth A; Žliobaitė I; Batista GEAPA, 2022, 'Hierarchical classification of pollinating flying insects under changing environments', Ecological Informatics, 70,
    Journal articles | 2022
    Tsutsui Da Silva L; Souza VMA; Batista GEAPA, 2022, 'An Open-Source Tool for Classification Models in Resource-Constrained Hardware', IEEE Sensors Journal, 22, pp. 544 - 554,
    Journal articles | 2021
    Li J; Sharma A; Mishra D; Batista G; Seneviratne A, 2021, 'COVID-Safe Spatial Occupancy Monitoring Using OFDM-Based Features and Passive WiFi Samples', ACM Transactions on Management Information Systems, 12, pp. 1 - 24,
    Journal articles | 2021
    Parmezan ARS; Souza VMA; Žliobaitė I; Batista GEAPA, 2021, 'Changes in the wing-beat frequency of bees and wasps depending on environmental conditions: a study with optical sensors', Apidologie, 52, pp. 731 - 748,
    Journal articles | 2021
    de Nadai BL; Maletzke AG; Corbi JJ; Batista GEAPA; Reiskind MH, 2021, 'The impact of body size on Aedes [Stegomyia] aegypti wingbeat frequency: implications for mosquito identification', Medical and Veterinary Entomology, 35, pp. 617 - 624,
    Journal articles | 2020
    Reis DD; de Souto M; de Sousa E; Batista G, 2020, 'Quantifying With Only Positive Training Data', arXiv preprint arXiv:2004.10356
    Journal articles | 2020
    Souza VMA; dos Reis DM; Maletzke AG; Batista GEAPA, 2020, 'Challenges in benchmarking stream learning algorithms with real-world data', Data Mining and Knowledge Discovery, 34, pp. 1805 - 1858,
    Journal articles | 2019
    Parmezan ARS; Souza VMA; Batista GEAPA, 2019, 'Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model', Information Sciences, 484, pp. 302 - 337
    Journal articles | 2019
    Silva DF; Yeh C-CM; Zhu Y; Batista GEAPA; Keogh E, 2019, 'Fast similarity matrix profile for music analysis and exploration', IEEE Transactions on Multimedia, 21, pp. 29 - 38,
    Journal articles | 2018
    Maletzke AG; dos Reis DM; Batista GEAPA, 2018, 'Combining instance selection and self-training to improve data stream quantification', Journal of the Brazilian Computer Society, 24, pp. 12 - 12,
    Journal articles | 2018
    Silva DF; Giusti R; Keogh E; Batista GEAPA, 2018, 'Speeding up similarity search under dynamic time warping by pruning unpromising alignments', Data Mining and Knowledge Discovery, 32, pp. 988 - 1016
    Journal articles | 2017
    Souza V; Rossi RG; Batista GEAPA; Rezende SO, 2017, 'Unsupervised active learning techniques for labeling training sets: An experimental evaluation on sequential data', Intelligent Data Analysis, 21, pp. 1061 - 1095
    Journal articles | 2015
    Batista GEAPA; Delgado M; Bernardini F, 2015, 'ENIAC 2013 Special Issue', Journal of Intelligent and Robotic Systems: Theory and Applications, 80, pp. 225 - 226,
    Journal articles | 2015
    Prati RC; Batista GEAPA; Silva DF, 2015, 'Class imbalance revisited: a new experimental setup to assess the performance of treatment methods', Knowledge and Information Systems, 45, pp. 247 - 270
    Journal articles | 2015
    Silva DF; Souza VMA; Ellis DPW; Keogh EJ; Batista GEAPA, 2015, 'Exploring Low Cost Laser Sensors to Identify Flying Insect Species: Evaluation of Machine Learning and Signal Processing Methods', Journal of Intelligent and Robotic Systems: Theory and Applications, 80, pp. 313 - 330,
    Journal articles | 2015
    Silva DF; Souza VMA; Ellis DPW; Keogh EJ; Batista GEAPA, 2015, 'Exploring low cost laser sensors to identify flying insect species', Journal of Intelligent & Robotic Systems, 80, pp. 313 - 330
    Journal articles | 2014
    Batista GEAPA; Keogh EJ; Tataw OM; De Souza VMA, 2014, 'CID: an efficient complexity-invariant distance for time series', Data Mining and Knowledge Discovery, 28, pp. 634 - 669
    Journal articles | 2014
    Chen Y; Why A; Batista G; Mafra-Neto A; Keogh E, 2014, 'Flying Insect Classification with Inexpensive Sensors', Journal of Insect Behavior, 27, pp. 657 - 677,
    Journal articles | 2014
    Chen Y; Why A; Batista G; Mafra-Neto A; Keogh E, 2014, 'Flying insect classification with inexpensive sensors', Journal of insect behavior, 27, pp. 657 - 677
    Journal articles | 2014
    Chen Y; Why A; Batista G; Mafra-Neto A; Keogh E, 2014, 'Flying insect detection and classification with inexpensive sensors', JoVE (Journal of Visualized Experiments), pp. e52111 - e52111
    Journal articles | 2014
    Del Gaudio R; Batista G; Branco A, 2014, 'Coping with highly imbalanced datasets: A case study with definition extraction in a multilingual setting', Natural Language Engineering, 20, pp. 327 - 359
    Journal articles | 2014
    Parmezan ARS; Batista GEAPA; others , 2014, 'ICMC-USP time series prediction repository', Instituto de Ciências Matemáticas e de Computaçao, Universidade de Sao Paulo, Sao Carlos, Brasil. URL https://goo. gl/uzxGZJ
    Journal articles | 2013
    Rakthanmanon T; Campana B; Mueen A; Batista G; Westover B; Zhu Q; Zakaria J; Keogh E, 2013, 'Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping.', ACM Trans Knowl Discov Data, 7,
    Journal articles | 2013
    Rakthanmanon T; Campana B; Mueen A; Batista G; Westover B; Zhu Q; Zakaria J; Keogh E, 2013, 'Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping', ACM Transactions on Knowledge Discovery from Data (TKDD), 7, pp. 1 - 31
    Journal articles | 2013
    Silva DF; de Souza VMAA; Batista GEAPA, 2013, 'A comparative study between MFCC and LSF coefficients in automatic recognition of isolated digits pronounced in Portuguese and English', Acta Scientiarum. Technology, 35, pp. 621 - 628
    Journal articles | 2012
    Prati RC; Batista GEAPA, 2012, 'A complexity-invariant measure based on fractal dimension for time series classification', International Journal of Natural Computing Research (IJNCR), 3, pp. 59 - 73
    Journal articles | 2011
    Milaré CR; Batista GEAPA; Carvalho ACPLF, 2011, 'A hybrid approach to learn with imbalanced classes using evolutionary algorithms', Logic Journal of IGPL, 19, pp. 293 - 293
    Journal articles | 2011
    Prati RC; Batista GEAPA; Monard MC, 2011, 'A survey on graphical methods for classification predictive performance evaluation', IEEE Transactions on Knowledge and Data Engineering, 23, pp. 1601 - 1618,
    Journal articles | 2010
    Prati R; Batista G; Monard M, 2010, 'A survey on graphical methods for classification predictive performance evaluation', Knowledge and Data Engineering, IEEE Transactions on, pp. 1 - 1
    Journal articles | 2008
    Prati RC; Batista GEAPA; Monard MC, 2008, 'Curvas ROC para avaliaç ao de classificadores', Revista IEEE América Latina, 6, pp. 215 - 222
    Journal articles | 2008
    Prati RC; Batista GEDAPA; Monard MC, 2008, 'Evaluating classifiers using ROC curves', IEEE Latin America Transactions, 6, pp. 215 - 222,
    Journal articles | 2006
    Batista GEAPA; Milaré CR; Prati RC; Monard MC, 2006, 'A Comparison of Methods for Rule Subset Selection Applied to Associative Classification.', Inteligencia artificial: Revista Iberoamericana de Inteligencia Artificial, 10, pp. 29 - 35
    Journal articles | 2005
    Batista G; Prati R; Monard M, 2005, 'Balancing strategies and class overlapping', Advances in Intelligent Data Analysis VI, pp. 741 - 741
    Journal articles | 2004
    Batista GEAPA; Prati RC; Monard MC, 2004, 'A study of the behavior of several methods for balancing machine learning training data', ACM SIGKDD Explorations Newsletter, 6, pp. 20 - 29
    Journal articles | 2004
    Milaré C; Batista G; de Carvalho A; Monard M, 2004, 'Applying genetic and symbolic learning algorithms to extract rules from artificial neural networks', MICAI 2004: Advances in Artificial Intelligence, pp. 833 - 843
    Journal articles | 2004
    Prati R; Batista G; Monard M, 2004, 'Class imbalances versus class overlapping: an analysis of a learning system behavior', MICAI 2004: Advances in Artificial Intelligence, pp. 312 - 321
    Journal articles | 2003
    Batista GEAPA; Monard MC, 2003, 'An analysis of four missing data treatment methods for supervised learning', Applied Artificial Intelligence, 17, pp. 519 - 533
    Journal articles | 2003
    Batista GEAPA; Monard MC, 2003, 'Descriç ao da arquitetura e do projeto do ambiente computacional DISCOVER LEARNING ENVIRONMENT—DLE', Relatório Técnico do ICMC/USP
    Journal articles | 2003
    Batista GEAPA; Monard MC, 2003, 'Experimental comparison of K-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4. 5 and CN2 to treat missing data', University of Sao Paulo
    Journal articles | 2002
    Batista GEAPA; Monard MC, 2002, 'A Study of K-Nearest Neighbour as an Imputation Method.', HIS, 87, pp. 48 - 48
    Journal articles | 2002
    Batista GEAPA; Monard MC, 2002, 'K-Nearest Neighbour as Imputation Method: Experimental Results', Technical report, ICMC-USP
    Journal articles | 2002
    Monard MC; Batista GEAPA, 2002, 'Learning with Skewed Class Distributions', Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002, 85, pp. 173 - 173
    Journal articles | 2000
    Batista G; Carvalho A; Monard M, 2000, 'Applying one-sided selection to unbalanced datasets', MICAI 2000: Advances in Artificial Intelligence, pp. 315 - 325
    Journal articles | 1997
    Batista GEAPA, 1997, 'Um ambiente de avaliaçao de algoritmos de aprendizado de máquina utilizando exemplos', Dissertaç ao de Mestrado, ICMC-USP
  • Working Papers | 2023
    Pashamokhtari A; Batista G; Habibi Gharakheili H, 2023, Quantifying and Managing Impacts of Concept Drifts on IoT Traffic Inference in Residential ISP Networks, arXiv, 2301.06695v2, ,
    Working Papers | 2022
    Pashamokhtari A; Batista G; Habibi Gharakheili H, 2022, AdIoTack: Quantifying and Refining Resilience of Decision Tree Ensemble Inference Models against Adversarial Volumetric Attacks on IoT Networks, arXiv, ARTN 102801, ,
  • Conference Papers | 2024
    Azizi S; Okui N; Nakahara M; Kubota A; Batista G; Gharakheili HH, 2024, 'Poster: Understanding and Managing Changes in IoT Device Behaviors for Reliable Network Traffic Inference', in SIGCOMM Posters and Demos 2024 - Proceedings of the 2024 SIGCOMM Poster and Demo Sessions, Part of: SIGCOMM 2024, pp. 25 - 27,
    Conference Papers | 2024
    Gil MZ; Hu Z; Lyu M; Batista G; Habibi Gharakheili H, 2024, 'Systematic Mapping and Temporal Reasoning of IoT Cyber Risks using Structured Data', in Asian Internet Engineering Conference, AINTEC 2024, pp. 18 - 25,
    Conference Papers | 2024
    Wang H; Zhi W; Batista G; Chandra R, 2024, 'Pedestrian Trajectory Prediction Using Dynamics-based Deep Learning', in Proceedings - IEEE International Conference on Robotics and Automation, pp. 15068 - 15075,
    Conference Papers | 2023
    Donyavi Z; Serapio A; Batista G, 2023, 'MC-SQ: A Highly Accurate Ensemble for Multi-class Quantification', in 2023 SIAM International Conference on Data Mining, SDM 2023, pp. 622 - 630,
    Preprints | 2023
    Hamza A; Gharakheili HH; Benson TA; Batista G; Sivaraman V, 2023, Detecting Anomalous Microflows in IoT Volumetric Attacks via Dynamic Monitoring of MUD Activity, ,
    Preprints | 2023
    Pashamokhtari A; Okui N; Nakahara M; Kubota A; Batista G; Gharakheili HH, 2023, Quantifying and Managing Impacts of Concept Drifts on IoT Traffic Inference in Residential ISP Networks, ,
    Conference Papers | 2023
    Serapião ABS; Donyavi Z; Batista G, 2023, 'Ensembles ofClassifiers andQuantifiers withData Fusion forQuantification Learning', in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 3 - 17,
    Conference Papers | 2022
    Chen B; Bakhshi A; Batista G; Ng B; Chin TJ, 2022, 'Update Compression for Deep Neural Networks on the Edge', in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 3075 - 3085,
    Preprints | 2022
    Pashamokhtari A; Batista G; Gharakheili HH, 2022, AdIoTack: Quantifying and Refining Resilience of Decision Tree Ensemble Inference Models against Adversarial Volumetric Attacks on IoT Networks, ,
    Conference Papers | 2022
    Tin D; Shahpasand M; Gharakheili HH; Batista G, 2022, 'Classifying Time-Series of IoT Flow Activity using Deep Learning and Intransitive Features', in International Conference on Software, Knowledge Information, Industrial Management and Applications, SKIMA, pp. 192 - 197,
    Conference Papers | 2021
    Da Silva TP; Parmezan ARS; Batista GEAPA, 2021, 'A Graph-Based Spatial Cross-Validation Approach for Assessing Models Learned with Selected Features to Understand Election Results', in Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021, pp. 909 - 915,
    Conference Papers | 2021
    Hassan W; Maletzke A; Batista G, 2021, 'Pitfalls in Quantification Assessment', in CEUR Workshop Proceedings
    Conference Papers | 2021
    Maletzke A; Reis DD; Hassan W; Batista G, 2021, 'Accurately Quantifying under Score Variability', in Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 1228 - 1233,
    Conference Papers | 2021
    Sharma A; Li J; Mishra D; Batista G; Seneviratne A, 2021, 'Passive WiFi CSI Sensing Based Machine Learning Framework for COVID-Safe Occupancy Monitoring', in 2021 IEEE International Conference on Communications Workshops, ICC Workshops 2021 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE), ELECTR NETWORK, pp. 1 - 6, presented at 2021 IEEE International Conference on Communications Workshops (ICC Workshops), ELECTR NETWORK, 14 June 2021 - 23 June 2021,
    Preprints | 2021
    da Silva LT; Souza VMA; Batista GEAPA, 2021, An Open-Source Tool for Classification Models in Resource-Constrained Hardware, ,
    Conference Papers | 2020
    Hassan W; Maletzke A; Batista G, 2020, 'Accurately quantifying a billion instances per second', in Proceedings - 2020 IEEE 7th International Conference on Data Science and Advanced Analytics, DSAA 2020, pp. 1 - 10,
    Conference Papers | 2020
    Jacintho LHM; da Silva TP; Parmezan ARS; de Almeida Prado Alves Batista GE; Batista G, 2020, 'Brazilian Presidential Elections: Analysing Voting Patterns in Time and Space Using a Simple Data Science Pipeline', in Anais do Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2020), Sociedade Brasileira de Computacao - SB, pp. 217 - 224, presented at Anais do Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2020),
    Conference Papers | 2020
    Maletzke A; Hassan W; dos Reis D; Batista G, 2020, 'The Importance of the Test Set Size in Quantification Assessment', in IJCAI, IJCAI, YOKOHAMA, pp. 2640 - 2646, presented at Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track, YOKOHAMA,
    Conference Papers | 2020
    Rebello G; Hu Y; Thilakarathna K; Batista G; Seneviratne A; Duarte OCMB, 2020, 'Melhorando a Acurácia da Detecção de Lavagem de Dinheiro na Rede Bitcoin', in Anais XXXVIII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC 2020), Sociedade Brasileira de Computação, pp. 728 - 741, presented at Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos,
    Preprints | 2020
    Reis DD; de Souto M; de Sousa E; Batista G, 2020, Quantifying With Only Positive Training Data, ,
    Preprints | 2020
    Souza VMA; Reis DMD; Maletzke AG; Batista GEAPA, 2020, Challenges in Benchmarking Stream Learning Algorithms with Real-world Data, ,
    Conference Papers | 2020
    de Sá JMC; Rossi ALD; Batista GEAPA; Garcia LPF, 2020, 'Algorithm recommendation for data streams', in Proceedings - International Conference on Pattern Recognition, pp. 6073 - 6080,
    Conference Papers | 2019
    Maletzke A; dos Reis D; Cherman E; Batista G, 2019, 'DyS: a Framework for Mixture Models in Quantification', in Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
    Conference Papers | 2019
    Tsutsui Da Silva L; Souza VMA; Batista GEAPA, 2019, 'EmbML Tool: Supporting the use of supervised learning algorithms in low-cost embedded systems', in Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, pp. 1633 - 1637,
    Conference Papers | 2018
    Maletzke A; dos Reis D; Cherman E; Batista G, 2018, 'On the Need of Class Ratio Insensitive Drift Tests for Data Streams', in Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 110 - 124
    Conference Papers | 2018
    Moreira dos Reis D; Maletzke A; Silva DF; Batista GEAPA, 2018, 'Classifying and counting with recurrent contexts', in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983 - 1992
    Conference Papers | 2018
    Parmezan ARS; Souza VMA; Batista GEAPA, 2018, 'Towards Hierarchical Classification of Data Streams', in 23rd Iberoamerican Congress on Pattern Recognition (CIARP), pp. 314 - 322
    Conference Papers | 2018
    Silva DF; Batista GEAPA; Keogh E, 2018, 'Large-Scale Similarity-Based Time Series Mining', in Anais do Concurso de Teses e Dissertações da SBC (CTD-SBC), Sociedade Brasileira de Computação - SBC, presented at XXXI Concurso de Teses e Dissertações da SBC,
    Conference Papers | 2018
    Silva DF; Batista GEAPA, 2018, 'Elastic time series motifs and discords', in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 237 - 242, IEEE
    Conference Papers | 2018
    Souza V; Pinho T; Batista G, 2018, 'Evaluating stream classifiers with delayed labels information', in Proceedings - 2018 Brazilian Conference on Intelligent Systems, BRACIS 2018, pp. 408 - 413,
    Conference Papers | 2018
    da Silva TP; Souza VMA; Batista GEAPA; de Arruda Camargo H, 2018, 'A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels', in 23rd Iberoamerican Congress on Pattern Recognition (CIARP)
    Conference Papers | 2018
    dos Reis D; Maletzke A; Cherman E; Batista G, 2018, 'One-class quantification', in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Dublin, Ireland, pp. 273 - 289, presented at ECML PKDD 2018, Dublin, Ireland, 10 September 2018 - 14 September 2018,
    Conference Papers | 2018
    dos Reis DM; Maletzke AG; Batista GEAPA, 2018, 'Unsupervised context switch for classification tasks on data streams with recurrent concepts', in Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 518 - 524
    Conference Papers | 2017
    Batista GEAPA; Tinós R, 2017, 'Message from program chairs', in Proceedings - 2017 Brazilian Conference on Intelligent Systems, BRACIS 2017, pp. xiv,
    Conference Papers | 2017
    Maletzke AG; dos Reis DM; Batista GEAPA, 2017, 'Quantification in data streams: Initial results', in 2017 Brazilian Conference on Intelligent Systems (BRACIS), IEEE, pp. 43 - 48, IEEE
    Conference Papers | 2016
    Giusti R; Silva DF; Batista GEAPA, 2016, 'Improved Time Series Classification with Representation Diversity and SVM', in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 1 - 6, presented at 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 18 December 2016 - 20 December 2016,
    Conference Papers | 2016
    Giusti R; Silva DF; Batista GEAPA, 2016, 'Improved time series classification with representation diversity and svm', in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 1 - 6, IEEE
    Conference Papers | 2016
    Silva DF; Batista GEAPA; Keogh E; others , 2016, 'On the effect of endpoints on dynamic time warping', in SIGKDD Workshop on Mining and Learning from Time Series II, San Francisco, CA. Association for Computing Machinery-ACM
    Conference Papers | 2016
    Silva DF; Batista GEAPA; Keogh E, 2016, 'Prefix and Suffix Invariant Dynamic Time Warping', in 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, pp. 1209 - 1214, presented at 2016 IEEE 16th International Conference on Data Mining (ICDM), 12 December 2016 - 15 December 2016,
    Conference Papers | 2016
    Silva DF; Batista GEAPA; Keogh E, 2016, 'Prefix and suffix invariant dynamic time warping', in 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, pp. 1209 - 1214, IEEE
    Conference Papers | 2016
    Silva DF; Batista GEAPA, 2016, 'Speeding up all-pairwise dynamic time warping matrix calculation', in Proceedings of the 2016 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 837 - 845, Society for Industrial and Applied Mathematics
    Conference Papers | 2016
    Silva DF; Yeh CCM; Batista GEAPA; Keogh E, 2016, 'SiMPle: Assessing music similarity using subsequences joins', in Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016, pp. 23 - 29
    Conference Papers | 2016
    Sousa CAR; Batista GEAPA, 2016, 'Constrained Local and Global Consistency for semi-supervised learning', in 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp. 1689 - 1694, IEEE
    Conference Papers | 2016
    dos Reis DM; Flach P; Matwin S; Batista G, 2016, 'Fast unsupervised online drift detection using incremental kolmogorov-smirnov test', in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1545 - 1554
    Other | 2015
    Chen Y; Keogh E; Hu B; Begum N; Bagnall A; Queen A; Batista G, 2015, The ucr time series classification archive,
    Conference Papers | 2015
    Giusti R; Silva DF; Batista GEAPA, 2015, 'Time series classification with representation ensembles', in International Symposium on Intelligent Data Analysis, Springer, Cham, pp. 108 - 119, Springer, Cham
    Conference Papers | 2015
    Oliveira LS; Batista GEAPA, 2015, 'Igmm-cd: a gaussian mixture classification algorithm for data streams with concept drifts', in 2015 Brazilian Conference on Intelligent Systems (BRACIS), IEEE, pp. 55 - 61, IEEE
    Conference Papers | 2015
    Parmezan ARS; Batista GEAPA, 2015, 'A study of the use of complexity measures in the similarity search process adopted by knn algorithm for time series prediction', in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 45 - 51, IEEE
    Conference Papers | 2015
    Qi Y; Cinar GT; Souza VMA; Batista GEAPA; Wang Y; Principe JC, 2015, 'Effective insect recognition using a stacked autoencoder with maximum correntropy criterion', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 7, IEEE
    Conference Papers | 2015
    Silva DF; de Souza VMA; Batista GEAPA, 2015, 'Music Shapelets for Fast Cover Song Recognition.', in ISMIR, pp. 441 - 447
    Conference Papers | 2015
    Souza VMA; Batista GEAPA; Souza-Filho NE, 2015, 'Automatic classification of drum sounds with indefinite pitch', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 8, IEEE
    Conference Papers | 2015
    Souza VMA; Silva DF; Batista GEAPA; Gama J, 2015, 'Classification of Evolving Data Streams with Infinitely Delayed Labels', in IEEE International Conference on Machine Learning & Applications (ICMLA), pp. 214 - 219
    Other | 2015
    Souza VMA; Silva DF; Gama J; Batista GEAPA, 2015, Nonstationary environments-archive,
    Conference Papers | 2015
    Souza VMA; Silva DF; Gama JA; Batista GEAPA, 2015, 'Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency', in SIAM International Conference on Data Mining (SDM), pp. 873 - 881
    Conference Papers | 2015
    de Sousa AR; Batista GEAPA, 2015, 'Robust multi-class graph transduction with higher order regularization', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 8, IEEE
    Conference Papers | 2015
    de Sousa CAR; Souza VMA; Batista GEAPA, 2015, 'An experimental analysis on time series transductive classification on graphs', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 8, IEEE
    Preprints | 2014
    Chen Y; Why A; Batista G; Mafra-Neto A; Keogh E, 2014, Flying Insect Classification with Inexpensive Sensors, ,
    Conference Papers | 2014
    Lemes CI; Silva DF; Batista GEAPA, 2014, 'Adding diversity to rank examples in anytime nearest neighbor classification', in 2014 13th International Conference on Machine Learning and Applications, IEEE, pp. 129 - 134, IEEE
    Conference Papers | 2014
    Silva DF; Rossi RG; Rezende SO; Batista GEAPA, 2014, 'Music Classification by Transductive Learning Using Bipartite Heterogeneous Networks', in International Society of Music Information Retrieval Conference (ISMIR)
    Conference Papers | 2014
    Souza VMA; Silva DF; Batista GEAPA, 2014, 'Extracting texture features for time series classification', in 2014 22nd International Conference on Pattern Recognition, IEEE, pp. 1425 - 1430, IEEE
    Conference Papers | 2014
    de Sousa CAR; Souza VMA; Batista GEAPA, 2014, 'Time series transductive classification on imbalanced data sets: an experimental study', in 2014 22nd International Conference on Pattern Recognition, IEEE, pp. 3780 - 3785, IEEE
    Conference Papers | 2013
    Chen Y; Hu B; Keogh E; Batista GEAPA, 2013, 'DTW-D: time series semi-supervised learning from a single example', in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 383 - 391
    Conference Papers | 2013
    Domingues MA; Cherman EA; Nogueira BM; Conrado MS; Rossi RG; De Padua R; Marcacini RM; Souza VMA; Batista GEAPA; Rezende SO, 2013, 'A comparative study of algorithms for recommending given names', in 2013 2nd International Conference on Informatics and Applications, ICIA 2013, pp. 66 - 71,
    Conference Papers | 2013
    Domingues MA; Marcacini RM; Rezende SO; Batista GEAPA, 2013, 'Improving the recommendation of given names by using contextual information', in CEUR Workshop Proceedings, pp. 61 - 72
    Conference Papers | 2013
    Giusti R; Batista GEAPA, 2013, 'An empirical comparison of dissimilarity measures for time series classification', in 2013 Brazilian Conference on Intelligent Systems, IEEE, pp. 82 - 88, IEEE
    Conference Papers | 2013
    Maletzke AG; Lee HD; Batista GEAPA; Rezende SO; Machado RB; Voltolini RF; Maciel JN; Silva F; dos Santos LB; Wu FC, 2013, 'Time Series Classification using Motifs and Characteristics Extraction: A Case Study on ECG Databases', in Lopez JCL; Andrade RAE; Perez RB; Carrillo PAA (eds.), PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON KNOWLEDGE DISCOVERY, KNOWLEDGE MANAGEMENT AND DECISION SUPPORT (EUREKA-2013), ATLANTIS PRESS, MEXICO, Mazatlan, pp. 322 - 329, presented at 4th International Workshop on Knowledge Discovery, Knowledge Management and Decision Support (Eureka), MEXICO, Mazatlan, 06 November 2013 - 08 November 2013,
    Conference Papers | 2013
    Rakthanmanon T; Keogh E, 2013, 'Data mining a trillion time series subsequences under dynamic time warping', in Twenty-Third International Joint Conference on Artificial Intelligence
    Conference Papers | 2013
    Silva D; Papadopoulos H; Batista GEAPA; Ellis DPW, 2013, 'A video compression-based approach to measure music structural similarity', in International Society for Music Information Retrieval Conference, pp. 95 - 10
    Conference Papers | 2013
    Silva DF; De Souza VMA; Batista GEAPA; Keogh E; Ellis DPW, 2013, 'Applying machine learning and audio analysis techniques to insect recognition in intelligent traps', in 2013 12th International Conference on Machine Learning and Applications, IEEE, pp. 99 - 104, IEEE
    Conference Papers | 2013
    Silva DF; De Souza VMA; Batista GEAPA, 2013, 'Time series classification using compression distance of recurrence plots', in 2013 IEEE 13th International Conference on Data Mining, IEEE, pp. 687 - 696, IEEE
    Conference Papers | 2013
    de Sousa CAR; Rezende SO; Batista GEAPA, 2013, 'Influence of graph construction on semi-supervised learning', in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, Heidelberg, pp. 160 - 175, Springer, Berlin, Heidelberg
    Conference Papers | 2013
    de Souza VMA; Silva DF; Batista GEAPA, 2013, 'Classification of data streams applied to insect recognition: Initial results', in 2013 Brazilian Conference on Intelligent Systems, IEEE, pp. 76 - 81, IEEE
    Conference Papers | 2012
    Alves GEDAP; Silva DF; Prati RC; others , 2012, 'An experimental design to evaluate class imbalance treatment methods', in 2012 11th International Conference on Machine Learning and Applications, IEEE, pp. 95 - 101, IEEE
    Conference Papers | 2012
    Qiang Z; Rakthanmanon T; Batista G; Keogh E, 2012, 'A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets', in SIAM International Conference on Data Mining, pp. 999 - 1010
    Conference Papers | 2012
    Rakthanmanon T; Campana B; Mueen A; Batista G; Westover B; Zhu Q; Zakaria J; Keogh E, 2012, 'Searching and mining trillions of time series subsequences under dynamic time warping', in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 262 - 270
    Conference Papers | 2012
    Silva DF; de Souza VMA; Batista GEAPA; Giusti R, 2012, 'Spoken digit recognition in portuguese using line spectral frequencies', in Ibero-American Conference on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 241 - 250, Springer, Berlin, Heidelberg
    Conference Papers | 2011
    Batista G; Keogh E; Neto AM; Rowton E, 2011, 'SIGKDD demo: sensors and software to allow computational entomology, an emerging application of data mining', in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 761 - 764, ACM
    Conference Papers | 2011
    Batista G; Wang X; Keogh E, 2011, 'A Complexity-Invariant Distance Measure for Time Series', in SDM-2011: Proceedings of SIAM International Conference on Data Mining
    Conference Papers | 2011
    Batista GEAPA; Hao Y; Keogh E; Mafra-Neto A, 2011, 'Towards automatic classification on flying insects using inexpensive sensors', in 2011 10th International Conference on Machine Learning and Applications and Workshops, IEEE, pp. 364 - 369, IEEE
    Conference Papers | 2010
    Batista GEAPA; Campana B; Keogh E, 2010, 'Classification of Live Moths Combining Texture, Color and Shape Primitives', in Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on, IEEE, pp. 903 - 906, IEEE
    Conference Papers | 2010
    Giusti R; Batista GEAPA, 2010, 'Discovering Knowledge Rules with Multi-Objective Evolutionary Computing', in Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on, IEEE, pp. 119 - 124, IEEE
    Conference Papers | 2009
    Batista GEAPA; Silva DF, 2009, 'How k-Nearest Neighbor Parameters Affect its Performance', in X Argentine Symposium on Artificial Intelligence
    Conference Papers | 2009
    Prati RC; Batista GEAPA; Monard MC, 2009, 'Data mining with imbalanced class distributions: concepts and methods.', in IICAI, pp. 359 - 376
    Conference Papers | 2008
    Giusti R; Batista GEAPA; Prati RC, 2008, 'Evaluating Ranking Composition Methods for Multi-Objective Optimization of Knowledge Rules', in Hybrid Intelligent Systems, 2008. HIS’08. Eighth International Conference on, IEEE, pp. 537 - 542, IEEE
    Conference Papers | 2008
    Matsubara ET; Prati RC; Batista GEAPA; Monard MC, 2008, 'Missing value imputation using a semi-supervised rank aggregation approach', in Brazilian Symposium on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 217 - 226, Springer, Berlin, Heidelberg
    Conference Papers | 2008
    Prati RC; Batista GEAPA; Monard MC, 2008, 'A study with class imbalance and random sampling for a decision tree learning system', in IFIP International Conference on Artificial Intelligence in Theory and Practice, Springer, Boston, MA, pp. 131 - 140, Springer, Boston, MA
    Conference Papers | 2005
    Batista GEAPA; Prati RC; Monard MC, 2005, 'Balancing strategies and class overlapping', in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 24 - 35,
    Conference Papers | 2005
    Matsubara ET; Monard MC; Batista GEAPA, 2005, 'Multi-view semi-supervised learning: An approach to obtain different views from text datasets', in Proceeding of the 2005 conference on Advances in Logic Based Intelligent Systems: Selected Papers of LAPTEC 2005, IOS Press, pp. 97 - 104, IOS Press
    Conference Papers | 2004
    Batista GEAPA; Monard MC; Bazzan ALC, 2004, 'Improving rule induction precision for automated annotation by balancing skewed data sets', in International Symposium on Knowledge Exploration in Life Science Informatics, Springer, Berlin, Heidelberg, pp. 20 - 32, Springer, Berlin, Heidelberg
    Conference Papers | 2004
    Prati RC; Batista GEAPA; Monard MC, 2004, 'Learning with class skews and small disjuncts', in Brazilian Symposium on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 296 - 306, Springer, Berlin, Heidelberg
    Theses / Dissertations | 2003
    BATISTA GE, 2003, Pré-processamento de dados em aprendizado de máquinas supervisionado., Tese (Doutorado)-Instituto de Ciências Matemáticas e de Computaç ao …
    Conference Papers | 2003
    Batista GEAPA; Bazan AL; Monard MC, 2003, 'Balancing training data for automated annotation of keywords: a case study', in Proceedings of the Second Brazilian Workshop on Bioinformatics, pp. 35 - 43
    Conference Papers | 2002
    Lorena AC; Batista GEAPA; De Carvalho ACPLF; Monard MC, 2002, 'Splice junction recognition using machine learning techniques', in Proceedings of the First Brazilian Workshop on Bioinformatics, Citeseer, pp. 32 - 39, Citeseer
    Conference Papers | 2002
    Lorena AC; Batista GEAPA; de Carvalho ACPLF; Monard MC, 2002, 'The influence of noisy patterns on the performance of learning methods in the splice junction recognition problem', in Neural Networks, 2002. SBRN 2002. Proceedings. VII Brazilian Symposium on, IEEE, pp. 31 - 36, IEEE
    Conference Papers | 2001
    Batista GEAPA; Monard MC, 2001, 'A study of K-nearest neighbour as a model-based method to treat missing data', in Argentine Symposium on Artificial Intelligence
    Conference Papers | 2000
    Baranauskas JA; Monard MC; Batista GEAPA, 2000, 'A computational environment for extracting rules from databases', in Management Information Systems, pp. 321 - 330

Grant funding as principal investigator

  • 2017 – 2019: FAPESP e-Science Research Grant. Intelligent Traps and Sensors: an Innovative Approach to Control Insect Pests and Disease Vectors. $55,000.
  • 2016 – 2019:USAID Combating Zika and Future Threats Grand Challenge. An Intelligent Trap and Mobile Application to Motivate Local Mosquito Control Activities. $500,000.
  • 2017 – 2019: CNPq Research Fellow. Novel Approaches in Machine Learning Applied to Automatic Insect Recognition. $25,000.
  • 2015 – 2016:Google LA Research Award. Controlling Dengue Fever Mosquitoes using Intelligent Sensors and Traps. $24,000.
  • 2012 – 2014: FAPESP Research Grant. Complexity-invariance for Classification, Clustering and Motif Discovery in Time Series. $30,000.
  • 2013 – 2015:FAPESP-CALDO International Cooperation Grant. Research on Geospatial Marine Biology Data Mining using Time Series, Text Mining and Visualization (with Stan Matwin co-PI for NSERC). $20,000.
  • 2013 – 2015:FAPESP-CNPq Research Grant. Intelligent Sensors for Controlling Agricultural Pests and Disease-vector Insects. $55,000.
  • 2014 – 2017:CNPq Universal Research Grant. Real-time Monitoring of Insect Pests in Agriculture and the Environment. $25,000.
  • 2014 – 2017:FAPESP New Frontiers Grant. Time Series Classification Algorithms Applied to Embedded Systems. $30,000.
  • 2007 – 2009:FAPESP Research Grant. Machine Learning and Class Imbalance.$10,000.

  • 2020. Best Research Paper Award. IEEE International Conference on Data Science and Advanced Analytics (IEEE-DSAA).
  • 2017 – 2020. Research Fellow, level 2. National Council for Scientific and Technological Development, CNPq.
  • 2014 – 2017. Research Fellow, level 2. National Council for Scientific and Technological Development, CNPq.
  • 2015 – 2016. Google Research Award in Latin America. Google Inc.
  • 2012. Best Research Paper Award. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (ACM-KDD).

I have worked in Machine Learning during my entire career. My main contributions to the field are the following:

Quantification: I have developed counting algorithms that are robust to changes in data distributions that occur in real-world applications. The algorithms developed by my research group, such as the ones of the DyS family are among the most accurate ones. We recently developed an ultra-fast counting algorithm which performs similarly to the state-of-the-art. This algorithm received the Best Research Paper Award at DSAA-2020.

Time Series Mining: I have created algorithms to classify and cluster time-oriented data under different invariances such as warping. Such developments lead to the UCR suite, a framework for time series matching under warping that received the KDD Best Research Award in 2012. More recently, we further improved the search speed of the UCR suite, creating the UCR-USP suite. I also proposed the first time series distance invariant to complexity.

Class imbalance: My initial research involved the development and assessment of methods to deal with imbalanced class data. My research focused on discussing the challenges of learning with imbalanced data, including the scenarios in which skewed distributions would impose difficulties for classifiers. My articles figure among the most cited in the topic, including the ACM SIGKDD paper of 2004 with more than 2,500 citations.

Missing data imputation: During my PhD, I worked with data preprocessing techniques, including missing data imputation methods. I developed and demonstrated the use of k-nearest neighbour (k-NN) as a flexible technique for missing data imputation and demonstrated its efficacy comparing to other techniques in the state-of-the-art. k-NN is currently one of the most used imputation algorithms due to its simple implementation, ability to deal with missing data in multiple attributes and capacity to work with continuous and discrete features.

My Research Supervision

  • Tiago Pinho da Silva, PhD student: Election Forensics: Detecting Irregularities in Electoral DataUnder Spatial Non-Stationarity.
  • Antonio Parmezan, PhD student: Hierarchical Classification of Data Streams.