Speakers – VLDB Summer School 2025

Amr El Abbadi

(University of California at Santa Barbara, USA)

Privacy-preserving access in large scale data systems

Abstract: In this tutorial-style presentation, we will delve into the specifics of various privacy preserving data management operations. In particular, we focus on public data, and the efficient and scalable use of PIR (Private Information Retrieval), which is one of the main mechanisms proposed in recent years. However, PIR requires the server to consider data as an array of elements and clients retrieve data using an index into the array. This requirement limits the use of PIR in many practical settings, especially for key-value stores, where the client may be interested in a particular key or for public document repositories like Wikipedia, where a client poses a query using multiple keywords and is interested in retrieving the top most relevant k documents. Solving these problems requires efficient scalable PIR mechanisms as well as secure matrix vector multiplication. In this talk we will discuss recent efforts to support such functionalities, using Fully Homomorphic Encryption (FHE), to improve the performance, scalability and expressiveness of privacy preserving queries of public data.

Bio: Amr El Abbadi is a Professor of Computer Science. He received his B. Eng. from Alexandria University, Egypt, and his Ph.D. from Cornell University. His research interests are in the fields of fault-tolerant distributed systems and databases, focusing recently on Cloud data management, blockchain based systems and privacy concerns. Prof. El Abbadi is an ACM Fellow, AAAS Fellow, and IEEE Fellow. He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He served as Associate Graduate Dean at the University of California, Santa Barbara from 2021–2023. He has served as a journal editor for several database journals, including, The VLDB Journal, IEEE Transactions on Computers and The Computer Journal. He has been Program Chair for multiple database and distributed systems conferences, including most recently SIGMOD 2022. He currently serves on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013, his student, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. Prof. El Abbadi is also a co-recipient of the Test of Time Award at EDBT/ICDT 2015. Recently, papers he co-authored received an Outstanding paper award in NSDI (Networked System Design and Implementation) 2024 and the Test of Time Award from MDM (Mobile Data Management)2024. He has published over 350 articles in databases and distributed systems and has supervised over 40 PhD students.

George Fletcher

(Eindhoven University of Technology, Netherlands)

Data-in-the-loop: theories of data for the working database researcher

Abstract: What is data? This is arguably the primary scientific question underlying the data management systems research field. Every research result, every contribution to the literature in our field sheds new light on this question. Historically, however, this question has remained too implicit in our work. Explicitly grappling with this question would be a significant step towards strengthening and enriching the scientific foundations of the study of data systems. First, this would help us more openly and critically understand the ways in which we have and have not been finding answers to this question. Second, this effort would better connect us to the broader literature, across the many STEM, social science, and humanities fields also studying data and data systems. This, in turn, would help us more critically reflect on our responsibilities to society. Progress in these directions will help both broaden and deepen our scientific and societal impacts as a research community. In this lecture we take first steps in this direction with a hands-on crash course in theories of data for data management systems researchers. We will study established models coming from the social sciences and humanities, grounded in concrete practical use cases. We will then consider implications for challenging new areas, new intellectual terrain, for database research.

Bio: George Fletcher is a data systems researcher and educator. His recent work increasingly focuses on new challenges in the human and social aspects of query languages and schema languages. His work received the ACM SIGMOD Research Highlight Award 2023, the ACM SIGMOD 2023 Best Industrial Paper Award, the VLDB 2022 Best Research Paper Award Runner Up, and the ACM ICER 2021 Honorable Mention Award. He has been on the faculty at Eindhoven University of Technology for the past 15 years. George did his graduate work at Indiana University Bloomington, where he defended a PhD in the computer science department. His undergraduate studies in mathematics and cognitive science were completed at the University of North Florida.

Sudeepa Roy

(Duke University, USA)

Interpretable Data Analysis with Causal Inference and Explanations

Abstract: In current times, data is considered synonymous with knowledge, profit, power, and entertainment, requiring development of new interpretable data analysis techniques to extract useful insights from data, and help users make data-driven decisions. In this talk, first I will discuss causal inference that estimates the effect of a treatment on an outcome and is studied in statistics and AI literature for many years. Causal inference provides a means to estimate the impact of a certain intervention to the world that correlation, association, or model-based prediction analysis cannot provide, and is indispensable in health, public policy, and other domains. I will give a brief tutorial on causal inference, make a connection between causal inference and database research, and discuss how causal inference can be used in prescriptive data analytics. Then, I will talk about meaningful explanations methods for different stages of the data analysis pipeline. Finally, I will make a connection with responsible data analysis with fairness and privacy, and conclude with directions for future research.

Bio: Sudeepa Roy is an Associate Professor in Computer Science at Duke University. She works in data management, with a focus on the foundational aspects of big data analysis, which includes causal inference and explanations for big data, debugging queries, data repair, data provenance, and probabilistic databases. Before joining Duke in 2015, she did a postdoc at the University of Washington and obtained her Ph.D. from the University of Pennsylvania. She co-directs the Almost Matching Exactly lab for interpretable causal inference at Duke University, and is a recipient of an NSF CAREER Award, a VLDB Early Career Research Contributions Award, and a Google Ph.D. fellowship in structured data. She is serving as the Program Committee Chair of the International Conference on Database Theory (ICDT) 2025 and as a Program Committee Co-Chair of ACM SIGMOD 2026.

Angela Bonifati

(Lyon 1 University & CNRS, France)

Graph Analytics for Strong and Trustworthy AI

Abstract: Strong AI, also known as artificial general intelligence (AGI), leads to conceive machines that have humanlike intelligence and can learn, react and make prompt decisions based on changes of the environment. On the other hand, graphs are the backbone of knowledge representation and evolution and they are at the heart of expressive graph database systems. Modern applications, especially in the area of graph analytics, need to bridge the gap between knowledge retrieval and understanding. To mitigate the risks of AGI and fully untap the potential of understanding and explaining advanced processes, rigorous mathematical theories for causality can be leveraged to model and query graph-based causal data going before plain correlations. In this talk, I will address the problem of combining these endeavours by presenting our latest results on the analytical operators needed for fully integrating causality into graph databases. To address these issues, it is necessary to study new and unique theoretical foundations for in-database causality, extend standardized graph languages to support causality and design and implement new causal graph analytical systems.

Bio: Angela Bonifati is a Distinguished Professor of Computer Science at Lyon 1 University and at the CNRS Liris research lab, where she leads the Database Group. She is also an Adjunct Professor at the University of Waterloo in Canada from 2020 and a Senior member of the French University Institute (IUF) from 2023. Her current research interests are on several aspects of data management, including graph databases, knowledge graphs and data integration with additional interests in data science and AI. She has co-authored more than 200 publications in top venues of the data management field, including five Best Paper Awards, two books and an invited paper in ACM Sigmod Record 2018. She is the recipient of the prestigious IEEE TCDE Impact Award 2023 and a co-recipient of an ACM Research Highlights Award 2023. She is the Program Chair of IEEE ICDE 2025, the General Chair of VLDB 2026 and an Associate Editor for the Proceedings of VLDB and for several other journals, including the VLDB Journal, IEEE TKDE and ACM TODS. She was the Program Chair of ACM Sigmod 2022 and EDBT 2020. She is the Chair of the EDBT Executive Board (2020-2025), a member of the PVLDB Board of Trustees (2024-2029) and a member of the IEEE TCDE executive committee (2024-2029).

Zoi Kaoudi

(IT University of Copenhagen, Denmark)

Unified Data Analytics: Seamless Data Systems Integration and Optimization

Abstract: In today’s data-driven world, organizations rely on multiple data processing systems for their applications, often leading to fragmented workflows, performance inefficiencies, and redundant ad hoc scripts. In this talk, I will explore how to streamline and optimise data processing across diverse and heterogeneous environments. I will first discuss the opportunities of cross-platform data processing and demonstrate how Apache Wayang enables seamless execution across engines like Apache Spark, Flink, and PostgreSQL. Wayang offers an abstraction layer on top of existing data systems, thereby optimizing performance, reducing costs, and maintaining flexibility without vendor lock-in. Additionally, I will share our latest research on using generative ML models to enhance Wayang’s query optimizer. The talk will conclude with a live demo of the system.

Bio: Zoi Kaoudi is an Associate Professor in the Computer Science Department at the IT University of Copenhagen (ITU). Her current research focus is on (i) leveraging machine learning techniques for data-intensive systems, (ii) improving the performance and ease of use of machine learning systems, and (iii) advance knowledge graph embeddings with ontologies and logical reasoning. Before joining ITU, she has held positions in various places around the world. She has worked as a Senior Researcher at the Technical University of Berlin, as a Scientist at the Qatar Computing Research Institute (QCRI), as a visiting researcher at IMIS-Athena Research Center, and as a postdoctoral researcher at Inria Saclay. She received her Ph.D. from the National and Kapodistrian University of Athens in 2011. She has co-authored articles in both database and ML communities and served as a member of the Program Committee for several international database conferences. She has recently received the best demonstration award at ICDE 2022 for her work on training data generation for learning-based query optimization.

Burcu Külahçıoğlu Özkan

(Delft University of Technology, Netherlands)

Testing Distributed Systems

Abstract: Distributed systems are difficult to design, implement, and test. They must ensure correctness in the concurrent execution of a distributed set of processes, different delivery orderings of asynchronous messages, and in the existence of network and process failures. Unforeseen interleavings of concurrent events and failures can result in expected malfunctioning or outages in the executions of distributed systems and databases. In this tutorial, we will focus on concurrency and fault-tolerance bugs in distributed systems, providing an overview of state-of-the-art testing techniques to efficiently detect these bugs in distributed system implementations.

Bio: Burcu Külahçıoğlu Özkan is an assistant professor and Delft Technology Fellow in the Software Engineering Research Group at TU Delft. She received her PhD from Koç University in Istanbul, Turkey, followed by postdoctoral research at the Max Planck Institute for Software Systems (MPI-SWS) in Kaiserslautern, Germany. Her research focuses on formal methods, model checking, software testing, and debugging of concurrent programs and distributed systems. She is a recipient of the academic research awards and grants from Amazon Research and Stellar Development Foundation.

Marcin Żukowski

(Snowflake co-founder, Poland)

The story of two architectures

Abstract: In this talk Marcin will discuss two major database architecture innovations he was involved with over his career. First is the vectorized query execution, a de-facto standard for all modern analytical databases. Second is Snowflake, a novel three-tier database architecture, foundational for the cloud but also strongly influencing modern lakehouse solutions. For both architectures Marcin will discuss the rationale, major technical contributions, benefits and the impact.

Bio: Marcin is a recognized expert in the field of database processing. His research, especially the concept of vectorized query execution, has become fundamental to the current generation of analytical data processing systems. Marcin holds MSc degrees in CS from the Warsaw University and Vrije Universiteit Amsterdam. He did his PhD thesis at Dutch Center for Mathematics and Computer Science (CWI). Based on that work he co-founded Vectorwise, and as its CEO led to its acquisition in 2010 by Actian. In 2013 he co-founded Snowflake, where he played major technical and leadership roles. Snowflake’s debut on NYSE in 2020 was the largest software company IPO in history. Today Marcin is engaged in supporting the startup ecosystem in Poland, through his roles as advisor and investor.