Hanchao Ma
I am a postdoctoral associate in Computer Science at Case Western Reserve University, advised by Yinghui Wu . My research focuses on next-generation AI data systems, with the goal of establishing a new systems paradigm that integrates knowledge graphs and large language models for end-to-end knowledge discovery, reasoning, and autonomous ML-assets and workflow management. My work spans graph-based knowledge discovery and scientific workflow platforms such as CRUX, with publications in leading venues including VLDB, SIGMOD, and ICDE, etc.
I was working as a research intern at Pacific Northwest National Laboratory and Microsoft Research Asia, and as an Applied Scientist Intern at Amazon, where I focused on graph data systems, knowledge discovery, and LLM-driven applications, including large-scale knowledge graph and recommendation systems.
Current Focus
Next-Generation AI Data Systems: A new paradigm unifying knowledge graphs and LLMs for end-to-end knowledge discovery and reasoning.
Graph-Based Knowledge Discovery: Interactive systems for unbiased, exploratory, and explainable graph exploration.
ML Asset & Workflow Ecosystems: Self-organizing systems for automated, reliable, and reproducible AI workflows.
Projects
Intelligent Graph Data Systems for Knowledge Exploration in AI Infrastructure
Interactive systems for fair, diverse, and explainable exploration of large-scale graph data, enabling unbiased knowledge discovery.

Related papers:
- Ontology-Based Entity Matching in Attributed Graphs (VLDB 2019)
- Diversified Subgraph Query Generation with Group Fairness (WSDM 2022)
- Subgraph Query Generation with Fairness and Diversity Constraints (ICDE 2022)
- Fair Group Summarization with Graph Patterns (ICDE 2023)
- Explaining Missing Data in Graphs: A Constraint-Based Approach (ICDE 2021)
- GRIP: Constraint-Based Explanation of Missing Answers for Graph Queries (SIGMOD demo)
Robust Graph Learning for Graph Data Quality
Learning-based systems for detecting errors and improving robustness in knowledge graphs under limited supervision.

Related papers:
- GEDet: Adversarially Learned Few-Shot Detection of Erroneous Nodes in Graphs (BigData 2020)
- GALE: Active Adversarial Learning for Erroneous Node Detection in Graphs (ICDE 2023)
- RoboGNN: Robust Node Classification under Link Perturbation (IJCAI 2022)
AI-Powered Scientific Data Assets & Workflow Management Systems
A unified platform for scientific data management, provenance tracking, and workflow exploration, enabling reproducible research and automated dataset/model discovery.

Related papers:
- CRUX: Crowdsourced Materials Science Resource and Workflow Exploration (CIKM 2022)
- Selecting Top-k Data Science Models by Example Dataset (CIKM 2023)
- ModsNet: Performance-Aware Top-k Model Search Using Exemplar Datasets (VLDB 2024)
- Generating Skyline Datasets for Data Science Models (EDBT 2025)
