Ph.D. Candidate in Computer Science
Carnegie Mellon University
saranyav [at] andrew.cmu.edu
I work on making AI systems more secure, private, and understandable. My research combines formal verification and machine learning to address vulnerabilities in areas like fraud detection, secure code generation, and privacy-preserving protocols. I’m especially focused on identifying and exploiting weaknesses through red teaming and jailbreaks, building tools that help us understand why these systems break, and how to make them safer.
I am fortunate to be advised by Christos Faloutsos and Matt Fredrikson. Previously, I did my undergraduate at Harvard with a joint concentration in computer science and government, working with Cynthia Dwork and Jim Waldo on my thesis. After Harvard, I spent three years as an associate at Goldman Sachs before beginning my PhD.
My AI governance experience includes running National Security Policy at Harvard's Institute of Politics, graduate coursework at the Kennedy School, an internship at Booz Allen Hamilton, and research collaboration with Bruce Schneier at the Berkman Klein Center. At CMU, I served as a teaching assistant for Norman Sadeh's Security, Privacy and Public Policy course and I guest lecture for his AI governance class. I am currently supported by the Department of Defense National Defense Science and Engineering Graduate Fellowship through the Army Research Office.
Core research areas:
Nils Palumbo, Ravi Mangal, Zifan Wang, Saranya Vijayakumar, Corina Pasareanau, Somesh Jha
In this work we present a formal framework for mechanistically interpreting neural networks, motivated by abstract interpretation from program analysis. We apply this framework to analyze a Transformer model trained to solve the 2-SAT problem. Through our analysis, we uncover that the model learns a systematic algorithm - first parsing input formulas into clause-level representations in initial layers, then evaluating satisfiability by enumerating possible Boolean variable valuations. Our work provides evidence that the extracted mechanistic interpretation satisfies our proposed formal axioms, demonstrating how the model systematically solves 2-SAT problems.
PDFPriyanshu Kumar, Saranya Vijayakumar, Elaine Lau, Tu Trinh, Zifan Wang, Matt Fredrikson
Our work investigates whether LLMs' safety training, which makes them refuse harmful instructions in chat contexts, extends to non-chat agent scenarios, particularly browser agents. We show that while LLMs may refuse harmful requests in chat form, the same LLMs when used as browser agents often fail to maintain these safety refusals. Through extensive testing with BrowserART (Browser Agent Red teaming Toolkit) on 100 browser-based harmful behaviors, we demonstrate that browser agents pursue many harmful behaviors that would be refused in chat contexts. We find that GPT-4o and o1-preview based browser agents pursued 98 and 63 harmful behaviors respectively (out of 100) under certain attack conditions, indicating that chat-based safety training does not sufficiently transfer to agent contexts.
PDFZifan Wang*, Saranya Vijayakumar*, Kaiji Lu, Vijay Ganesh, Somesh Jha, Matt Fredrikson
A novel framework combining neural networks with SMT solvers for enhanced reasoning capabilities while maintaining computational efficiency. Achieved 15% improvement in accuracy on complex reasoning tasks.
Mirela Cazzolato, Saranya Vijayakumar, Meng-Chieh Lee, Namyong Park, Catalina Vajiac, Christos Faloutsos
A system for detecting and visualizing fraud patterns in large-scale telecommunication networks, combining advanced graph mining with interactive visualizations. Successfully deployed with a Portuguese telecom provider.
Saranya Vijayakumar*, Matt Fredrikson, Christos Faloutsos
This paper introduces MalCentroid, a novel framework for tracking malware evolution through behavioral analysis. While existing malware detection approaches achieve high accuracy on known samples, they typically treat each sample in isolation and rely on surface features that are easily manipulated. MalCentroid addresses both limitations by decomposing malware samples into behavioral primitives extracted from control flow graphs and tracking their evolution through a centroid-based embedding space. The framework maintains multiple behavioral prototypes per malware family, enabling it to track behavioral drift over time and identify truly novel variants. Evaluation on large-scale datasets demonstrates two key advantages: the ability to track malware evolution patterns (showing families exhibiting drift rates up to 0.633 and uncovering parallel evolution across distinct families), and inherent robustness against adversarial manipulation (with most attack vectors achieving less than 5% success rate compared to 97% against image-based approaches). These results suggest that focusing on behavioral primitives rather than surface features provides both better evolution tracking and natural security benefits.
PDFSaranya Vijayakumar, Christos Faloutsos, Matt Fredrikson
A comprehensive evaluation of traditional, transformer-based, and visual-textual approaches for detecting AI-generated code across multiple programming languages.
PDFSaranya Vijayakumar, Matt Fredrikson, Norman Sadeh
AI Governance Course (17-416/17-716), March 31, 2025
Information Security, Privacy & Policy (17-331/631), November 21, 2024
SlidesAI Governance Course (17-416/17-716), April 3, 2024
Information Security, Privacy & Policy (17-331/631), December 5, 2023
Information Security, Privacy & Policy (17-331/631), Fall 2024
I believe in creating an inclusive learning environment that emphasizes practical understanding and critical thinking. My teaching approach combines theoretical foundations with hands-on experience, preparing students for both academic and industry challenges.
Teaching Assistant
Instructors: Norman Sadeh and Hana Habib
Course Highlights:
Teaching Assistant
Instructor: Dave Touretzky
Course Highlights:
Participant in Carnegie Mellon's teaching development program
Mentored Philip Negrin on AI Code Detection research (project video) and a second student on their research project.
Guided a team of 4 students on privacy research analyzing Google's Topics API (USENIX PEPR '24).
TgrApp system visualization interface
Developed novel visualization and detection methods for analyzing million-scale fraud patterns in telecommunication networks, leading to deployed solutions with real-world impact.
Formal verification of the Olvid messaging protocol using ProVerif
Early work on fairness in algorithmic decision-making systems, combining technical analysis with policy implications.
Featured Article in Harvard Political Review
Undergraduate Thesis on Fairness Metrics in ML
Investigating privacy vulnerabilities in Google's Topics API through novel LLM-based approaches
Under Review, 2025
Novel techniques for evaluating and enhancing privacy protections in modern web APIs
Implemented healthcare technology solutions with Partners in Health, Lima, Peru
Led computer science education programs in Boston public schools
NSA (Declined)
Army Research Office
Carnegie Mellon University Eberly Center
Goldman Sachs - New York, NY
Beto O'Rourke Senate Campaign