There are several techniques addressing the fundamental problem of protecting individual sequence data such as medical records, purchase data, movement data, web-search query data or DNA data. These techniques have hardly been employed in practice because: (1) they offer unsatisfactory privacy or utility guarantees or assume that the data collector protects the data, an operation that most data collectors are unable or unwilling to perform; (2) these techniques mostly rely on optimizing generic criteria (e.g. information-theoretic or distance measures), which are not necessarily linked to data analytic tasks.
In this project, we will consider the challenging and practical setting where analytics must be performed accurately on transformed (sanitized) data. We will be typically interested in designing tools with theoretical guarantees on privacy-utility trade-offs. This setting requires new privacy models and algorithms; in particular, combinatorial algorithms on strings and graphs. The ultimate goal is to develop a new framework (model, algorithms, implementations) for privacy preserving in sequence data that is: (1) provably privacy-preserving; (2) useful and usable; and (3) extensible.
|Supervisors||Leen Stougie (VU)|
|PhD Student||Michelle Sweering|
|Location||Centrum Wiskunde Informatica (CWI)|