December 4th, 2015, Purdue University
The Midwest PL Summit is an informal workshop with the goal of fostering exchange and collaboration among faculty and students in the Greater Midwest area. Everybody who is interested in programming languages and compilers is welcome to attend. Please use the link below to register and, if you like, submit a talk proposal. Our aim is to have a broad selection of talks about ongoing research, interesting ideas, and anything that may be of interest to the PL community.
There will be no formal proceedings, but we plan to make abstracts and slides available on the web.
Organizers: Milind Kulkarni & Tiark Rompf
The deadline for online registration is November 17.
Please register here.
Purdue University is conveniently located between Chicago and Indianapolis on I-65.
The event will take place at:
Lawson Computer Science Building, room 3102,
305 N. University Ave. West Lafayette, IN 47907
Parking will be available on-site.
If you are planning to stay overnight, we recommend one of the following hotels:
Time | Activity |
---|---|
9:00-10:30 | Breakfast and welcome |
10:30-12:00 | Session I: Faculty intro talks |
Amr Sabry, Jeremy Siek, Milind Kulkarni, Sam Midkiff, Aaron Stump, Xiangyu Zhang, John Reppy, Tiark Rompf, Chung-chieh Shan, Ravi Chugh, Mathias Payer | |
12:00-1:00 | Lunch break |
1:00-2:20 | Session II |
Memory corruption: why protection is hard
(Mathias Payer, Purdue University) Symbolic Bayesian inference by lazy partial evaluation (Chung-Chieh Shan, Indiana University) Hacking LLVM for Grad Students (Scott Carr, Purdue University) Diderot: A Parallel Domain-Specific Language for Image Analysis and Visualization (John Reppy, University of Chicago) | |
2:20-2:50 | Coffee break |
2:50-4:40 | Session III |
Data-Driven Shape Specification Inference
(He Zhu, Purdue University) Program Synthesis for Direct Manipulation Interfaces (Ravi Chugh, University of Chicago) Lambda Encodings Reborn (Aaron Stump, The University of Iowa) Effect of GC on Android Power Management (Ahmed Hussein, Purdue University) Declarative Programming over Eventually Consistent Data Stores. (Gowtham Kaki, Purdue University) | |
4:40-5:10 | Coffee break |
5:10-5:30 | Wrap up |
6:00- | Dinner |
Memory corruption plagues systems since the dawn of computing. With the rise of defense techniques like stack cookies, ASLR, and DEP, attacks have become much more complicated, yet control-flow hijack attacks are still prevalent. Attacks leverage code reuse attacks, often using some form of information disclosure. Stronger defense mechanisms have been proposed but none have seen wide deployment so far (i) due to the time it takes to deploy a security mechanism, (ii) incompatibility with specific features, and (iii) most severely due to performance overhead. In the talk, we evaluate the security benefits and limitations of the status quo and look into upcoming compiler-based defense mechanisms with a focus on novel techniques for low overhead memory safety.
Bayesian inference, of posterior knowledge based on prior knowledge and observed evidence, is typically implemented by applying Bayes's theorem, solving an equation in which the posterior multiplied by the probability of an observation equals a joint probability. But when we observe a value of a continuous variable, the observation usually has probability zero, and Bayes's theorem says only that zero times the unknown is zero. To infer a posterior distribution from a zero-probability observation, we turn to the statistical technique of disintegration. The classic formulation of disintegration tells us only what constitutes a posterior distribution, not how to compute it. But by representing all distributions and observations as terms of our probabilistic language, core Hakaru, we have developed the first constructive method of computing disintegrations, solving the problem of drawing inferences from zero-probability observations. Our method uses a lazy partial evaluator to transform terms of core Hakaru, and we argue its correctness by a semantics of core Hakaru in which monadic terms denote measures. The method, which has been implemented in a larger system, is useful not only on its own but also in composition with sampling and other inference methods commonly used in machine learning.
Many grad students in programming languages implement their projects with LLVM. It is a powerful tool, but has a significant learning curve. Drawing on my experiences and tutorials from around the web, in my talk I will provide guidance for novice to intermediate LLVM hackers. Specially, the talk will contain advice for writing, building, and debugging LLVM passes efficiently.
The analysis of structure in three-dimensional images is increasingly valuable for biomedical research and computational science. At the same time, the computational burden of processing images is increasing as devices produce images of higher resolution (e.g., typical CT scans have gone from 128^3 to roughly 512^3 resolutions). With the latest scanning technologies, it is also more common for the the values measured at each sample to be multi-dimensional rather than a single scalar, which further complicates implementing mathematically correct methods.
Diderot is a domain-specific language (DSL) for programming advanced 3D image visualization and analysis algorithms. These algorithms, such as volume rendering, fiber tractography, and particle systems, are naturally defined as computations over continuous tensor fields that are reconstructed from the discrete image data. Diderot combines a high-level mathematical programming notation based on tensor calculus with an abstract bulk-synchronous parallelism model. Diderot is designed to both enable rapid prototyping of new image analysis algorithms and high performance on a range of parallel platforms.
In this talk, I will give an overview of the design of Diderot and examples of its use. I will then describe aspects of its implementation with a focus on how we translate the notation of tensor calculus to efficient code.
Diderot is joint work with Gordon Kindlmann, Charisee Chiw, Lamont Samuels, and Nick Seltzer.
We present a novel automated procedure for discovering expressive shape specifications for sophisticated functional data structures. Our approach extracts potential shape predicates based on the definition of constructors of arbitrary user-defined inductive data types, and combines these predicates within an expressive first-order specification language using a lightweight data-driven learning procedure. Notably, this technique requires no programmer annotations, and is equipped with a type-based decision procedure to verify the correctness of discovered specifications. Experimental results indicate that our implementation is both efficient and effective, capable of automatically synthesizing sophisticated shape specifications over a range of complex data types, going well beyond the scope of existing solutions.
Direct manipulation interfaces are developed in many domains where objects have inherently visual representations. After prototyping phases, however, relying solely on direct manipulation can lead to repetitive copy-and-paste tasks and, furthermore, can make it difficult for expert users to manipulate complex content in reusable and composable ways. What is needed are software interfaces that allow users to freely mix between programmatic and direct manipulation, a combination we dub prodirect manipulation, each of which can be used for its distinct strengths.
As an example of this vision, we present the Sketch-n-Sketch prodirect manipulation editor for designing Scalable Vector Graphics (SVG) images. In Sketch-n-Sketch, the user first defines a program in a small functional language that generates an output SVG canvas, and then directly manipulates the output canvas while the system infers program updates, in real-time, to match the changes. To achieve this novel workflow, we employ a lightweight, trace-based program synthesis algorithm, together with a set of design decisions that mitigate ambiguities without user intervention. We demonstrate that Sketch-n-Sketch allows a variety of examples to be designed and maintained more easily than when using purely programmatic or purely direct manipulation tools alone.
(Joint work with Jacob Albers, Brian Hempel, and Mitch Spradlin)
Lambda encodings represent familiar datatypes like numbers, lists, and trees as pure lambda-calculus terms. Numerous obstacles prevented lambda encodings from being adopted in proof assistants based on type theory; instead, such tools add systems of primitive inductive datatypes to the core lambda calculus. In this talk, I will show solutions -- one old and several new -- to the problems with lambda encodings in type theory. This opens the possibility of basing proof assistants on pure lambda calculus only. One benefit of this is support for higher-order encodings, for data with binders (such as expressions in programming languages).
Mobile devices must conserve power while still providing the desired responsiveness and throughput. This offers new dimensions for the evaluation of the Virtual Machine (VM) for devices that rely on a managed run-time. These dimensions offer not just the usual tradeoffs, but also a "free lunch" where the energy cost of garbage collection (GC) can be lowered with minimal impact on throughput and responsiveness.
We achieve this with a new GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), JIT compilation, and across established dimensions of heap memory size and concurrency.
Our results reveal that significant energy is consumed by Android’s GC thread, and that varying GC strategy can reduce total on-chip energy (by 20–30 %), application throughput (by 10–40%), or worst-case pause times (by 20–30 %). We also show that integration between GC and power mechanisms is more effective in controlling the impact of GC than standard techniques such as heap resizing.
User-facing online services utilize geo-distributed data stores to minimize latency and tolerate partial failures, with the intention to provide a fast, always-on experience. However, geo-distribution does not come for free; application developers have to contend with weak consistency behaviors, and the lack of abstractions to composably construct high-level replicated data types, necessitating the need for complex application logic and invariably exposing inconsistencies to the user. Some commercial distributed data stores and several academic proposals provide a lattice of consistency levels, with stronger consistency guarantees incurring increased latency and throughput costs. However, correctly assigning the right consistency level for an operation requires subtle reasoning and is often an error-prone task.
In this talk, I will present QUELEA, a declarative programming model for eventually consistent data stores (ECDS), equipped with a specification language, capable of specifying fine-grained application-level consistency properties. A consistency enforcement system analyses specifications, and automatically generates the appropriate consistency protocol for the method protected by the specification. Next, I will briefly describe an implementation of QUELEA on top of Cassandra, an off-the-shelf ECDS, and our experience of using QUELEA to implement well-known CRDTs and large web application benchmarks. Finally, I will report on the progress of our attempts to synthesize consistency specifications from concurrent test cases.