Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monday, November 18

9:00am PST

Opening Remarks

Monday November 18, 2019 9:00am - 9:10am PST
Marriott Marquis San Diego Marina - San Diego Room B/C

9:10am PST

LightStep Sponsored Session: Observability for Deep Systems - Spoons (aka Daniel Spoonhower), LightStep
Software architectures have evolved: applications are not just getting bigger but scaling deeper. Observabiliity tools must adapt to this new environment or leave developers with lots of responsibility but little control. I'll describe deep systems and where they came from as well as the opportunity that they have created for observability practitioners. All this in only 10 minutes!

avatar for Spoons (aka Daniel Spoonhower)

Spoons (aka Daniel Spoonhower)

CTO and Cofounder, LightStep
Before LightStep, I spent almost six years at Google where I worked on developer tools as part of both Google’s internal infrastructure and Cloud Platform teams. I have published papers on the performance of parallel programs, garbage collection, and real-time programming. I have... Read More →

Monday November 18, 2019 9:10am - 9:20am PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  Sponsored Session
  • Session Slides Included Yes

9:20am PST

Tracing is for Everyone, Not Just Backend Engineers. (How Tracing Could Help Front-end Engineers to Build a Better UX) - Nina Stawski, Omnition
There's been a lot of talk about the importance of observability and tracing for microservice-based applications. The usecases involved are usually focused on backend engineers and DevOps. But what about us front-end engineers? We also want to know how things work. More often than not, we get blamed first when something breaks, and it is important to understand the whole application, not just the front-end.

Currently, observability is not the top concern for front-end engineers, and I will show why it should be. In many cases, even if the application speed cannot be changed significantly, you can apply little tricks and add microinteractions to improve the UX. Besides, emerging tooling in OpenCensus and OpenTelemetry is easy to configure, enriches the existing data and helps developers to correlate traces between backend and UI.

avatar for Nina Stawski

Nina Stawski

Senior UI/UX Engineer, Omnition (acquired by Splunk)

Monday November 18, 2019 9:20am - 9:50am PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

9:50am PST

Real-Time Application Maps for Proactive and Actionable Visibility - Aloke Guha, OpsCruise
Today’s observability provides volumes of time-series data, statistical trends including anomaly detection and correlational analyses. We argue that operations teams need an integrated and cohesive understanding of the application that maps interdependencies across microservices and dependencies on the orchestration and infrastructure services. We show that beyond metrics, logs, and traces, capturing configuration information are necessary for creating a complete application maps for gaining deeper insights into the application behavior. In addition, establishing a standard approach capture the attributes of the complete application environment will enable automated detection and causal analysis of application problems. We will present some early findings on building real-time actionable application maps for cloud applications.

avatar for Aloke Guha

Aloke Guha

CTO, OpsCruise
Aloke Guha is a serial entrepreneur with an extensive career working in data center technologies including storage and networking, big data, and machine learning and analytics predating the AI winters. Before co-founding OpsCruise, he was Vice President of Analytics and Big Data products... Read More →

Monday November 18, 2019 9:50am - 10:20am PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

10:20am PST


Monday November 18, 2019 10:20am - 10:50am PST
Marriott Marquis San Diego Marina - San Diego Room B/C

10:50am PST

SlackTrace: A New Tracing Tool - Suman Karumuri, Slack
Trace data contains very rich information about a request execution. However, current tracing tools only expose that information as a trace view or a service graph, which severely limits the questions we can ask of trace data and diminishes the utility of tracing. However, from past experience, we found that these limitations arise because unlike logs or metrics, we can’t query raw trace data.

To query raw trace data easily, we designed a new span format called SpanEvent and built our tracing infrastructure called SlackTrace around it. In addition, to presenting the trace data as a trace view and a service graph, the SpanEvent format allows us to query raw span data using SQL queries which allows us to derive rich insights from trace data that is not possible with existing tracing systems. In this talk, I will present SpanEvent format and an overview of our SlackTrace infrastructure.


Suman Karumuri

Senior Staff Engineer, Slack
Suman Karumuri is an Sr. Staff Engineer at Slack. He is passionate about all things observability. Previously, he served as the lead for Zipkin project at Twitter.

Monday November 18, 2019 10:50am - 11:20am PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

11:20am PST

LeitMotif: An Abstraction for Debugging Distributed Applications - Mania Abdi, Northeastern University
Abstractions, such as APIs, allow developers to build complex distributed applications out of smaller building blocks. In contrast, there are very few abstractions available to limit the amount of complexity engineers must deal with when diagnosing problems in production applications. This mismatch means that diagnosis will continue to become more challenging as systems continue to scale. We present the workflow motif abstraction, instantiations of which capture frequent or important processing patterns observed in the workflow of requests. We argue that use of motifs can make existing diagnosis techniques more powerful and enable new use cases. We discuss features needed from distributed tracing infrastructures to generate useful motifs, progress on modifying frequent-subgraph mining algorithms to identify motifs from traces, and initial experiences using motifs to debug problems.

avatar for Mania Abdi

Mania Abdi

PhD Student, Northeastern university

Monday November 18, 2019 11:20am - 11:50am PST
Marriott Marquis San Diego Marina - San Diego Room B/C

11:50am PST

When Connections are Magic: Understanding Performance in Serverless - James Burns, LightStep
Observability! Cloud Functions! APIs! What could go wrong?! While researching the performance of object storage APIs there appeared to be custom run time magic happening leading to significant performance differences. Further research showed that it was *not magic* but lead to even more questions.
Working with modern systems means network connections, many of them. Understanding how those connections impact your customer's experience can be difficult. Distributed tracing helps isolate what parts of the system are failing, but when only implemented at the RPC level the reasons for and scope of network induced issues can be lost. See how network level insights can be integrated into distributed traces and hear how to effective practice iterative observability from the specific case of this research to a general framework for investigation.

avatar for James Burns

James Burns

Head of Research, LightStep
From network load balancers to FPGAs to ASICs to embedded security to cloud ops at scale, James has seen how systems work but, more interestingly, how they fail. He is passionate about sharing what he's learned to level up teams, make developers happier, and improve customer experiences... Read More →

Monday November 18, 2019 11:50am - 12:20pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

12:20pm PST

Networking Lunch

Monday November 18, 2019 12:20pm - 1:20pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

1:20pm PST

A Picture is Worth 1,000 Traces - Steve Flanders, Splunk & Yuri Shkuro, Uber Technologies
Distributed tracing has emerged as the go-to solution for understanding what’s going on in the ever-changing cloud native architectures. A single trace can reveal many things: network latencies, time spent in databases, a service spinning idly, etc. but finding the right trace among billions that demonstrates a problem in a large distributed application is very hard. By looking at traces in aggregate, we can eliminate the need to state and validate hypotheses and instead answers start to emerge naturally. Especially when we use creative visualizations that put our visual cortex to work without overloading it with useless information. This talk will present the power of aggregate analysis of distributed traces by highlight its applications beyond performance troubleshooting.

avatar for Steve Flanders

Steve Flanders

Director of Engineering, Splunk
As Engineering Director at Splunk, Steve leads Observability “Getting Data In”: the top contributor to the CNCF OpenTelemetry project. Previously, he served as a founding member and Head of Product at Omnition; and Global Engineering Manager for log analytics at VMware. Steve’s... Read More →
avatar for Yuri Shkuro

Yuri Shkuro

Software Engineer, Uber Technologies
Yuri Shkuro is a software engineer at Uber Technologies, working on distributed tracing, observability, reliability, and performance problems; author of the book ["Mastering Distributed Tracing"](https://www.shkuro.com/books/2019-mastering-distributed-tracing/); creator of Jaeger... Read More →

Monday November 18, 2019 1:20pm - 1:50pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

1:50pm PST

Testing in A Distributed Systems World - Fernando Mayo, Undefined Labs
While microservices are becoming the norm due to advancements in development, deployment and monitoring techniques in the last few years, we are still using the same testing methodologies we used for monolithic apps. In this talk, we look at how distributed tracing can be applied to testing modern, distributed applications, from unit to end-to-end tests, to continuously give developers invaluable insight on how entire applications behave, and when and why they fail, before they are deployed to production. We'll also discuss the power of distributed context propagation and how it can be leveraged for testing purposes, from safely testing in production to failure injection.

avatar for Fernando Mayo

Fernando Mayo

CTO & Co-founder, Undefined Labs
Fernando is the CTO & co-founder of Undefined Labs, a startup building developer tools. Previously he worked at Docker, after the acquisition of Tutum, which he co-founded. He is obsessed with improving the developer workflow, from coding and testing, to deployment and monitoring... Read More →

Monday November 18, 2019 1:50pm - 2:20pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

2:20pm PST

Pythia: An Automated, Cross-layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications - Emre Ates, Boston University
It is extremely difficult to understand where to enable instrumentation a priori to help diagnose problems that may occur in the future. We present Pythia, an automated cross-layer instrumentation framework, which explores the space of possible instrumentation choices and enables instrumentation needed to diagnose a newly-observed problem in production systems. Pythia builds on distributed tracing and uses statistical techniques to identify where instrumentation is needed. This talk will discuss 1) the scalable design of Pythia 2) our progress on identifying promising data structures to represent the instrumentation search space across multiple data center stack layers (e.g., application and kernel). These structures must trade-off between compactness, exhaustiveness, and accuracy. 3) Creating algorithms to search this space quickly while staying under a specific instrumentation budget.

avatar for Emre Ates

Emre Ates

PhD Student, Boston University
Emre Ates is a Ph.D. candidate in the Department of Electrical and Computer Engineering of Boston University. His current research interests include automated analytics on large-scale computing systems and distributed systems.

Monday November 18, 2019 2:20pm - 2:50pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

2:50pm PST

Dynatrace Sponsored Session - Observability Where are We Headed? - Alois Reitbauer, Dynatrace
Observability is helping us to move beyond the traditional paradigm of monitoring. Companies are looking for more answers and gathering more data than traditional alerting can provide. On our journey we learned that simply having more data has just as many challenges. Ultimately, what you do with those insights from the data provides the value. Let’s look at some of these challenges and how they can be addressed.

Monday November 18, 2019 2:50pm - 3:00pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

3:00pm PST

Networking Break

Monday November 18, 2019 3:00pm - 3:30pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

3:30pm PST

New Relic Sponsored Session - Mike Panchenko, New Relic
At New Relic, we’re going all in with Kubernetes. That doesn’t just mean delivering features to customers that allow them to observe and monitor Kubernetes, but also, embracing Kubernetes as the defacto standard for orchestrating workloads running on the entire New Relic data platform. 
This lightning talk will cover the trials and tribulations of planning and migrating New Relic’s massively scaled distributed database (a database that processes up to 1.5 billion data points a minute) to Kubernetes. You’ll learn how monitoring and observability as well as the tooling created for spreading our workloads out over many heterogeneous clusters has been critical for the success of the migration thus far. We will also share our perspective about what to expect and be prepared for in the future of this fast-growing space.

avatar for Mike Panchenko

Mike Panchenko

General Manager & Senior Director of Site Engineering, New Relic
Mike Panchenko is a General Manager & Senior Director of Site Engineering responsible for modernizing and scaling New Relic’s internal platform and Site Engineering organization. Prior to working at New Relic, Mike was co-founder and CTO of Opsmatic, a company he started after spending... Read More →

Monday November 18, 2019 3:30pm - 3:40pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

3:40pm PST

Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk
"Observability and reliability engineering have been on a convergent course for several years. Error Budgets joined the reliability lexicon of engineering organizations in 2016 with the release of the SRE book. The intersection of observability and reliability has largely been the domain of specialists for practical implementation. How can one democratize these techniques to put them in the hands of a thousand engineers at once?

At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc)."

avatar for Fred Moyer

Fred Moyer

SRE Observability Economist, Zendesk
Fred is an SRE and resident Observability Economist at Zendesk. He previously worked with high scale telemetry at Circonus, and scaled large web systems at Turnitin. Fred developed the first Istio community adapter in 2018, and was a White Camel Award winner in 2013. He likes to daydream... Read More →

Monday November 18, 2019 3:40pm - 4:10pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C
  • Session Slides Included Yes

4:10pm PST

Monday November 18, 2019 4:10pm - 4:40pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

4:40pm PST


Monday November 18, 2019 4:40pm - 4:50pm PST
Marriott Marquis San Diego Marina - San Diego Room B/C

5:02pm PST

6:00pm PST

OPS Reception
After OPS wraps up, attendees are invited to join us at The Whiskey House from 6:00 pm - 8:00 pm for drinks, food, and more conversation with your fellow Observability enthusiasts. No tickets are required, but please bring your conference badge with you. 420 Third Avenue

Monday November 18, 2019 6:00pm - 8:00pm PST
The Whiskey House 420 Third Avenue
Filter sessions
Apply filters to sessions.