Loading…
Attending this event?

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monday, November 18
 

9:00am

Welcome
Opening Remarks

Monday November 18, 2019 9:00am - 9:10am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

9:10am

LightStep Sponsored Session
tbd

Speakers

Monday November 18, 2019 9:10am - 9:20am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

9:20am

Tracing is for Everyone, Not Just Backend Engineers. (How Tracing Could Help Front-end Engineers to Build a Better UX) - Nina Stawski, Omnition
There's been a lot of talk about the importance of observability and tracing for microservice-based applications. The usecases involved are usually focused on backend engineers and DevOps. But what about us front-end engineers? We also want to know how things work. More often than not, we get blamed first when something breaks, and it is important to understand the whole application, not just the front-end.

Currently, observability is not the top concern for front-end engineers, and I will show why it should be. In many cases, even if the application speed cannot be changed significantly, you can apply little tricks and add microinteractions to improve the UX. Besides, emerging tooling in OpenCensus and OpenTelemetry is easy to configure, enriches the existing data and helps developers to correlate traces between backend and UI.

Speakers
avatar for Nina Stawski

Nina Stawski

Senior UI/UX Engineer, Omnition (acquired by Splunk)


Monday November 18, 2019 9:20am - 9:50am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

9:50am

Real-Time Application Maps for Proactive and Actionable Visibility - Aloke Guha, OpsCruise
Today’s observability provides volumes of time-series data, statistical trends including anomaly detection and correlational analyses. We argue that operations teams need an integrated and cohesive understanding of the application that maps interdependencies across microservices and dependencies on the orchestration and infrastructure services. We show that beyond metrics, logs, and traces, capturing configuration information are necessary for creating a complete application maps for gaining deeper insights into the application behavior. In addition, establishing a standard lexicon for specifying the attributes of the complete application environment will enable automated detection and causal analysis of application problems. We will present some early findings on building real-time actionable application maps for cloud applications.

Speakers
avatar for Aloke Guha

Aloke Guha

CTO, OpsCruise
Aloke Guha is a serial entrepreneur with an extensive career working in data center technologies including storage and networking, big data, and machine learning and analytics predating the AI winters. Before co-founding OpsCruise, he was Vice President of Analytics and Big Data products... Read More →


Monday November 18, 2019 9:50am - 10:20am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

10:20am

Break
Networking

Monday November 18, 2019 10:20am - 10:50am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

10:50am

SlackTrace: A New Tracing Tool - Suman Karumuri, Slack
Trace data contains very rich information about a request execution. However, current tracing tools only expose that information as a trace view or a service graph, which severely limits the questions we can ask of trace data and diminishes the utility of tracing. However, from past experience, we found that these limitations arise because unlike logs or metrics, we can’t query raw trace data.

To query raw trace data easily, we designed a new span format called SpanEvent and built our tracing infrastructure called SlackTrace around it. In addition, to presenting the trace data as a trace view and a service graph, the SpanEvent format allows us to query raw span data using SQL queries which allows us to derive rich insights from trace data that is not possible with existing tracing systems. In this talk, I will present SpanEvent format and an overview of our SlackTrace infrastructure.

Speakers
SK

Suman Karumuri

Senior Staff Engineer, Slack
Suman Karumuri is the lead for distributed tracing at Pinterest. Previously, he served as the lead for Zipkin project at Twitter. He is the author of an upcoming book Distributed Tracing from O’Reilly.


Monday November 18, 2019 10:50am - 11:20am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

11:20am

LeitMotif: An Abstraction for Debugging Distributed Applications - Mania Abdi, Northeastern University
Abstractions, such as APIs, allow developers to build complex distributed applications out of smaller building blocks. In contrast, there are very few abstractions available to limit the amount of complexity engineers must deal with when diagnosing problems in production applications. This mismatch means that diagnosis will continue to become more challenging as systems continue to scale. We present the workflow motif abstraction, instantiations of which capture frequent or important processing patterns observed in the workflow of requests. We argue that use of motifs can make existing diagnosis techniques more powerful and enable new use cases. We discuss features needed from distributed tracing infrastructures to generate useful motifs, progress on modifying frequent-subgraph mining algorithms to identify motifs from traces, and initial experiences using motifs to debug problems.

Speakers
avatar for Mania Abdi

Mania Abdi

PhD Student, Northeastern university


Monday November 18, 2019 11:20am - 11:50am
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

11:50am

When Connections are Magic: Understanding Performance in Serverless - James Burns, LightStep
While understanding API performance in serverless environments at major cloud providers we saw that runtimes may modify HTTP request behavior providing ""pre-connected"" clients leading to significant performance differences – as much as 4x in some cases. Integrating information from net/http/httptrace in Go into distributed traces lead to invaluable insights into how and why connections (and hence entire transactions) performed differently in many environments.

Working with many modern systems means network connections, many of them. Understanding how those connections impact your customer's experience can be difficult. Distributed tracing helps isolate what parts of the system are failing, but when only implemented at the RPC level the reasons for and scope of network induced issues can be lost. We provide specific examples and source code to get this visibility in your applications.

Speakers
avatar for James Burns

James Burns

Head of Research, LightStep
From network load balancers to FPGAs to ASICs to embedded security to cloud ops at scale, James has seen how systems work but, more interestingly, how they fail. He is passionate about sharing what he's learned to level up teams, make developers happier, and improve customer experiences... Read More →


Monday November 18, 2019 11:50am - 12:20pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

12:20pm

Lunch
Networking Lunch

Monday November 18, 2019 12:20pm - 1:20pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

1:20pm

A Picture is Worth 1,000 Traces - Yuri Shkuro, Uber Technologies - Spiros Xanthos, Omnition
Distributed tracing has emerged as the go-to solution for understanding what’s going on in the ever-changing cloud native architectures. A single trace can reveal many things: network latencies, time spent in databases, a service spinning idly, etc. but finding the right trace among billions that demonstrates a problem in a large distributed application is very hard. By looking at traces in aggregate, we can eliminate the need to state and validate hypotheses and instead answers start to emerge naturally. Especially when we use creative visualizations that put our visual cortex to work without overloading it with useless information. This talk will present the power of aggregate analysis of distributed traces by highlight its applications beyond performance troubleshooting.

Speakers
avatar for Spiros Xanthos

Spiros Xanthos

Founder and CEO, Omnition
Spiros Xanthos is the CEO and Founder of Omnition, an Observability platform for Cloud Native Applications. Omnition is one of the companies building OpenCensus.io and now OpenTelemetry.io that is replacing OpenCensus and OpenTracing to become the standard instrumentation and collection... Read More →
avatar for Yuri Shkuro

Yuri Shkuro

Software Engineer, Uber Technologies
Yuri Shkuro is a software engineer at Uber Technologies, working on distributed tracing, observability, reliability, and performance problems; author of the book ["Mastering Distributed Tracing"](https://www.shkuro.com/books/2019-mastering-distributed-tracing/); creator of Jaeger... Read More →


Monday November 18, 2019 1:20pm - 1:50pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

1:50pm

Testing in A Distributed Systems World - Fernando Mayo, Undefined Labs
While microservices are becoming the norm due to advancements in development, deployment and monitoring techniques in the last few years, we are still using the same testing methodologies we used for monolithic apps. In this talk, we look at how distributed tracing can be applied to testing modern, distributed applications, from unit to end-to-end tests, to continuously give developers invaluable insight on how entire applications behave, and when and why they fail, before they are deployed to production. We'll also discuss the power of distributed context propagation and how it can be leveraged for testing purposes, from safely testing in production to failure injection.

Speakers
avatar for Fernando Mayo

Fernando Mayo

CTO & Co-founder, Undefined Labs
Fernando is the CTO & co-founder of Undefined Labs, a startup building developer tools. Previously he worked at Docker, after the acquisition of Tutum, which he co-founded. He is obsessed with improving the developer workflow, from coding and testing, to deployment and monitoring... Read More →


Monday November 18, 2019 1:50pm - 2:20pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

2:20pm

Pythia: An Automated, Cross-layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications - Emre Ates, Boston University
It is extremely difficult to understand where to enable instrumentation a priori to help diagnose problems that may occur in the future. We present Pythia, an automated cross-layer instrumentation framework, which explores the space of possible instrumentation choices and enables instrumentation needed to diagnose a newly-observed problem in production systems. Pythia builds on distributed tracing and uses statistical techniques to identify where instrumentation is needed. This talk will discuss 1) the scalable design of Pythia 2) our progress on identifying promising data structures to represent the instrumentation search space across multiple data center stack layers (e.g., application and kernel). These structures must trade-off between compactness, exhaustiveness, and accuracy. 3) Creating algorithms to search this space quickly while staying under a specific instrumentation budget.

Speakers
avatar for Emre Ates

Emre Ates

PhD Student, Boston University
Emre Ates is a Ph.D. candidate in the Department of Electrical and Computer Engineering of Boston University. His current research interests include automated analytics on large-scale computing systems and distributed systems.


Monday November 18, 2019 2:20pm - 2:50pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

2:50pm

Dynatrace Sponsored Session
tbd

Monday November 18, 2019 2:50pm - 3:00pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

3:00pm

Break
Networking Break

Monday November 18, 2019 3:00pm - 3:30pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

3:30pm

New Relic Sponsored Session - Mike Panchenko, New Relic
At New Relic, we’re going all in with Kubernetes. That doesn’t just mean delivering features to customers that allow them to observe and monitor Kubernetes, but also, embracing Kubernetes as the defacto standard for orchestrating workloads running on the entire New Relic data platform. 
This lightning talk will cover the trials and tribulations of planning and migrating New Relic’s massively scaled distributed database (a database that processes up to 1.5 billion data points a minute) to Kubernetes. You’ll learn how monitoring and observability as well as the tooling created for spreading our workloads out over many heterogeneous clusters has been critical for the success of the migration thus far. We will also share our perspective about what to expect and be prepared for in the future of this fast-growing space.

Speakers
avatar for Mike Panchenko

Mike Panchenko

General Manager & Senior Director of Site Engineering, New Relic
Mike Panchenko is a General Manager & Senior Director of Site Engineering responsible for modernizing and scaling New Relic’s internal platform and Site Engineering organization. Prior to working at New Relic, Mike was co-founder and CTO of Opsmatic, a company he started after spending... Read More →


Monday November 18, 2019 3:30pm - 3:40pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

3:40pm

Observability at Scale with Neural Networks : A More Proactive Approach - Keshav Peswani, Expedia Group
Modern observability platforms have evolved beyond simple application logs and now include distributed tracing like Zipkin,haystack etc. While these systems are to detect problems in a manual fashion, combining them with real-time intelligent alerting mechanisms with accurate alerts helps in automated detection of these problems.In this talk,we will demonstrate how we scaled our system in real time to ingest ever-increasing terabytes of tracing data in production and use it for trending service errors/latencies. With this increasing number,there felt the need to have a real time intelligent alerting and monitoring system to move towards 24/7 reliability. We will talk about how we use neural networks on telemetry data and perform anomaly detection, including a deep dive into the architecture for the automated training pipeline and online compute using kstreams in a cost effective manner

Speakers
KP

Keshav Peswani

Senior Software Development Engineer, Expedia Group


Monday November 18, 2019 3:40pm - 4:10pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

4:10pm

Reliable Observability at Scale: Error Budgets for 1,000+ - Fred Moyer, Zendesk
"Observability and reliability engineering have been on a convergent course for several years. Error Budgets joined the reliability lexicon of engineering organizations in 2016 with the release of the SRE book. The intersection of observability and reliability has largely been the domain of specialists for practical implementation. How can one democratize these techniques to put them in the hands of a thousand engineers at once?

At Zendesk we developed simple algorithms and practical approaches for implementing SLIs, SLOs, and Error Budgets at scale using a number of observability tools. This talk will show the approaches developed and how we were able to manage observability instrumentation across dozens of teams quickly in a complex ecosystem (CDN, UI, middleware, backend, queues, dbs, queues, etc)."

Speakers
avatar for Fred Moyer

Fred Moyer

Staff Site Reliability Engineer, Zendesk
SLOgician, bitmasks&, C/Perl/Ruby/Go/blablabla. Staff SRE at Zendesk. Likes TSDBs, operational telemetry, mountain biking, high cardinality. Previously Circonus and Turnitin. 2018 Google Istio Developer award, 2013 Perl White Camel award.


Monday November 18, 2019 4:10pm - 4:40pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

4:40pm

Closing
Wrap-up

Monday November 18, 2019 4:40pm - 4:50pm
Marriott Marquis San Diego Marina - San Diego Room B/C 333 W Harbor Dr, San Diego, CA 92101, USA

5:02pm