class: bottom, left background-image: url(plotly-for-r.svg) background-size: contain # Welcome ## Thanks for coming! --- class: inverse, center background-image: url(../your-turn.jpeg) background-size: contain ## Your turn .pull-left[ (1) Install the required software <https://workshops.cpsievert.me/20171118/> (2) Run the docker container so that RStudio appears when you visit <http://localhost:8787> (in Chrome or Firefox, please!) (3) In the R console, enter ```r browseURL("~/day1/index.html") ``` If you see [this popup](popup.png), press "Try Again". <img src="popup.png" /> ] .pull-right[ (4) Go to Tools -> Global Options and configure your pane layout [like this](panes.png). <img src="panes.png" height="325"/> (5) Share (w/ your neighbor) 3 things you're hoping to take away from this workshop (share them via Slack if you like!) ] --- class: middle <h3 align="center"> About me </h3> * Maintainer of plotly's R package (for nearly 3 years!) * Before that: animint, LDAvis, pitchRx, rdom * PhD in statistics from Iowa State * Thesis: [Interfacing R with Web Technologies for Interactive Statistical Graphics and Computing with Data](https://search.proquest.com/openview/3a91971f82fd4af20a78bebb079f5035/1.pdf?pq-origsite=gscholar&cbl=18750&diss=y) * Founder & CEO, [Sievert Consulting LLC](https://consulting.cpsievert.me/) * Currently looking for more projects! --- ## About you <iframe src="attendees.html" width="100%" height="450" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> <https://plot.ly/ggplot2/geom_density/> --- ## Same data, another look <iframe src="attendees2.html" width="100%" height="450" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> <https://plot.ly/r/parallel-coordinates-plot/> --- ## Is there anything in particular that you would like covered? * I'm most interested in **linking and animating views without shiny**. Already pretty well acquainted with R/RStudio/Plotly, but hoping to learn how to use Plotly more efficiently and extend my current usage with animation, crosstalk, etc. I use Plotly most often within flexdashboard applications, so most interested in non-Shiny options. * I want to learn how to **animate my plotly graphs/charts.** * More **shiny** dev * **Creating documents with data visualizations in Markdown files**. I've been working to get better with Markdown, but if there's a way to include interactive visualizations and/or animations in Markdown files that can be sent to readers that would be great. I work in consulting so I often need to send reports to executives and including interactive data visualizations in those reports would be incredibly useful for me. * Looks good! Stoked for the class. For me, I **don't use rmarkdown** and rarely use ggplot2 - so those could be dropped from the list. * Looks like all of Day1 currently would be old news. Wouldn't **half a day of R/RStudio/ggplot2** suffice for your audience? --- ## Is there anything else that we can help you with at the workshop? * I'd like to know how plotly performs with displaying **large data** sets * Plotting large amounts of **time series** data efficiently * Plotly custom **controls and mapping** options. * Learn how to apply **interactive filters** in Plotly if that's supported (e.g. click on a single bar within a bar chart on the left which filters the heat-map to the right). * More advanced visualization techniques such as a chord diagram or a sunburst * Check out [sunburstR](https://github.com/timelyportfolio/sunburstR) * Suggestions/hints on how to create clean, **good code** would be especially appreciated. * Check out [tidyverse](https://www.tidyverse.org/) * **Extended breakout session** time for help on specific project i am bringing --- ## Is there anything else that we can help you with at the workshop? * I'd like to know how plotly performs with displaying **large data** sets * Plotting large amounts of **time series** data efficiently * Plotly custom **controls and mapping** options. * Learn how to apply **interactive filters** in Plotly if that's supported (e.g. click on a single bar within a bar chart on the left which filters the heat-map to the right). * More advanced visualization techniques such as a chord diagram or a sunburst * Check out [sunburstR](https://github.com/timelyportfolio/sunburstR) * Suggestions/hints on how to create clean, **good code** would be especially appreciated. * Check out [tidyverse](https://www.tidyverse.org/) * **Extended breakout session** time for help on specific project i am bringing .footnote[ ### I'll try my best -- 2 days is not enough! ### PLEASE PLEASE PLEASE stop me to clarify and/or ask questions ] --- ## A minimal bar chart * Every **plotly** chart is powered by [plotly.js](https://github.com/plotly/plotly.js), plus some extra R/JS magic 🎩 🐰. * How/why did `plot_ly()` draw a bar chart? What if we want something different? ```r library(plotly) plot_ly(x = c("A", "B"), y = 1:2) ``` <iframe src="01-simple-bar.html" width="100%" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- class: inverse, center background-image: url(../your-turn.jpeg) background-size: contain ## Your turn In your R console enter: ```r file.edit('~/day1/your-turn.R') ``` Work through the comments/code/questions in this R script. Feel free to work with your neighbor and ask me questions! .footnote[ If you have question(s) about it **rmarkdown**, now would be a good time to ask (we won't have time to cover it together). **PS.** Did you know [the workshop slides](index.Rmd) were created with **rmarkdown**? ] --- The **scatter** trace type is quite general. It provides the foundation for many charts (e.g., [polygons](https://plot.ly/ggplot2/geom_polygon/), [ribbons](https://plot.ly/ggplot2/geom_ribbon/), [filled areas](https://plot.ly/r/filled-area-plots/), etc), extends to different coordinate systems (e.g, [scatter3d](https://plot.ly/r/3d-line-plots/), [scattergeo](https://plot.ly/r/bubble-maps/), and [scattermapbox](https://plot.ly/r/scattermapbox/)), and rendering systems (e.g., [scattergl](https://plot.ly/r/webgl-vs-svg/)) ```r subplot(shareY = TRUE, plot_ly(x = 1:2, y = 1:2), plot_ly(x = 1:2, y = 1:2, mode = "lines"), plot_ly(x = 1:2, y = 1:2, mode = "markers+lines"), plot_ly(x = 1:2, y = 1:2, text = 1:2, mode = "text"), plot_ly(x = 1:2, y = 1:2, text = 1:2, mode = "markers+lines+text") ) ``` <iframe src="02-scatter-modes.html" width="100%" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- class: inverse, middle, center # Working with actual data ```r file.edit('~/day1/demo.R') ``` --- ## Which visualization is better? ```r subplot(shareX = TRUE, nrows = 2, plot_ly(logs, x = ~date, y = ~package, z = ~count, type = "heatmap"), plot_ly(logs, x = ~date, y = ~count, color = ~package, mode = "lines") ) ``` <iframe src="03-heatmap.html" width="100%" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- background-image: url(bostock-heer-groups.png) background-size: contain ## Famous question: which is larger (and by how much)? <font color="#DEE5FF">A</font> or <font color="#FDDFA4">B</font>? .footnote[ These questions drive at least two influential papers: * [Cleveland and McGill (1984)](http://info.slis.indiana.edu/~katy/S637-S11/cleveland84.pdf) * [Bostock and Heer (2010)]() This figure is from [Data Visualization for Social Science](http://socviz.co/) (highly recommended!) in reference to Bostock and Heer. ] --- background-image: url(bostock-heer-findings.png) background-size: contain ## Position is best, especially along common scale and baseline .footnote[ Figure from [Heer and Bostock (2010)](http://vis.stanford.edu/files/2010-MTurk-CHI.pdf) ] --- background-image: url(hierarchy.png) background-size: contain ## A more general guideline from Cleveland and McGill .footnote[ * Figure from [Data Points: Visualization That Means Something](https://issuu.com/wiley_publishing/docs/data_points_sample_a15b4e87f1b924) by Nathan Yau ] --- class: center, inverse, middle # Interactive techniques can aid in these tasks --- ## Again, which is better? <iframe src="03-heatmap.html" width="100%" height="550" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- background-image: url(https://i.imgur.com/bdd64us.gif) background-size: contain class: inverse <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ## Graphing 5 time series ## —————————— --- background-image: url(https://i.imgur.com/bdd64us.gif) background-size: contain class: inverse <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ## Graphing 5 time series ## —————————— <h1 align="right"> 1,000 time series!</h2> --- ## With all my `installed.packages()`, yikes! ```r plot_ly(logz, x = ~date, y = ~count) %>% group_by(package) %>% add_lines(alpha=0.3) ``` <iframe src="04-lines.html" width="100%" height="500" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## Can improve a bit with interaction ```r library(crosstalk) SharedData$new(logz, ~package, "Select package(s)") %>% plot_ly(x = ~date, y = ~count) %>% group_by(package) %>% add_lines(alpha=0.3) %>% highlight(dynamic = TRUE, selectize = TRUE, persistent = TRUE) ``` <iframe src="04-lines-b.html" width="100%" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## [heatmaply](https://github.com/talgalili/heatmaply#readme) is awesome for visualizing a numeric matrices! <iframe src="05-heatmaply.html" width="100%" height="550" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- background-image: url(https://i.imgur.com/bdd64us.gif) background-size: contain class: inverse <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ## Graphing 1,000 time series ## —————————— --- background-image: url(https://i.imgur.com/bdd64us.gif) background-size: contain class: inverse <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> <br /> ## Graphing 1,000 time series ## —————————— <h1 align="right"> 1,000,000 time series!</h2> --- class: center, middle # Overview first, then zoom and filter, then details on demand Ben Shneiderman .footnote[ Popular information visualization perspective ] --- class: center, inverse, middle # Visualization surprise, but don't scale well. Models scale well, but don't surprise Hadley Wickham .footnote[ Popular statistical graphics perspective ] --- class: inverse background-image: url(../your-turn.jpeg) background-size: contain <h2 align="center"> Your turn </h2> Have a look at some plotly "extension" packages! * [heatmaply](https://github.com/talgalili/heatmaply#readme) * [iheatmapr](https://github.com/ropensci/iheatmapr#readme) * [visdat](http://visdat.njtierney.com/articles/experimental_features.html#interactivity) * [vistime](https://github.com/shosaco/vistime#readme) * [ggplotgui](https://github.com/gertstulp/ggplotgui#readme) .footnote[ **Exercise**: Most of these packages have a function that returns a plotly object (e.g., `heatmaply::heatmaply()`). Use a plotly function to modify/customize the result (e.g., add a title with `plotly::layout()`) For all CRAN packages that use plotly, see the "Reverse dependencies" section on <https://cran.r-project.org/package=plotly> ] --- ## What about *long* time series? * *Tens* of thousands points is responsive with *SVG* ```r y <- sample(c(-1, 1), 1e4, TRUE) x <- seq(Sys.time(), Sys.time() + length(y) - 1, by = "1 sec") plot_ly(x = x, y = cumsum(y)) %>% add_lines() %>% rangeslider() ``` <a href="06-rangeslider.html" target="_blank"> <div align="center"> <img src="svg.png" width="100%" /> </div> </a> --- ## What about *long* time series? * *Hundreds* of thousands time series points is responsive with *WebGL* (but [no rangeslider (yet)](https://community.plot.ly/t/is-there-a-way-to-keep-the-range-slider-while-using-scattergl/4556/3)) ```r y <- sample(c(-1, 1), 1e5, TRUE) x <- seq(Sys.time(), Sys.time() + length(y) - 1, by = "1 sec") plot_ly(x = x, y = cumsum(y)) %>% add_lines() %>% toWebGL() ``` <a href="07-webgl.html" target="_blank"> <div align="center"> <img src="webgl.png" width="100%" /> </div> </a> --- ## What about performance (beyond time-series)? ### SVG vs Canvas, in general * The *Scalable* in SVG, means scalable in terms of bounding box size. * No matter the context, your browser will struggle to render > 30,000 SVG elements. * This is why [canvas based elements](https://en.wikipedia.org/wiki/Canvas_element) exist (the difference is similar to pdf vs png) #### Time series doesn't scale well, even in a canvas context * Time series has performance limitations that other data types don't (this is pretty universal). ### High performance plotly charts * scattergl -- basically same as scatter (with [limitations](https://github.com/plotly/plotly.js/issues/130)), but [responsive with >1M points](scattergl.R). * pointcloud -- more restricted than scattergl, but [responsive w/ several million points](pointcloud.R). * heatmapgl -- [response with >1M cells](heatmapgl.R) in heatmap. --- ## More time series tips #### Have *lots* of *long* time series? * Consider [extracting/visualizing features from each series](https://robjhyndman.com/talks/Google-Oct2015-part3.pdf) * Some useful packages: [anomalous](https://github.com/robjhyndman/anomalous) and [tscognostics](https://github.com/earowang/tscognostics) * See my work on [visualizing pedestrian traffic](https://github.com/cpsievert/pedestrians) with plotly * Consider a tool like [trelliscope](http://ryanhafen.com/blog/trelliscopejs-plotly) for exploring many "groups" #### Visualization of models/predictions? * Start with [forecast](https://cran.r-project.org/package=forecast) and/or [mgcv](https://cran.r-project.org/package=mgcv) for model fitting. * Use a strategy similar to [here](https://plotly-book.cpsievert.me/a-case-study-of-housing-sales-in-texas.html#fig:forecast) to plot forecasts. #### Is seasonality important? * Consider "wrapping" your time-series * Wrap (i.e., group) your series by hand (get inspired by this [paper](https://arxiv.org/pdf/1412.6675.pdf)) * Checkout out [sugrrants](https://github.com/earowang/sugrrants) (`ggplotly()` converts the ggplot2 plots) --- ## Texas housing prices ```r library(dplyr) tx <- txhousing %>% select(city, year, month, median) %>% filter(city %in% c("Galveston", "Midland", "Odessa", "South Padre Island")) tx #> # A tibble: 748 x 4 #> city year month median #> <chr> <int> <int> <dbl> #> 1 Galveston 2000 1 95000 #> 2 Galveston 2000 2 100000 #> 3 Galveston 2000 3 98300 #> 4 Galveston 2000 4 111100 #> 5 Galveston 2000 5 89200 #> 6 Galveston 2000 6 108600 #> 7 Galveston 2000 7 99000 #> 8 Galveston 2000 8 96200 #> 9 Galveston 2000 9 104000 #> 10 Galveston 2000 10 118800 #> # ... with 738 more rows ``` --- ## Wrap by year, facet by city ```r ggplot(tx, aes(month, median, group = year)) + geom_line() + facet_wrap(~city, ncol = 2) ``` ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Compare across cities within year *and* across years within city ```r TX <- SharedData$new(tx, ~year) p <- ggplot(TX, aes(month, median, group = year)) + geom_line() + facet_wrap(~city, ncol = 2) (gg <- ggplotly(p, tooltip = "year")) ``` <iframe src="08-small-multiples.html" width="100%" height="425" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## Set selection mode and default selections ```r highlight(gg, "plotly_hover", defaultValues = "2006") ``` <iframe src="08-modes.html" width="100%" height="425" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## Make comparisons with dynamic brush ```r highlight(gg, dynamic = TRUE, persistent = TRUE, selectize = TRUE) ``` <iframe src="08-dynamic.html" width="100%" height="500" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## Customize the appearance of selections ```r highlight( gg, dynamic = TRUE, persistent = TRUE, selected = attrs_selected(mode = "markers+lines", marker = list(symbol = "x")) ) ``` <iframe src="08-custom.html" width="100%" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- ## Automate queries via animation ```r p <- ggplot(tx, aes(month, median)) + geom_line(aes(group = year), alpha = 0.2) + geom_line(aes(frame = year), color = "red") + facet_wrap(~city, ncol = 2) ggplotly(p) ``` <iframe src="08-automate.html" width="100%" height="400" scrolling="no" seamless="seamless" frameBorder="0"> </iframe> --- class: inverse, center background-image: url(../your-turn.jpeg) background-size: contain ## Your turn Visit [this post](http://ryanhafen.com/blog/trelliscopejs-plotly), replicate the example (no install needed), and use trelliscope to visualize `txhousing` (or, more preferably, your own data!) .footnote[ ## Now would be a good time to ask me about personal projects (I have a 6pm flight tomorrow)! #### Unfortunately, we won't cover maps in this workshop, but [day 1 of last years workshop](https://plotcon17.cpsievert.me/workshop/day1/#1) covers the topic in depth. ]