Making Sense of Big Data

Image from with permission

A few months ago, I wrote an article on my favorite Python Viz tools — HoloViz. Many people are interested in learning more about Dashshader — the big data visualization tool in the HoloViz family. I absolutely love Datashader and love how Datashader creates meaningful visualizations of large datasets very…

Hands-on Tutorials

Photo by Marek Piwnicki on Unsplash

Why do we need variance reduction?

When we do online experiments or A/B testing, we need to ensure our test has high statistical power so that we have a high probability to find the experimental effect if it does exist. What are the factors that might affect power? …

Image from with permission

It is surprising to me that many data scientists do not know HoloViz. HoloViz is my favorite Python viz ecosystem, which comprises seven Python libraries — Panel, hvPlot, HoloViews, GeoViews, Datashader, Param, and Colorcet.

Why do I love Holoviz?

HoloViz allows users to build Python visualization and interactive dashboard with super easy and flexible Python…

Hands-on Tutorials

Streaming and Refreshing

Photo by Sasha • Stories on Unsplash

Data scientists use data visualization to communicate data and generate insights. It’s essential for data scientists to know how to create meaningful visualization dashboards, especially real-time dashboards. This article talks about two ways to get your real-time dashboard in Python:

  • First, we use streaming data and create an auto-updated streaming…

Photo by Kelly Sikkema on Unsplash

Check out the slideshow of this article here:

There are two parts to this article:

  1. How to turn your Jupyter Notebooks into a slideshow and output to an html file.
  2. How to host an html file on Github.

Jupyter Notebook slides

First, let’s create a new environment slideshow, install a Jupyter notebook…

Photo by Jack Ward on Unsplash

A Salesforce database can be a hot mess. The figure below illustrates the relationship among some of the data tables in Salesforce. As you can see, the relationship among data tables (i.e., objects) can be complicated and hard to work with. I wrote a blog post previously on how to…

Photo by Raimond Klavins on Unsplash

How do you query BigQuery data? This article talks about 3 ways to query BigQuery data in Python. Hope you find them useful.


conda install notebook google-cloud-bigquery sqlalchemy pybigquery


To authenticate Google Cloud locally, you will need to install Google Cloud SDK and log in/authenticate through the following command line…


Photo by Amy Shamblen on Unsplash

Multiclass logistic regression is also called multinomial logistic regression and softmax regression. It is used when we want to predict more than 2 classes. A lot of people use multiclass logistic regression all the time, but don’t really know how it works. …

Photo by Sarah Kilian on Unsplash

Software testing is essential for software development. It is recommended for software engineers to use test-driven development (TDD), which is a software development process that develops test cases first and then develops the software. For data scientists, it is not always easy and plausible to write tests first. Nevertheless, software…

Photo by Greg Rakozy on Unsplash

Many data scientists like to use Jupyter Notebook or JupyterLab to do their data explorations, visualizations, and model building. I know some data scientists refuse to use Jupyter Notebook. But, I love to use Jupyter Notebook/Lab to do my experiments and explorations. …

Sophia Yang

Ph.D. | Senior Data Scientist @ Anaconda | Twitter @ sophiamyang | All views are my own

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store