Violin Plots: Examples, Best Practices, and How to Create

The standard summary statistics, such as mean and media, are useful. But they usually don’t tell a data set’s full story. To truly understand your data, you also need to see its shape, spread, and density. Violin plots are a powerful visualization tool that reveals the full distribution of your data, offering insights that simple metrics and even other charts might miss. They let you see the peaks, valleys, and overall form in a way that simple summary numbers cannot.

This guide is your resource for understanding and using violin plots well. We will explore what they are, how to read them, and when they are the best choice for your analysis. You’ll also learn about design best practices, how to create them, and their limitations. Let’s dive in and discover how to make your data tell a more complete story.

What is a violin plot?

A violin plot is a statistical chart used to visualize the distribution of numerical data and its probability density. Think of it as a hybrid that combines the features of a box plot and a density plot. The name comes from its shape, which often resembles a violin.

The core concept behind a violin plot is its width. The width of the plot at any given point represents the density of the data at that value. A wider section indicates that more data points are concentrated around that value, while a narrower section means fewer data points. This gives you a clear and intuitive picture of where your data is clustered.

This chart type merges the strengths of two other common visualizations. Like a box plot, it can display key summary statistics such as the median, the interquartile range (IQR), and the overall range of the data. However, it goes a step further. Like a density plot, it shows the full distribution, revealing patterns like multiple peaks (multimodality) that a box plot would hide.

Let’s dig a little deeper into how violin plots differ from other similar charts. Box plots, for instance, are excellent for showing medians and quartiles, but they simplify the distribution into a box, hiding its underlying shape. You could have two very different distributions that produce identical box plots. As another example, histograms show data distribution by grouping values into bins, but their appearance can change significantly depending on the number and width of these bins. 

Violin plots, by contrast, use a technique called kernel density estimation to create a smooth, continuous curve that is less dependent on arbitrary parameters, giving a more stable representation of the distribution than either the box plot or the histogram.

When (and why) to use a violin plot

Violin plots are incredibly useful in specific analytical situations. Knowing when to use one can elevate your data storytelling and lead to deeper insights.

Comparing distributions across multiple groups

One of the primary strengths of a violin plot is its ability to compare the distributions of a numerical variable across several different categories or groups. By placing multiple violins side by side, you can quickly assess differences in shape, spread, and central tendency. 

For example, you could compare salary distributions across different departments in a company or test scores across various classrooms. This side-by-side comparison makes it easy to spot which groups are similar and which are distinct.

Identifying skewness, spread, and variability

A violin plot’s shape immediately communicates key characteristics of your data’s distribution. You can see if the data is symmetric or skewed to one side. A long tail on one end of the violin indicates skewness. The overall height and width of the violin show the spread and variability. A tall, thin violin suggests data is spread out over a wide range but not densely clustered, while a short, wide violin indicates data is concentrated within a narrow range.

Revealing multimodal or bimodal distributions

This is where violin plots truly shine. A multimodal distribution is one with two or more peaks, meaning the data has multiple clusters of high frequency. A box plot would completely obscure this fact, often showing a single median between the peaks that does not accurately represent the data’s true nature. 

A violin plot, with its density-based shape, will clearly show these multiple peaks. Identifying multimodality is often a critical insight, suggesting that your data set may contain distinct subgroups that should be analyzed separately.

Situations where summary statistics alone are insufficient

Relying only on summary statistics—such as the mean or median—can be misleading. The famous Anscombe’s quartet is a classic example of how four data sets with nearly identical simple statistical properties can look vastly different when graphed. Violin plots help you avoid this pitfall by providing a visual representation of the entire distribution. When the shape of the data is as important as its central tendency, a violin plot is a strong choice.

When not to use a violin plot

Despite their advantages, violin plots aren’t always the best option. For very small data sets, the kernel density estimation can be misleading, creating a smooth shape that does not accurately reflect the few data points available. They can also be complex for audiences unfamiliar with statistical charts. If your goal is simplicity and you are presenting to a non-technical audience, a simpler chart like a bar chart or a box plot might be more effective.

How violin plots work: The mechanics

To use violin plots effectively, you need to understand the mechanics behind their construction. The process involves statistical techniques that turn your raw data into the informative shapes you see on the chart.

At the heart of a violin plot is kernel density estimation (KDE). KDE is a method for estimating the probability density function of a random variable from a set of data points. In simpler terms, it’s a way to create a smooth curve that represents the distribution of your data. The process involves placing a “kernel” (a small, symmetric function, usually a Gaussian curve) at each data point and then summing all the kernels to generate the final density estimate. The smoothness of this final curve is controlled by a parameter called bandwidth.

The resulting density curve is then plotted vertically. This curve is mirrored on the other side of a central axis to create the symmetrical “violin” shape. This symmetry does not add new information; it simply makes the shape easier to read and more visually pleasing.

The relationship between the width and frequency is the key takeaway. Where the violin is widest, the data is most concentrated. Where it is narrow, the data is sparse. The peaks in the violin’s shape correspond to the modes, or the most frequent values, in your data set.

The choice of bandwidth in the KDE process is crucial. A small bandwidth can lead to a jagged, undersmoothed curve that reflects the noise in the data rather than the true underlying distribution. A large bandwidth can create an oversmoothed curve that obscures important features like multiple peaks. Most software tools automatically select a reasonable bandwidth, but it’s good to be aware of how it affects the final visualization.

Finally, when comparing multiple groups, it is essential to maintain consistent scaling across violins. Both the vertical axis (representing the data values) and the horizontal scaling (representing density) must be the same for all violins. If the widths are not scaled consistently, you cannot accurately compare the densities between different groups.

Key components of a violin plot

A violin plot packs a lot of information into a single shape. Understanding its individual components will help you interpret it correctly.

  • Violin shape and density curve: This is the main body of the plot. Its outline is the kernel density estimate of the data distribution. The width at any point indicates the frequency of data at that value.
  • Median indicator: Most violin plots include a marker, typically a white dot or a horizontal line, inside the violin to show the median (the 50th percentile) of the data. This tells you the central point of the distribution.
  • Interquartile range (IQR): Often, a thick black bar or box is drawn inside the violin to represent the IQR. This range contains the middle 50 percent of the data, spanning from the first quartile (25th percentile) to the third quartile (75th percentile). The length of this bar gives you a sense of the data’s spread around the median.
  • Whiskers or range markers: Some violin plots include lines (whiskers) extending from the IQR bar. These can represent different things, but they often extend to one and a half times the IQR or to the minimum and maximum values in the data, similar to a box plot. They help show the full range of the data and can help in identifying potential outliers.
  • Optional overlays: To add even more detail, violin plots can be combined with other plot types. A common practice is to overlay a box plot directly on top of the violin. Another option is to add individual data points, often with a bit of “jitter” (small random offsets) to prevent them from overlapping. These overlays provide more granular detail while still showing the overall distribution shape.

Types and variants of violin plots

Violin plots are versatile and can be adapted to fit different analytical needs. Here are some common types and variations you might encounter.

  • Single violin plots: The simplest form is a single violin that visualizes the distribution of one data set. This is useful for understanding the shape, spread, and skewness of a single variable.
  • Grouped or comparative violin plots: This is the most common use case, where multiple violins are placed side by side to compare distributions across different categories. This format is excellent for highlighting differences between groups.
  • Split violin plots: A split violin plot is used to compare the distributions of two different subgroups within a single category. The violin is split down the middle, with each half representing one of the subgroups. For example, you could compare the distribution of salaries for men and women within the same department. This is a very efficient way to show direct comparisons.
  • Violin plots with box plot overlays: This popular variant includes a box plot within the violin. It provides the best of both worlds: the detailed distribution shape from the violin and the precise summary statistics (median, quartiles) from the box plot.
  • Violin plots with jittered points: In this variation, individual data points are plotted on top of the violin. A small amount of horizontal randomness (jitter) is added to each point to prevent overplotting. This is especially useful for smaller data sets, as it allows you to see both the overall distribution and the location of each individual data point.

Design best practices and pitfalls

A well-designed violin plot is clear and easy to interpret. Following best practices can help you create effective visualizations, while being aware of common pitfalls can prevent you from misleading your audience.

Design best practices

  • Use a consistent bandwidth and scale: When comparing multiple violins, ensure that the bandwidth parameter used for the KDE is the same for all of them. Also, make sure the area of each violin is proportional to the number of observations in its group, or clearly state if it is not.
  • Avoid overcrowding: While it is tempting to compare many groups at once, too many violins on a single chart can become cluttered and difficult to read. If you have more than eight or 10 groups, consider breaking them into smaller, related charts or using a different visualization type.
  • Choose readable color schemes: Use color to distinguish between groups, but choose a palette that is easy on the eyes and accessible to people with color vision deficiencies. Avoid overly bright or clashing colors.
  • Label axes clearly: Always label your vertical axis with the name of the variable and its units. The horizontal axis should clearly label the different categories or groups being compared. A descriptive title is also essential.

Common pitfalls of violin plots

  • Misleading density smoothing: Be mindful of the bandwidth setting. An overly smooth plot can hide important features, while an undersmoothed plot can be noisy and confusing. Let your software choose a default, but be prepared to adjust it if the result looks misleading.
  • Unclear medians or IQRs: Ensure that the markers for the median and IQR are clearly visible and distinct. A simple legend explaining what each component represents can be very helpful for your audience.
  • Overcomplicated designs: Avoid adding too many elements. A violin plot with a box plot overlay, jittered points, and complex coloring can become overwhelming. Choose the components that are most relevant to the story you want to tell and leave out the rest.

Examples and storytelling tips

The real power of a violin plot comes from its ability to tell a story with data. Here are a few examples and tips for using them to build a compelling narrative.

Example: Comparing test scores across classes

Imagine you are an educator analyzing test scores for three different classes (Class A, Class B, and Class C). A simple bar chart of average scores might show that all three classes have a similar mean score of around 85 percent.

However, a violin plot could reveal a much richer story. The plot for Class A might be a wide, symmetrical violin centered at 85, indicating a standard bell-curve distribution. The plot for Class B might be bimodal, with two distinct peaks around 75 and 95. This suggests two subgroups of students: one that struggled and one that excelled. The plot for Class C might be skewed, with a long tail towards the lower scores, indicating that while most students did well, a significant number performed poorly. This level of insight allows the educator to tailor their approach for each class.

Storytelling tips

  • Explain what the width means: When presenting a violin plot, start by explaining that the width of the violin represents how many data points are at that level. This is the most crucial concept for your audience to grasp.
  • Call out peaks and clusters: Point to the peaks in the distribution. A single peak indicates a unimodal distribution, while multiple peaks suggest distinct subgroups that may warrant further investigation.
  • Compare shape, not just the center: Encourage your audience to look beyond the median. Compare the overall shapes of the violins. Is one wider than another? Is one skewed while the other is symmetrical? These differences in shape often tell a more interesting story than differences in the median alone.

How to create a violin plot

Creating a violin plot requires a bit of data preparation and the right tools. While not as straightforward as a bar chart, many modern software platforms make it accessible.

Data preparation

First, you need to structure your data correctly. You will need at least one numeric variable that you want to visualize. This could be anything from temperatures and heights to sales figures and test scores. If you want to compare distributions, you will also need a categorical variable to group your data. This variable defines the different violins that will be plotted side by side (for example, department, region, or experimental group).

Before plotting, it is also a good practice to consider how you will handle outliers and missing values. Violin plots will include outliers in their density calculation, so it is important to be aware of their presence.

In common tools

Most modern statistical and visualization platforms offer built-in capabilities for creating violin plots. In these tools, you typically select the violin plot type, assign your numeric variable to the vertical axis, and assign your categorical variable to the horizontal axis. You can then configure options like showing a box plot overlay, adjusting colors, and modifying the KDE’s bandwidth.

In Excel

Creating violin plots in Excel is more challenging, as they are not a native chart type. It’s not impossible but requires workarounds. One method involves using third-party add-ins that extend Excel’s charting capabilities. Another advanced approach involves calculating the density distribution in the worksheet and using a stacked area chart or scatter plot with smoothed lines to mimic the violin shape. For most users, using a tool with native support is a more practical option.

Limitations and when to use an alternative

While powerful, violin plots have limitations. Their primary drawback is that they can be difficult for non-technical audiences to interpret. The concept of a kernel density estimate is not intuitive for everyone. If your audience is not familiar with statistical charts, you will need to spend time explaining how to read the plot.

Because of this, violin plots are not always a good choice for general business dashboards where quick, at-a-glance comprehension is key. In these cases, simpler charts may be more appropriate.

When a violin plot is not the right fit, consider these alternatives:

  • Box plots: If your audience is familiar with them and you only need to show summary statistics, box plots are a great, compact alternative.
  • Histograms or density plots: If you are only visualizing a single distribution, a standard histogram or density plot can be clearer.
  • Strip plots or jitter plots: For smaller data sets, plotting the individual points as a strip plot or jitter plot can be more honest and revealing than a smoothed density curve.

Conclusion and key takeaways about violin charts

Violin plots offer a deep and nuanced view of your data’s distribution. By combining the statistical summary of a box plot with the detailed shape of a density plot, they provide insights that are easy to miss with other chart types. They excel at comparing distributions across multiple groups and are uniquely capable of revealing complex patterns like multimodality.

To use them effectively, remember to focus on clear design, proper labeling, and the context of your audience. Always explain what the width of the violin represents, and guide your audience to compare the shapes, not just the medians. While they may require a bit more explanation than simpler charts, the rich, detailed story they tell is often worth the effort. Use violin plots when understanding the complete shape of your data truly matters.

Table of contents
Try Domo for yourself.
Try free

Frequently asked questions

What does the width of a violin plot represent?

The width of a violin plot at any given point indicates the density or frequency of data at that specific value. A wider section means more data points are clustered around that value, while a narrower section indicates fewer data points.

How is a violin plot different from a box plot?

A box plot summarizes a distribution with five key numbers: the median, the first and third quartiles, and the minimum and maximum values. A violin plot shows the same summary statistics but also visualizes the entire distribution’s shape using a kernel density estimate, revealing features like multiple peaks that a box plot would hide.

What is kernel density estimation?

Kernel density estimation (KDE) is a statistical method used to estimate the probability density function of a variable. It creates a smooth curve that represents the distribution of the data points, which forms the outline of the violin plot.

Can violin plots show outliers?

Yes, violin plots can indicate the presence of outliers. The “whiskers” that extend from the central box can be defined to show the data range, and any points that fall outside this range can be considered outliers. Additionally, the smoothed density curve itself will extend to cover all data points, so a long, thin tail can suggest the presence of outlying values.

Are violin plots good for dashboards?

It depends on the audience. For dashboards intended for analysts or data-savvy users, violin plots can provide valuable, dense information. However, for general business dashboards aimed at a non-technical audience, simpler charts like bar charts or box plots are often a better choice because they are easier to interpret quickly.

No items found.
Explore all
No items found.
No items found.