Iplot Scatter

Iplot Scatter 7,1/10 5408 votes
  • Label points in the scatter plot
  • Scatter plots with multiple groups


This article describes how create a scatter plot using R software and ggplot2 package. The function geom_point() is used.

Graph Image I'm trying to make a line graph using the Plotly function, and I've installed the new chartstudio package. I was able to make a graph using this code: a = np.linspace(start=0, stop=36. Create a scatter plot with varying marker point size and color. The coordinates of each point are defined by two dataframe columns and filled circles are used to represent each point. This kind of plot is useful to see complex correlations between two variables.


Related Book:


GGPlot2 Essentials for Great Data Visualization in R

mtcars data sets are used in the examples below.

Simple scatter plots are created using the R code below. The color, the size and the shape of points can be changed using the function geom_point() as follow :

Note that, the size of the points can be controlled by the values of a continuous variable as in the example below.

Read more on point shapes : ggplot2 point shapes

The function geom_text() can be used :

Read more on text annotations : ggplot2 - add texts to a plot

Add regression lines

The functions below can be used to add regression lines to a scatter plot :

  • geom_smooth() and stat_smooth()
  • geom_abline()
Iplot Scatter

geom_abline() has been already described at this link : ggplot2 add straight lines to a plot.

Only the function geom_smooth() is covered in this section.

A simplified format is :


  • method : smoothing method to be used. Possible values are lm, glm, gam, loess, rlm.
    • method = “loess”: This is the default value for small number of observations. It computes a smooth local regression. You can read more about loess using the R code ?loess.
    • method =“lm”: It fits a linear model. Note that, it’s also possible to indicate the formula as formula = y ~ poly(x, 3) to specify a degree 3 polynomial.
  • se : logical value. If TRUE, confidence interval is displayed around smooth.
  • fullrange : logical value. If TRUE, the fit spans the full range of the plot
  • level : level of confidence interval to use. Default value is 0.95


Change the appearance of points and lines

This section describes how to change :

  • the color and the shape of points
  • the line type and color of the regression line
  • the fill color of the confidence interval

Note that a transparent color is used, by default, for the confidence band. This can be changed by using the argument alpha : geom_smooth(fill=“blue”, alpha=1)

Read more on point shapes : ggplot2 point shapes

Read more on line types : ggplot2 line types

This section describes how to change point colors and shapes automatically and manually.

Change the point color/shape/size automatically

In the R code below, point shapes, colors and sizes are controlled by the levels of the factor variable cyl :

Add regression lines

Regression lines can be added as follow :

Note that, you can also change the line type of the regression lines by using the aesthetic linetype = cyl.

The fill color of confidence bands can be changed as follow :

Change the point color/shape/size manually

The functions below are used :

  • scale_shape_manual() for point shapes
  • scale_color_manual() for point colors
  • scale_size_manual() for point sizes

It is also possible to change manually point and line colors using the functions :

  • scale_color_brewer() : to use color palettes from RColorBrewer package
  • scale_color_grey() : to use grey color palettes

Read more on ggplot2 colors here : ggplot2 colors

The function geom_rug() can be used :

sides : a string that controls which sides of the plot the rugs appear on. Allowed value is a string containing any of “trbl”, for top, right, bottom, and left.

The functions geom_density_2d() or stat_density_2d() can be used :

Read more on ggplot2 colors here : ggplot2 colors

The function stat_ellipse() can be used as follow:

The number of observations is counted in each bins and displayed using any of the functions below :

  • geom_bin2d() for adding a heatmap of 2d bin counts
  • stat_bin_2d() for counting the number of observation in rectangular bins
  • stat_summary_2d() to apply function for 2D rectangular bins

The simplified formats of these functions are :

  • geom : geometrical object to display the data
  • bins : Number of bins in both vertical and horizontal directions. The default value is 30
  • fun : function for summary

The data sets diamonds from ggplot2 package is used :

Change the number of bins :

Or specify the width of bins :

Step 1/3. Create some data :

Step 2/3. Create the plots :

Scatter Plot Iplot

Create a blank placeholder plot :

Step 3/3. Put the plots together:

To put multiple plots on the same page, the package gridExtra can be used. Install the package as follow :

Pandas Plot Scatter

Arrange ggplot2 with adapted height and width for each row and column :

Read more on how to arrange multiple ggplots in one page : ggplot2 - Easy way to mix multiple graphs on the same page

Change colors manually :

Read more on ggplot2 colors here : ggplot2 colors

This analysis has been performed using R software (ver. 3.2.4) and ggplot2 (ver. 2.1.0)


Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.
Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!



Recommended for You!




More books on R and data science

Recommended for you

This section contains best data science and self-development resources to help you on your path.

Coursera - Online Courses and Specialization

Data science

  • Course: Machine Learning: Master the Fundamentals by Standford
  • Specialization: Data Science by Johns Hopkins University
  • Specialization: Python for Everybody by University of Michigan
  • Courses: Build Skills for a Top Job in any Industry by Coursera
  • Specialization: Master Machine Learning Fundamentals by University of Washington
  • Specialization: Statistics with R by Duke University
  • Specialization: Software Development in R by Johns Hopkins University
  • Specialization: Genomic Data Science by Johns Hopkins University

Popular Courses Launched in 2020

  • Google IT Automation with Python by Google
  • AI for Medicine by deeplearning.ai
  • Epidemiology in Public Health Practice by Johns Hopkins University
  • AWS Fundamentals by Amazon Web Services
Pandas plot scatter

Trending Courses

  • The Science of Well-Being by Yale University
  • Google IT Support Professional by Google
  • Python for Everybody by University of Michigan
  • IBM Data Science Professional Certificate by IBM
  • Business Foundations by University of Pennsylvania
  • Introduction to Psychology by Yale University
  • Excel Skills for Business by Macquarie University
  • Psychological First Aid by Johns Hopkins University
  • Graphic Design by Cal Arts

Books - Data Science

Our Books

  • Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
  • Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
  • Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
  • R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
  • GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
  • Network Analysis and Visualization in R by A. Kassambara (Datanovia)
  • Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
  • Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

  • R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
  • Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
  • Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
  • An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
  • Deep Learning with R by François Chollet & J.J. Allaire
  • Deep Learning with Python by François Chollet

Pyplot 3d Scatter


Want to Learn More on R Programming and Data Science?
Follow us by EmailOn Social Networks:

Iplot Scatter

Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on 'Discussion' button (top-right position of this page)