Data_Bootcamp
directory/folder. The link goes to a display of the notebook; you need to click on the Raw button to get the real file. Be sure to download it as filetype ipynb.import
statements.justdoit()
to the object x
with x.justdoit()
.df
with df.columns
and row labels with df.index
.x
in a dataframe df
as df['x']
, a series.read_csv()
and read_excel()
functions in Pandas.Data_Bootcamp
directory. Return to the Jupyter tab in your browser that points to that directory. Look for the file named bootcamp_graphics.ipynb
. Click to open it. That will open the notebook in a new tab. The notebook will say at the top: "Python graphics: Matplotlib fundamentals" in large bold letters.wbdf
on its own -- results in the display of wbdf
. That works as long as it's the last statement in the cell.rm
is the return on the equity market overall and rf
is the riskfree return.wbdf
? What are its column and row labels?ff.index
? What does that tell us?x
and y
variables. Typically we graph some y
variable -- or perhaps several of them -- against an x
variable, with x
on the horizontal axis and y
on the vertical axis. We need to tell Excel which is which.x
and y
variables. By default, the x
variable is the dataframe's index and the y
variables are the columns of the dataframe -- all of them that can be plotted (e.g. columns with a numeric dtype).us.plot()
into a code cell and run it. This plots every column of the dataframe us
as a line against the index, the year of the observation. The lines have different colors. We didn't ask for this, it's built in. A legend associates each variable name with a line color. This is also built in.us
. To plot one line, we apply the same method to a single variable -- a series. The statement us['gdp'].plot()
plots GDP alone. The first part -- us['gdp']
-- is the single variable GDP. The second part -- .plot()
-- plots it.y
argument when we call the plot method. The statement us.plot(y="gdp")
will produce the same plot as us['gdp'].plot()
.us.plot(kind='bar')
produces a bar chart of the same data.x
and y
. We'll use gdp
as x
and pce
(consumption) as y
. The general syntax for a dataframe df
is df.plot.scatter(x,y)
. In this case we useus.plot(kind='bar')
and us.plot.bar()
in separate cells. Show that they produce the same bar chart.us.plot()
:kind='area'
subplots=True
sharey=True
figsize=(3,6)
ylim=(0,16000)
us.plot?
in a new cell. Run the cell (shift-enter or click on the run cell icon). What options do you see for the kind=
argument? Which ones have we tried? What are the other ones?ff
. The basic plot statement isrm
) that varies a lot and one (the riskfree return rf
) that does not.title='Fama-French returns'
, grid=True
, and legend=False
. What does the documentation say about them? What do they do?wbdf
to create a bar chart of GDP per capita, the variable 'gdppc'
. Bonus points: Create a horizontal bar chart. Which do you prefer?subplots()
, which creates the objects fig
and ax
on the left. The subplot()
function produces a blank figure, which is displayed in the Jupyter notebook. The names fig
and ax
can be anything, but these choices are standard.fig
is a figure object and ax
is an axis object. (Try type(fig)
and type(ax)
to see why.) Once more, the words don't mean what we might think they mean:fig
is a blank canvas for creating a figure.ax
is everything in it: axes, labels, lines or bars, legend, and so on.ax
. We typically do this with dataframe plot methods:kind='bar'
to convert this into a bar chart.alpha=0.65
to the bar chart. What does it do?plot(x,y)
function.fig, ax
objects and apply plot methods to them.set_xlabel()
method to add an x-axis label. What would you choose? Or would you prefer to leave it empty?ax.legend?
to access the documentation for the legend
method. What options appeal to you?us.plot?
to get the documentation.set_ylim()
method to start the y
axis at zero. Hint: Use ax.set_ylim?
to get the documentation.ff
that includes both returns. Bonus points: Add a title with the set_title
method.subplot
statement asks for a graph with two rows (top and bottom) and one column. That is, two graphs, one on top of the other. The sharex=True
argument makes the x
axes the same. The print
statement tells us "Object ax has dimension 2", one for the GDP graph, and one for the consumption graph.ax
at zero, which should be getting familiar by now.) This gives us a double graph, with GDP at the top and consumption at the bottom. Put another way, the figure fig
contains two axis (ax[0]
and ax[1]
) and each axis has one plot in it.figsize=(width, height)
. The sizes are measured in inches, which get shrunk a bit when we display them in Jupyter. Here's a version with a much larger height
that we discovered by experimenting:36
comes from experimenting. We count from the bottom starting with zero.barh
above with bar
. (Try it and see what you think.)x
on the horizontal axis, y
on the vertical axis, and a third variable represented by the size of the bubble.) From a technical perspective, this is simply another argument in a scatter plot. Here's an example in which x
is GDP per capita, y
is life expectancy, and the third variable is population:10**6
"scaling" on the second line. The bubble size is a little tricky to calibrate. Without the scaling, the bubbles are larger than the graph. We played around until they looked reasonable.ggplot
, bmh
, dark_background
, and grayscale
. Which ones do you like? Why?cals
contains the calories in 100 grams of several different foods.'Food'
as the index of cals
.cals
using figure and axis objects.alpha=0.5
. What does it do?plot(x,y)
, which wouldn't be our choice, but the content is very good.