If you are an electrical engineer, and want to know how to use Python to get data from the Internet and display it, this post is for you.
(This is the third part of a four part series. By the end we’ll have written a Python script to display a chart of electricity market prices. Enter your email → on the right so that you don’t miss the final post.)
- All you need to analyse the electricity market pt 1
- All you need to analyse the electricity market pt 2
- All you need to analyse the electricity market pt 3
- All you need to analyse the electricity market final
Australian electricity prices are high – let’s analyse
Very high electricity prices [Larger Size] |
Previously I mentioned that the Australian electricity prices have gone through the roof (more than doubling) since the introduction of the carbon tax.
This series of posts is exploring how to analyse market data accessible from the internet. The methods described can be adapted to your country’s data or any sort of data available on the internet.
We began the series with a post detailing how to obtain a CSV file that contains the latest electricity market prices.
Then we unzipped the price data CSV file that was downloaded in Part 1 and had a brief look at its contents.
Now we will teach you how to pull that CSV file apart using Python. You will master the ability to highlight the columns and rows that you are interested in. Just like top Japanese chefs are qualified to cut the good meat from the fugu fish, you too will learn to slice the good data from the CSV file.
The code we have developed over this series so far:
- downloads a zipped file;
- unzips it; and
- reads the file contents as a CSV.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
The example CSV file downloaded earlier had over 30 columns of information and many thousands of rows. We’ll use Python to get exactly the columns and rows that we want.
Which rows are important? Knowing what is in the CSV file is paramount at this stage. To this end, the site that provides the data may also provide a specification of the file structure. Failing that, you may have to get intimate with the data and spend a bit of time working out the format for yourself.
For the data provided by the Australian electricity market operator, the first
CSV column is a label. Each label describes the purpose of that row. There are
three values, C
, I
or D
. Shown below is an example of the data stored
in the first column,
1 2 3 4 5 6 7 8 9 10 11 |
|
Rows marked with a C
are comment rows, they give further information about
the file itself but aren’t necessary for us to worry about.
Rows marked with an I
are header rows. The header row is just like a header
row you use in a normal Microsoft Excel spreadsheet, it indicates what data is
stored in that column. For our goal of finding the price of electricity in
different regions of Australia over time, the columns that we are looking for
are SETTLEMENTDATE
(date and time), REGIONID
(price region) and RRP
(electricity price $/MWh).
Rows marked with a D
are the data rows. We’ll take these rows for the
SETTLEMENTDATE
, REGIONID
and RRP
then plot them on a chart.
Multiple header rows in a CSV file?
Immediately though, we run into a problem. Notice in the CSV file structure
figure shown above that there are multiple I
header rows? There are no less
than four in the CSV file we downloaded. You can think of it as four CSV files
crammed into a single file.
We are only interested in one of these four sections, the section with the
columns we mentioned before, SETTLEMENTDATE
, REGIONID
and RRP
. Through
analysis of the file structure, we know that all the data rows we are
interested in all begin with:
1
|
|
Master technique one: filter
Here is how to update the program to print the rows beginning with D,TREGION,,1
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
The filter
function will only return the rows for which the function
is_halfhourly_data
returns True
.
Master technique 2: map
Having correctly isolated the rows that interest us, slice up those
columns and get the SETTLEMENTDATE
, REGIONID
and RRP
columns (column
numbers 4
, 6
and 7
).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
The map
function will return a list of the result of
get_date_region_and_rrp
called with each row in halfhourly_data
as an argument.
1 2 3 4 |
|
The above section of code using map
is the equivalent of this code using
for
loop.
1 2 3 4 5 6 7 8 |
|
map
is an extremely versatile function. Those four lines of for
loop code
are replaced with one map
line of code.
And now only the good data remains
The date_region_price
variable will have these contents:
1 2 3 4 5 |
|
Only the columns that we are interested in. Nice work!
However there is still a small problem. All of the data values are still text.
Update the get_date_region_and_rrp
to convert the first column to a date,
keep the second as a string and the third to a floating point value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
Great, now our data is in a format that Python can understand and plot. This is
what is contained in the date_region_price
value now:
1 2 3 4 5 |
|
Perfect. We now have all the data formatted exactly the way we want it.
Conclusion
We’ve used filter
and map
to quickly and efficiently sort and slice the
CSV data. Just like a master Japanese chef, I’m sure that you will not poison
your patrons with bad slices of data. filter
and map
are advanced level
functions that are often used to replace for
loops. Please practice using
map
and filter
. Experiment with them so that you understand how they work!
In the next, and final, blog post of this series we’ll show you how to plot the results using matplotlib.