All you need to analyse the electricity market pt 3

If you are an electrical engineer, and want to know how to use Python to get data from the Internet and display it, this post is for you.

(This is the third part of a four part series. By the end we’ll have written a Python script to display a chart of electricity market prices. Enter your email → on the right so that you don’t miss the final post.)

All you need to analyse the electricity market pt 1
All you need to analyse the electricity market pt 2
All you need to analyse the electricity market pt 3
All you need to analyse the electricity market final

Australian electricity prices are high – let’s analyse

Australian Electricity Prices are higher since the introduction of the carbon tax

Very high electricity prices [Larger Size]

Previously I mentioned that the Australian electricity prices have gone through the roof (more than doubling) since the introduction of the carbon tax.

This series of posts is exploring how to analyse market data accessible from the internet. The methods described can be adapted to your country’s data or any sort of data available on the internet.

We began the series with a post detailing how to obtain a CSV file that contains the latest electricity market prices.

Then we unzipped the price data CSV file that was downloaded in Part 1 and had a brief look at its contents.

Now we will teach you how to pull that CSV file apart using Python. You will master the ability to highlight the columns and rows that you are interested in. Just like top Japanese chefs are qualified to cut the good meat from the fugu fish, you too will learn to slice the good data from the CSV file.

The code we have developed over this series so far:

downloads a zipped file;
unzips it; and
reads the file contents as a CSV.

extractziptocsv.py

from __future__ import with_statement
import csv
from urllib2 import urlopen
from StringIO import StringIO
from zipfile import ZipFile

PRICE_REPORTS_URL = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'
ZIP_URL = '/PUBLIC_PRICES_201207040000_20120705040607.ZIP'

# zippedfile is now one long string.
zippedfile = urlopen(PRICE_REPORTS_URL + ZIP_URL).read()

# StringIO turns the string into a real file-like object.
opened_zipfile = ZipFile(StringIO(zippedfile))

# assuming there is only one CSV in the zipped file.
csv_filename = opened_zipfile.namelist()[0]

prices_csv_file = opened_zipfile.open(csv_filename)

prices_csv_reader = csv.reader(prices_csv_file)

The example CSV file downloaded earlier had over 30 columns of information and many thousands of rows. We’ll use Python to get exactly the columns and rows that we want.

Which rows are important? Knowing what is in the CSV file is paramount at this stage. To this end, the site that provides the data may also provide a specification of the file structure. Failing that, you may have to get intimate with the data and spend a bit of time working out the format for yourself.

For the data provided by the Australian electricity market operator, the first CSV column is a label. Each label describes the purpose of that row. There are three values, C, I or D. Shown below is an example of the data stored in the first column,

CSV file structure

C,
I,
D,
D,
D,
D,
I,
D,
D,
I,
...

Rows marked with a C are comment rows, they give further information about the file itself but aren’t necessary for us to worry about.

Rows marked with an I are header rows. The header row is just like a header row you use in a normal Microsoft Excel spreadsheet, it indicates what data is stored in that column. For our goal of finding the price of electricity in different regions of Australia over time, the columns that we are looking for are SETTLEMENTDATE (date and time), REGIONID (price region) and RRP (electricity price $/MWh).

Rows marked with a D are the data rows. We’ll take these rows for the SETTLEMENTDATE, REGIONID and RRP then plot them on a chart.

Multiple header rows in a CSV file?

Immediately though, we run into a problem. Notice in the CSV file structure figure shown above that there are multiple I header rows? There are no less than four in the CSV file we downloaded. You can think of it as four CSV files crammed into a single file.

We are only interested in one of these four sections, the section with the columns we mentioned before, SETTLEMENTDATE, REGIONID and RRP. Through analysis of the file structure, we know that all the data rows we are interested in all begin with:

example D data row

D,TREGION,,1

Master technique one: `filter`

Here is how to update the program to print the rows beginning with D,TREGION,,1:

takedatarows.py

from __future__ import with_statement
import csv
from urllib2 import urlopen
from StringIO import StringIO
from zipfile import ZipFile

PRICE_REPORTS_URL = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'
ZIP_URL = '/PUBLIC_PRICES_201207040000_20120705040607.ZIP'

# zippedfile is now one long string.
zippedfile = urlopen(PRICE_REPORTS_URL + ZIP_URL).read()

# StringIO turns the string into a real file-like object.
opened_zipfile = ZipFile(StringIO(zippedfile))

# assuming there is only one CSV in the zipped file.
csv_filename = opened_zipfile.namelist()[0]

prices_csv_file = opened_zipfile.open(csv_filename)

prices_csv_reader = csv.reader(prices_csv_file)

def is_halfhourly_data(row):
    """Returns True if the given row starts with 'D', 'TREGION', '', '1'"""
    return row[:4] == ["D", "TREGION", "", "1"]

print filter(is_halfhourly_data, prices_csv_reader)

The filter function will only return the rows for which the function is_halfhourly_data returns True.

Master technique 2: `map`

Having correctly isolated the rows that interest us, slice up those columns and get the SETTLEMENTDATE, REGIONID and RRP columns (column numbers 4, 6 and 7).

takedatarows.py

from __future__ import with_statement
import csv
from urllib2 import urlopen
from StringIO import StringIO
from zipfile import ZipFile

PRICE_REPORTS_URL = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'
ZIP_URL = '/PUBLIC_PRICES_201207040000_20120705040607.ZIP'

# zippedfile is now one long string.
zippedfile = urlopen(PRICE_REPORTS_URL + ZIP_URL).read()

# StringIO turns the string into a real file-like object.
opened_zipfile = ZipFile(StringIO(zippedfile))

# assuming there is only one CSV in the zipped file.
csv_filename = opened_zipfile.namelist()[0]

prices_csv_file = opened_zipfile.open(csv_filename)

prices_csv_reader = csv.reader(prices_csv_file)

def is_halfhourly_data(row):
    """Returns True if the given row starts with 'D', 'TREGION', '', '1'"""
    return row[:4] == ["D", "TREGION", "", "1"]

halfhourly_data = filter(is_halfhourly_data, prices_csv_reader)

def get_date_region_and_rrp(row):
    return (row[4], row[6], row[7])

date_region_price = map(get_date_region_and_rrp, halfhourly_data)

The map function will return a list of the result of get_date_region_and_rrp called with each row in halfhourly_data as an argument.

def get_date_region_and_rrp(row):
    return (row[4], row[6], row[7])

date_region_price = map(get_date_region_and_rrp, halfhourly_data)

The above section of code using map is the equivalent of this code using for loop.

def get_date_region_and_rrp(row):
    return (row[4], row[6], row[7])

date_region_price = []

for row in halfhourly_data:
    new_row = get_date_region_and_rrp(row)
    date_region_price.append(new_row)

map is an extremely versatile function. Those four lines of for loop code are replaced with one map line of code.

And now only the good data remains

The date_region_price variable will have these contents:

date_region_price output

[["2012/04/02 10:30:00", "NSW1", "34.40"],
 ["2012/04/02 10:30:00", "QLD1", "34.67"],
 ["2012/04/02 10:30:00", "VIC1", "35.83"],
 ...
]

Only the columns that we are interested in. Nice work!

However there is still a small problem. All of the data values are still text. Update the get_date_region_and_rrp to convert the first column to a date, keep the second as a string and the third to a floating point value.

converted_data.py

from __future__ import with_statement
import csv
import datetime
from urllib2 import urlopen
from StringIO import StringIO
from zipfile import ZipFile

PRICE_REPORTS_URL = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'
ZIP_URL = '/PUBLIC_PRICES_201207040000_20120705040607.ZIP'

# zippedfile is now one long string.
zippedfile = urlopen(PRICE_REPORTS_URL + ZIP_URL).read()

# StringIO turns the string into a real file-like object.
opened_zipfile = ZipFile(StringIO(zippedfile))

# assuming there is only one CSV in the zipped file.
csv_filename = opened_zipfile.namelist()[0]

prices_csv_file = opened_zipfile.open(csv_filename)

prices_csv_reader = csv.reader(prices_csv_file)

def is_halfhourly_data(row):
    """Returns True if the given row starts with 'D', 'TREGION', '', '1'"""
    return row[:4] == ["D", "TREGION", "", "1"]

halfhourly_data = filter(is_halfhourly_data, prices_csv_reader)

def get_date_region_and_rrp(row):
    """
    Returns the SETTLEMENTDATE, REGION and RRP from the given
    PUBLIC_PRICES CSV data row.

    SETTLEMENTDATE is converted to a Python date (the time is discarded);
    REGION is left as a string; and
    RRP is converted to a floating point.
    """
    return (datetime.datetime.strptime(row[4], '%Y/%m/%d %H:%M:%S').date(),
            row[6],
            float(row[7]))

date_region_price = map(get_date_region_and_rrp, halfhourly_data)

Great, now our data is in a format that Python can understand and plot. This is what is contained in the date_region_price value now:

date_region_price converted output

[[datetime.date(2012, 4, 2), "NSW1", 34.40],
 [datetime.date(2012, 4, 2), "QLD1", 34.67],
 [datetime.date(2012, 4, 2), "VIC1", 35.83],
 ...
]

Perfect. We now have all the data formatted exactly the way we want it.

Conclusion

We’ve used filter and map to quickly and efficiently sort and slice the CSV data. Just like a master Japanese chef, I’m sure that you will not poison your patrons with bad slices of data. filter and map are advanced level functions that are often used to replace for loops. Please practice using map and filter. Experiment with them so that you understand how they work!

In the next, and final, blog post of this series we’ll show you how to plot the results using matplotlib.

Python for Power Systems

A blog for power systems engineers to learn Python.

All You Need to Analyse the Electricity Market Pt 3

Australian electricity prices are high – let’s analyse

Multiple header rows in a CSV file?

Master technique one: `filter`

Master technique 2: `map`

And now only the good data remains

Conclusion

Australian electricity prices are high – let’s analyse

Multiple header rows in a CSV file?

Master technique one: filter

Master technique 2: map

And now only the good data remains

Conclusion

Master technique one: `filter`

Master technique 2: `map`