Previously I
mentioned that the Australian electricity prices have gone through the roof
(more than doubling) since the introduction of the carbon tax.
This series of posts is exploring how to analyse market data accessible from the
internet. The methods described can be adapted to your country’s data or any
sort of data available on the internet.
We began the series with a post detailing how to obtain a CSV file that
contains the latest electricity market prices.
Then we
unzipped the price data CSV file that was downloaded in Part 1 and had a brief
look at its contents.
from__future__importwith_statementimportcsvimportdatetimefromurllib2importurlopenfromStringIOimportStringIOfromzipfileimportZipFilePRICE_REPORTS_URL='http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'ZIP_URL='/PUBLIC_PRICES_201207040000_20120705040607.ZIP'# zippedfile is now one long string.zippedfile=urlopen(PRICE_REPORTS_URL+ZIP_URL).read()# StringIO turns the string into a real file-like object.opened_zipfile=ZipFile(StringIO(zippedfile))# assuming there is only one CSV in the zipped file.csv_filename=opened_zipfile.namelist()[0]prices_csv_file=opened_zipfile.open(csv_filename)prices_csv_reader=csv.reader(prices_csv_file)defis_halfhourly_data(row):"""Returns True if the given row starts with 'D', 'TREGION', '', '1'"""returnrow[:4]==["D","TREGION","","1"]halfhourly_data=filter(is_halfhourly_data,prices_csv_reader)defget_date_region_and_rrp(row):""" Returns the SETTLEMENTDATE, REGION and RRP from the given PUBLIC_PRICES CSV data row. SETTLEMENTDATE is converted to a Python date (the time is discarded); REGION is left as a string; and RRP is converted to a floating point. """return(datetime.datetime.strptime(row[4],'%Y/%m/%d %H:%M:%S').date(),row[6],float(row[7]))date_region_price=map(get_date_region_and_rrp,halfhourly_data)
The completed example
Here is the complete code to download and plot the electricity prices with
Python. We’ll step through the most important parts and show you two of
Python’s advanced features defaultdict and yield.
If you aren’t interested in the advanced Python code, you can skip to the end.
The matplotlib code that creates the chart is very short and easy to
follow.
from__future__importwith_statementfromcollectionsimportdefaultdictimportcsvimportdatetimefromurllib2importurlopenfromStringIOimportStringIOfromzipfileimportZipFileimportmatplotlib.pyplotaspltPRICE_REPORTS_URL='http://www.nemweb.com.au/Reports/CURRENT/Public_Prices'ZIP_URL='/PUBLIC_PRICES_201207040000_20120705040607.ZIP'REGIONS=("QLD1","NSW1","VIC1","SA1","TAS1")# zippedfile is now one long string.try:zippedfile=open(ZIP_URL.replace('/','')).read()exceptIOError:zippedfile=urlopen(PRICE_REPORTS_URL+ZIP_URL).read()f=open(ZIP_URL.replace('/',''),'wb')f.write(zippedfile)# StringIO turns the string into a real file-like object.opened_zipfile=ZipFile(StringIO(zippedfile))# assuming there is only one CSV in the zipped file.csv_filename=opened_zipfile.namelist()[0]prices_csv_file=opened_zipfile.open(csv_filename)prices_csv_reader=csv.reader(prices_csv_file)defis_halfhourly_data(row):"""Returns True if the given row starts with 'D', 'TREGION', '', '1'"""returnrow[:4]==["D","TREGION","","1"]halfhourly_data=filter(is_halfhourly_data,prices_csv_reader)defget_date_region_and_rrp(row):""" Returns the SETTLEMENTDATE, REGION and RRP from the given PUBLIC_PRICES CSV data row. SETTLEMENTDATE is converted to a Python date (the time is discarded); REGION is left as a string; and RRP is converted to a floating point. """return(datetime.datetime.strptime(row[4],'%Y/%m/%d %H:%M:%S'),row[6],float(row[7]))prices=map(get_date_region_and_rrp,halfhourly_data)defget_region_price(date_region_prices,regions):""" returns the dates and prices in two columns grouped by region, suitable for plotting with matplotlib. the order of returned prices is the same order as the `regions` argument. Args: date_region_prices: a list of (date, region, price) tuples [(datetime(2012, 08, 09), "QLD1", 45.6), (datetime(2012, 08, 09, 1), "NSW1", 46.0) ... ] regions: A list of the regions to return. >>> get_region_price([(datetime(2012, 09, 09), "QLD1", 43.2), (datetime(2012, 09, 09), "NSW1", 45.5), (datetime(2012, 09, 10), "NSW1", 44.2), ...], ("NSW1", "QLD1")) [(datetime(2012, 09, 09), datetime(2012, 09, 10)) (45.5, 44.2)], [(datetime(2012, 09, 09),) (43.2,)] ... """region_prices=defaultdict(list)fordate,region,priceindate_region_prices:region_prices[region+'d'].append(date)region_prices[region+'p'].append(price)forregioninregions:yieldregion_prices[region+'d'],region_prices[region+'p']figure=plt.figure()fordates,pricesinget_region_price(prices,REGIONS):plt.plot(dates,prices,'-')plt.legend(REGIONS)plt.grid()plt.xlabel("Time of day")plt.ylabel("Electricity Price A$/MWh")figure.autofmt_xdate()plt.show()
Grouping the regions together
I want to plot each of the five Australian regions’s prices as a separate
series. But I don’t have the data organised into separate x axis and y axis
values. Instead there is one long Python list that has all regions.
Here is a way to use defaultdict to collect the date and price per region.
For example the 'NSW1' and 'VIC1' regions. The defaultdict(official docs)
is just like a normal dictionary, except that it has one additional powerful feature:
`defaultdict` will auto-initialise a new value if you attempt to access a `key` that is missing.
Confused? Here is a concrete example. Grouping power station names by
generator category (Nuclear, Wind,..) using a normal Python dict:
with a normal dict
1234567891011121314
gens={}gens['NUCLEAR']=['NUCLEAR-1','HILLVIEW-2','NUCLEAR-2']# we can add a new nuclear generator name.gens['NUCLEAR'].append('HILLVIEW-1')# we can't add a new wind farm name - yet.gens['WINDFARM'].append('WINDY-HILL-1')# KeyError 'WINDFARM'!# first must make a new empty wind farm listgens['WINDFARM']=[]# now this works.gens['WINDFARM'].append('WINDY-HILL-1')
A KeyError exception will be raised on line 8, because 'WINDFARM' is a key
that doesn’t exist in the gens dictionary yet. It isn’t until line 12 that
the 'WINDFARM' key is entered into the dictionary and the first wind farm can
be appended.
Here is the same code using defaultdict to initialise an empty list when
there is a missing key. Notice that there is no need to create a key with
an empty list before appending.
with defaultdict
12345678910
fromcollectionsimportdefaultdictgens=defaultdict(list)# empty list created for 'NUCLEAR' and straight away we extend it.gens['NUCLEAR'].extend(['NUCLEAR-1','HILLVIEW-2','NUCLEAR-2'])gens['NUCLEAR'].append('HILLVIEW-1')# empty list created for 'WINDFARM' and straight away we append.gens['WINDFARM'].append('WINDY-HILL-1')
Having seen defaultdict take another look at this code section from
final.py:
It makes two lists for every region. The key to the first list region + 'd'
would look like NSW1d or VIC1d. The key to the second list is region + 'p'
and looks like NSW1p or VIC1p.
The d stands for date, our x axis and the p stands for price, our y axis.
Its time to plot those x and y values.
Making your own iterator
Use the yield keyword in a function to turn that function into something
that can be used in a forloop (an iterable).
I used the yield keyword in the get_region_price function to return the
date and price (x and y axis) pairs that were grouped using defaultdict.
They are returned one region at a time in the forloop.
yield will take some getting used to if you’ve never seen it before. Try
working with this script on your computer so you can see what is happening:
yield demo
12345678910111213
names_age=[("NUCLEAR1",10),("NUCLEAR2",11),("WINDYPEAK",2)]defgenerator_names(names_age):print'[generator_names] started the generator_names generator'forname,ageinnames_age:print'[generator_names] about to yield',nameyieldnamefornameingenerator_names(names_age):print'I got name: ',name
Plotting the dates and prices is very easy once you have them in two lists, x
axis and y axis.
The plot commands are similar to Matlab plotting routines:
displaying a chart
123456789101112
figure=plt.figure()fordates,pricesinget_region_price(prices,REGIONS):plt.plot(dates,prices,'-')plt.legend(REGIONS)plt.grid()plt.xlabel("Time of day")plt.ylabel("Electricity Price A$/MWh")figure.autofmt_xdate()plt.show()
Here is the list of steps in the code above.
Create an empty figure;
Plot each region’s prices;
Display a legend;
Enable grid lines;
Set the x-axis label;
Set the y-axis label;
Auto-rotate the x axis date labels; and
Show the plot.
Conclusion
The final program is quite short, just 100 lines of code. But it covers
such a wide range of tasks:
Downloading files from the internet;
Unzipping files;
Reading CSV files;
Sorting, transposing and filtering data; and
Displaying data on a chart.
The post may not be clear in certain areas, or you may want us to write about
something in more detail, so tell us using the form below.