{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\"Continuum



" ] }, { "cell_type": "heading", "level": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Interactive Financial Analytics\n", "\n", "with Python & IPython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Eurex Tutorial with Examples based on the VSTOXX Volatility Index**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dr. Yves J. Hilpisch\n", "\n", "Continuum Analytics Europe GmbH\n", "\n", "www.continuum.io\n", "\n", "yves@continuum.io\n", "\n", "@dyjh\n", "\n", "PyData London – 21. February 2014" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "You find the presentation and the IPython Notebook here:\n", " \n", "* http://www.hilpisch.com/YH_PyData_Eurex_Tutorial.html\n", "* http://www.hilpisch.com/YH_PyData_Eurex_Tutorial.ipynb" ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "About Me" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A brief bio:\n", "\n", "* Managing Director Europe of Continuum Analytics Inc.\n", "* Founder of Visixion GmbH \u2013 The Python Quants\n", "* Lecturer Mathematical Finance at Saarland University\n", "* Focus on Financial Industry and Financial Analytics\n", "* Book (2013) \"Derivatives Analytics with Python\"\n", "* Book (July 2014) \"Python for Finance\", O'Reilly\n", "* Dr.rer.pol in Mathematical Finance\n", "* Graduate in Business Administration\n", "* Martial Arts Practitioner and Fan\n", "\n", "See www.hilpisch.com." ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python for Analytics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Corporations, decision makers and analysts nowadays generally face a number of problems with data:\n", "\n", " * **sources**: data typically comes from different sources, like from the Web, from in-house databases or it is generated in-memory, e.g. for simulation purposes \n", " * **formats**: data is generally available in different formats, like SQL databases/tables, Excel files, CSV files, arrays, proprietary formats \n", " * **structure**: data typically comes differently structured, be it unstructured, simply indexed, hierarchically indexed, in table form, in matrix form, in multidimensional arrays \n", " * **completeness**: real-world data is generally incomplete, i.e. there is missing data (e.g. along an index) or multiple series of data cannot be aligned correctly (e.g. two time series with different time indexes)\n", " * **conventions**: for some types of data there a many \u201ccompeting\u201d conventions with regard to formatting, like for dates and time\n", " * **interpretation**: some data sets contain information that can be easily and intelligently interpreted, like a time index, others not, like texts\n", " * **performance**: reading, streamlining, aligning, analyzing \u2013 i.e. processing \u2013 (big) data sets might be slow " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In addition to these data-oriented problems, there typically are organizational issues that have to be considered:\n", "\n", " * **departments**: the majority of companies is organized in departments with different technologies, databases, etc., leading to \u201cdata silos\u201d\n", " * **analytics skills**: analytical and business skills in general are possessed by people working in line functions (e.g. production) or administrative functions (e.g. finance)\n", " * **technical skills**: technical skills, like retrieving data from databases and visualizing them, are generally possessed by people in technology functions (e.g. development, systems operations)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "At Continuum Analytics, the vision for Python-based data analytics is the following:\n", "\n", "\u201cTo revolutionize data analytics and visualization by moving high-level Python code and domain expertise closer to data. This vision rests on four pillars:\n", "\n", "* **simplicity**: advanced, powerful analytics, accessible to domain experts and business users via a simplified programming paradigm\n", "* **interactivity**: interactive analysis and visualization of massive data sets \n", "* **collaboration**: collaborative, shareable analysis (data, code, results, graphics)\n", "* **scalability**: out-of-core, distributed data processing\u201d" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This tutorial **focuses** on\n", "\n", " * Python as a general purpose financial analytics environment\n", " * interactive analytics examples\n", " * prototyping-like Python usage\n", " \n", "It does **not** address such important issues like\n", "\n", " * architectural issues regarding hardware and software\n", " * development processes, testing, documentation and production\n", " * real world problem modeling" ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Some Python Fundamentals" ] }, { "cell_type": "heading", "level": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Fundamental Python Libraries" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A fundamental Python stack for interactive data analytics and visualization should at least contain the following libraries tools:\n", "\n", " * **Python** – the Python interpreter itself\n", " * **NumPy** – high performance, flexible array structures and operations\n", " * **SciPy** – collection of scientific modules and functions (e.g. for regression, optimization, integration)\n", " * **pandas** – time series and panel data analysis and I/O \n", " * **PyTables** – hierarchical, high performance database (e.g. for out-of-memory analytics)\n", " * **matplotlib** – 2d and 3d visualization\n", " * **IPython** – interactive data analytics, visualization, publishing\n", " \n", "It is best to use either the Python distribution **Anaconda** or the Web-based analytics environment **Wakari**. Both provide almost \"complete\" Python environments." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For example, pandas can, among others, help with the following data-related problems:\n", "\n", " * **sources**: pandas reads data directly from different data sources such as SQL databases, Excel files or JSON based APIs\n", " * **formats**: pandas can process input data in different formats like CSV files or Excel files; it can also generate output in different formats like CSV, XLS, HTML or JSON\n", " * **structure**: pandas' strength lies in structured data formats, like time series and panel data\n", " * **completeness**: pandas automatically deals with missing data in most circumstances, e.g. computing sums even if there are a few or many \u201cnot a number\u201d, i.e. missing, values\n", " * **conventions/interpretation**: for example, pandas can interpret and convert different date-time formats to Python datetime objects and/or timestamps\n", " * **performance**: the majority of pandas classes, methods and functions is implemented in a performance-oriented fashion making heavy use of the Python/C compiler Cython" ] }, { "cell_type": "heading", "level": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "First Interactive Steps with Python " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As a simple example let's generate a NumPy array with five sets of 1000 (pseudo-)random numbers each." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np # this imports the NumPy library" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "data = np.random.standard_normal((5, 1000)) # generate 5 sets with 1000 rn each\n", "data[:, :5].round(3) # print first five values of each set rounded to 3 digits" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's plot a **histogram** of the 1st, 2nd and 3rd data set." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import matplotlib as mpl # this imports matplotlib\n", "import matplotlib.pyplot as plt # this imports matplotlib.pyplot\n", "%matplotlib inline\n", " # inline plotting" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.hist([data[0], data[1], data[2]], label=['Set 0', 'Set 1', 'Set 2'])\n", "plt.grid(True) # grid for better readability\n", "plt.legend()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We then want to plot the 'running' **cumulative sum** of each set." ] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.figure() # initialize figure object\n", "plt.grid(True) \n", "for data_set in enumerate(data): # iterate over all rows\n", " plt.plot(data_set[1].cumsum(), label='Set %s' % data_set[0])\n", " # plot the running cumulative sums for each row\n", "plt.legend(loc=0) # write legend with labels" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Some **fundamental statistics** from our data sets." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.mean(axis=1) # average value of the 5 sets" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "data.std(axis=1) # standard deviation of the 5 sets" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.corrcoef(data).round(3) # correltion matrix of the 5 data sets" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "First Financial Analytics Example" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We need to make a couple of imports for what is to come." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import pandas.io.data as pdd\n", "from urllib import urlretrieve" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The convenience function _**DataReader**_ makes it easy to read historical stock price data from Yahoo! Finance (http://finance.yahoo.com)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "index = pdd.DataReader('^GDAXI', data_source='yahoo', start='2007/3/30')\n", " # e.g. the EURO STOXX 50 ticker symbol -- ^SX5E" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "index.info()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "pandas strength is the handling of **indexed/labeled/structured data**, like times series data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "index.tail()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "pandas makes it easy to implement **vectorized operations**, like calculating log-returns over whole time series." ] }, { "cell_type": "code", "collapsed": false, "input": [ "index['Returns'] = np.log(index['Close'] / index['Close'].shift(1))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In addition, pandas makes **plotting** quite simple and compact." ] }, { "cell_type": "code", "collapsed": false, "input": [ "index[['Close', 'Returns']].plot(subplots=True, style='b', figsize=(8, 5))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We now want to check **how annual volatility changes over time**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "index['Mov_Vol'] = pd.rolling_std(index['Returns'], window=252) * np.sqrt(252)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Obviously, the annual volatility **changes significantly** over time." ] }, { "cell_type": "code", "collapsed": false, "input": [ "index[['Close', 'Returns', 'Mov_Vol']].plot(subplots=True, style='b', figsize=(8, 5))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Exercise" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Trend-based investment strategy with the EURO STOXX 50 index:\n", "\n", "* 2 trends 42d & 252d \n", "* long, short, cash positions\n", "* no transaction costs\n", "\n", "Signal generation:\n", "\n", "* invest (go long) when the 42d trend is more than 100 points above the 252d trend\n", "* sell (go short) when the 42d trend is more than 20 points below the 252d trend\n", "* invest in cash (no interest) when neither of both is true" ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Historical Correlation between EURO STOXX 50 and VSTOXX" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "It is a stylized fact that stock indexes and related volatility indexes are **highly negatively correlated**. The following example analyzes this stylized fact based on the EURO STOXX 50 stock index and the VSTOXX volatility index using Ordinary Least-Squares regession (OLS)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "First, we collect historical data for both the EURO STOXX 50 stock and the VSTOXX volatility index." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import datetime as dt\n", "from urllib import urlretrieve" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'\n", "vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'\n", "urlretrieve(es_url, 'es.txt')\n", "urlretrieve(vs_url, 'vs.txt')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **EURO STOXX 50 data** is not yet in the right format. Some house cleaning is necessary (I)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lines = open('es.txt').readlines() # reads the whole file line-by-line" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "lines[:5] # header not well formatted" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **EURO STOXX 50 data** is not yet in the right format. Some house cleaning is necessary (II)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lines[3883:3890] # from 27.12.2001 additional semi-colon" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **EURO STOXX 50 data** is not yet in the right format. Some house cleaning is necessary (III)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lines = open('es.txt').readlines() # reads the whole file line-by-line\n", "new_file = open('es50.txt', 'w') # opens a new file\n", "new_file.writelines('date' + lines[3][:-1].replace(' ', '') + ';DEL' + lines[3][-1])\n", " # writes the corrected third line (additional column name)\n", " # of the orginal file as first line of new file\n", "new_file.writelines(lines[4:-1]) # writes the remaining lines of the orginal file" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **EURO STOXX 50 data** is not yet in the right format. Some house cleaning is necessary (IV)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "list(open('es50.txt'))[:5] # opens the new file for inspection" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now, the data can be safely read into a DataFrame object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "es = pd.read_csv('es50.txt', index_col=0, parse_dates=True, sep=';', dayfirst=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "del es['DEL'] # delete the helper column" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "es.info()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The VSTOXX data can be read without touching the raw data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "vs = pd.read_csv('vs.txt', index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True)\n", "\n", "# you can alternatively read from the Web source directly\n", "# without saving the csv file to disk:\n", "# vs = pd.read_csv(vs_url, index_col=0, header=2,\n", "# parse_dates=True, sep=',', dayfirst=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We now **merge the data** for further analysis." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import datetime as dt\n", "data = pd.DataFrame({'EUROSTOXX' :\n", " es['SX5E'][es.index > dt.datetime(1999, 12, 31)]})\n", "data = data.join(pd.DataFrame({'VSTOXX' :\n", " vs['V2TX'][vs.index > dt.datetime(1999, 12, 31)]}))\n", "data.info()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's inspect the two time series." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.head()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A **picture** can tell almost the complete story." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.plot(subplots=True, grid=True, style='b', figsize=(10, 5))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We now generate **log returns** for both time series." ] }, { "cell_type": "code", "collapsed": false, "input": [ "rets = np.log(data / data.shift(1)) \n", "rets.head()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To this new data set, also stored in a DataFrame object, we apply **OLS**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "xdat = rets['EUROSTOXX']\n", "ydat = rets['VSTOXX']\n", "model = pd.ols(y=ydat, x=xdat)\n", "model" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Again, we want to see how our results look graphically." ] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.plot(xdat, ydat, 'r.')\n", "ax = plt.axis() # grab axis values\n", "x = np.linspace(ax[0], ax[1] + 0.01)\n", "plt.plot(x, model.beta[1] + model.beta[0] * x, 'b', lw=2)\n", "plt.grid(True)\n", "plt.axis('tight')" ], "language": "python", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us see if we can identify **systematics over time**. And indeed, during the crisis 2007/2008 (yellow dots) volatility has been more pronounced than more recently (red dots)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "mpl_dates = mpl.dates.date2num(rets.index)\n", "plt.figure(figsize=(8, 4))\n", "plt.scatter(rets['EUROSTOXX'], rets['VSTOXX'], c=mpl_dates, marker='o')\n", "plt.grid(True)\n", "plt.xlabel('EUROSTOXX')\n", "plt.ylabel('VSTOXX')\n", "plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),\n", " format=mpl.dates.DateFormatter('%d %b %y'))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Exercise" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We want to test whether the EURO STOXX 50 and/or the VSTOXX returns are **normally distributed** or not (e.g. if they might have fat tails). We want to do a\n", "\n", "* graphical illustration (using _**qqplot**_ of _**statsmodels.api**_) and a\n", "* statistical test (using _**normaltest**_ of _**scipy.stats**_)\n", "\n", "Add on: plot a histogram of the log return frequencies and compare that to a normal distribution with same mean and variance (using e.g. _**norm.pdf**_ from _**scipy.stats**_)" ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Constant Proportion VSTOXX Investment" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There has been a number of studies which have illustrated that **constant proportion investments in volatility derivatives** – given a diversified equity portfolio – might improve investment performance considerably. See, for instance, the study\n", "\n", "The Benefits of Volatility Derivatives in Equity Portfolio Management\n", "\n", "We now want to replicate (in a simplified fashion) what you can flexibly test here on the basis of two backtesting applications for **VSTOXX-based investment strategies**:\n", "\n", "Two Assets Backtesting\n", "\n", "Four Assets Backtesting\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The strategy we are going to implement and test is characterized as follows:\n", "\n", "* An investor has total wealth of say 100,000 EUR\n", "* He invests, say, 70% of that into a diversified equity portfolio\n", "* The remainder, i.e. 30%, is invested in the VSTOXX index directly\n", "* Through (daily) trading the investor keeps the proportions constant\n", "* No transaction costs apply, all assets are infinitely divisible" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We already have the necessary data available. However, we want to drop 'NaN' values and want to **normalize the index values**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = data.dropna()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "data = data / data.ix[0] * 100" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "data.head()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "First, the **initial invest**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "invest = 100\n", "cratio = 0.3\n", "data['Equity'] = (1 - cratio) * invest / data['EUROSTOXX'][0]\n", "data['Volatility'] = cratio * invest / data['VSTOXX'][0]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This can already be considered an **static** investment strategy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data['Static'] = (data['Equity'] * data['EUROSTOXX']\n", " + data['Volatility'] * data['VSTOXX'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "data[['EUROSTOXX', 'Static']].plot(figsize=(10, 5))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Second, the dynamic strategy with **daily adjustments** to keep the value ratio constant." ] }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(1, len(data)):\n", " evalue = data['Equity'][i - 1] * data['EUROSTOXX'][i]\n", " # value of equity position\n", " vvalue = data['Volatility'][i - 1] * data['VSTOXX'][i]\n", " # value of volatility position\n", " tvalue = evalue + vvalue\n", " # total wealth \n", " data['Equity'][i] = (1 - cratio) * tvalue / data['EUROSTOXX'][i]\n", " # re-allocation of total wealth to equity ...\n", " data['Volatility'][i] = cratio * tvalue / data['VSTOXX'][i]\n", " # ... and volatility position" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Third, the **total wealth position**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data['Dynamic'] = (data['Equity'] * data['EUROSTOXX']\n", " + data['Volatility'] * data['VSTOXX'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "data.head()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A brief check if the **ratios are indeed constant**." ] }, { "cell_type": "code", "collapsed": false, "input": [ "(data['Volatility'] * data['VSTOXX'] / data['Dynamic'])[:5]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "(data['Equity'] * data['EUROSTOXX'] / data['Dynamic'])[:5]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us inspect the performance of the strategy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "data[['EUROSTOXX', 'Dynamic']].plot(figsize=(10, 5))" ], "language": "python", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Exercise" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Write a Python function which allows for an **arbitrary but constant ratio** to be invested in the VSTOXX index and which returns net performance values (in percent) for the constant proportion VSTOXX strategy.\n", "\n", "Add on: find the ratio to be invested in the VSTOXX that gives the maximum performance." ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Analyzing High Frequency Data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Using standard Python functionality and pandas, the code that follows reads **intraday, high-frequency data** from a Web source, plots it and resamples it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "url = 'http://hopey.netfonds.no/posdump.php?'\n", "url += 'date=%s%s%s&paper=AAPL.O&csv_format=csv' % ('2014', '02', '19')\n", "# you may have to adjust the date since only recent dates are available\n", "urlretrieve(url, 'aapl.csv')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "AAPL.info()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **intraday evolution** of the Apple stock price." ] }, { "cell_type": "code", "collapsed": false, "input": [ "AAPL['bid'].plot()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "AAPL = AAPL[AAPL.index > dt.datetime(2014, 2, 19, 10, 0, 0)]\n", " # only data later than 10am at that day" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A **resampling of the data** is easily accomplished with pandas." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# this resamples the record frequency to 5 minutes, using mean as aggregation rule\n", "AAPL_5min = AAPL.resample(rule='5min', how='mean').fillna(method='ffill')\n", "AAPL_5min.head()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's have a graphical look at the new data set." ] }, { "cell_type": "code", "collapsed": false, "input": [ "AAPL_5min['bid'].plot()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With pandas you can easily apply **custom functions** to time series data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "AAPL_5min['bid'].apply(lambda x: 2 * 540 - x).plot()\n", " # this mirrors the stock price development at " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Why Python for Financial Analytics & Visualization?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "10 years ago, Python was considered *exotic* in the analytics space – at best. Languages/packages like R and Matlab dominated the scene. Today, Python has become a major force in financial analytics & visualization due to a number of characteristics:\n", "\n", " * **multi-purpose**: prototyping, development, production, sytems administration – Python is one for all\n", " * **libraries**: there is a library for almost any task or problem you face\n", " * **efficiency**: Python speeds up all IT development tasks for analytics applications and reduces maintenance costs\n", " * **performance**: Python has evolved from a scripting language to a 'meta' language with bridges to all high performance environments (e.g. LLVM, multi-core CPUs, GPUs, clusters)\n", " * **interoperalbility**: Python seamlessly integrates with almost any other language and technology\n", " * **interactivity**: Python allows domain experts to get closer to their business and financial data pools and to do real-time analytics\n", " * **collaboration**: solutions like Wakari with IPython Notebook allow the easy sharing of code, data, results, graphics, etc.\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "One of the easiest ways to deploy Python today across a whole organization with a heterogenous IT infrastructure is via **Wakari**, Continuum's Web-/Browser- and Python-based Data Analytics environment. It is availble both as a cloud as well as an enterprise solution for in-house deployment.\n", "\n", "\"Wakari\"\n", "\n", "enterprise.wakari.io" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\"Continuum



" ] }, { "cell_type": "heading", "level": 2, "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Continuum Analytics Europe GmbH – Python Data Exploration & Visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Continuum Analytics Inc.** – the company Web site\n", "\n", "www.continuum.io\n", "\n", "**Dr. Yves J. Hilpisch** – my personal Web site\n", "\n", "www.hilpisch.com\n", "\n", "**Derivatives Analytics with Python** – my new book\n", "\n", "Read an Excerpt and Order the Book\n", " \n", "**Contact Us**\n", "\n", "yves@continuum.io\n", "\n", "@dyjh\n" ] } ], "metadata": {} } ] }