The Python Quants



Python for Next Generation Data Analytics

Dr. Yves J. Hilpisch

The Python Quants GmbH

www.pythonquants.com

yves@pythonquants.com

@dyjh

BI Forum, Budapest – 06. November 2013

Python for Analytics

Corporations, decision makers and analysts nowadays generally face a number of problems with data:

  • sources: data typically comes from different sources, like from the Web, from in-house databases or it is generated in-memory, e.g. for simulation purposes
  • formats: data is generally available in different formats, like SQL databases/tables, Excel files, CSV files, arrays, proprietary formats
  • structure: data typically comes differently structured, be it unstructured, simply indexed, hierarchically indexed, in table form, in matrix form, in multidimensional arrays
  • completeness: real-world data is generally incomplete, i.e. there is missing data (e.g. along an index) or multiple series of data cannot be aligned correctly (e.g. two time series with different time indexes)
  • conventions: for some types of data there a many “competing” conventions with regard to formatting, like for dates and time
  • interpretation: some data sets contain information that can be easily and intelligently interpreted, like a time index, others not, like texts
  • performance: reading, streamlining, aligning, analyzing – i.e. processing – (big) data sets might be slow

In addition to these data-oriented problems, there typically are organizational issues that have to be considered:

  • departments: the majority of companies is organized in departments with different technologies, databases, etc., leading to “data silos”
  • analytics skills: analytical and business skills in general are possessed by people working in line functions (e.g. production) or administrative functions (e.g. finance)
  • technical skills: technical skills, like retrieving data from databases and visualizing them, are generally possessed by people in technology functions (e.g. development, systems operations)

This presentation focuses on

  • Python as a general purpose analytics environment
  • interactive analytics examples
  • prototyping-like Python usage

It does not address such important issues like

  • architectural issues regarding hardware and software
  • development processes, testing, documentation and production
  • real world problem modeling

Some Python Fundamentals

Fundamental Python Libraries

A fundamental Python stack for interactive data analytics and visualization should at least contain the following libraries tools:

  • Python – the Python interpreter itself
  • NumPy – high performance, flexible array structures and operations
  • SciPy – collection of scientific modules and functions (e.g. for regression, optimization, integration)
  • pandas – time series and panel data analysis and I/O
  • PyTables – hierarchical, high performance database (e.g. for out-of-memory analytics)
  • matplotlib – 2d and 3d visualization
  • IPython – interactive data analytics, visualization, publishing

It is best to use either the Python distribution Anaconda or the Web-based analytics environment Wakari. Both provide almost "complete" Python environments.

For example, pandas can, among others, help with the following data-related problems:

  • sources: pandas reads data directly from different data sources such as SQL databases, Excel files or JSON based APIs
  • formats: pandas can process input data in different formats like CSV files or Excel files; it can also generate output in different formats like CSV, XLS, HTML or JSON
  • structure: pandas' strength lies in structured data formats, like time series and panel data
  • completeness: pandas automatically deals with missing data in most circumstances, e.g. computing sums even if there are a few or many “not a number”, i.e. missing, values
  • conventions/interpretation: for example, pandas can interpret and convert different date-time formats to Python datetime objects and/or timestamps
  • performance: the majority of pandas classes, methods and functions is implemented in a performance-oriented fashion making heavy use of the Python/C compiler Cython

First Interactive Steps with Python

As a simple example let's generate a NumPy array with five sets of 1000 (pseudo-)random numbers each.

In [1]:
import numpy as np  # this imports the NumPy library
In [2]:
data = np.random.standard_normal((5, 1000))  # generate 5 sets with 1000 rn each
data[:, :5].round(3)  # print first five values of each set rounded to 3 digits
Out[2]:
array([[ 0.321, -1.314, -0.593, -0.102,  1.658],
       [-1.482,  0.179, -0.682, -1.432, -0.41 ],
       [-0.138,  0.023, -0.363,  1.308,  0.297],
       [ 0.562, -0.749, -1.585, -0.208, -0.089],
       [ 1.425, -0.555, -0.573,  0.169, -0.078]])

Let's plot a histogram of the 1st, 2nd and 3rd data set.

In [3]:
import matplotlib.pyplot as plt  # this imports matplotlib.pyplot
In [4]:
plt.hist([data[0], data[1], data[2]], label=['Set 0', 'Set 1', 'Set 2'])
plt.grid(True)  # grid for better readability
plt.legend()
Out[4]:
<matplotlib.legend.Legend at 0x105901590>

We then want to plot the 'running' cumulative sum of each set.

In [5]:
plt.figure()  # initialize figure object
plt.grid(True) 
for data_set in enumerate(data):  # iterate over all rows
    plt.plot(data_set[1].cumsum(), label='Set %s' % data_set[0])
        # plot the running cumulative sums for each row
plt.legend(loc=0)  # write legend with labels
Out[5]:
<matplotlib.legend.Legend at 0x105d13a50>

Some fundamental statistics from our data sets.

In [6]:
data.mean(axis=1)  # average value of the 5 sets
Out[6]:
array([ 0.02138191, -0.01091389,  0.00409084,  0.006534  ,  0.01840274])
In [7]:
data.std(axis=1)  # standard deviation of the 5 sets
Out[7]:
array([ 1.0369874 ,  1.03725012,  0.99011055,  1.00291577,  0.97987308])
In [8]:
np.corrcoef(data)  # correltion matrix of the 5 data sets
Out[8]:
array([[ 1.        , -0.01430132, -0.04065085,  0.03645453,  0.02389929],
       [-0.01430132,  1.        , -0.0287767 , -0.02230724,  0.03217344],
       [-0.04065085, -0.0287767 ,  1.        ,  0.00691503, -0.0021722 ],
       [ 0.03645453, -0.02230724,  0.00691503,  1.        ,  0.00393679],
       [ 0.02389929,  0.03217344, -0.0021722 ,  0.00393679,  1.        ]])

General I/O Capabilities of Python

SQL Databases

Currently, and in the foreseeable future as well, most corporate data is stored in SQL databases. We take this as a starting point and generate a SQL table with dummy data.

In [9]:
filename = 'numbers'
import sqlite3 as sq3  # imports the SQLite library
# Remove files if they exist
import os
try:
    os.remove(filename + '.db')
    os.remove(filename + '.h5')
    os.remove(filename + '.csv')
    os.remove(filename + '.xls')
except:
    pass

We create a table and populate the table with the dummy data.

In [10]:
query = 'CREATE TABLE numbs (no1 real, no2 real, no3 real, no4 real, no5 real)'
con = sq3.connect(filename + '.db')
con.execute(query)
con.commit()
In [11]:
# Writing Random Numbers to SQLite3 Database
rows = 500000  # 500,000 rows for the SQL table (ca. 25 MB in size)
nr = np.random.standard_normal((rows, 5))
%time con.executemany('INSERT INTO numbs VALUES(?, ?, ?, ?, ?)', nr)
con.commit()
CPU times: user 8.83 s, sys: 177 ms, total: 9.01 s
Wall time: 9.03 s

We can now read the data into a NumPy array.

In [12]:
# Reading a couple of rows from SQLite3 database table
array(con.execute('SELECT * FROM numbs').fetchmany(3)).round(3)
Out[12]:
array([[ 1.678, -1.522,  0.455,  0.484, -1.186],
       [ 0.217, -2.242, -0.893, -0.357, -2.32 ],
       [ 1.241, -0.432,  0.731,  0.372,  1.517]])
In [13]:
# Reading the whole table into an NumPy array
%time result = array(con.execute('SELECT * FROM numbs').fetchall())
CPU times: user 4.71 s, sys: 134 ms, total: 4.84 s
Wall time: 4.87 s

In [14]:
nr = 0.0; result = 0.0  # memory clean-up

Reading and Writing Data with pandas

pandas is optimized both for convenience and performance when it comes to analytics and I/O operations.

In [15]:
import pandas as pd
import pandas.io.sql as pds
In [16]:
%time data = pds.read_frame('SELECT * FROM numbs', con)
con.close()
CPU times: user 2.11 s, sys: 159 ms, total: 2.27 s
Wall time: 2.28 s

In [17]:
data.head()
Out[17]:
no1 no2 no3 no4 no5
0 1.677696 -1.521913 0.455371 0.483874 -1.185816
1 0.216890 -2.242000 -0.893104 -0.357223 -2.319643
2 1.240746 -0.432435 0.730807 0.371782 1.516961
3 0.575273 -1.219279 -0.413639 0.433625 0.910346
4 -1.913407 -1.233322 -0.063490 -1.231014 -0.571479

The plotting of larger data sets often is a problem. This can easily be accomplished here by only plotting every 500th data row, for instance.

In [18]:
%time data[::500].cumsum().plot(grid=True, legend=True)  # every 500th row is plotted
CPU times: user 69.9 ms, sys: 1.61 ms, total: 71.6 ms
Wall time: 71.2 ms

Out[18]:
<matplotlib.axes.AxesSubplot at 0x107f1b050>

There are some reasons why you may want to write the DataFrame to disk:

  • Database Performance: you do not want to query the SQL database everytime anew
  • IO Speed: you want to increase data/information retrievel
  • Data Structure: maybe you want to store additional data with the original data set or you have generated a DataFrame out of multiple sources

pandas is tightly integrated with PyTables, a library that implements the HDF5 standard for Python. It makes data storage and I/O operations not only convenient but quite fast as well.

In [19]:
h5 = pd.HDFStore(filename + '.h5', 'w')
%time h5['data'] = data  # writes the DataFrame to HDF5 database
CPU times: user 31.4 ms, sys: 35.4 ms, total: 66.8 ms
Wall time: 84.9 ms

Now, we can read the data from the HDF5 file into another DataFrame object. Performance of such an operation generally is quite high.

In [20]:
%time data_new = h5['data']
CPU times: user 3.4 ms, sys: 12.3 ms, total: 15.7 ms
Wall time: 15.4 ms

In [21]:
%time data.sum() - data_new.sum()   # checking if the DataFrames are the same
CPU times: user 74.4 ms, sys: 15.3 ms, total: 89.7 ms
Wall time: 83 ms

Out[21]:
no1    0
no2    0
no3    0
no4    0
no5    0
dtype: float64
In [22]:
data_new = 0.0  # memory clean-up

The data is easily exported to different standard data file formats like comma separated value files (CSV).

In [23]:
%time data.to_csv(filename + '.csv')  # writes the whole data set as CSV file
CPU times: user 4.89 s, sys: 129 ms, total: 5.02 s
Wall time: 5.08 s

In [24]:
csv_file = open(filename + '.csv')
for i in range(5):
    print csv_file.readline(),
,no1,no2,no3,no4,no5
0,1.677695668253341,-1.5219134106087264,0.4553707994114911,0.4838739437594936,-1.185815732601574
1,0.2168901241866966,-2.2420002426484427,-0.8931035434748411,-0.3572227389588546,-2.3196427644529645
2,1.2407462786194512,-0.43243481195488837,0.7308065597573864,0.37178197861313794,1.516960849144284
3,0.5752733225616632,-1.2192792718946073,-0.413638685603497,0.43362460986398205,0.9103460017903237

pandas also interacts easily with Microsoft Excel spreadsheet files (XLS/XLSX).

In [25]:
%time data.ix[:65000].to_excel(filename + '.xls')  # first 65,000 rows written in a Excel file
CPU times: user 9.35 s, sys: 197 ms, total: 9.55 s
Wall time: 9.59 s

With pandas you can in one step read data from an Excel file and e.g. plot it.

In [26]:
%time pd.read_excel(filename + '.xls', 'sheet1').hist(grid=True)
CPU times: user 2.38 s, sys: 66.2 ms, total: 2.44 s
Wall time: 2.44 s

Out[26]:
array([[<matplotlib.axes.AxesSubplot object at 0x1084433d0>,
        <matplotlib.axes.AxesSubplot object at 0x10841b090>,
        <matplotlib.axes.AxesSubplot object at 0x10843f950>],
       [<matplotlib.axes.AxesSubplot object at 0x106f21dd0>,
        <matplotlib.axes.AxesSubplot object at 0x106f42190>,
        <matplotlib.axes.AxesSubplot object at 0x1070fd950>]], dtype=object)

One can also generate HTML code, e.g. for automatic Web-based report generation.

In [27]:
html_code = data.ix[:2].to_html()
html_code
Out[27]:
u'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>no1</th>\n      <th>no2</th>\n      <th>no3</th>\n      <th>no4</th>\n      <th>no5</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td> 1.677696</td>\n      <td>-1.521913</td>\n      <td> 0.455371</td>\n      <td> 0.483874</td>\n      <td>-1.185816</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td> 0.216890</td>\n      <td>-2.242000</td>\n      <td>-0.893104</td>\n      <td>-0.357223</td>\n      <td>-2.319643</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td> 1.240746</td>\n      <td>-0.432435</td>\n      <td> 0.730807</td>\n      <td> 0.371782</td>\n      <td> 1.516961</td>\n    </tr>\n  </tbody>\n</table>'
In [28]:
from IPython.core.display import HTML
HTML(html_code)
Out[28]:
no1 no2 no3 no4 no5
0 1.677696 -1.521913 0.455371 0.483874 -1.185816
1 0.216890 -2.242000 -0.893104 -0.357223 -2.319643
2 1.240746 -0.432435 0.730807 0.371782 1.516961

Some final checks.

In [29]:
ll numb*
-rw-rw-rw-  1 yhilpisch  staff  52469010  5 Nov 16:11 numbers.csv
-rw-r--r--  1 yhilpisch  staff  27224064  5 Nov 16:11 numbers.db
-rw-rw-rw-  1 yhilpisch  staff  24007368  5 Nov 16:11 numbers.h5
-rw-rw-rw-  1 yhilpisch  staff   8130560  5 Nov 16:11 numbers.xls

In [30]:
data = 0.0  # memory clean-up
try:
    os.remove(filename + '.db')
    os.remove(filename + '.h5')
    os.remove(filename + '.csv')
    os.remove(filename + '.xls')
except:
    pass

Parallel Execution of Algorithms

Parallelization of Algorithms

First, let us define the function we want to evaluate on a large array.

In [31]:
from math import *
def f(x):
    return abs(cos(x)) ** 0.5 + sin(2 + 3 * sqrt(abs(x)))
In [32]:
x = pi
f(x)
Out[32]:
1.8594415167660776

Second, as a benchmark, a pure Python implementation.

In [33]:
I = 10000000
x = range(I)
def f1(x):
    return [f(x) for x in x]
In [34]:
%time r = f1(x)
CPU times: user 7.82 s, sys: 208 ms, total: 8.03 s
Wall time: 8.13 s

In [35]:
print r[:5]
[1.9092974268256817, -0.2238716875184228, 0.6045609273704432, 1.7863049827563042, 1.7978405413358955]

A better implementation uses NumPy for faster looping via vectorization.

In [36]:
import numpy as np
x = np.arange(I)
def f2(x):
    return (np.abs(np.cos(x)) ** 0.5 +
            np.sin(2 + 3 * np.sqrt(np.abs(x))))
In [37]:
%time r = f2(x)
CPU times: user 937 ms, sys: 177 ms, total: 1.11 s
Wall time: 1.2 s

In [38]:
print r[:5]
[ 1.90929743 -0.22387169  0.60456093  1.78630498  1.79784054]

An even better and parallel version uses the numexpr library.

In [39]:
import numexpr as ne
ex = 'abs(cos(x)) ** 0.5 + sin(2 + 3 * sqrt(abs(x)))'
def f3(x):
    ne.set_num_threads(4)
    return ne.evaluate(ex)
In [40]:
%time r = f3(x)
CPU times: user 1.13 s, sys: 18.6 ms, total: 1.14 s
Wall time: 384 ms

In [41]:
print r[:5]
[ 1.90929743 -0.22387169  0.60456093  1.78630498  1.79784054]

Let us compare the performance of the different functions in a bit more detail.

In [42]:
from perf_comp import perf_comp
In [43]:
func_list = [f1, f2, f3]
In [44]:
perf_comp(func_list, 'x', rep=3)
function: f3, av. time sec:   0.35817, relative:    1.0
function: f2, av. time sec:   0.85174, relative:    2.4
function: f1, av. time sec:   9.68758, relative:   27.0

The parallel version with numexpr is magnitudes of order faster than pure Python.

Using IPython for Distributed Computing

The algorithm we (arbitrarily) choose is the Monte Carlo estimator for a European option value in the Black-Scholes-Merton set-up. In this set-up the underlying follows the stochastic differential equation:

\[dS_t = r S_t dt + \sigma S_t dZ_t\]

The Monte Carlo estimator for a call option value looks like follows:

\(C_0 = e^{-rT} \frac{1}{I} \sum_{i=1}^{I} \max[S_T(i) - K, 0]\)

where \(S_T(i)\) is the \(i\)-th simulated value of the underlying at maturity \(T\).

A dynamic implementation of the BSM Monte Carlo valuation could look like follows.

In [45]:
def BSM_MCS(S0):
    ''' Dynamic Black-Scholes-Merton MCS Estimator for European Calls. '''
    import numpy as np
    K = 100.; T = 1.0; r = 0.05; vola = 0.2
    M = 50; I = 20000
    dt = T / M
    rand = np.random.standard_normal((M + 1, I))
    S = np.zeros((M + 1, I)); S[0] = S0
    for t in range(1, M + 1):
        S[t] = S[t-1] * np.exp((r - 0.5 * vola ** 2) * dt + vola * np.sqrt(dt) * rand[t])
    BSM_MCS_Value = np.sum(np.maximum(S[-1] - K, 0)) / I
    return BSM_MCS_Value

The Sequential Calculation

The benchmark case of sequential valuation of \(n\) options.

In [46]:
def seqValue(n):
    optionValues = []
    for S in linspace(80, 100, n):
        optionValues.append(BSM_MCS(S))
    return optionValues

The Parallel Calculation

First, start a (local) cluster.

In [47]:
# in the shell ...
####
# ipcluster start -n 4
####
# or maybe more cores

Second, enable parallel computing capabilities.

In [48]:
from IPython.parallel import Client
c = Client(profile="default")
view = c.load_balanced_view()

The function for the parallel execution of a number \(n\) of different options valuations.

In [49]:
def parValue(n):
    optionValues = []
    for S in linspace(80, 100, n):
        value = view.apply_async(BSM_MCS, S)
        # asynchronously calculate the option values
        optionValues.append(value)
    c.wait(optionValues)
    return optionValues

Let us again compare performance of the two different approaches.

In [50]:
n = 100  # number of option valuations
perf_comp((seqValue, parValue), data='n', rep=3)
function: parValue, av. time sec:   3.26991, relative:    1.0
function: seqValue, av. time sec:   6.11594, relative:    1.9

Performance scales linearly with the number of CPU cores available.

Analyzing Financial Time Series Data

Historical Correlation between EURO STOXX 50 and VSTOXX

It is a stylized fact that stock indexes and related volatility indexes are highly negatively correlated. The following example analyzes this stylized fact based on the EURO STOXX 50 stock index and the VSTOXX volatility index using Ordinary Least-Squares regession (OLS).

First, we collect historical data for both the EURO STOXX 50 stock and the VSTOXX volatility index.

In [51]:
import pandas as pd
from urllib import urlretrieve
In [52]:
try:
    open('es.txt')
    open('vs.txt')
except:
    es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
    vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
    urlretrieve(es_url, 'es.txt')
    urlretrieve(vs_url, 'vs.txt')

The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary.

In [53]:
lines = open('es.txt').readlines()  # reads the whole file line-by-line
new_file = open('es50.txt', 'w')  # opens a new file
new_file.writelines('date' + lines[3][:-1].replace(' ', '') + ';DEL' + lines[3][-1])
    # writes the corrected third line of the orginal file as first line of new file
new_file.writelines(lines[4:-1])  # writes the remaining lines of the orginial file
print list(open('es50.txt'))[:5]  # opens the new file for inspection
['date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n', '31.12.1986;775.00 ;  900.82 ;   82.76 ;   98.58 ;   98.06 ;   69.06 ;  645.26  ;  65.56\n', '01.01.1987;775.00 ;  900.82 ;   82.76 ;   98.58 ;   98.06 ;   69.06 ;  645.26  ;  65.56\n', '02.01.1987;770.89 ;  891.78 ;   82.57 ;   97.80 ;   97.43 ;   69.37 ;  647.62  ;  65.81\n', '05.01.1987;771.89 ;  898.33 ;   82.82 ;   98.60 ;   98.19 ;   69.16 ;  649.94  ;  65.82\n']

Now, the data can be safely read into a DataFrame object.

In [54]:
es = pd.read_csv('es50.txt', index_col=0, parse_dates=True, sep=';', dayfirst=True)
In [55]:
del es['DEL']  # delete the helper column
es
Out[55]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6907 entries, 1986-12-31 00:00:00 to 2013-10-10 00:00:00
Data columns (total 8 columns):
SX5P    6907  non-null values
SX5E    6907  non-null values
SXXP    6907  non-null values
SXXE    6907  non-null values
SXXF    6906  non-null values
SXXA    6906  non-null values
DK5F    6906  non-null values
DKXF    6906  non-null values
dtypes: float64(7), object(1)

The VSTOXX data can be read without touching the raw data.

In [56]:
vs = pd.read_csv('vs.txt', index_col=0, header=2,
                 parse_dates=True, sep=',', dayfirst=True)

# you can alternatively read from the Web source directly
# without saving the csv file to disk:
# vs = pd.read_csv(vs_url, index_col=0, header=2,
#                  parse_dates=True, sep=',', dayfirst=True)

We now merge the data for further analysis.

In [57]:
from datetime import datetime
data = pd.DataFrame({'EUROSTOXX' :
                     es['SX5E'][es.index > datetime(1999, 12, 31)]})
data = data.join(pd.DataFrame({'VSTOXX' :
                     vs['V2TX'][vs.index > datetime(1999, 12, 31)]}))
data
Out[57]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3532 entries, 2000-01-03 00:00:00 to 2013-10-10 00:00:00
Data columns (total 2 columns):
EUROSTOXX    3532  non-null values
VSTOXX       3512  non-null values
dtypes: float64(2)

Let's inspect the two time series.

In [58]:
data.head()
Out[58]:
EUROSTOXX VSTOXX
date
2000-01-03 4849.22 30.9845
2000-01-04 4657.83 33.2225
2000-01-05 4541.75 32.5944
2000-01-06 4500.69 31.1811
2000-01-07 4648.27 27.4407

A picture can tell the complete story.

In [59]:
data.plot(subplots=True, grid=True, style='b')
Out[59]:
array([<matplotlib.axes.AxesSubplot object at 0x1071d4d50>,
       <matplotlib.axes.AxesSubplot object at 0x105f46e90>], dtype=object)

We now generate log returns for both time series.

In [60]:
rets = np.log(data / data.shift(1)) 
rets.head()
Out[60]:
EUROSTOXX VSTOXX
date
2000-01-03 NaN NaN
2000-01-04 -0.040268 0.069740
2000-01-05 -0.025237 -0.019087
2000-01-06 -0.009082 -0.044328
2000-01-07 0.032264 -0.127785

To this new data set, also stored in a DataFrame object, we apply ordinary least-square regression (OLS).

In [61]:
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
model = pd.ols(y=ydat, x=xdat)
model
Out[61]:

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         3491
Number of Degrees of Freedom:   2

R-squared:         0.5573
Adj R-squared:     0.5571

Rmse:              0.0378

F-stat (1, 3489):  4391.4224, p-value:     0.0000

Degrees of Freedom: model 1, resid 3489

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -2.7062     0.0408     -66.27     0.0000    -2.7863    -2.6262
     intercept    -0.0007     0.0006      -1.05     0.2952    -0.0019     0.0006
---------------------------------End of Summary---------------------------------

Again, we want to see how our results look graphically.

In [62]:
plt.plot(xdat, ydat, 'r.')
ax = plt.axis()  # grab axis values
x = np.linspace(ax[0], ax[1] + 0.01)
plt.plot(x, model.beta[1] + model.beta[0] * x, 'b', lw=2)
plt.grid(True)
plt.axis('tight')
Out[62]:
(-0.10000000000000001, 0.16, -0.43366353822915737, 0.43687964474802654)

Analyzing High Frequency Data

Using pandas, the code that follows reads intraday, high-frequency data from a Web source, plots it and resamples it.

In [63]:
try:
    AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)
except:
    date = datetime.now()
    url1 = 'http://hopey.netfonds.no/posdump.php?'
    url2 = 'date=%s%s%s&paper=AAPL.O&csv_format=csv' % ('2013', '10', '31')
    # you may have to adjust the date since only recent dates are available
    urlretrieve(url1 + url2, 'aapl.csv')
    AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)
In [64]:
AAPL
Out[64]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 12944 entries, 2013-10-31 09:00:01 to 2013-10-31 21:21:26
Data columns (total 6 columns):
bid                  12944  non-null values
bid_depth            12944  non-null values
bid_depth_total      12944  non-null values
offer                12944  non-null values
offer_depth          12944  non-null values
offer_depth_total    12944  non-null values
dtypes: float64(2), int64(4)

The intraday evolution of the Apple stock price.

In [65]:
AAPL['bid'].plot()
Out[65]:
<matplotlib.axes.AxesSubplot at 0x106e58090>

A resampling of the data is easily accomplished with pandas.

In [66]:
import datetime as dt
# this resamples the record frequency to 5 minutes, using mean as aggregation rule
AAPL_resam = AAPL.resample(rule='5min', how='mean')
AAPL_resam.head()
Out[66]:
bid bid_depth bid_depth_total offer offer_depth offer_depth_total
time
2013-10-31 09:00:00 519.223333 133.333333 133.333333 527.000 100 100
2013-10-31 09:05:00 NaN NaN NaN NaN NaN NaN
2013-10-31 09:10:00 NaN NaN NaN NaN NaN NaN
2013-10-31 09:15:00 NaN NaN NaN NaN NaN NaN
2013-10-31 09:20:00 523.290000 500.000000 500.000000 526.995 300 300

Let's have a graphical look at the new data set.

In [67]:
AAPL_resam['bid'].plot()
Out[67]:
<matplotlib.axes.AxesSubplot at 0x10b1eb5d0>

Obvioulsly, there hasn't been trading data for every interval.

pandas has a number of approaches to fill in missing data.

In [68]:
AAPL_resam['bid'].fillna(method='ffill').plot()  # this forward fills missing data
Out[68]:
<matplotlib.axes.AxesSubplot at 0x10b19fad0>

Compute-Intensive Applications

The simulation of the movement of N bodies in relation to each other, where \(N\) is a large number and a body can be a star/planet or a small particle, is an important problem in Physics.

The basic simulation algorithm involves a number of iterations ("loops") since one has to determine the force that every single body exerts on all the other bodies in the system to be simulated (e.g. a galaxy or a system of atomic particles).

The algorithm lends iteself pretty well for parallization. However, the example code that follows does not use any parallelization techniques. It illustrates rather what speed-ups are achievable by

  • using vectorization approaches with NumPy and
  • just-in-time compiling capabilities of Numba

A description and discussion of the algorithm is found in the recent article Vladimirov, Andrey and Vadim Karpusenko (2013): "Test-Driving Intel Xeon-Phi Coprocessors with a Basic N-Body Simulation." White Paper, Colfax International.

Pure Python

We first use pure Python to implement the algorithm. We start by generating random vectors of particle locations and a zero vector for the speed of each particle.

In [69]:
import random
In [70]:
nParticles = 500
particle = []
for i in range(nParticles):
    particle.append([random.gauss(0.0, 1.0) for j in range(3)])
In [71]:
particlev = [[0, 0, 0] for x in particle]

Next, we define the simulation function for the movements of each particle given their gravitational interaction.

In [72]:
def nbody_py(particle, particlev):  # lists as input
    nSteps = 15; dt = 0.01
    for step in range(1, nSteps + 1, 1):
        for i in range(nParticles):
            Fx = 0.0; Fy = 0.0; Fz = 0.0
            for j in range(nParticles):
                if j != i:
                    dx = particle[j][0] - particle[i][0]
                    dy = particle[j][1] - particle[i][1]
                    dz = particle[j][2] - particle[i][2]
                    drSquared = dx * dx + dy * dy + dz * dz
                    drPowerN32 = 1.0 / (drSquared + sqrt(drSquared))
                    Fx += dx * drPowerN32
                    Fy += dy * drPowerN32
                    Fz += dz * drPowerN32
                particlev[i][0] += dt * Fx
                particlev[i][1] += dt * Fy
                particlev[i][2] += dt * Fz
        for i in range(nParticles):
            particle[i][0] += particlev[i][0] * dt
            particle[i][1] += particlev[i][1] * dt
            particle[i][2] += particlev[i][2] * dt

We then execute the function to measure execution speed.

In [73]:
%time nbody_py(particle, particlev)
CPU times: user 6.52 s, sys: 19.3 ms, total: 6.54 s
Wall time: 6.55 s

NumPy Solution

The basic algorithm allows for a number of vectorizations by using NumPy.

In [74]:
import numpy as np
In [75]:
particle = np.random.standard_normal((nParticles, 3))
particlev = np.zeros_like(particle)

Using NumPy arrays and matplotlib's 3d plotting capabilities, we can easily visualize the intial positions of the particles in the space.

In [76]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
In [77]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(particle[:, 0], particle[:, 1], particle[:, 2],
           c='r', marker='.')
Out[77]:
<mpl_toolkits.mplot3d.art3d.Patch3DCollection at 0x1081c9b50>

The respective NumPy simulation function is considerably more compact.

In [78]:
def nbody_np(particle, particlev):  # NumPy arrays as input
    nSteps = 15; dt = 0.01
    for step in range(1, nSteps + 1, 1):
        Fp = np.zeros((nParticles, 3))
        for i in range(nParticles):
            dp = particle - particle[i]
            drSquared = np.sum(dp ** 2, axis=1)
            h = drSquared + np.sqrt(drSquared)
            drPowerN32 = 1. / np.maximum(h, 1E-10)
            Fp += -(dp.T * drPowerN32).T
            particlev += dt * Fp
        particle += particlev * dt

It is also considerably faster.

In [79]:
%time nbody_np(particle, particlev)
CPU times: user 493 ms, sys: 1.47 ms, total: 495 ms
Wall time: 494 ms

Let's check how the particles moved. The following is a snapshot of the final positions of all particles.

In [80]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(particle[:, 0], particle[:, 1], particle[:, 2],
           c='r', marker='.')
Out[80]:
<mpl_toolkits.mplot3d.art3d.Patch3DCollection at 0x10820d8d0>

Dynamically Compiled Version

Numba is an open-source, NumPy-aware optimizing compiler for Python sponsored by The Python Quants GmbH It uses the remarkable LLVM compiler infrastructure to compile Python byte-code to machine code especially for use in the NumPy run-time and SciPy modules.

For this case, we first implement a function which is somehow a hybrid function of the pure Python (loop-heavy) and the NumPy array class-based version.

In [81]:
def nbody_hy(particle, particlev):  # NumPy arrays as input
    nSteps = 15; dt = 0.01
    for step in range(1, nSteps + 1, 1):
        for i in range(nParticles):
            Fx = 0.0; Fy = 0.0; Fz = 0.0
            for j in range(nParticles):
                if j != i:
                    dx = particle[j,0] - particle[i,0]
                    dy = particle[j,1] - particle[i,1]
                    dz = particle[j,2] - particle[i,2]
                    drSquared = dx * dx + dy * dy + dz * dz
                    drPowerN32 = 1.0 / (drSquared + sqrt(drSquared))
                    Fx += dx * drPowerN32
                    Fy += dy * drPowerN32
                    Fz += dz * drPowerN32
                particlev[i, 0] += dt * Fx
                particlev[i, 1] += dt * Fy
                particlev[i, 2] += dt * Fz
        for i in range(nParticles):
            particle[i,0] += particlev[i,0] * dt
            particle[i,1] += particlev[i,1] * dt
            particle[i,2] += particlev[i,2] * dt

This function is then compiled with Numba.

In [82]:
import numba as nb
In [83]:
nbody_nb = nb.autojit(nbody_hy)
nbody_nb(particle, particlev)

DEBUG -- translate:361:translate
; ModuleID = 'tmp.module.__main__.nbody_hy.108244cf8'

@PyArray_API = linkonce_odr global i8** inttoptr (i64 4349352640 to i8**)

define void @__numba_specialized_0___main___2E_nbody_hy({ i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev) {
entry:
  %nsteps8 = alloca i64
  %target_temp7 = alloca i64
  %nsteps6 = alloca i64
  %target_temp5 = alloca i64
  %nsteps4 = alloca i64
  %target_temp3 = alloca i64
  %nsteps = alloca i64
  %target_temp = alloca i64
  %particlev2 = alloca { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }*
  %particle1 = alloca { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }*
  store { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particle1
  %0 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle to { i64, i8* }*
  call void @Py_INCREF({ i64, i8* }* %0)
  %1 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i32 0, i32 2
  %2 = load i8** %1, !tbaa !2
  %3 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i32 0, i32 4
  %4 = load i64** %3, !tbaa !4
  %5 = getelementptr i64* %4, i32 0
  %6 = load i64* %5
  %7 = getelementptr i64* %4, i32 1
  %8 = load i64* %7
  %9 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i32 0, i32 5
  %10 = load i64** %9, !tbaa !5
  %11 = getelementptr i64* %10, i32 0
  %12 = load i64* %11
  %13 = getelementptr i64* %10, i32 1
  %14 = load i64* %13
  store { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particlev2
  %15 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev to { i64, i8* }*
  call void @Py_INCREF({ i64, i8* }* %15)
  %16 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i32 0, i32 2
  %17 = load i8** %16, !tbaa !2
  %18 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i32 0, i32 4
  %19 = load i64** %18, !tbaa !4
  %20 = getelementptr i64* %19, i32 0
  %21 = load i64* %20
  %22 = getelementptr i64* %19, i32 1
  %23 = load i64* %22
  %24 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i32 0, i32 5
  %25 = load i64** %24, !tbaa !5
  %26 = getelementptr i64* %25, i32 0
  %27 = load i64* %26
  %28 = getelementptr i64* %25, i32 1
  %29 = load i64* %28
  store i64 0, i64* %target_temp, !tbaa !6
  br i1 true, label %ifexp.then, label %ifexp.else

cleanup_label:                                    ; preds = %"exit_for_3:4", %error_label
  %30 = load { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particlev2, !tbaa !14
  %31 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %30 to { i64, i8* }*
  call void @Py_XDECREF({ i64, i8* }* %31)
  %32 = load { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particle1, !tbaa !14
  %33 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %32 to { i64, i8* }*
  call void @Py_XDECREF({ i64, i8* }* %33)
  ret void

error_label:                                      ; No predecessors!
  br label %cleanup_label

ifexp.then:                                       ; preds = %entry
  br label %ifexp.merge

ifexp.else:                                       ; preds = %entry
  br label %ifexp.merge

ifexp.merge:                                      ; preds = %ifexp.else, %ifexp.then
  %34 = phi i32 [ 1, %ifexp.then ], [ -1, %ifexp.else ]
  %35 = sext i32 %34 to i64
  %36 = sub i64 16, %35
  %37 = sdiv i64 %36, 1
  store i64 %37, i64* %nsteps, !tbaa !7
  br label %"for_condition_3:16"

"for_condition_3:16":                             ; preds = %ifexp.merge, %"exit_for_19:8"
  %i_1 = phi i64 [ 123456789, %ifexp.merge ], [ %i_4, %"exit_for_19:8" ]
  %dz_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %dz_2, %"exit_for_19:8" ]
  %Fx_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %Fx_2, %"exit_for_19:8" ]
  %Fy_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %Fy_2, %"exit_for_19:8" ]
  %Fz_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %Fz_2, %"exit_for_19:8" ]
  %j_1 = phi i64 [ 123456789, %ifexp.merge ], [ %j_2, %"exit_for_19:8" ]
  %drPowerN32_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %drPowerN32_2, %"exit_for_19:8" ]
  %dx_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %dx_2, %"exit_for_19:8" ]
  %dy_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %dy_2, %"exit_for_19:8" ]
  %drSquared_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %drSquared_2, %"exit_for_19:8" ]
  %38 = load i64* %target_temp, !tbaa !6
  %39 = load i64* %nsteps, !tbaa !7
  %40 = icmp slt i64 %38, %39
  %41 = icmp ne i1 %40, false
  br i1 %41, label %"loop_body_4:8", label %"exit_for_3:4"

"exit_for_3:4":                                   ; preds = %"for_condition_3:16"
  br label %cleanup_label

"loop_body_4:8":                                  ; preds = %"for_condition_3:16"
  %42 = load i64* %target_temp, !tbaa !6
  %43 = mul i64 %42, 1
  %44 = add i64 1, %43
  %45 = load i64* %target_temp, !tbaa !6
  %46 = add i64 %45, 1
  store i64 %46, i64* %target_temp, !tbaa !6
  store i64 0, i64* %target_temp3, !tbaa !8
  store i64 500, i64* %nsteps4, !tbaa !9
  br label %"for_condition_4:17"

"for_condition_4:17":                             ; preds = %"loop_body_4:8", %"exit_for_6:12"
  %dz_2 = phi double [ %dz_3, %"exit_for_6:12" ], [ %dz_1, %"loop_body_4:8" ]
  %i_2 = phi i64 [ %51, %"exit_for_6:12" ], [ %i_1, %"loop_body_4:8" ]
  %Fx_2 = phi double [ %Fx_4, %"exit_for_6:12" ], [ %Fx_1, %"loop_body_4:8" ]
  %Fy_2 = phi double [ %Fy_4, %"exit_for_6:12" ], [ %Fy_1, %"loop_body_4:8" ]
  %Fz_2 = phi double [ %Fz_4, %"exit_for_6:12" ], [ %Fz_1, %"loop_body_4:8" ]
  %j_2 = phi i64 [ %j_3, %"exit_for_6:12" ], [ %j_1, %"loop_body_4:8" ]
  %drPowerN32_2 = phi double [ %drPowerN32_3, %"exit_for_6:12" ], [ %drPowerN32_1, %"loop_body_4:8" ]
  %dx_2 = phi double [ %dx_3, %"exit_for_6:12" ], [ %dx_1, %"loop_body_4:8" ]
  %dy_2 = phi double [ %dy_3, %"exit_for_6:12" ], [ %dy_1, %"loop_body_4:8" ]
  %drSquared_2 = phi double [ %drSquared_3, %"exit_for_6:12" ], [ %drSquared_1, %"loop_body_4:8" ]
  %47 = load i64* %target_temp3, !tbaa !8
  %48 = load i64* %nsteps4, !tbaa !9
  %49 = icmp slt i64 %47, %48
  %50 = icmp ne i1 %49, false
  br i1 %50, label %"loop_body_5:12", label %"exit_for_4:8"

"exit_for_4:8":                                   ; preds = %"for_condition_4:17"
  store i64 0, i64* %target_temp7, !tbaa !12
  store i64 500, i64* %nsteps8, !tbaa !13
  br label %"for_condition_19:17"

"loop_body_5:12":                                 ; preds = %"for_condition_4:17"
  %51 = load i64* %target_temp3, !tbaa !8
  %52 = load i64* %target_temp3, !tbaa !8
  %53 = add i64 %52, 1
  store i64 %53, i64* %target_temp3, !tbaa !8
  store i64 0, i64* %target_temp5, !tbaa !10
  store i64 500, i64* %nsteps6, !tbaa !11
  br label %"for_condition_6:21"

"for_condition_6:21":                             ; preds = %"loop_body_5:12", %"exit_if_7:16"
  %dz_3 = phi double [ %dz_2, %"loop_body_5:12" ], [ %dz_5, %"exit_if_7:16" ]
  %drSquared_3 = phi double [ %drSquared_2, %"loop_body_5:12" ], [ %drSquared_5, %"exit_if_7:16" ]
  %Fx_4 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fx_6, %"exit_if_7:16" ]
  %Fy_4 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fy_6, %"exit_if_7:16" ]
  %Fz_4 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fz_6, %"exit_if_7:16" ]
  %j_3 = phi i64 [ %j_2, %"loop_body_5:12" ], [ %58, %"exit_if_7:16" ]
  %drPowerN32_3 = phi double [ %drPowerN32_2, %"loop_body_5:12" ], [ %drPowerN32_5, %"exit_if_7:16" ]
  %dx_3 = phi double [ %dx_2, %"loop_body_5:12" ], [ %dx_5, %"exit_if_7:16" ]
  %dy_3 = phi double [ %dy_2, %"loop_body_5:12" ], [ %dy_5, %"exit_if_7:16" ]
  %54 = load i64* %target_temp5, !tbaa !10
  %55 = load i64* %nsteps6, !tbaa !11
  %56 = icmp slt i64 %54, %55
  %57 = icmp ne i1 %56, false
  br i1 %57, label %"loop_body_7:16", label %"exit_for_6:12"

"exit_for_6:12":                                  ; preds = %"for_condition_6:21"
  br label %"for_condition_4:17"

"loop_body_7:16":                                 ; preds = %"for_condition_6:21"
  %58 = load i64* %target_temp5, !tbaa !10
  %59 = load i64* %target_temp5, !tbaa !10
  %60 = add i64 %59, 1
  store i64 %60, i64* %target_temp5, !tbaa !10
  br label %"if_cond_7:19"

"if_cond_7:19":                                   ; preds = %"loop_body_7:16"
  %61 = icmp ne i64 %58, %51
  %62 = icmp ne i1 %61, false
  br i1 %62, label %"if_body_8:20", label %"exit_if_7:16"

"exit_if_7:16":                                   ; preds = %"if_cond_7:19", %"if_body_8:20"
  %dz_5 = phi double [ %152, %"if_body_8:20" ], [ %dz_3, %"if_cond_7:19" ]
  %drSquared_5 = phi double [ %157, %"if_body_8:20" ], [ %drSquared_3, %"if_cond_7:19" ]
  %Fx_6 = phi double [ %162, %"if_body_8:20" ], [ %Fx_4, %"if_cond_7:19" ]
  %Fy_6 = phi double [ %164, %"if_body_8:20" ], [ %Fy_4, %"if_cond_7:19" ]
  %Fz_6 = phi double [ %166, %"if_body_8:20" ], [ %Fz_4, %"if_cond_7:19" ]
  %drPowerN32_5 = phi double [ %160, %"if_body_8:20" ], [ %drPowerN32_3, %"if_cond_7:19" ]
  %dx_5 = phi double [ %122, %"if_body_8:20" ], [ %dx_3, %"if_cond_7:19" ]
  %dy_5 = phi double [ %137, %"if_body_8:20" ], [ %dy_3, %"if_cond_7:19" ]
  %63 = mul i64 %51, %27
  %64 = add i64 0, %63
  %65 = mul i64 0, %29
  %66 = add i64 %64, %65
  %67 = getelementptr i8* %17, i64 %66
  %68 = bitcast i8* %67 to double*
  %69 = load double* %68, !tbaa !2
  %70 = fmul double 1.000000e-02, %Fx_6
  %71 = fadd double %69, %70
  %72 = mul i64 %51, %27
  %73 = add i64 0, %72
  %74 = mul i64 0, %29
  %75 = add i64 %73, %74
  %76 = getelementptr i8* %17, i64 %75
  %77 = bitcast i8* %76 to double*
  store double %71, double* %77, !tbaa !2
  %78 = mul i64 %51, %27
  %79 = add i64 0, %78
  %80 = mul i64 1, %29
  %81 = add i64 %79, %80
  %82 = getelementptr i8* %17, i64 %81
  %83 = bitcast i8* %82 to double*
  %84 = load double* %83, !tbaa !2
  %85 = fmul double 1.000000e-02, %Fy_6
  %86 = fadd double %84, %85
  %87 = mul i64 %51, %27
  %88 = add i64 0, %87
  %89 = mul i64 1, %29
  %90 = add i64 %88, %89
  %91 = getelementptr i8* %17, i64 %90
  %92 = bitcast i8* %91 to double*
  store double %86, double* %92, !tbaa !2
  %93 = mul i64 %51, %27
  %94 = add i64 0, %93
  %95 = mul i64 2, %29
  %96 = add i64 %94, %95
  %97 = getelementptr i8* %17, i64 %96
  %98 = bitcast i8* %97 to double*
  %99 = load double* %98, !tbaa !2
  %100 = fmul double 1.000000e-02, %Fz_6
  %101 = fadd double %99, %100
  %102 = mul i64 %51, %27
  %103 = add i64 0, %102
  %104 = mul i64 2, %29
  %105 = add i64 %103, %104
  %106 = getelementptr i8* %17, i64 %105
  %107 = bitcast i8* %106 to double*
  store double %101, double* %107, !tbaa !2
  br label %"for_condition_6:21"

"if_body_8:20":                                   ; preds = %"if_cond_7:19"
  %108 = mul i64 %58, %12
  %109 = add i64 0, %108
  %110 = mul i64 0, %14
  %111 = add i64 %109, %110
  %112 = getelementptr i8* %2, i64 %111
  %113 = bitcast i8* %112 to double*
  %114 = load double* %113, !tbaa !2
  %115 = mul i64 %51, %12
  %116 = add i64 0, %115
  %117 = mul i64 0, %14
  %118 = add i64 %116, %117
  %119 = getelementptr i8* %2, i64 %118
  %120 = bitcast i8* %119 to double*
  %121 = load double* %120, !tbaa !2
  %122 = fsub double %114, %121
  %123 = mul i64 %58, %12
  %124 = add i64 0, %123
  %125 = mul i64 1, %14
  %126 = add i64 %124, %125
  %127 = getelementptr i8* %2, i64 %126
  %128 = bitcast i8* %127 to double*
  %129 = load double* %128, !tbaa !2
  %130 = mul i64 %51, %12
  %131 = add i64 0, %130
  %132 = mul i64 1, %14
  %133 = add i64 %131, %132
  %134 = getelementptr i8* %2, i64 %133
  %135 = bitcast i8* %134 to double*
  %136 = load double* %135, !tbaa !2
  %137 = fsub double %129, %136
  %138 = mul i64 %58, %12
  %139 = add i64 0, %138
  %140 = mul i64 2, %14
  %141 = add i64 %139, %140
  %142 = getelementptr i8* %2, i64 %141
  %143 = bitcast i8* %142 to double*
  %144 = load double* %143, !tbaa !2
  %145 = mul i64 %51, %12
  %146 = add i64 0, %145
  %147 = mul i64 2, %14
  %148 = add i64 %146, %147
  %149 = getelementptr i8* %2, i64 %148
  %150 = bitcast i8* %149 to double*
  %151 = load double* %150, !tbaa !2
  %152 = fsub double %144, %151
  %153 = fmul double %122, %122
  %154 = fmul double %137, %137
  %155 = fadd double %153, %154
  %156 = fmul double %152, %152
  %157 = fadd double %155, %156
  %158 = call double @"numba.math.['double'].sqrt"(double %157)
  %159 = fadd double %157, %158
  %160 = fdiv double 1.000000e+00, %159
  %161 = fmul double %122, %160
  %162 = fadd double %Fx_4, %161
  %163 = fmul double %137, %160
  %164 = fadd double %Fy_4, %163
  %165 = fmul double %152, %160
  %166 = fadd double %Fz_4, %165
  br label %"exit_if_7:16"

"for_condition_19:17":                            ; preds = %"exit_for_4:8", %"loop_body_20:29"
  %i_4 = phi i64 [ %i_2, %"exit_for_4:8" ], [ %171, %"loop_body_20:29" ]
  %167 = load i64* %target_temp7, !tbaa !12
  %168 = load i64* %nsteps8, !tbaa !13
  %169 = icmp slt i64 %167, %168
  %170 = icmp ne i1 %169, false
  br i1 %170, label %"loop_body_20:29", label %"exit_for_19:8"

"exit_for_19:8":                                  ; preds = %"for_condition_19:17"
  br label %"for_condition_3:16"

"loop_body_20:29":                                ; preds = %"for_condition_19:17"
  %171 = load i64* %target_temp7, !tbaa !12
  %172 = load i64* %target_temp7, !tbaa !12
  %173 = add i64 %172, 1
  store i64 %173, i64* %target_temp7, !tbaa !12
  %174 = mul i64 %171, %12
  %175 = add i64 0, %174
  %176 = mul i64 0, %14
  %177 = add i64 %175, %176
  %178 = getelementptr i8* %2, i64 %177
  %179 = bitcast i8* %178 to double*
  %180 = load double* %179, !tbaa !2
  %181 = mul i64 %171, %27
  %182 = add i64 0, %181
  %183 = mul i64 0, %29
  %184 = add i64 %182, %183
  %185 = getelementptr i8* %17, i64 %184
  %186 = bitcast i8* %185 to double*
  %187 = load double* %186, !tbaa !2
  %188 = fmul double %187, 1.000000e-02
  %189 = fadd double %180, %188
  %190 = mul i64 %171, %12
  %191 = add i64 0, %190
  %192 = mul i64 0, %14
  %193 = add i64 %191, %192
  %194 = getelementptr i8* %2, i64 %193
  %195 = bitcast i8* %194 to double*
  store double %189, double* %195, !tbaa !2
  %196 = mul i64 %171, %12
  %197 = add i64 0, %196
  %198 = mul i64 1, %14
  %199 = add i64 %197, %198
  %200 = getelementptr i8* %2, i64 %199
  %201 = bitcast i8* %200 to double*
  %202 = load double* %201, !tbaa !2
  %203 = mul i64 %171, %27
  %204 = add i64 0, %203
  %205 = mul i64 1, %29
  %206 = add i64 %204, %205
  %207 = getelementptr i8* %17, i64 %206
  %208 = bitcast i8* %207 to double*
  %209 = load double* %208, !tbaa !2
  %210 = fmul double %209, 1.000000e-02
  %211 = fadd double %202, %210
  %212 = mul i64 %171, %12
  %213 = add i64 0, %212
  %214 = mul i64 1, %14
  %215 = add i64 %213, %214
  %216 = getelementptr i8* %2, i64 %215
  %217 = bitcast i8* %216 to double*
  store double %211, double* %217, !tbaa !2
  %218 = mul i64 %171, %12
  %219 = add i64 0, %218
  %220 = mul i64 2, %14
  %221 = add i64 %219, %220
  %222 = getelementptr i8* %2, i64 %221
  %223 = bitcast i8* %222 to double*
  %224 = load double* %223, !tbaa !2
  %225 = mul i64 %171, %27
  %226 = add i64 0, %225
  %227 = mul i64 2, %29
  %228 = add i64 %226, %227
  %229 = getelementptr i8* %17, i64 %228
  %230 = bitcast i8* %229 to double*
  %231 = load double* %230, !tbaa !2
  %232 = fmul double %231, 1.000000e-02
  %233 = fadd double %224, %232
  %234 = mul i64 %171, %12
  %235 = add i64 0, %234
  %236 = mul i64 2, %14
  %237 = add i64 %235, %236
  %238 = getelementptr i8* %2, i64 %237
  %239 = bitcast i8* %238 to double*
  store double %233, double* %239, !tbaa !2
  br label %"for_condition_19:17"
}

declare { i64, i8* }* @Py_BuildValue(i8*, ...)

declare i32 @PyArg_ParseTuple({ i64, i8* }*, i8*, ...)

declare void @PyErr_Clear()

declare void @Py_INCREF({ i64, i8* }*)

declare double @"numba.math.['double'].sqrt"(double)

declare void @Py_XDECREF({ i64, i8* }*)

!tbaa = !{!0, !1, !2, !3, !4, !5, !6, !7, !8, !9, !10, !11, !12, !13, !14}

!0 = metadata !{metadata !"root"}
!1 = metadata !{metadata !"char *", metadata !0}
!2 = metadata !{metadata !"float3264 *", metadata !1}
!3 = metadata !{metadata !"npy_intp **", metadata !1}
!4 = metadata !{metadata !"tbaa_type(numpy shape, npy_intp *) *", metadata !3}
!5 = metadata !{metadata !"tbaa_type(numpy strides, npy_intp *) *", metadata !3}
!6 = metadata !{metadata !"unique0", metadata !1}
!7 = metadata !{metadata !"unique1", metadata !1}
!8 = metadata !{metadata !"unique2", metadata !1}
!9 = metadata !{metadata !"unique3", metadata !1}
!10 = metadata !{metadata !"unique4", metadata !1}
!11 = metadata !{metadata !"unique5", metadata !1}
!12 = metadata !{metadata !"unique6", metadata !1}
!13 = metadata !{metadata !"unique7", metadata !1}
!14 = metadata !{metadata !"object", metadata !1}


DEBUG -- translate:361:translate
; ModuleID = 'numba_executable_module'

@PyArray_API = linkonce_odr global i8** inttoptr (i64 4349352640 to i8**)

define void @Py_INCREF({ i64, i8* }* %obj) {
decl:
  %obj1 = alloca { i64, i8* }*
  store { i64, i8* }* %obj, { i64, i8* }** %obj1
  %0 = bitcast { i64, i8* }* %obj to i64*
  %1 = load i64* %0
  %2 = add i64 %1, 1
  store i64 %2, i64* %0
  ret void
}

define void @Py_DECREF({ i64, i8* }* %obj) {
decl:
  %obj1 = alloca { i64, i8* }*
  store { i64, i8* }* %obj, { i64, i8* }** %obj1
  %0 = bitcast { i64, i8* }* %obj to i64*
  %1 = load i64* %0
  %2 = icmp sgt i64 %1, 1
  br i1 %2, label %if.then, label %if.else

if.then:                                          ; preds = %decl
  %3 = bitcast { i64, i8* }* %obj to i64*
  %4 = add i64 %1, -1
  store i64 %4, i64* %3
  br label %if.end

if.else:                                          ; preds = %decl
  call void @Py_DecRef({ i64, i8* }* %obj)
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  ret void
}

declare void @Py_DecRef({ i64, i8* }*)

define void @Py_XINCREF({ i64, i8* }* %obj) {
decl:
  %obj1 = alloca { i64, i8* }*
  store { i64, i8* }* %obj, { i64, i8* }** %obj1
  %0 = ptrtoint { i64, i8* }* %obj to i64
  %1 = icmp ne i64 %0, 0
  br i1 %1, label %if.then, label %if.end

if.then:                                          ; preds = %decl
  %2 = bitcast { i64, i8* }* %obj to i64*
  %3 = load i64* %2
  %4 = add i64 %3, 1
  store i64 %4, i64* %2
  br label %if.end

if.end:                                           ; preds = %if.then, %decl
  ret void
}

define void @Py_XDECREF({ i64, i8* }* %obj) {
decl:
  %obj1 = alloca { i64, i8* }*
  store { i64, i8* }* %obj, { i64, i8* }** %obj1
  %0 = ptrtoint { i64, i8* }* %obj to i64
  %1 = icmp ne i64 %0, 0
  br i1 %1, label %if.then, label %if.end

if.then:                                          ; preds = %decl
  call void @Py_DECREF({ i64, i8* }* %obj)
  br label %if.end

if.end:                                           ; preds = %if.then, %decl
  ret void
}

define i8* @IndexAxis(i8* %data, i64* %in_shape, i64* %in_strides, i64 %src_dim, i64 %index) {
decl:
  %data1 = alloca i8*
  %in_shape2 = alloca i64*
  %in_strides3 = alloca i64*
  %src_dim4 = alloca i64
  %index5 = alloca i64
  %result = alloca i8*
  store i8* %data, i8** %data1
  store i64* %in_shape, i64** %in_shape2
  store i64* %in_strides, i64** %in_strides3
  store i64 %src_dim, i64* %src_dim4
  store i64 %index, i64* %index5
  %0 = load i64** %in_strides3
  %1 = load i64* %src_dim4
  %2 = getelementptr inbounds i64* %0, i64 %1
  %3 = load i64* %2
  %4 = mul i64 %3, %index
  %5 = load i8** %data1
  %6 = getelementptr inbounds i8* %5, i64 %4
  store i8* %6, i8** %result
  ret i8* %6
}

define void @NewAxis(i64* %out_shape, i64* %out_strides, i32 %dst_dim) {
decl:
  %out_shape1 = alloca i64*
  %out_strides2 = alloca i64*
  %dst_dim3 = alloca i32
  store i64* %out_shape, i64** %out_shape1
  store i64* %out_strides, i64** %out_strides2
  store i32 %dst_dim, i32* %dst_dim3
  %0 = load i64** %out_shape1
  %1 = getelementptr inbounds i64* %0, i32 %dst_dim
  store i64 1, i64* %1
  %2 = load i64** %out_strides2
  %3 = load i32* %dst_dim3
  %4 = getelementptr inbounds i64* %2, i32 %3
  store i64 0, i64* %4
  ret void
}

define i32 @Broadcast(i64* %dst_shape, i64* %src_shape, i64* %src_strides, i32 %max_ndim, i32 %ndim) {
decl:
  %dst_shape1 = alloca i64*
  %src_shape2 = alloca i64*
  %src_strides3 = alloca i64*
  %max_ndim4 = alloca i32
  %ndim5 = alloca i32
  %0 = alloca i32
  store i64* %dst_shape, i64** %dst_shape1
  store i64* %src_shape, i64** %src_shape2
  store i64* %src_strides, i64** %src_strides3
  store i32 %max_ndim, i32* %max_ndim4
  store i32 %ndim, i32* %ndim5
  %1 = load i32* %max_ndim4
  %2 = sub i32 %1, %ndim
  store i32 0, i32* %0
  br label %loop.cond

loop.cond:                                        ; preds = %if.end11, %decl
  %3 = load i32* %0
  %4 = load i32* %ndim5
  %5 = icmp slt i32 %3, %4
  br i1 %5, label %loop.body, label %loop.end

loop.body:                                        ; preds = %loop.cond
  %6 = load i64** %src_shape2
  %7 = getelementptr inbounds i64* %6, i32 %3
  %8 = add i32 %3, %2
  %9 = load i64** %dst_shape1
  %10 = getelementptr inbounds i64* %9, i32 %8
  %11 = load i64* %7
  %12 = icmp eq i64 %11, 1
  br i1 %12, label %if.then, label %if.else

loop.end:                                         ; preds = %if.else7, %loop.cond
  %merge = phi i32 [ 1, %loop.cond ], [ 0, %if.else7 ]
  ret i32 %merge

if.then:                                          ; preds = %loop.body
  %13 = load i64** %src_strides3
  %14 = getelementptr inbounds i64* %13, i32 %3
  store i64 0, i64* %14
  br label %if.end11

if.else:                                          ; preds = %loop.body
  %15 = load i64* %10
  %16 = icmp eq i64 %15, 1
  br i1 %16, label %if.then6, label %if.else7

if.then6:                                         ; preds = %if.else
  store i64 %11, i64* %10
  br label %if.end11

if.else7:                                         ; preds = %if.else
  %17 = icmp ne i64 %11, %15
  br i1 %17, label %loop.end, label %if.end11

if.end11:                                         ; preds = %if.else7, %if.then6, %if.then
  %18 = load i32* %0
  %19 = add i32 %18, 1
  store i32 %19, i32* %0
  br label %loop.cond
}

define void @__numba_specialized_0___main___2E_nbody_hy({ i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev) {
entry:
  %0 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle to { i64, i8* }*
  tail call void @Py_INCREF({ i64, i8* }* %0)
  %1 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i64 0, i32 2
  %2 = load i8** %1, align 8, !tbaa !2
  %3 = ptrtoint i8* %2 to i64
  %4 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i64 0, i32 5
  %5 = load i64** %4, align 8, !tbaa !5
  %6 = load i64* %5, align 8
  %7 = getelementptr i64* %5, i64 1
  %8 = load i64* %7, align 8
  %9 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev to { i64, i8* }*
  tail call void @Py_INCREF({ i64, i8* }* %9)
  %10 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i64 0, i32 2
  %11 = load i8** %10, align 8, !tbaa !2
  %12 = ptrtoint i8* %11 to i64
  %13 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i64 0, i32 5
  %14 = load i64** %13, align 8, !tbaa !5
  %15 = load i64* %14, align 8
  %16 = getelementptr i64* %14, i64 1
  %17 = load i64* %16, align 8
  %18 = shl i64 %17, 1
  %19 = shl i64 %8, 1
  %20 = sub i64 0, %3
  %21 = sub i64 0, %12
  br label %"loop_body_4:8"

"for_condition_3:16.loopexit":                    ; preds = %"loop_body_20:29"
  %22 = add i64 %25, 1
  %exitcond33 = icmp eq i64 %22, 15
  br i1 %exitcond33, label %"exit_for_3:4", label %"loop_body_4:8"

"exit_for_3:4":                                   ; preds = %"for_condition_3:16.loopexit"
  %23 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev to { i64, i8* }*
  %24 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle to { i64, i8* }*
  tail call void @Py_XDECREF({ i64, i8* }* %23)
  tail call void @Py_XDECREF({ i64, i8* }* %24)
  ret void

"loop_body_4:8":                                  ; preds = %"for_condition_3:16.loopexit", %entry
  %25 = phi i64 [ 0, %entry ], [ %22, %"for_condition_3:16.loopexit" ]
  br label %"loop_body_5:12"

"for_condition_4:17.loopexit":                    ; preds = %"exit_if_7:16"
  %26 = add i64 %27, 1
  %exitcond31 = icmp eq i64 %26, 500
  br i1 %exitcond31, label %"loop_body_20:29", label %"loop_body_5:12"

"loop_body_5:12":                                 ; preds = %"for_condition_4:17.loopexit", %"loop_body_4:8"
  %27 = phi i64 [ 0, %"loop_body_4:8" ], [ %26, %"for_condition_4:17.loopexit" ]
  %28 = mul i64 %27, %15
  %29 = add i64 %28, %17
  %30 = add i64 %28, %18
  %31 = mul i64 %27, %6
  %32 = add i64 %31, %8
  %33 = add i64 %31, %19
  br label %"loop_body_7:16"

"loop_body_7:16":                                 ; preds = %"exit_if_7:16", %"loop_body_5:12"
  %lsr.iv = phi i64 [ %lsr.iv.next, %"exit_if_7:16" ], [ %20, %"loop_body_5:12" ]
  %Fz_417 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fz_6, %"exit_if_7:16" ]
  %Fy_416 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fy_6, %"exit_if_7:16" ]
  %Fx_415 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fx_6, %"exit_if_7:16" ]
  %34 = phi i64 [ 0, %"loop_body_5:12" ], [ %35, %"exit_if_7:16" ]
  %35 = add i64 %34, 1
  %36 = icmp eq i64 %27, %34
  br i1 %36, label %"exit_if_7:16", label %"if_body_8:20"

"exit_if_7:16":                                   ; preds = %"loop_body_7:16", %"if_body_8:20"
  %Fx_6 = phi double [ %68, %"if_body_8:20" ], [ %Fx_415, %"loop_body_7:16" ]
  %Fy_6 = phi double [ %70, %"if_body_8:20" ], [ %Fy_416, %"loop_body_7:16" ]
  %Fz_6 = phi double [ %72, %"if_body_8:20" ], [ %Fz_417, %"loop_body_7:16" ]
  %sunkaddr = ptrtoint i8* %11 to i64
  %sunkaddr23 = add i64 %sunkaddr, %28
  %sunkaddr24 = inttoptr i64 %sunkaddr23 to double*
  %37 = load double* %sunkaddr24, align 8, !tbaa !2
  %38 = fmul double %Fx_6, 1.000000e-02
  %39 = fadd double %38, %37
  store double %39, double* %sunkaddr24, align 8, !tbaa !2
  %sunkaddr25 = ptrtoint i8* %11 to i64
  %sunkaddr26 = add i64 %sunkaddr25, %29
  %sunkaddr27 = inttoptr i64 %sunkaddr26 to double*
  %40 = load double* %sunkaddr27, align 8, !tbaa !2
  %41 = fmul double %Fy_6, 1.000000e-02
  %42 = fadd double %41, %40
  store double %42, double* %sunkaddr27, align 8, !tbaa !2
  %sunkaddr28 = ptrtoint i8* %11 to i64
  %sunkaddr29 = add i64 %sunkaddr28, %30
  %sunkaddr30 = inttoptr i64 %sunkaddr29 to double*
  %43 = load double* %sunkaddr30, align 8, !tbaa !2
  %44 = fmul double %Fz_6, 1.000000e-02
  %45 = fadd double %44, %43
  store double %45, double* %sunkaddr30, align 8, !tbaa !2
  %lsr.iv.next = sub i64 %lsr.iv, %6
  %exitcond = icmp eq i64 500, %35
  br i1 %exitcond, label %"for_condition_4:17.loopexit", label %"loop_body_7:16"

"if_body_8:20":                                   ; preds = %"loop_body_7:16"
  %46 = mul i64 %lsr.iv, -1
  %47 = inttoptr i64 %46 to double*
  %48 = load double* %47, align 8, !tbaa !2
  %sunkaddr31 = ptrtoint i8* %2 to i64
  %sunkaddr32 = add i64 %sunkaddr31, %31
  %sunkaddr33 = inttoptr i64 %sunkaddr32 to double*
  %49 = load double* %sunkaddr33, align 8, !tbaa !2
  %50 = fsub double %48, %49
  %51 = mul i64 %lsr.iv, -1
  %sunkaddr34 = add i64 %8, %51
  %sunkaddr35 = inttoptr i64 %sunkaddr34 to double*
  %52 = load double* %sunkaddr35, align 8, !tbaa !2
  %sunkaddr36 = ptrtoint i8* %2 to i64
  %sunkaddr37 = add i64 %sunkaddr36, %32
  %sunkaddr38 = inttoptr i64 %sunkaddr37 to double*
  %53 = load double* %sunkaddr38, align 8, !tbaa !2
  %54 = fsub double %52, %53
  %55 = mul i64 %lsr.iv, -1
  %sunkaddr39 = mul i64 %8, 2
  %sunkaddr40 = add i64 %55, %sunkaddr39
  %sunkaddr41 = inttoptr i64 %sunkaddr40 to double*
  %56 = load double* %sunkaddr41, align 8, !tbaa !2
  %sunkaddr42 = ptrtoint i8* %2 to i64
  %sunkaddr43 = add i64 %sunkaddr42, %33
  %sunkaddr44 = inttoptr i64 %sunkaddr43 to double*
  %57 = load double* %sunkaddr44, align 8, !tbaa !2
  %58 = fsub double %56, %57
  %59 = fmul double %50, %50
  %60 = fmul double %54, %54
  %61 = fadd double %59, %60
  %62 = fmul double %58, %58
  %63 = fadd double %61, %62
  %64 = tail call double @"numba.math.['double'].sqrt"(double %63)
  %65 = fadd double %64, %63
  %66 = fdiv double 1.000000e+00, %65
  %67 = fmul double %50, %66
  %68 = fadd double %Fx_415, %67
  %69 = fmul double %54, %66
  %70 = fadd double %Fy_416, %69
  %71 = fmul double %58, %66
  %72 = fadd double %Fz_417, %71
  br label %"exit_if_7:16"

"loop_body_20:29":                                ; preds = %"for_condition_4:17.loopexit", %"loop_body_20:29"
  %lsr.iv21 = phi i64 [ %lsr.iv.next22, %"loop_body_20:29" ], [ 500, %"for_condition_4:17.loopexit" ]
  %lsr.iv15 = phi i64 [ %lsr.iv.next16, %"loop_body_20:29" ], [ %21, %"for_condition_4:17.loopexit" ]
  %lsr.iv4 = phi i8* [ %89, %"loop_body_20:29" ], [ %2, %"for_condition_4:17.loopexit" ]
  %lsr.iv414 = bitcast i8* %lsr.iv4 to double*
  %lsr.iv45 = bitcast i8* %lsr.iv4 to i1*
  %73 = load double* %lsr.iv414, align 8, !tbaa !2
  %74 = mul i64 %lsr.iv15, -1
  %75 = inttoptr i64 %74 to double*
  %76 = load double* %75, align 8, !tbaa !2
  %77 = fmul double %76, 1.000000e-02
  %78 = fadd double %73, %77
  store double %78, double* %lsr.iv414, align 8, !tbaa !2
  %scevgep12 = getelementptr i8* %lsr.iv4, i64 %8
  %scevgep1213 = bitcast i8* %scevgep12 to double*
  %79 = load double* %scevgep1213, align 8, !tbaa !2
  %80 = mul i64 %lsr.iv15, -1
  %sunkaddr45 = add i64 %17, %80
  %sunkaddr46 = inttoptr i64 %sunkaddr45 to double*
  %81 = load double* %sunkaddr46, align 8, !tbaa !2
  %82 = fmul double %81, 1.000000e-02
  %83 = fadd double %79, %82
  %scevgep10 = getelementptr i8* %lsr.iv4, i64 %8
  %scevgep1011 = bitcast i8* %scevgep10 to double*
  store double %83, double* %scevgep1011, align 8, !tbaa !2
  %sunkaddr47 = ptrtoint i8* %lsr.iv4 to i64
  %sunkaddr48 = mul i64 %8, 2
  %sunkaddr49 = add i64 %sunkaddr47, %sunkaddr48
  %sunkaddr50 = inttoptr i64 %sunkaddr49 to double*
  %84 = load double* %sunkaddr50, align 8, !tbaa !2
  %85 = mul i64 %lsr.iv15, -1
  %sunkaddr51 = mul i64 %17, 2
  %sunkaddr52 = add i64 %85, %sunkaddr51
  %sunkaddr53 = inttoptr i64 %sunkaddr52 to double*
  %86 = load double* %sunkaddr53, align 8, !tbaa !2
  %87 = fmul double %86, 1.000000e-02
  %88 = fadd double %84, %87
  %sunkaddr54 = ptrtoint i8* %lsr.iv4 to i64
  %sunkaddr55 = mul i64 %8, 2
  %sunkaddr56 = add i64 %sunkaddr54, %sunkaddr55
  %sunkaddr57 = inttoptr i64 %sunkaddr56 to double*
  store double %88, double* %sunkaddr57, align 8, !tbaa !2
  %scevgep = getelementptr i1* %lsr.iv45, i64 %6
  %89 = bitcast i1* %scevgep to i8*
  %lsr.iv.next16 = sub i64 %lsr.iv15, %15
  %lsr.iv.next22 = add i64 %lsr.iv21, -1
  %exitcond32 = icmp eq i64 %lsr.iv.next22, 0
  br i1 %exitcond32, label %"for_condition_3:16.loopexit", label %"loop_body_20:29"
}

declare double @"numba.math.['double'].sqrt"(double)

define { i64, i8* }* @__numba_specialized_1_numba_2E_codegen_2E_llvmwrapper_2E___numba_wrapper_nbody_hy(i8* %self, { i64, i8* }* %args) {
entry:
  %0 = alloca { i64, i8* }*
  %1 = alloca { i64, i8* }*
  %return_value = alloca { i64, i8* }*
  %2 = call i32 ({ i64, i8* }*, i8*, ...)* @PyArg_ParseTuple({ i64, i8* }* %args, i8* getelementptr inbounds ([3 x i8]* @__STR_0, i32 0, i32 0), { i64, i8* }** %1, { i64, i8* }** %0)
  %3 = icmp eq i32 %2, 0
  br i1 %3, label %cleanup.if.true, label %cleanup.if.end

cleanup_label:                                    ; preds = %empty1, %error_label
  %4 = load { i64, i8* }** %return_value
  ret { i64, i8* }* %4

error_label:                                      ; preds = %empty2, %cleanup.if.true
  store { i64, i8* }* null, { i64, i8* }** %return_value
  %5 = load { i64, i8* }** %return_value, !tbaa !14
  call void @Py_XINCREF({ i64, i8* }* %5)
  br label %cleanup_label

cleanup.if.true:                                  ; preds = %entry
  br label %error_label

cleanup.if.end:                                   ; preds = %entry
  %6 = load { i64, i8* }** %1
  %7 = load { i64, i8* }** %0
  %8 = bitcast { i64, i8* }* %6 to { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }*
  %9 = bitcast { i64, i8* }* %7 to { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }*
  call void @__numba_specialized_0___main___2E_nbody_hy({ i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %8, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %9)
  br label %empty

empty:                                            ; preds = %cleanup.if.end
  %10 = call i8* @PyErr_Occurred()
  %11 = ptrtoint i8* %10 to i64
  %12 = icmp ne i64 %11, 0
  br i1 %12, label %empty2, label %empty1

empty1:                                           ; preds = %empty
  store { i64, i8* }* inttoptr (i64 4296527248 to { i64, i8* }*), { i64, i8* }** %return_value
  %13 = load { i64, i8* }** %return_value, !tbaa !14
  call void @Py_XINCREF({ i64, i8* }* %13)
  br label %cleanup_label

empty2:                                           ; preds = %empty
  br label %error_label
}

declare i8* @PyErr_Occurred()

declare { i64, i8* }* @Py_BuildValue(i8*, ...)

declare i32 @PyArg_ParseTuple({ i64, i8* }*, i8*, ...)

declare void @PyErr_Clear()

!tbaa = !{!0, !1, !2, !3, !4, !5, !6, !7, !8, !9, !10, !11, !12, !13, !14, !0, !1, !14}

!0 = metadata !{metadata !"root"}
!1 = metadata !{metadata !"char *", metadata !0}
!2 = metadata !{metadata !"float3264 *", metadata !1}
!3 = metadata !{metadata !"npy_intp **", metadata !1}
!4 = metadata !{metadata !"tbaa_type(numpy shape, npy_intp *) *", metadata !3}
!5 = metadata !{metadata !"tbaa_type(numpy strides, npy_intp *) *", metadata !3}
!6 = metadata !{metadata !"unique0", metadata !1}
!7 = metadata !{metadata !"unique1", metadata !1}
!8 = metadata !{metadata !"unique2", metadata !1}
!9 = metadata !{metadata !"unique3", metadata !1}
!10 = metadata !{metadata !"unique4", metadata !1}
!11 = metadata !{metadata !"unique5", metadata !1}
!12 = metadata !{metadata !"unique6", metadata !1}
!13 = metadata !{metadata !"unique7", metadata !1}
!14 = metadata !{metadata !"object", metadata !1}


This new, compiled version of the hybrid function is yet faster.

In [84]:
particle = np.random.standard_normal((nParticles, 3))
particlev = np.zeros_like(particle)
In [85]:
%time nbody_nb(particle, particlev)
CPU times: user 69.4 ms, sys: 113 µs, total: 69.5 ms
Wall time: 69.6 ms

Speed Comparisons

The dynamically compiled version is by far the fastest.

In [86]:
func_list = [nbody_py, nbody_np, nbody_nb]
In [87]:
perf_comp(func_list, data='particle, particlev', rep=3)
function: nbody_nb, av. time sec:   0.05783, relative:    1.0
function: nbody_np, av. time sec:   0.50086, relative:    8.7
function: nbody_py, av. time sec:  74.38617, relative: 1286.2

Python for Next Generation Data Analytics & Visualization?!

10 years ago, Python was considered exotic in the analytics space – at best. Languages/packages like R and Matlab dominated the scene. Today, Python has become a major force in data analytics & visualization due to a number of characteristics:

  • multi-purpose: prototyping, development, production, sytems administration – Python is one for all
  • libraries: there is a library for almost any task or problem you face
  • efficiency: Python speeds up all IT development tasks for analytics applications and reduces maintenance costs
  • performance: Python has evolved from a scripting language to a 'meta' language with bridges to all high performance environments (e.g. LLVM, multi-core CPUs, GPUs, clusters)
  • interoperalbility: Python seamlessly integrates with almost any other language and technology
  • interactivity: Python allows domain experts to get closer to their business and financial data pools and to do real-time analytics
  • collaboration: solutions like Wakari with IPython Notebook allow the easy sharing of code, data, results, graphics, etc.

One of the easiest ways to deploy Python today across a whole organization for collaboration with a heterogenous IT infrastructure is via Wakari.io, Continuum's Web-/Browser- and Python-based Data Analytics environment. It is availble both as a cloud as well as an enterprise solution for in-house deployment.

Wakari

The Python Quants



The Python Quants GmbH – Experts for Python in Quant finance

The Python Quants – the company Web site

www.pythonquants.com

Dr. Yves J. Hilpisch – my personal Web site

www.hilpisch.com

Derivatives Analytics with Python – my new book

Read About and Order the Book

Contact Us

yves@pythonquants.com

@dyjh