python - Pandas - Creating Difference Matrix from Data Frame -


i'm trying create matrix show differences between rows in pandas data frame.

import pandas pd  data = {'country':['gb','jp','us'],'values':[20.2,-10.5,5.7]} df = pd.dataframe(data) 

i this:

  country  values 0      gb    20.2 1      jp   -10.5 2          5.7 

to become (differences going vertically):

  country     gb     jp     0      gb    0.0  -30.7   14.5 1      jp   30.7    0.0   16.2 2        14.5  -16.2    0.0 

is achievable built-in function or need build loop desired output? help!

this standard use case numpy's broadcasting:

df['values'].values - df['values'].values[:, none] out:  array([[  0. , -30.7, -14.5],        [ 30.7,   0. ,  16.2],        [ 14.5, -16.2,   0. ]]) 

we access underlying numpy array values attribute , [:, none] introduces new axis result 2 dimensional.

you can concat original series:

arr = df['values'].values - df['values'].values[:, none] pd.concat((df['country'], pd.dataframe(arr, columns=df['country'])), axis=1) out:    country    gb    jp    0      gb   0.0 -30.7 -14.5 1      jp  30.7   0.0  16.2 2       14.5 -16.2   0.0 

the array can generated following, @divakar:

arr = np.subtract.outer(*[df.values]*2).t 

here calling .outer on subtract ufunc , applies pair of inputs.


Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -