python - Pandas - Creating Difference Matrix from Data Frame -
i'm trying create matrix show differences between rows in pandas data frame.
import pandas pd data = {'country':['gb','jp','us'],'values':[20.2,-10.5,5.7]} df = pd.dataframe(data)
i this:
country values 0 gb 20.2 1 jp -10.5 2 5.7
to become (differences going vertically):
country gb jp 0 gb 0.0 -30.7 14.5 1 jp 30.7 0.0 16.2 2 14.5 -16.2 0.0
is achievable built-in function or need build loop desired output? help!
this standard use case numpy's broadcasting:
df['values'].values - df['values'].values[:, none] out: array([[ 0. , -30.7, -14.5], [ 30.7, 0. , 16.2], [ 14.5, -16.2, 0. ]])
we access underlying numpy array values attribute , [:, none]
introduces new axis result 2 dimensional.
you can concat original series:
arr = df['values'].values - df['values'].values[:, none] pd.concat((df['country'], pd.dataframe(arr, columns=df['country'])), axis=1) out: country gb jp 0 gb 0.0 -30.7 -14.5 1 jp 30.7 0.0 16.2 2 14.5 -16.2 0.0
the array can generated following, @divakar:
arr = np.subtract.outer(*[df.values]*2).t
here calling .outer
on subtract
ufunc , applies pair of inputs.
Comments
Post a Comment