Skip to content

Perf (II)

ravel() vs flatten()

np.ravel() will avoid copy if possible and thus faster than flatten()

broadcasting rules

when broadcasting is possible, we do not need to use np.tile()

numexpr

numexpr is 5x faster than NumPy expression.

a = df['a']
expr = 'sin(a - 1) + 1'
result1 = np.sin(a - 1) + 1
result2 = numexpr.evaluate(expr)
result3 = pd.eval(expr, engine='numexpr')

pd.eval and pd.query

Mainly used for large arrays to save memory thus the speed.

# 2x faster than df1+df2 and less mem
pd.eval('df1 + df2'))

# pd.eval
result1 = (df['A'] + df['B']) / (df['C'] - 1)
result2 = pd.eval("(df.A + df.B) / (df.C - 1)")

# df.eval
result3 = df.eval('(A + B) / (C - 1)')
# add/modify col
df.eval('D = (A + B) / C', inplace=True)

# df.eval with local variable
col_mean = df.mean(1)
result1 = df['A'] + col_mean
result2 = df.eval('A + @col_mean')

# df.query
result1 = df[(df.A < 0.5) & (df.B < 0.5)]
result2 = pd.eval('df[(df.A < 0.5) & (df.B < 0.5)]')
result2 = df.query('A < 0.5 & B < 0.5')

# df.query with local variable
avg = df['C'].mean()
result1 = df[(df.A < avg) & (df.B < avg)]
result2 = df.query('A < @avg & B < @avg')