import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
lin = np.linspace(0, 1, 30)
R = 5
df = pd.DataFrame({'x': R * np.sin(lin*2*np.pi),
'y': R * np.cos(lin*2*np.pi),
'fi': lin*2*np.pi}).round(2)
df = df.replace(0, np.nan)
df = df.replace(3.14, np.nan)
df.head()
df.to_csv('dummy.csv', index=False)
df2 = pd.read_csv('dummy.csv')
df2.head()
Often missing data is marked as -999 or some ad-hoc value that the person who made the data came up with!
df.isna()
df.isna().sum()
plt.imshow(df.isna())
plt.figure(figsize=(25, 6))
plt.imshow(df.isna().T)
df.mean()
df = df.fillna(df.mean())
df.describe()
plt.figure(figsize=(6, 6))
plt.scatter(df.x, df.y)
plt.xlabel('this is the X axis label', fontsize=20)
plt.ylabel('this is Y', fontsize=40)
plt.title('this is the title', fontsize=22)
plt.hist(df.x, bins=10)
plt.show()
sns.jointplot('x', 'fi', df, kind='kde')
sns.scatterplot(x='y', y='fi', data=df)
R, 2*np.pi
sns.boxplot(data=df)
Fiznum1 - course in physics BSc: http://oroszl.web.elte.hu/fiznum1/
Data Exploration and Visualisation - course in physics MSc: https://github.com/sdam-elte/data-exp-vis-2020
Kaggle datasets / competitions $\to$ notebooks $\to$ most voted
eg: https://www.kaggle.com/therealcyberlord/coronavirus-covid-19-visualization-prediction