Part of an evolving cheat sheet
Read a file
For CSV use
1
| data = pd.read_csv('data.csv')
|
For excel use
1
| df = pd.read_excel (r'./Detected as deleted from API data.xlsx')
|
You might need to install a library for xlsx files
Get a summary of the data
Numerical summaries
Types of columns
Create a new column from combining with another
This adds up values in two columns
1
| data["totalqueuelength"] = data["Svc"] + data["Svc Que"]
|
This converts a couple of columns that have the data and time to a single field and turns it into a date
1
| data["datetime"] = pd.to_datetime(data["Date"] + " " + data["Time"], format='%m/%d/%Y %I:%M %p')
|
Check date formats at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
Create a new column from a function
1
| df = df.assign(percent=lambda x: 100* x['deleted on date'] / x['total'])
|
Binning
You can bin the data by adding a new bin column to the dataframe
1
| df['binning'] = pd.cut(df['percent'], 5)
|
Average a column
Find row at an index
1
| df.iloc[[int(totalRecords * .95)]] # Find the row at the 95th percentile
|
Filter data rows
1
| data.loc[(data['datetime'] > '2021-10-16')]
|
Sort
1
| df.sort_values(by=['percent'], ascending=False)
|