07_handling_dataframe(bool-apply)
In [1]:
import pandas as pd
from collections import OrderedDict
In [2]:
scientists = pd.read_csv("./data/scientists.csv")


열의 자료형 바꾸기와 새로운 열 추가

In [3]:
print("Born type: {}".format(scientists["Born"].dtype))
print("Died type: {}".format(scientists["Died"].dtype))
Born type: object
Died type: object
In [4]:
born_datetime = pd.to_datetime(scientists["Born"], format="%Y-%m-%d")
print("born_datetime: \n{}".format(born_datetime))

print("\n===================================\n")
died_datetime = pd.to_datetime(scientists["Died"], format="%Y-%m-%d")
print("died_datetime: \n{}".format(died_datetime))
born_datetime: 
0   1920-07-25
1   1876-06-13
2   1820-05-12
3   1867-11-07
4   1907-05-27
5   1813-03-15
6   1912-06-23
7   1777-04-30
Name: Born, dtype: datetime64[ns]

===================================

died_datetime: 
0   1958-04-16
1   1937-10-16
2   1910-08-13
3   1934-07-04
4   1964-04-14
5   1858-06-16
6   1954-06-07
7   1855-02-23
Name: Died, dtype: datetime64[ns]
In [5]:
scientists["born_dt"], scientists["died_dt"] = [born_datetime, died_datetime]
scientists.head()
Out[5]:
Name Born Died Age Occupation born_dt died_dt
0 Rosaline Franklin 1920-07-25 1958-04-16 37 Chemist 1920-07-25 1958-04-16
1 William Gosset 1876-06-13 1937-10-16 61 Statistician 1876-06-13 1937-10-16
2 Florence Nightingale 1820-05-12 1910-08-13 90 Nurse 1820-05-12 1910-08-13
3 Marie Curie 1867-11-07 1934-07-04 66 Chemist 1867-11-07 1934-07-04
4 Rachel Carson 1907-05-27 1964-04-14 56 Biologist 1907-05-27 1964-04-14
In [6]:
scientists["age_days"] = (scientists["died_dt"] - scientists["born_dt"])
scientists.head()
Out[6]:
Name Born Died Age Occupation born_dt died_dt age_days
0 Rosaline Franklin 1920-07-25 1958-04-16 37 Chemist 1920-07-25 1958-04-16 13779 days
1 William Gosset 1876-06-13 1937-10-16 61 Statistician 1876-06-13 1937-10-16 22404 days
2 Florence Nightingale 1820-05-12 1910-08-13 90 Nurse 1820-05-12 1910-08-13 32964 days
3 Marie Curie 1867-11-07 1934-07-04 66 Chemist 1867-11-07 1934-07-04 24345 days
4 Rachel Carson 1907-05-27 1964-04-14 56 Biologist 1907-05-27 1964-04-14 20777 days
In [7]:
import random

print(scientists["Age"])
print("\n===================================\n")
random.seed(42)
random.shuffle(scientists["Age"])
print(scientists["Age"])
0    37
1    61
2    90
3    66
4    56
5    45
6    41
7    77
Name: Age, dtype: int64

===================================

/anaconda3/lib/python3.6/random.py:277: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  x[i], x[j] = x[j], x[i]
0    66
1    56
2    41
3    77
4    90
5    45
6    37
7    61
Name: Age, dtype: int64


열 삭제

In [8]:
print(scientists.columns)

print("\n===================================\n")

scientists_dropped = scientists.drop(["Age"], axis=1)
print(scientists_dropped.columns)
Index(['Name', 'Born', 'Died', 'Age', 'Occupation', 'born_dt', 'died_dt',
       'age_days'],
      dtype='object')

===================================

Index(['Name', 'Born', 'Died', 'Occupation', 'born_dt', 'died_dt', 'age_days'], dtype='object')
In [9]:
from IPython.core.display import display, HTML

display(HTML("<style> .container{width:100% !important;}</style>"))

'pandas > basic' 카테고리의 다른 글

06.handling_dataframe(bool)  (0) 2018.12.09
05.handling_series(apply)  (0) 2018.12.09
04.handling_series(basic)  (0) 2018.12.09
03.create_data_frame  (0) 2018.12.09
02.basic_statistic  (0) 2018.12.09

+ Recent posts