software_development:python_pandas

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software_development:python_pandas [2022/08/04 05:40] – [I/O file] prgramsoftware_development:python_pandas [2025/07/07 14:12] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== python pandas ====== ====== python pandas ======
 {{INLINETOC}} {{INLINETOC}}
 +
 +=== etc : list ===
 +<code python>
 +set( [list] ) # unique value
 +[list].sort() #자동적용?
 +[list1] + [list2]  #list합치기
 +</code>
 +
 +===== shape of df =====
 +=== Pivot_table ===
 +<code python>
 +df.pivot_table(index=[인덱스컬럼],
 +               columns=[컬럼1,컬럼2],
 +               values=[값],
 +               aggfunc='sum').reset_index()
 +</code>
 +  * string일 때는 aggfunc='max'
 +  * index에 NULL 있으면 안됨
 +== fillna ==
 +<code python>
 +df[['a','b','c']] = df[['a','b','c']].fillna('')
 +</code>
 +
 +=== group by ===
 +<code python>
 +df.groupby([컬럼들]).agg({'컬럼':sum}).reset_index()
 +
 +df.groupby([COLUMNS])['COLUMN'].max().reset_index()
 +
 +df = df.assign(date=pd.to_numeric(df['date'], errors='coerce')).groupby(['코드', '종목명']).agg({'date':np.min}).reset_index().drop_duplicates()
 +
 +df = df[['코드', 'date']].groupby(['코드']).agg({'date': [np.min, np.max]}).reset_index(level='종목코드')
 +df.columns = df.columns.droplevel()
 +</code>
 +
 +=== rank ===
 +<code python>
 +df['rank'] = df.groupby('code')['value'].rank(ascending=False)
 +</code>
 +
 +=== merge ===
 +<code python>
 +df_out = df_out.merge(df, on=['no', 'name'], how='outer') #left_on right_on
 +</code>
 +
 +
 +===== modify =====
 +=== Series to DF ===
 +<code python>
 +df = df.append(pd.DataFrame(pd.Series(dict_row)).transpose(), ignore_index=True)
 +</code>
 +
 +
 +=== rename ===
 +<code python>
 +df.rename(columns = {'컬럼':'new name'}, index={0:'new index'})
 +df.rename({1: 2, 2: 4}, axis='index')
 +
 +df.columns = [컬럼들..]
 +df.columns = ['1'] + df.columns[1:].tolist()
 +</code>
 +
 +=== order of columns ===
 +<code python>
 +#1
 +df = df.sort_index(axis='columns', level = 'MULTILEVEL INDEX NAME/no')
 +#2
 +df.columns
 +col_order = ['a','b','c']
 +df = df.reindex(col_order, axis='columns')
 +</code>
 +
 +
 +=== map ===
 +<code python>
 +df['코드'] = 'A' + df['코드'].map(lambda x: f'{x:0>6}' #6글자로
 +</code>
 +
 +
 +===== get info =====
  
 === Shapes === === Shapes ===
Line 13: Line 93:
  
 df.loc[df.sex=='W', 'year_born'].hist() df.loc[df.sex=='W', 'year_born'].hist()
 +
 +=== type ===
 +<code python>
 +df['value'] = pd.to_numeric(df['value'], errors='coerce')
 +df['value'].dtype
 +</code>
 +
 +
 +===== get element =====
  
 === Selects === === Selects ===
Line 18: Line 107:
 iloc: Select by position iloc: Select by position
 loc: Select by label loc: Select by label
 +  
 +df.loc[:,~df.columns.isin(['a','b'])]  
 +
 +df[~( df['a'].isin(['1','2','3']) & df['b']=='3' )] #row-wise
 +df.loc[~( df['a'].isin(['1','2','3']) & df['b']=='3' ), 8] #row-wise & column
 </code> </code>
  
 +=== row iteration ===
 +<code python>
 +for idx,row in anime[:2].iterrows():
 +    print(idx, row)
 +</code>
  
      
      
 =====I/O file===== =====I/O file=====
 +
 +=== encoding_errors - 'ignore'===
 +Encoding 제대로 했는데도 안되면..
 +공공데이터가 이런 경우가 많음.
 +
 +Error tokenizing data. C error: EOF inside string starting at row 0 | 판다스 에러
 +https://con2joa.tistory.com/m/60
 +quoting=csv.QUOTE_NONE 파라미터
 +
 +<code python>
 +import chardet
 +with open(file, 'rb') as rawdata:
 +    result = chardet.detect(rawdata.read(100000))
 +result
 +
 +
 +data = pd.read_csv( file, encoding='cp949', encoding_errors='ignore')
 +# on_bad_lines='skip'
 +# error_bad_lines=False
 +</code>
 +
 +=== to_numberic ===
 +<code python>
 +#1
 +df = pd.read_csv('file.csv', encoding='utf-8', index_col=0, converters={'col':int, 'col2':str})
 +#2
 +df['col'] = pd.to_numeric(df[col].str.replace(',',''), errors='coerce')
 +</code>
  
 === Excel === === Excel ===
Line 61: Line 188:
 file open/ append lines : https://thispointer.com/how-to-append-text-or-lines-to-a-file-in-python/           file open/ append lines : https://thispointer.com/how-to-append-text-or-lines-to-a-file-in-python/          
  
-=== row iteration === 
-<code python> 
-for idx,row in anime[:2].iterrows(): 
-    print(idx, row) 
-</code> 
  
  
-=== Series to DF === 
-<code python> 
-df = df.append(pd.DataFrame(pd.Series(dict_row)).transpose(), ignore_index=True) 
-</code> 
  
-=== rename === 
-<code python> 
-df.rename(columns = {'컬럼':'new name'}, index={0:'new index'}) 
-df.rename({1: 2, 2: 4}, axis='index') 
  
-df.columns = [컬럼들..] 
-df.columns = ['1'] + df.columns[1:].tolist() 
-</code> 
  
-=== map === 
-<code python> 
-df['코드'] = 'A' + df['코드'].map(lambda x: f'{x:0>6}' #6글자로 
-</code> 
-=== Pivot_table === 
-<code python> 
-df.pivot_table(index=[인덱스컬럼], 
-               columns=[컬럼1,컬럼2], 
-               values=[값], 
-               aggfunc='sum').reset_index() 
-</code> 
  
-=== group by === 
-<code python> 
-df.groupby([컬럼들]).agg({'컬럼':sum}).reset_index() 
- 
-df = df.assign(date=pd.to_numeric(df['date'], errors='coerce')).groupby(['코드', '종목명']).agg({'date':np.min}).reset_index().drop_duplicates() 
- 
-df = df[['코드', 'date']].groupby(['코드']).agg({'date': [np.min, np.max]}).reset_index(level='종목코드') 
-df.columns = df.columns.droplevel() 
-</code> 
- 
-=== merge === 
-<code python> 
-df_out = df_out.merge(df, on=['no', 'name'], how='outer') 
-</code> 
- 
-=== rank === 
-<code python> 
-df['rank'] = df.groupby('code')['value'].rank(ascending=False) 
-</code> 
-=== to_numberic === 
-<code python> 
-#1 
-df = pd.read_csv('file.csv', encoding='utf-8', index_col=0, converters={'col':int, 'col2':str}) 
-#2 
-df['col'] = pd.to_numeric(df[col].str.replace(',',''), errors='coerce') 
-</code> 
- 
- 
- 
-=== type === 
-<code python> 
-df['value'] = pd.to_numeric(df['value'], errors='coerce') 
-df['value'].dtype 
-</code> 
  
 {{tag>software_development python 파이썬 pandas 데이터정리 data_analysis}} {{tag>software_development python 파이썬 pandas 데이터정리 data_analysis}}
  • software_development/python_pandas.1659591620.txt.gz
  • Last modified: 2025/07/07 14:12
  • (external edit)