<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://yy2-hi.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://yy2-hi.github.io/" rel="alternate" type="text/html" /><updated>2024-08-28T17:00:19+09:00</updated><id>https://yy2-hi.github.io/feed.xml</id><title type="html">yy2-hi</title><subtitle>yy2-hi 블로그</subtitle><author><name>yy2-hi</name></author><entry><title type="html">Project 1 - 서울시 CCTV 현황 데이터 분석 (1)</title><link href="https://yy2-hi.github.io/dataanalysis/cctvanalysis1/" rel="alternate" type="text/html" title="Project 1 - 서울시 CCTV 현황 데이터 분석 (1)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/cctvanalysis1</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/cctvanalysis1/"><![CDATA[<h1 id="project-01-analysis-seoul-cctv">Project 01. Analysis Seoul CCTV</h1>
<h2 id="프로젝트-개요">프로젝트 개요</h2>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/3616de05-5ac7-451e-9a08-f4c2e3348e23/image.png" alt="" /></p>

<h2 id="목표">목표</h2>
<ul>
  <li>서울시 구별 CCTV 현황 데이터 확보</li>
  <li>인구 현황 데이터 확보</li>
  <li>CCTV 데이터와 인구 현황 데이터 합치기</li>
  <li>데이터 정리 및 정렬</li>
  <li>그래프로 시각화</li>
  <li>전체적인 경향 파악</li>
  <li>경향에서 벗어난 데이터 강조</li>
</ul>

<h2 id="데이터-읽기">데이터 읽기</h2>
<h3 id="pandas로-csv-엑셀-파일-읽기">Pandas로 CSV, 엑셀 파일 읽기</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/f10200a1-6e1b-4d5d-834f-22f415707635/image.png" alt="" /></p>
<ul>
  <li>R만큼의 강력한 데이터 핸들링 성능을 제공하는 모듈</li>
  <li>단일 프로세스에서는 최대 효율</li>
  <li>코딩 가능하고 응용 가능한 엑셀
<img src="https://velog.velcdn.com/images/yy2hi/post/c8721df6-ea98-4139-8916-d2158d5ce7b7/image.png" alt="" /></li>
</ul>

<hr />
<h4 id="pandas-dataframe-구조">Pandas DataFrame 구조</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/8185d8ec-628b-46f6-95a5-3cad8aeb67c8/image.png" alt="" /></p>

<hr />
<h4 id="column-이름으로-조회">column 이름으로 조회</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/4ad15318-8818-4093-9293-aafd98101344/image.png" alt="" /></p>

<hr />
<h4 id="서울-cctv-수-column-이름-변경">서울 CCTV 수 column 이름 변경</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/83a5445d-37c0-4a38-a55d-756f4f67a035/image.png" alt="" /></p>

<hr />
<h4 id="엑셀-설정">엑셀 설정</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/79c3153e-b4c3-4ed5-b329-d34b3a24dfe0/image.png" alt="" /></p>

<ul>
  <li>읽기 시작할 행(header)과 컬럼 지정(usecols)</li>
</ul>

<hr />
<h4 id="서울시-인구수-column-이름-변경">서울시 인구수 column 이름 변경</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/243801e9-744e-4570-8e23-b7576630b7ca/image.png" alt="" /></p>

<hr />

<h3 id="pandas-basic">Pandas Basic</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/8351c5d4-6d37-4154-b7c8-71897b98cc7a/image.png" alt="" /></p>

<ul>
  <li>pandas는 통상 pd로 import</li>
  <li>수치해석적 함수가 많은 numpy는 통상 np로 import</li>
</ul>

<hr />

<h4 id="pandas의-데이터형을-구성하는-기본-series">Pandas의 데이터형을 구성하는 기본 Series</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/39f202ea-8360-4680-bd45-0130ee6a2cd8/image.png" alt="" /></p>

<hr />

<h4 id="날짜시간-이용">날짜(시간) 이용</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/14872710-6556-44f4-ab39-30f4a6b68645/image.png" alt="" /></p>

<hr />

<h4 id="가장-많이-사용되는-데이터형-dataframe">가장 많이 사용되는 데이터형 DataFrame</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/c7d07f01-96d7-4336-803d-7a2e12751bf3/image.png" alt="" /></p>
<ul>
  <li>index와 columns를 지정</li>
</ul>

<hr />

<h4 id="dataframe의-기본-정보-확인">DataFrame의 기본 정보 확인</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/1a33680a-df3d-436f-9a9e-b31c86f21f95/image.png" alt="" /></p>

<ul>
  <li>각 컬럼의 크기와 데이터형태 확인</li>
</ul>

<hr />

<h4 id="dataframe의-통계적-기본-정보-확인">DataFrame의 통계적 기본 정보 확인</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/84534c44-4631-4e72-9803-753f741c8d1e/image.png" alt="" /></p>

<hr />

<h4 id="데이터-정렬">데이터 정렬</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/329c4195-8745-4b44-ad99-e50eb6615258/image.png" alt="" /></p>

<hr />

<h4 id="특정-컬럼-읽기">특정 컬럼 읽기</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/5e3d0987-4af2-457c-ba72-f15287542b76/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/c358c8bd-3c16-41fe-94c4-7b5f45ff7414/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/e40b9542-1624-44f1-9f5c-dd6db2769252/image.png" alt="" /></p>

<ul>
  <li>iloc 옵션을 이용해 번호로만 접근</li>
</ul>

<hr />

<h4 id="pandas-slice-under-condition">Pandas Slice under condition</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/df08cf37-d0dd-4ee9-abc3-ba1f9e45892b/image.png" alt="" /></p>

<ul>
  <li>df[condition]과 같이 사용하는 것이 일반적</li>
  <li>버전에 따라 문법이 다르므로, 인터넷에서 확보한 소스코드는 Pandas의 버전 확인이 필요
<img src="https://velog.velcdn.com/images/yy2hi/post/431f875a-b6ac-492f-a1b3-e69b1caf2a66/image.png" alt="" /></li>
</ul>

<hr />

<h4 id="특정-요소-확인">특정 요소 확인</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/46960809-3791-4e1b-a0bd-11981b273838/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/5cf2ec12-95b8-4be1-8e0a-fca73b1bbbb5/image.png" alt="" /></p>

<hr />
<h4 id="특정-칼럼-제거">특정 칼럼 제거</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/9e306eee-3443-479f-b4ce-cd9477931f6c/image.png" alt="" /></p>

<hr />

<h4 id="apply-메소드">apply 메소드</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/0535d600-3a60-4435-8b07-4fd41e37fe6a/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/9312a98e-6561-49e6-abdf-fd14693e347c/image.png" alt="" /></p>

<ul>
  <li>함수를 만들어서 적용하거나 람다 함수 적용 가능</li>
</ul>

<hr />

<h2 id="cctv-데이터-훑어보기">CCTV 데이터 훑어보기</h2>

<h4 id="cctv를-가장-적게-보유한-구">CCTV를 가장 적게 보유한 구</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/2052b47d-618a-4deb-8002-4bef4790a3c6/image.png" alt="" /></p>

<hr />

<h4 id="cctv를-가장-많이-보유한-구">CCTV를 가장 많이 보유한 구</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/1bc731fd-7f19-457a-8603-72a6a29c5698/image.png" alt="" /></p>

<hr />

<h4 id="전에-보유한-갯수-대비-최근-3년간-cctv를-많이-설치한-구">전에 보유한 갯수 대비 최근 3년간 CCTV를 많이 설치한 구</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/f10d2955-ca3e-43a8-8a01-344e11131fbb/image.png" alt="" /></p>

<hr />

<h2 id="인구현황-데이터-훑어보기">인구현황 데이터 훑어보기</h2>
<h4 id="서울시-인구-데이터-확인">서울시 인구 데이터 확인</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/41dd0ea2-dd89-4e49-b592-677862669519/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/3ede9e67-94e6-47e9-9ddf-a6ea7e47009a/image.png" alt="" /></p>

<hr />

<h4 id="데이터-초반-검증">데이터 초반 검증</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/34cedfd9-4f10-4962-8d68-a14d3c96ec9c/image.png" alt="" /></p>

<hr />

<h4 id="외국인-고령자-비율-만들기">외국인, 고령자 비율 만들기</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/893459fb-756c-4e96-a577-e5f2fbdfaa0e/image.png" alt="" /></p>

<hr />

<h4 id="인구수가-많은-구-확인">인구수가 많은 구 확인</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/c421f38f-3709-4e03-8da7-929148e679f2/image.png" alt="" /></p>

<hr />

<h4 id="고령자비율-확인">고령자비율 확인</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/a2539557-7100-4566-abba-0ba5e0cb598f/image.png" alt="" /></p>

<hr />

<p><strong>출처</strong>
서울시 자치구 년도별 CCTV 설치 현황, https://data.seoul.go.kr/dataList/OA-2734/F/1/datasetView.do
서울시 주민등록인구 통계, https://data.seoul.go.kr/dataList/419/S/2/datasetView.do</p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[Project 01. Analysis Seoul CCTV 프로젝트 개요]]></summary></entry><entry><title type="html">Project 1 - 서울시 CCTV 현황 데이터 분석 (2)</title><link href="https://yy2-hi.github.io/dataanalysis/cctvanalysis2/" rel="alternate" type="text/html" title="Project 1 - 서울시 CCTV 현황 데이터 분석 (2)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/cctvanalysis2</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/cctvanalysis2/"><![CDATA[<h2 id="두-데이터-합치기">두 데이터 합치기</h2>
<h4 id="merge를-이용한-데이터-병합">merge를 이용한 데이터 병합</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/b30a15a4-ff91-4b97-a199-6210dc0e39b7/image.png" alt="" /></p>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/9ea50617-6ff8-42a5-8235-d4dfdf55b339/image.png" alt="" /></p>

<ul>
  <li>key 컬럼을 기준으로 병합</li>
</ul>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/299ff756-e8d1-45b1-b92a-42bb08aa9d1b/image.png" alt="" /></p>

<ul>
  <li>left에 키를 기준으로 right 병합</li>
</ul>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/735f49a0-cbef-486c-8444-8ad4b4056528/image.png" alt="" /></p>

<ul>
  <li>key를 기준으로 합집합 병합</li>
</ul>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/cc204afd-a2ee-4372-9b85-964b0ddd3782/image.png" alt="" /></p>

<ul>
  <li>key 컬럼에서 교집합 병합</li>
</ul>

<hr />

<h3 id="데이터-병합-및-정리">데이터 병합 및 정리</h3>
<h4 id="데이터-병합">데이터 병합</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/bc3f05ca-ec8f-4802-b21b-a3455491c82e/image.png" alt="" /></p>

<hr />

<h4 id="필요-없는-컬럼-제거">필요 없는 컬럼 제거</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/5d14f999-122d-44c7-8023-b11dc20c2260/image.png" alt="" /></p>

<hr />

<h4 id="인덱스-재지정">인덱스 재지정</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/1115173e-068e-4c99-8e35-6b3621e2a203/image.png" alt="" /></p>

<ul>
  <li>재지정 명령어 : set_index</li>
</ul>

<hr />

<h3 id="상관관계correlation">상관관계(Correlation)</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ee217a35-7988-41a0-8f73-54dd4ffef1ef/image.png" alt="" /></p>

<hr />

<h4 id="corr">corr()</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/bac442dc-624b-4c2f-ab15-c45a396e1027/image.png" alt="" /></p>

<ul>
  <li>데이터의 관계를 찾을 때, 최소한의 근거가 있어야 해당 데이터를 비교하는 의미가 존재</li>
  <li>상관계수를 조사해서 0.2 이상의 데이터를 비교하는 것은 유의미</li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/e300d49d-53e3-42d0-a3c7-d2f0e22d89af/image.png" alt="" /></p>

<ul>
  <li>CCTV 전체 수(소계)와 가장 상관관계가 있는 데이터 → 인구수</li>
  <li>∴ 구별 인구대비 CCTV 현황을 분석해서 상대적으로 CCTV가 적거나 많은 구를 찾는 것이 의미를 가짐</li>
</ul>

<hr />

<h4 id="cctv-비율">CCTV 비율</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/8a810798-48f0-441d-b3ea-15c40300a3b4/image.png" alt="" /></p>

<ul>
  <li>인구대비 CCTV 비율이 높은 구</li>
</ul>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/0afe0648-c2e4-4785-97b8-64b3dcd1bf8d/image.png" alt="" /></p>

<ul>
  <li>인구대비 CCTV 비율이 낮은 구</li>
</ul>

<hr />

<h2 id="matplotlib">Matplotlib</h2>
<ul>
  <li>파이썬 대표 시각화 도구</li>
  <li>Jupyter Notebook의 경우 matplotlib의 결과가 out session에 나타나는 것이 유리하므로 %matplotlib inline 옵션 사용</li>
</ul>

<hr />

<h3 id="matplotlib-호출">matplotlib 호출</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/c55b31c2-2b4e-4195-b05a-1adb5307c879/image.png" alt="" /></p>

<hr />

<h4 id="삼각함수-그리기">삼각함수 그리기</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/5a67317a-340a-4425-8caf-5be2d5c0a3fa/image.png" alt="" /></p>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/34c02836-0819-4523-9acd-a66868fa544f/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/62efa723-a009-49a6-a3df-3a21e78d923c/image.png" alt="" /></p>

<hr />

<h4 id="scatter">scatter()</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ca39868b-146a-44dc-a2cd-311c68d0c035/image.png" alt="" /></p>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/1c5de3c8-04fd-4843-ac3d-7d84d1bdf8c0/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/ec2852c9-fc93-4f83-bd98-258ef1436b04/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/093c2643-eb86-4a36-a230-d68630fa9a70/image.png" alt="" /></p>

<hr />

<h2 id="데이터-시각화">데이터 시각화</h2>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/bb3ec83a-33cb-4149-841c-9ac3914a886d/image.png" alt="" /></p>
<ul>
  <li>한글 폰트 적용 및 마이너스 기호 적용 (window: “malgun gothic”)</li>
</ul>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/93085a1d-7b7a-4d9d-bc4c-d14aeb9cc4fd/image.png" alt="" /></p>
<ul>
  <li>Pandas DataFrame은 데이터 변수에서 plot() 사용 가능</li>
  <li>데이터(컬럼)가 많은 경우 정렬한 후 그리는 것이 효과적</li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/d1476506-9636-45ff-abbb-d5808e551d5c/image.png" alt="" /></p>

<hr />

<p><img src="https://velog.velcdn.com/images/yy2hi/post/24242bbe-5dc3-4491-af17-8458b25c29fd/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/c4e40d31-0ed5-4921-a32c-d2e1fc110a0a/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/bcb772ff-10b3-4d21-a59e-b4e3a2db8d2f/image.png" alt="" /></p>

<hr />

<h2 id="경향-파악">경향 파악</h2>
<ul>
  <li>단순 CCTV 많은 구 : 강남, 양천, 서초, 관악, 은평, 용산</li>
  <li>CCTV 비율 높은 구 : 종로, 용산, 중구</li>
  <li>전체 경향과 함께 보지 않으면 제대로 이해시키기 어려움
<img src="https://velog.velcdn.com/images/yy2hi/post/9d568179-b03c-42be-9ef9-d3443bc91cdc/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/f6df4a4c-402d-4410-8079-20c49a613981/image.png" alt="" /></li>
</ul>

<hr />

<h3 id="선형회귀linear-regression-trend-파악">선형회귀(Linear Regression) Trend 파악</h3>
<h4 id="numpy를-이용한-1차-직선-만들기">Numpy를 이용한 1차 직선 만들기</h4>
<ul>
  <li>np.polyfit : 직선을 구성하기 위한 계수 계산</li>
  <li>np.poly1d : polyfit으로 찾은 계수로 python에서 사용할 함수로 만들어 줌
<img src="https://velog.velcdn.com/images/yy2hi/post/c32872cf-4c93-4e75-947c-4f97edd20f2d/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/50b44dbf-b028-4a62-86d3-c110900a1cf7/image.png" alt="" /></li>
  <li>plyfit에서 찾은 계수를 넣어 함수 완성</li>
</ul>

<hr />

<h4 id="인구-400000인-구에서-서울시의-전체-경향에-맞는-적당한-cctv-수">인구 400000인 구에서 서울시의 전체 경향에 맞는 적당한 CCTV 수?</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/e1e6cb25-c357-4e8c-99c0-efb063dd7c02/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/692fbd2b-2af5-4315-a74b-324fd7231125/image.png" alt="" /></p>
<ul>
  <li>경향선을 그리기 위해 X 데이터 생성</li>
  <li>np.linspace(a, b ,n) : a부터 b까지 n개의 등간격 데이터 생성
<img src="https://velog.velcdn.com/images/yy2hi/post/74f0e43e-7c93-47be-9277-e7aa568f9d90/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/1a3c54e2-7ede-4185-a02f-4e73b9af3c62/image.png" alt="" /></li>
</ul>

<hr />

<h2 id="경향에서-벗어난-데이터-강조">경향에서 벗어난 데이터 강조</h2>
<h3 id="그래프-다듬기">그래프 다듬기</h3>
<p>data_result[‘오차’] = data_result[‘소계’]-f1(data_result[‘인구수])</p>

<ul>
  <li>경향(trend)과의 오차 만들기</li>
  <li>경향은 f1 함수에 해당 인구를 입력 : f1(data_result[‘인구수’])</li>
  <li>현재값 : data_result[‘소계’]
<img src="https://velog.velcdn.com/images/yy2hi/post/3405a69d-2f14-468f-ab1d-f878de3ae5c3/image.png" alt="" /></li>
</ul>

<hr />

<h4 id="경향-대비-cctv를-많이-가진-구">경향 대비 CCTV를 많이 가진 구</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/8659b758-69c4-49c1-a728-2e92e47dee76/image.png" alt="" /></p>

<hr />

<h4 id="경향-대비-cctv를-적게-가진-구">경향 대비 CCTV를 적게 가진 구</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/c8254774-a61b-4545-8ec1-1a9502e6d472/image.png" alt="" /></p>

<hr />

<h4 id="강조하고-싶은-데이터-시각화">강조하고 싶은 데이터 시각화</h4>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/897065b2-e9d5-4e2f-9199-470a578bbf57/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/5ed53a8b-9327-4ca4-9238-4f6376abc8f5/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/4c7ba5a7-2782-40bd-9c3d-0ddf863b1247/image.png" alt="" /></p>
<ul>
  <li>s : 마커의 크기</li>
  <li>c : color 세팅에 방금 계산한 경향과의 오차 적용</li>
  <li>cmap : 사용자 정의한 맵 적용</li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/190f8cfc-210f-43e5-8346-771175a3c009/image.png" alt="" /></p>
<ul>
  <li>오차가 큰 데이터 아래 위로 5개만 마커 옆에 구 이름 명시
<img src="https://velog.velcdn.com/images/yy2hi/post/1d3d8a0e-5f6c-4c9d-8dc8-7c4dd536e5e4/image.png" alt="" /></li>
  <li>text : 그래프에 글자를 그리는 명령</li>
  <li>plt.text(x, y, text, 설정)</li>
  <li>x, y 데이터에 1.02, 0.98을 곱해 구 이름이 마커에 겹치지 않도록 설정
<img src="https://velog.velcdn.com/images/yy2hi/post/808494aa-cc94-4d77-9263-25e45852b747/image.png" alt="" /></li>
</ul>

<hr />
<h4 id="데이터-저장">데이터 저장</h4>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ad28c5c3-9bd7-4225-8022-612a091a5cc4/image.png" alt="" /></p>

<hr />

<h4 id="출처">출처</h4>
<p>서울시 자치구 년도별 CCTV 설치 현황, https://data.seoul.go.kr/dataList/OA-2734/F/1/datasetView.do
서울시 주민등록인구 통계, https://data.seoul.go.kr/dataList/419/S/2/datasetView.do</p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[두 데이터 합치기 merge를 이용한 데이터 병합]]></summary></entry><entry><title type="html">Project 9 - 유저 군집 분석</title><link href="https://yy2-hi.github.io/dataanalysis/clusteranalysis/" rel="alternate" type="text/html" title="Project 9 - 유저 군집 분석" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/clusteranalysis</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/clusteranalysis/"><![CDATA[<h1 id="클러스터링-군집화">클러스터링, 군집화</h1>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/6174ce34-b10a-4be3-93cf-9a5973bef2b6/image.png" alt="" /></p>

<ul>
  <li>데이터가 주어졌을 때, 여러 개의 그룹으로 나누는 것</li>
  <li>유사한 특성을 가진 그룹을 발견해내는 일</li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/0cf56223-bfc3-4e5a-b9e6-78d9c2b89156/image.png" alt="" /></p>

<ul>
  <li>내부 멤버들 간의 사이(Intra-clust)는 가깝고, 그룬 간 사이(Inter-cluster)는 멀게 그룹을 만드는 것</li>
  <li>그룹에 대한 정답이 있으면 분류 문제이지만 <strong>클러스터링은 비지도 학습</strong>으로 어느 그룹에 있는지 정답은 없다.</li>
</ul>

<h1 id="언제-사용할까">언제 사용할까?</h1>
<h3 id="유사한-뉴스-그룹--문서-군집화">유사한 뉴스 그룹 : 문서 군집화</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/73b771ba-1157-4d32-8eac-3f5fc415f711/image.png" alt="" /></p>

<h3 id="가까운-위치-좌표끼리-묶기">가까운 위치 좌표끼리 묶기</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/b3bfb038-3889-4411-ac9d-14a12d81d956/image.png" alt="" /></p>

<h3 id="유사한-유저군-나누기--마켓-세분화">유사한 유저군 나누기 : 마켓 세분화</h3>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/53ba5416-88a5-4cd7-a712-fdf49eb8fcac/image.png" alt="" /></p>

<ul>
  <li>시장을 적절한 수로 나누고, 각 시장, 타겟 별로 효과적인 정책을 찾아낸다.</li>
</ul>

<h4 id="sns-관심사-기반-자율-주행-이미지-인식-클러스터링">SNS 관심사 기반, 자율 주행 이미지 인식 클러스터링</h4>

<h1 id="종류">종류</h1>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/d43cf85a-c1b1-4dc4-991c-bc89796e0746/image.png" alt="" /></p>

<h2 id="partition-based-clustering">Partition-based Clustering</h2>

<ul>
  <li>미리 군집(그룹)의 수를 정해두고 클러스터링하는 방식</li>
  <li>대표적인 알고리즘
    <ul>
      <li>K-means</li>
      <li>K-Medoids</li>
    </ul>
  </li>
</ul>

<p>### K-means Clustering</p>

<ol>
  <li>사용자가 미리 군집수(k) 정의</li>
  <li>처음에는 랜덤으로 <strong>k개의 중심점(Centroid)</strong>를 정함
 	- 각 군집은 하나의 중심(Centroid)를 가지고 있다.</li>
  <li>데이터 point를 돌면서 <strong>가까운 중심점이 있는 그룹에 각 데이터를 할당</strong>
 	- 이 때 각 개체 간 거리는 Euclidean distance 사용</li>
  <li>모두 그룹을 할당했다면, <strong>각 그룹마다 새 중심점(Centroid) - 클러스터 내 평균점</strong>을 새로 구함</li>
  <li>3-4 단계를 반복하다가 <strong>더 이상 그룹 이동이 일어나지 않으면 멈춤</strong>
<img src="https://velog.velcdn.com/images/yy2hi/post/f0eb3d8c-963f-444d-a62f-b4b49d03be21/image.png" alt="" /></li>
</ol>

<hr />

<h2 id="hierarchical-based-clustering">Hierarchical based Clustering</h2>
<ul>
  <li>
    <p>여러개의 군집 중에서 가장 유사도가 높은 혹은 <strong>거리가 가까운 군집 두 개를 선택하여 하나로 합치면서 군집 개수를 줄여가는 방법</strong>으로 agglomerative clustering(합체 군집)라고도 한다.
<img src="https://velog.velcdn.com/images/yy2hi/post/2413c7f4-30a4-4379-b7d2-ca1565d0e65a/image.png" alt="" /></p>
  </li>
  <li>
    <p>가까운 데이터를 서로 묶기 위해서는 먼저 각 군집 간 거리를 계산해야 한다.</p>
  </li>
</ul>

<h4 id="centroid-distance">Centroid Distance</h4>
<ul>
  <li>각 군집의 중심점 사이의 거리를 계산하는 방법</li>
  <li>계층 클러스터링이 아니더라도 사용할 수 있는 방법</li>
</ul>

<h4 id="median-distance">Median Distance</h4>
<ul>
  <li>Hierarchical based Clustering에서 사용할 수 있는 방법</li>
  <li>군집 u가 군집 s와 군집 t의 결합으로 생성된 군집이라면, 중심점을 새로 계산하지 않고, 기존 s와 t의 중심점의 평균을 사용한다 -&gt; 더 빠른 계산 가능</li>
</ul>

<hr />

<h2 id="density-based-clustering">Density based Clustering</h2>
<ul>
  <li>데이터가 밀집한 정도, 밀도를 이용한 클러스터링 방법</li>
</ul>

<h3 id="dbscan-clustering">DBSCAN clustering</h3>

<ul>
  <li>군집의 개수를 사용자가 지정할 필요가 없다.</li>
  <li>초기 데이터로부터 근접한 데이터를 찾아나가는 방법으로 군집을 확장</li>
  <li>필요한 파라미터는 2가지로 ‘근접하다’를 정의
    <ul>
      <li>최소거리 a (다른 점들을 이웃으로 묶기 위함)</li>
      <li>최소 데이터 개수 b (밀집 지역으로 정의하기 위함)</li>
    </ul>
  </li>
  <li>최소거리 a안에 있는 데이터는 이웃이다.</li>
  <li>최소거리 a안에 최소 데이터 개수 b 이상의 데이터가 있으면, 이 데이터를 core로 정의</li>
  <li>core데이터는 하나의 클러스터를 형성하고, 그 core와 a 거리 내에 있는 점들은 같은 클러스터로 분류</li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/75e21f89-6c88-4695-b7c9-5450bfb1624d/image.png" alt="" /></p>

<ul>
  <li>K-means는 중심점을 기준으로 그룹을 형성하기 때문에, <strong>원의 형태</strong>로 군집이 만들어짐</li>
  <li>서로 이웃한 데이터들을 같은 클러스터에 포함시키기 때문에 <strong>불특정한 모양</strong>의 클러스터가 형성</li>
</ul>

<hr />

<h1 id="정답이-없는-클러스터링-어떻게-평가할까">정답이 없는 클러스터링, 어떻게 평가할까?</h1>
<p>이미 정답이 있는 분류문제와 달리 성능 기준을 만들기 어렵다. 따라서 기준을 사용할 수 있지만, 그 중 클러스터에 대한 정답이 없는 경우 사용할 수 있는 기준을 알아보자</p>

<h2 id="silhouette-coefficient-실루엣-계수">Silhouette Coefficient: 실루엣 계수</h2>

<ul>
  <li>모든 데이터 쌍 (i, j)에 대한 거리나 dissimilarity를 구한다.
    <ul>
      <li>a_i : i와 같은 군집에 속한 원소들의 평균 거리</li>
      <li>b_i : i와 다른 군집 중 가장 가까운 군집까지의 평균 거리
 <img src="https://velog.velcdn.com/images/yy2hi/post/b725dddc-ff3b-4f2b-ba07-4e672a8074f3/image.png" alt="" /></li>
    </ul>
  </li>
</ul>

<p>-&gt; 만약 a 같은 군집 내 평균 거리가 더 가깝다면 양수, 다른 군집과의 거리가 가깝다면 음수가 나온다.
<img src="https://velog.velcdn.com/images/yy2hi/post/85f006be-48c5-496d-9020-c7bfb8366d5d/image.png" alt="" /></p>

<hr />

<h1 id="다양한-클러스터링-방법을-통해-유저-세그먼트를-나눠보자">다양한 클러스터링 방법을 통해 유저 세그먼트를 나눠보자</h1>

<h2 id="라이브러리-불러오기">라이브러리 불러오기</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">colors</span>
</code></pre></div></div>

<h2 id="-데이터-살펴보기">👣 데이터 살펴보기</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 나의 구글 드라이브를 mount (colaboratory 노트북이 떠있는 위치에서 드라이브의 파일에 접근할 수 있게 만드는 것) 하는 명령어
</span><span class="kn">from</span> <span class="nn">google.colab</span> <span class="kn">import</span> <span class="n">drive</span>
<span class="n">drive</span><span class="p">.</span><span class="n">mount</span><span class="p">(</span><span class="s">'/content/drive'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 판다스로 데이터 불러오기
# 아래에 customer_personality_analysis.csv 가 있는 경로로 이동합니다
# 경로 설정
</span><span class="n">DRIVE_PATH</span> <span class="o">=</span> <span class="s">"/content/drive/MyDrive/"</span> <span class="c1"># 내 드라이브의 경로이다
</span><span class="n">FILE_PATH_IN_MY_DRIVE</span> <span class="o">=</span> <span class="s">"zerobase/유저 데이터 분석/유저 군집 분석하기/data/customer_personality_analysis.csv"</span> <span class="c1"># 내 드라이브 내 파일이 있는 경로
</span><span class="n">PATH</span> <span class="o">=</span> <span class="n">DRIVE_PATH</span> <span class="o">+</span>  <span class="n">FILE_PATH_IN_MY_DRIVE</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">PATH</span> <span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">"</span><span class="se">\t</span><span class="s">"</span><span class="p">)</span> <span class="c1"># csv 파일 읽어오기
</span><span class="n">df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>

<span class="c1"># ID,Year_Birth,Education
# ID\tYear\t
</span></code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>ID</th>
      <th>Year_Birth</th>
      <th>Education</th>
      <th>Marital_Status</th>
      <th>Income</th>
      <th>Kidhome</th>
      <th>Teenhome</th>
      <th>Dt_Customer</th>
      <th>Recency</th>
      <th>MntWines</th>
      <th>...</th>
      <th>NumWebVisitsMonth</th>
      <th>AcceptedCmp3</th>
      <th>AcceptedCmp4</th>
      <th>AcceptedCmp5</th>
      <th>AcceptedCmp1</th>
      <th>AcceptedCmp2</th>
      <th>Complain</th>
      <th>Z_CostContact</th>
      <th>Z_Revenue</th>
      <th>Response</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>5524</td>
      <td>1957</td>
      <td>Graduation</td>
      <td>Single</td>
      <td>58138.0</td>
      <td>0</td>
      <td>0</td>
      <td>04-09-2012</td>
      <td>58</td>
      <td>635</td>
      <td>...</td>
      <td>7</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>11</td>
      <td>1</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2174</td>
      <td>1954</td>
      <td>Graduation</td>
      <td>Single</td>
      <td>46344.0</td>
      <td>1</td>
      <td>1</td>
      <td>08-03-2014</td>
      <td>38</td>
      <td>11</td>
      <td>...</td>
      <td>5</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>11</td>
      <td>0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>4141</td>
      <td>1965</td>
      <td>Graduation</td>
      <td>Together</td>
      <td>71613.0</td>
      <td>0</td>
      <td>0</td>
      <td>21-08-2013</td>
      <td>26</td>
      <td>426</td>
      <td>...</td>
      <td>4</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>11</td>
      <td>0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>6182</td>
      <td>1984</td>
      <td>Graduation</td>
      <td>Together</td>
      <td>26646.0</td>
      <td>1</td>
      <td>0</td>
      <td>10-02-2014</td>
      <td>26</td>
      <td>11</td>
      <td>...</td>
      <td>6</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>11</td>
      <td>0</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5324</td>
      <td>1981</td>
      <td>PhD</td>
      <td>Married</td>
      <td>58293.0</td>
      <td>1</td>
      <td>0</td>
      <td>19-01-2014</td>
      <td>94</td>
      <td>173</td>
      <td>...</td>
      <td>5</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>11</td>
      <td>0</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"데이터 전체의 행 수: "</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"데이터 컬럼 수: "</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">.</span><span class="n">columns</span><span class="p">))</span>

<span class="o">=&gt;</span>

<span class="n">데이터</span> <span class="n">전체의</span> <span class="n">행</span> <span class="n">수</span><span class="p">:</span>  <span class="mi">2240</span>
<span class="n">데이터</span> <span class="n">컬럼</span> <span class="n">수</span><span class="p">:</span>  <span class="mi">29</span>
</code></pre></div></div>

<h2 id="데이터-컬럼-종류">데이터 컬럼 종류</h2>

<h3 id="people-사람">People (사람)</h3>

<ul>
  <li>ID: Customer’s unique identifier</li>
  <li>Year_Birth: Customer’s birth year</li>
  <li>Education: Customer’s education level (교육 수준)</li>
  <li>Marital_Status: Customer’s marital status (결혼 상태)</li>
  <li>Income: Customer’s yearly household income</li>
  <li>Kidhome: Number of children in customer’s household (어린 아이의 수)</li>
  <li>Teenhome: Number of teenagers in customer’s household (10대 수)</li>
  <li>Dt_Customer: Date of customer’s enrollment with the company (서비스 가입 날짜)</li>
  <li>Recency: Number of days since customer’s last purchase (마지막으로 구매한 날로부터 얼마가 지났는지)</li>
  <li>Complain: 1 if customer complained in the last 2 years, 0 otherwise</li>
</ul>

<h3 id="products-상품">Products (상품)</h3>

<ul>
  <li>MntWines: Amount spent on <strong>wine</strong> in last 2 years</li>
  <li>MntFruits: Amount spent on <strong>fruits</strong> in last 2 years</li>
  <li>MntMeatProducts: Amount spent on <strong>meat</strong> in last 2 years</li>
  <li>MntFishProducts: Amount spent on <strong>fish</strong> in last 2 years</li>
  <li>MntSweetProducts: Amount spent on <strong>sweets</strong> in last 2 years</li>
  <li>MntGoldProds: Amount spent on <strong>gold</strong> in last 2 years</li>
</ul>

<h3 id="promotion-프로모션">Promotion (프로모션)</h3>

<ul>
  <li>NumDealsPurchases: Number of purchases made with a discount (할인 받아 구매한 수)</li>
  <li>AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise</li>
  <li>AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise</li>
  <li>AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise</li>
  <li>AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise</li>
  <li>AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise</li>
  <li>Response: 1 if customer accepted the offer in the last campaign, 0 otherwise</li>
</ul>

<h3 id="place-구매-장소">Place (구매 장소)</h3>

<ul>
  <li>NumWebPurchases: Number of purchases made through the company’s web site</li>
  <li>NumCatalogPurchases: Number of purchases made using a catalogue</li>
  <li>NumStorePurchases: Number of purchases made directly in stores</li>
  <li>NumWebVisitsMonth: Number of visits to company’s web site in the last month</li>
</ul>

<h2 id="-데이터-정제하기-cleaning">👣 데이터 정제하기: Cleaning</h2>

<ul>
  <li>결측치 (Missing Values)와 이상치 (Outliers) 를 제거하자</li>
  <li>데이터를 바로 사용할 수 있으면 좋겠지만, 바로 사용할 수 있는 경우가 많지 않다.
    <ul>
      <li>결측치가 있을 수 있고, 컴퓨터가 이해할 수 있는 형태 (숫자형)으로 바꿔주어야 하는 경우도 있다.</li>
      <li>Outlier 가 너무 크다면, 이상치로 인해 모델이 왜곡될 수 있다.</li>
    </ul>
  </li>
</ul>

<p>아래 데이터를 살펴보자</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># .info() 함수는 데이터에 대한 전반적인 정보를 나타냅니다. 
# df를 구성하는 행과 열의 크기, 컬럼명, 컬럼을 구성하는 값의 자료형 등을 출력해줍니다.
df.info()

=&gt;

&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   int64  
 16  NumWebPurchases      2240 non-null   int64  
 17  NumCatalogPurchases  2240 non-null   int64  
 18  NumStorePurchases    2240 non-null   int64  
 19  NumWebVisitsMonth    2240 non-null   int64  
 20  AcceptedCmp3         2240 non-null   int64  
 21  AcceptedCmp4         2240 non-null   int64  
 22  AcceptedCmp5         2240 non-null   int64  
 23  AcceptedCmp1         2240 non-null   int64  
 24  AcceptedCmp2         2240 non-null   int64  
 25  Complain             2240 non-null   int64  
 26  Z_CostContact        2240 non-null   int64  
 27  Z_Revenue            2240 non-null   int64  
 28  Response             2240 non-null   int64  
dtypes: float64(1), int64(25), object(3)
memory usage: 507.6+ KB
</code></pre></div></div>

<ul>
  <li>4 Income : non-null인 row의 수는 2216 개 - 비어있는 값이 24개 정도 있다.</li>
  <li>7 Dt_Customer : 날짜형이지만 날짜가 아닌 String(object) 로 표시되어 있다.</li>
</ul>

<p><strong>결측치 제거하기</strong></p>
<ul>
  <li>결측치를 제거하는 방법은 크게 3가지이다.
    <ul>
      <li>결측치의 비중이 작다면 제거한다.</li>
      <li>결측치를 빈도가 가장 높은 값이나 평균으로 채운다.</li>
      <li>결측치를 예측하는 모델을 만들어 예측값으로 채운다.</li>
    </ul>
  </li>
  <li>해당 데이터에서는 24개로 결측치가 많지 않기 때문에 제거한다.</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># missing value 가 있는 row 를 제거한다.
# 결측치가 있는 경우 1. 평균치로 채우거나 2. 예측하거나 3. 데이터가 많지 않으면 제거한다.
</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">dropna</span><span class="p">()</span>
<span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="mi">2216</span>
</code></pre></div></div>
<p><strong>날짜 데이터 정제하기</strong></p>
<ul>
  <li>가입한지 얼마 되지 않은 고객과 가장 오래된 고객을 구해보자</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># string으로 된 date 를 datetime 함수를 쓰기 위해 datetime 형태로 바꾼다.
# Dt_Customer: 가입한 날짜
</span><span class="n">df</span><span class="p">[</span><span class="s">"Dt_Customer"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">to_datetime</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">"Dt_Customer"</span><span class="p">])</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">df</span><span class="p">[</span><span class="s">"Dt_Customer"</span><span class="p">][:</span><span class="mi">5</span><span class="p">]:</span>
  <span class="k">print</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">date</span><span class="p">())</span>
  
<span class="o">=&gt;</span>

<span class="mi">2012</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">09</span>
<span class="mi">2014</span><span class="o">-</span><span class="mi">08</span><span class="o">-</span><span class="mi">03</span>
<span class="mi">2013</span><span class="o">-</span><span class="mi">08</span><span class="o">-</span><span class="mi">21</span>
<span class="mi">2014</span><span class="o">-</span><span class="mi">10</span><span class="o">-</span><span class="mi">02</span>
<span class="mi">2014</span><span class="o">-</span><span class="mi">01</span><span class="o">-</span><span class="mi">19</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dates</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">df</span><span class="p">[</span><span class="s">"Dt_Customer"</span><span class="p">]:</span>
    <span class="n">i</span> <span class="o">=</span> <span class="n">i</span><span class="p">.</span><span class="n">date</span><span class="p">()</span>
    <span class="n">dates</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>  
    
<span class="k">print</span><span class="p">(</span><span class="n">dates</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">dates</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">dates</span><span class="p">))</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="n">datetime</span><span class="p">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2012</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">),</span> <span class="n">datetime</span><span class="p">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2014</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">datetime</span><span class="p">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2013</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">21</span><span class="p">),</span> <span class="p">...</span> <span class="p">,</span> <span class="n">datetime</span><span class="p">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2014</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">25</span><span class="p">),</span> <span class="n">datetime</span><span class="p">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2014</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">24</span><span class="p">),</span> <span class="n">datetime</span><span class="p">.</span><span class="n">date</span><span class="p">(</span><span class="mi">2012</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">15</span><span class="p">)]</span>
<span class="mi">2014</span><span class="o">-</span><span class="mi">12</span><span class="o">-</span><span class="mi">06</span>
<span class="mi">2012</span><span class="o">-</span><span class="mi">01</span><span class="o">-</span><span class="mi">08</span>
</code></pre></div></div>

<ul>
  <li>날짜를 숫자로! 가입 날짜(date)를 가입한 후 지난 일수(int)로 바꿔준다.</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">days</span> <span class="o">=</span> <span class="p">[]</span> <span class="c1"># 데이터를 담을 수 있는 빈 list
</span><span class="n">recent_date</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">dates</span><span class="p">)</span> <span class="c1"># 2014-12-06, 
# 가장 최근 가입일 (해당 날짜를 기준으로 기존 날짜의 값을 빼준다.)
</span><span class="k">for</span> <span class="n">date</span> <span class="ow">in</span> <span class="n">dates</span><span class="p">:</span>
    <span class="n">day_difference</span> <span class="o">=</span> <span class="n">recent_date</span> <span class="o">-</span> <span class="n">date</span>
    <span class="n">days</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">day_difference</span><span class="p">)</span>
<span class="n">df</span><span class="p">[</span><span class="s">"Customer_For"</span><span class="p">]</span> <span class="o">=</span> <span class="n">days</span>


<span class="n">df</span><span class="p">[</span><span class="s">"Customer_For"</span><span class="p">].</span><span class="n">head</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="mi">0</span>   <span class="mi">971</span> <span class="n">days</span>
<span class="mi">1</span>   <span class="mi">125</span> <span class="n">days</span>
<span class="mi">2</span>   <span class="mi">472</span> <span class="n">days</span>
<span class="mi">3</span>    <span class="mi">65</span> <span class="n">days</span>
<span class="mi">4</span>   <span class="mi">321</span> <span class="n">days</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">Customer_For</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">timedelta64</span><span class="p">[</span><span class="n">ns</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 날짜를 숫자로 type 을 변경해준다
</span><span class="n">df</span><span class="p">[</span><span class="s">"Customer_For"</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">to_numeric</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">"Customer_For"</span><span class="p">],</span> <span class="n">errors</span><span class="o">=</span><span class="s">"coerce"</span><span class="p">)</span>

<span class="c1"># errors: error는 총 3개의 옵션이 존재합니다.
# errors = 'ignore' -&gt; 만약 숫자로 변경할 수 없는 데이터라면 숫자로 변경하지 않고 원본 데이터를 그대로 반환합니다.
# errors = 'coerce' -&gt; 만약 숫자로 변경할 수 없는 데이터라면 기존 데이터를 지우고 NaN으로 설정하여 반환합니다.
# errors = 'raise' -&gt; 만약 숫자로 변경할 수 없는 데이터라면 에러를 일으키며 코드를 중단합니다.
</span></code></pre></div></div>

<p><strong>카테고리 데이터 정리하기</strong></p>
<ul>
  <li>앞에서 본 데이터 중 카테고리 형태의 데이터는 Education 과 Marital_Status :)</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[[</span><span class="s">'Education'</span><span class="p">,</span> <span class="s">'Marital_Status'</span><span class="p">]].</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Education</th>
      <th>Marital_Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Graduation</td>
      <td>Single</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Graduation</td>
      <td>Single</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Graduation</td>
      <td>Together</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Graduation</td>
      <td>Together</td>
    </tr>
    <tr>
      <th>4</th>
      <td>PhD</td>
      <td>Married</td>
    </tr>
    <tr>
      <th>5</th>
      <td>Master</td>
      <td>Together</td>
    </tr>
    <tr>
      <th>6</th>
      <td>Graduation</td>
      <td>Divorced</td>
    </tr>
    <tr>
      <th>7</th>
      <td>PhD</td>
      <td>Married</td>
    </tr>
    <tr>
      <th>8</th>
      <td>PhD</td>
      <td>Together</td>
    </tr>
    <tr>
      <th>9</th>
      <td>PhD</td>
      <td>Together</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[</span><span class="s">"Marital_Status"</span><span class="p">].</span><span class="n">value_counts</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="n">Married</span>     <span class="mi">864</span>
<span class="n">Together</span>    <span class="mi">580</span>
<span class="n">Single</span>      <span class="mi">480</span>
<span class="n">Divorced</span>    <span class="mi">232</span>
<span class="n">Widow</span>        <span class="mi">77</span>
<span class="n">Alone</span>         <span class="mi">3</span>
<span class="n">Absurd</span>        <span class="mi">2</span>
<span class="n">YOLO</span>          <span class="mi">2</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">Marital_Status</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[</span><span class="s">"Education"</span><span class="p">].</span><span class="n">value_counts</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="n">Graduation</span>    <span class="mi">1127</span>
<span class="n">PhD</span>            <span class="mi">486</span>
<span class="n">Master</span>         <span class="mi">370</span>
<span class="mi">2</span><span class="n">n</span> <span class="n">Cycle</span>       <span class="mi">203</span>
<span class="n">Basic</span>           <span class="mi">54</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">Education</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Marital_Status 정리하기, 파트너와 같이 사는지, 혼자사는지 여부
</span><span class="n">df</span><span class="p">[</span><span class="s">"Living_With"</span><span class="p">]</span><span class="o">=</span><span class="p">(</span>
    <span class="n">df</span><span class="p">[</span><span class="s">"Marital_Status"</span><span class="p">]</span>
    <span class="p">.</span><span class="n">replace</span><span class="p">(</span>
      <span class="p">{</span><span class="s">"Married"</span><span class="p">:</span><span class="s">"Partner"</span><span class="p">,</span> 
       <span class="s">"Together"</span><span class="p">:</span><span class="s">"Partner"</span><span class="p">,</span> 
       <span class="s">"Absurd"</span><span class="p">:</span><span class="s">"Alone"</span><span class="p">,</span> 
       <span class="s">"Widow"</span><span class="p">:</span><span class="s">"Alone"</span><span class="p">,</span> 
       <span class="s">"YOLO"</span><span class="p">:</span><span class="s">"Alone"</span><span class="p">,</span> 
       <span class="s">"Divorced"</span><span class="p">:</span><span class="s">"Alone"</span><span class="p">,</span> 
       <span class="s">"Single"</span><span class="p">:</span><span class="s">"Alone"</span>
    <span class="p">})</span>
<span class="p">)</span>

<span class="c1"># the number of children, Kidhome 과 Teenhome 을 분리하지 않고 합쳐준다.
</span><span class="n">df</span><span class="p">[</span><span class="s">"Children"</span><span class="p">]</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Kidhome"</span><span class="p">]</span><span class="o">+</span><span class="n">df</span><span class="p">[</span><span class="s">"Teenhome"</span><span class="p">]</span>

<span class="c1"># 위의 데이터를 통해 가족 사이즈도 구할 수 있다.
</span><span class="n">df</span><span class="p">[</span><span class="s">"Family_Size"</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span>
    <span class="n">df</span><span class="p">[</span><span class="s">"Living_With"</span><span class="p">].</span><span class="n">replace</span><span class="p">({</span><span class="s">"Alone"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"Partner"</span><span class="p">:</span><span class="mi">2</span><span class="p">})</span>
    <span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"Children"</span><span class="p">]</span>
<span class="p">)</span>

<span class="c1"># 아이가 있는지, 없는지
</span><span class="n">df</span><span class="p">[</span><span class="s">"Is_Parent"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">df</span><span class="p">.</span><span class="n">Children</span><span class="o">&gt;</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 교육 상태 정리하기
</span><span class="n">df</span><span class="p">[</span><span class="s">"Education"</span><span class="p">]</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Education"</span><span class="p">].</span><span class="n">replace</span><span class="p">({</span><span class="s">"Basic"</span><span class="p">:</span><span class="s">"Undergraduate"</span><span class="p">,</span><span class="s">"2n Cycle"</span><span class="p">:</span><span class="s">"Undergraduate"</span><span class="p">,</span> <span class="s">"Graduation"</span><span class="p">:</span><span class="s">"Graduate"</span><span class="p">,</span> <span class="s">"Master"</span><span class="p">:</span><span class="s">"Postgraduate"</span><span class="p">,</span> <span class="s">"PhD"</span><span class="p">:</span><span class="s">"Postgraduate"</span><span class="p">})</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 생년 월일을 통해 나이를 구할 수 있다.
</span><span class="n">df</span><span class="p">[</span><span class="s">"Age"</span><span class="p">]</span> <span class="o">=</span> <span class="mi">2021</span><span class="o">-</span><span class="n">df</span><span class="p">[</span><span class="s">"Year_Birth"</span><span class="p">]</span>

<span class="c1"># 다양한 잡화 구매를 더해서 총 사용한 비용 Spent 를 구한다.
</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">"MntWines"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"MntFruits"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"MntMeatProducts"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"MntFishProducts"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"MntSweetProducts"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"MntGoldProds"</span><span class="p">]</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 모두 동일한 값, 필요없는 컬럼
</span><span class="n">df</span><span class="p">.</span><span class="n">Z_CostContact</span><span class="p">.</span><span class="n">value_counts</span><span class="p">()</span>
<span class="n">df</span><span class="p">.</span><span class="n">Z_Revenue</span><span class="p">.</span><span class="n">value_counts</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="mi">11</span>    <span class="mi">2216</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">Z_Revenue</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 컬럼명 짧게 변경
</span><span class="n">df</span><span class="o">=</span><span class="n">df</span><span class="p">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">"MntWines"</span><span class="p">:</span> <span class="s">"Wines"</span><span class="p">,</span><span class="s">"MntFruits"</span><span class="p">:</span><span class="s">"Fruits"</span><span class="p">,</span><span class="s">"MntMeatProducts"</span><span class="p">:</span><span class="s">"Meat"</span><span class="p">,</span><span class="s">"MntFishProducts"</span><span class="p">:</span><span class="s">"Fish"</span><span class="p">,</span><span class="s">"MntSweetProducts"</span><span class="p">:</span><span class="s">"Sweets"</span><span class="p">,</span><span class="s">"MntGoldProds"</span><span class="p">:</span><span class="s">"Gold"</span><span class="p">})</span>

<span class="c1"># 중복되거나 필요없는 컬럼 제거
</span><span class="n">to_drop</span> <span class="o">=</span> <span class="p">[</span><span class="s">"Marital_Status"</span><span class="p">,</span> <span class="s">"Dt_Customer"</span><span class="p">,</span> <span class="s">"Z_CostContact"</span><span class="p">,</span> <span class="s">"Z_Revenue"</span><span class="p">,</span> <span class="s">"Year_Birth"</span><span class="p">,</span> <span class="s">"ID"</span><span class="p">]</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="n">to_drop</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>


<span class="n">df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Education</th>
      <th>Income</th>
      <th>Kidhome</th>
      <th>Teenhome</th>
      <th>Recency</th>
      <th>Wines</th>
      <th>Fruits</th>
      <th>Meat</th>
      <th>Fish</th>
      <th>Sweets</th>
      <th>...</th>
      <th>AcceptedCmp2</th>
      <th>Complain</th>
      <th>Response</th>
      <th>Customer_For</th>
      <th>Living_With</th>
      <th>Children</th>
      <th>Family_Size</th>
      <th>Is_Parent</th>
      <th>Age</th>
      <th>Spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Graduate</td>
      <td>58138.0</td>
      <td>0</td>
      <td>0</td>
      <td>58</td>
      <td>635</td>
      <td>88</td>
      <td>546</td>
      <td>172</td>
      <td>88</td>
      <td>...</td>
      <td>0</td>
      <td>0</td>
      <td>1</td>
      <td>83894400000000000</td>
      <td>Alone</td>
      <td>0</td>
      <td>1</td>
      <td>0</td>
      <td>64</td>
      <td>1617</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Graduate</td>
      <td>46344.0</td>
      <td>1</td>
      <td>1</td>
      <td>38</td>
      <td>11</td>
      <td>1</td>
      <td>6</td>
      <td>2</td>
      <td>1</td>
      <td>...</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>10800000000000000</td>
      <td>Alone</td>
      <td>2</td>
      <td>3</td>
      <td>1</td>
      <td>67</td>
      <td>27</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Graduate</td>
      <td>71613.0</td>
      <td>0</td>
      <td>0</td>
      <td>26</td>
      <td>426</td>
      <td>49</td>
      <td>127</td>
      <td>111</td>
      <td>21</td>
      <td>...</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>40780800000000000</td>
      <td>Partner</td>
      <td>0</td>
      <td>2</td>
      <td>0</td>
      <td>56</td>
      <td>776</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Graduate</td>
      <td>26646.0</td>
      <td>1</td>
      <td>0</td>
      <td>26</td>
      <td>11</td>
      <td>4</td>
      <td>20</td>
      <td>10</td>
      <td>3</td>
      <td>...</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>5616000000000000</td>
      <td>Partner</td>
      <td>1</td>
      <td>3</td>
      <td>1</td>
      <td>37</td>
      <td>53</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Postgraduate</td>
      <td>58293.0</td>
      <td>1</td>
      <td>0</td>
      <td>94</td>
      <td>173</td>
      <td>43</td>
      <td>118</td>
      <td>46</td>
      <td>27</td>
      <td>...</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>27734400000000000</td>
      <td>Partner</td>
      <td>1</td>
      <td>3</td>
      <td>1</td>
      <td>40</td>
      <td>422</td>
    </tr>
  </tbody>
</table>

<p><strong>Outlier 제거하기</strong>
 우리가 정리한 데이터를 다시 한 번 살펴보자</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">df</span><span class="p">.</span><span class="n">describe</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Income</th>
      <th>Kidhome</th>
      <th>Teenhome</th>
      <th>Recency</th>
      <th>Wines</th>
      <th>Fruits</th>
      <th>Meat</th>
      <th>Fish</th>
      <th>Sweets</th>
      <th>Gold</th>
      <th>...</th>
      <th>AcceptedCmp1</th>
      <th>AcceptedCmp2</th>
      <th>Complain</th>
      <th>Response</th>
      <th>Customer_For</th>
      <th>Children</th>
      <th>Family_Size</th>
      <th>Is_Parent</th>
      <th>Age</th>
      <th>Spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>count</th>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>...</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2.216000e+03</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
      <td>2216.000000</td>
    </tr>
    <tr>
      <th>mean</th>
      <td>52247.251354</td>
      <td>0.441787</td>
      <td>0.505415</td>
      <td>49.012635</td>
      <td>305.091606</td>
      <td>26.356047</td>
      <td>166.995939</td>
      <td>37.637635</td>
      <td>27.028881</td>
      <td>43.965253</td>
      <td>...</td>
      <td>0.064079</td>
      <td>0.013538</td>
      <td>0.009477</td>
      <td>0.150271</td>
      <td>4.423735e+16</td>
      <td>0.947202</td>
      <td>2.592509</td>
      <td>0.714350</td>
      <td>52.179603</td>
      <td>607.075361</td>
    </tr>
    <tr>
      <th>std</th>
      <td>25173.076661</td>
      <td>0.536896</td>
      <td>0.544181</td>
      <td>28.948352</td>
      <td>337.327920</td>
      <td>39.793917</td>
      <td>224.283273</td>
      <td>54.752082</td>
      <td>41.072046</td>
      <td>51.815414</td>
      <td>...</td>
      <td>0.244950</td>
      <td>0.115588</td>
      <td>0.096907</td>
      <td>0.357417</td>
      <td>2.008532e+16</td>
      <td>0.749062</td>
      <td>0.905722</td>
      <td>0.451825</td>
      <td>11.985554</td>
      <td>602.900476</td>
    </tr>
    <tr>
      <th>min</th>
      <td>1730.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>...</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000e+00</td>
      <td>0.000000</td>
      <td>1.000000</td>
      <td>0.000000</td>
      <td>25.000000</td>
      <td>5.000000</td>
    </tr>
    <tr>
      <th>25%</th>
      <td>35303.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>24.000000</td>
      <td>24.000000</td>
      <td>2.000000</td>
      <td>16.000000</td>
      <td>3.000000</td>
      <td>1.000000</td>
      <td>9.000000</td>
      <td>...</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>2.937600e+16</td>
      <td>0.000000</td>
      <td>2.000000</td>
      <td>0.000000</td>
      <td>44.000000</td>
      <td>69.000000</td>
    </tr>
    <tr>
      <th>50%</th>
      <td>51381.500000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>49.000000</td>
      <td>174.500000</td>
      <td>8.000000</td>
      <td>68.000000</td>
      <td>12.000000</td>
      <td>8.000000</td>
      <td>24.500000</td>
      <td>...</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>4.432320e+16</td>
      <td>1.000000</td>
      <td>3.000000</td>
      <td>1.000000</td>
      <td>51.000000</td>
      <td>396.500000</td>
    </tr>
    <tr>
      <th>75%</th>
      <td>68522.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>74.000000</td>
      <td>505.000000</td>
      <td>33.000000</td>
      <td>232.250000</td>
      <td>50.000000</td>
      <td>33.000000</td>
      <td>56.000000</td>
      <td>...</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>5.927040e+16</td>
      <td>1.000000</td>
      <td>3.000000</td>
      <td>1.000000</td>
      <td>62.000000</td>
      <td>1048.000000</td>
    </tr>
    <tr>
      <th>max</th>
      <td>666666.000000</td>
      <td>2.000000</td>
      <td>2.000000</td>
      <td>99.000000</td>
      <td>1493.000000</td>
      <td>199.000000</td>
      <td>1725.000000</td>
      <td>259.000000</td>
      <td>262.000000</td>
      <td>321.000000</td>
      <td>...</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>9.184320e+16</td>
      <td>3.000000</td>
      <td>5.000000</td>
      <td>1.000000</td>
      <td>128.000000</td>
      <td>2525.000000</td>
    </tr>
  </tbody>
</table>

<p>-&gt; Income 과 Age 의 Max 를 보자. Outlier 가 숨어있는 것 같다!</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 색상 지정
</span><span class="n">sns</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">rc</span><span class="o">=</span><span class="p">{</span><span class="s">"axes.facecolor"</span><span class="p">:</span><span class="s">"#FFF9ED"</span><span class="p">,</span><span class="s">"figure.facecolor"</span><span class="p">:</span><span class="s">"#FFF9ED"</span><span class="p">})</span>
<span class="n">pallet</span> <span class="o">=</span> <span class="p">[</span><span class="s">"#682F2F"</span><span class="p">,</span> <span class="s">"#9E726F"</span><span class="p">,</span> <span class="s">"#D6B2B1"</span><span class="p">,</span> <span class="s">"#B9C0C9"</span><span class="p">,</span> <span class="s">"#9F8A78"</span><span class="p">,</span> <span class="s">"#F3AB60"</span><span class="p">]</span>
<span class="n">cmap</span> <span class="o">=</span> <span class="n">colors</span><span class="p">.</span><span class="n">ListedColormap</span><span class="p">([</span><span class="s">"#682F2F"</span><span class="p">,</span> <span class="s">"#9E726F"</span><span class="p">,</span> <span class="s">"#D6B2B1"</span><span class="p">,</span> <span class="s">"#B9C0C9"</span><span class="p">,</span> <span class="s">"#9F8A78"</span><span class="p">,</span> <span class="s">"#F3AB60"</span><span class="p">])</span>

<span class="c1"># 아래의 정해진 컬럼들 사이의 상관관계를 그래프로 확인해 본다.
</span><span class="n">To_Plot</span> <span class="o">=</span> <span class="p">[</span> <span class="s">"Income"</span><span class="p">,</span> <span class="s">"Recency"</span><span class="p">,</span> <span class="s">"Customer_For"</span><span class="p">,</span> <span class="s">"Age"</span><span class="p">,</span> <span class="s">"Spent"</span><span class="p">,</span> <span class="s">"Is_Parent"</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Reletive Plot Of Some Selected Features: A Data Subset"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="n">To_Plot</span><span class="p">],</span> <span class="n">hue</span><span class="o">=</span> <span class="s">"Is_Parent"</span><span class="p">,</span> <span class="n">palette</span><span class="o">=</span><span class="p">([</span><span class="s">"#682F2F"</span><span class="p">,</span><span class="s">"#F3AB60"</span><span class="p">]))</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Outlier (Age, Income) 삭제
</span><span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="s">"Age"</span><span class="p">]</span><span class="o">&lt;</span> <span class="mi">90</span><span class="p">)]</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="s">"Income"</span><span class="p">]</span> <span class="o">&lt;</span> <span class="mi">600000</span><span class="p">)]</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Outlier 제거 후 데이터 row 수:"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">df</span><span class="p">))</span>


<span class="c1"># box plot 이나 분포를 보고도 알 수 있음.
</span>
<span class="o">=&gt;</span>

<span class="n">Outlier</span> <span class="n">제거</span> <span class="n">후</span> <span class="n">데이터</span> <span class="n">row</span> <span class="n">수</span><span class="p">:</span> <span class="mi">2212</span>
</code></pre></div></div>

<h3 id="-correlation-coefficients">💡 Correlation Coefficients</h3>

<ul>
  <li>correlation (상관성)이란?
    <ul>
      <li>상관성은 두 변수간의 “선형적” 관계의 정도를 의미</li>
      <li>-1 ~ 1 사이를 가지며 1에 가까울 수록 양의 선형관계, -1에 가까울 수록 음의 선형관계가 강하다는 것을 의미</li>
    </ul>
  </li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/b8e521b9-0202-4d2f-ae12-09845888f741/image.png" alt="" /></p>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/cb466b6b-3e21-48b5-bd7e-a0f57f9a12fd/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">corr</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Income</th>
      <th>Kidhome</th>
      <th>Teenhome</th>
      <th>Recency</th>
      <th>Wines</th>
      <th>Fruits</th>
      <th>Meat</th>
      <th>Fish</th>
      <th>Sweets</th>
      <th>Gold</th>
      <th>...</th>
      <th>AcceptedCmp1</th>
      <th>AcceptedCmp2</th>
      <th>Complain</th>
      <th>Response</th>
      <th>Customer_For</th>
      <th>Children</th>
      <th>Family_Size</th>
      <th>Is_Parent</th>
      <th>Age</th>
      <th>Spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Income</th>
      <td>1.000000</td>
      <td>-0.514523</td>
      <td>0.034565</td>
      <td>0.007965</td>
      <td>0.688209</td>
      <td>0.507354</td>
      <td>0.692279</td>
      <td>0.520040</td>
      <td>0.523599</td>
      <td>0.388299</td>
      <td>...</td>
      <td>0.327524</td>
      <td>0.104036</td>
      <td>-0.027900</td>
      <td>0.161387</td>
      <td>-0.027892</td>
      <td>-0.343529</td>
      <td>-0.286638</td>
      <td>-0.403132</td>
      <td>0.199977</td>
      <td>0.792740</td>
    </tr>
    <tr>
      <th>Kidhome</th>
      <td>-0.514523</td>
      <td>1.000000</td>
      <td>-0.039066</td>
      <td>0.010623</td>
      <td>-0.497203</td>
      <td>-0.373258</td>
      <td>-0.439031</td>
      <td>-0.388643</td>
      <td>-0.377843</td>
      <td>-0.354922</td>
      <td>...</td>
      <td>-0.174261</td>
      <td>-0.081911</td>
      <td>0.037067</td>
      <td>-0.077901</td>
      <td>-0.057731</td>
      <td>0.688081</td>
      <td>0.583250</td>
      <td>0.520355</td>
      <td>-0.237497</td>
      <td>-0.557949</td>
    </tr>
    <tr>
      <th>Teenhome</th>
      <td>0.034565</td>
      <td>-0.039066</td>
      <td>1.000000</td>
      <td>0.014392</td>
      <td>0.003945</td>
      <td>-0.175905</td>
      <td>-0.261134</td>
      <td>-0.205235</td>
      <td>-0.163107</td>
      <td>-0.018579</td>
      <td>...</td>
      <td>-0.145198</td>
      <td>-0.015633</td>
      <td>0.007746</td>
      <td>-0.154402</td>
      <td>0.008986</td>
      <td>0.698199</td>
      <td>0.594481</td>
      <td>0.587993</td>
      <td>0.361932</td>
      <td>-0.137964</td>
    </tr>
    <tr>
      <th>Recency</th>
      <td>0.007965</td>
      <td>0.010623</td>
      <td>0.014392</td>
      <td>1.000000</td>
      <td>0.015981</td>
      <td>-0.005257</td>
      <td>0.022914</td>
      <td>0.000788</td>
      <td>0.025244</td>
      <td>0.018148</td>
      <td>...</td>
      <td>-0.021147</td>
      <td>-0.001429</td>
      <td>0.005713</td>
      <td>-0.200114</td>
      <td>0.030748</td>
      <td>0.018062</td>
      <td>0.014717</td>
      <td>0.002189</td>
      <td>0.015694</td>
      <td>0.020479</td>
    </tr>
    <tr>
      <th>Wines</th>
      <td>0.688209</td>
      <td>-0.497203</td>
      <td>0.003945</td>
      <td>0.015981</td>
      <td>1.000000</td>
      <td>0.385844</td>
      <td>0.568081</td>
      <td>0.396915</td>
      <td>0.389583</td>
      <td>0.391461</td>
      <td>...</td>
      <td>0.351610</td>
      <td>0.206309</td>
      <td>-0.036420</td>
      <td>0.246320</td>
      <td>0.148745</td>
      <td>-0.353356</td>
      <td>-0.296702</td>
      <td>-0.341994</td>
      <td>0.164615</td>
      <td>0.892996</td>
    </tr>
    <tr>
      <th>Fruits</th>
      <td>0.507354</td>
      <td>-0.373258</td>
      <td>-0.175905</td>
      <td>-0.005257</td>
      <td>0.385844</td>
      <td>1.000000</td>
      <td>0.546740</td>
      <td>0.593038</td>
      <td>0.571474</td>
      <td>0.393459</td>
      <td>...</td>
      <td>0.192417</td>
      <td>-0.009924</td>
      <td>-0.002956</td>
      <td>0.123007</td>
      <td>0.059828</td>
      <td>-0.395161</td>
      <td>-0.341414</td>
      <td>-0.410657</td>
      <td>0.013447</td>
      <td>0.612129</td>
    </tr>
    <tr>
      <th>Meat</th>
      <td>0.692279</td>
      <td>-0.439031</td>
      <td>-0.261134</td>
      <td>0.022914</td>
      <td>0.568081</td>
      <td>0.546740</td>
      <td>1.000000</td>
      <td>0.572986</td>
      <td>0.534624</td>
      <td>0.357556</td>
      <td>...</td>
      <td>0.313379</td>
      <td>0.043549</td>
      <td>-0.021017</td>
      <td>0.237966</td>
      <td>0.071381</td>
      <td>-0.504176</td>
      <td>-0.429948</td>
      <td>-0.574147</td>
      <td>0.033622</td>
      <td>0.845543</td>
    </tr>
    <tr>
      <th>Fish</th>
      <td>0.520040</td>
      <td>-0.388643</td>
      <td>-0.205235</td>
      <td>0.000788</td>
      <td>0.396915</td>
      <td>0.593038</td>
      <td>0.572986</td>
      <td>1.000000</td>
      <td>0.583484</td>
      <td>0.426299</td>
      <td>...</td>
      <td>0.261712</td>
      <td>0.002322</td>
      <td>-0.019098</td>
      <td>0.108135</td>
      <td>0.078042</td>
      <td>-0.427482</td>
      <td>-0.363522</td>
      <td>-0.449596</td>
      <td>0.041154</td>
      <td>0.641884</td>
    </tr>
    <tr>
      <th>Sweets</th>
      <td>0.523599</td>
      <td>-0.377843</td>
      <td>-0.163107</td>
      <td>0.025244</td>
      <td>0.389583</td>
      <td>0.571474</td>
      <td>0.534624</td>
      <td>0.583484</td>
      <td>1.000000</td>
      <td>0.356754</td>
      <td>...</td>
      <td>0.245113</td>
      <td>0.010142</td>
      <td>-0.020569</td>
      <td>0.116059</td>
      <td>0.076345</td>
      <td>-0.389152</td>
      <td>-0.330705</td>
      <td>-0.402064</td>
      <td>0.021516</td>
      <td>0.606652</td>
    </tr>
    <tr>
      <th>Gold</th>
      <td>0.388299</td>
      <td>-0.354922</td>
      <td>-0.018579</td>
      <td>0.018148</td>
      <td>0.391461</td>
      <td>0.393459</td>
      <td>0.357556</td>
      <td>0.426299</td>
      <td>0.356754</td>
      <td>1.000000</td>
      <td>...</td>
      <td>0.170853</td>
      <td>0.050976</td>
      <td>-0.030166</td>
      <td>0.141096</td>
      <td>0.145632</td>
      <td>-0.267776</td>
      <td>-0.235826</td>
      <td>-0.245380</td>
      <td>0.059779</td>
      <td>0.527101</td>
    </tr>
    <tr>
      <th>NumDealsPurchases</th>
      <td>-0.108207</td>
      <td>0.216594</td>
      <td>0.386805</td>
      <td>0.002591</td>
      <td>0.009117</td>
      <td>-0.134191</td>
      <td>-0.121128</td>
      <td>-0.143147</td>
      <td>-0.121395</td>
      <td>0.053047</td>
      <td>...</td>
      <td>-0.127586</td>
      <td>-0.038064</td>
      <td>0.003744</td>
      <td>0.003226</td>
      <td>0.199994</td>
      <td>0.436072</td>
      <td>0.373986</td>
      <td>0.388593</td>
      <td>0.066156</td>
      <td>-0.065571</td>
    </tr>
    <tr>
      <th>NumWebPurchases</th>
      <td>0.459265</td>
      <td>-0.372327</td>
      <td>0.162239</td>
      <td>-0.005680</td>
      <td>0.553663</td>
      <td>0.302301</td>
      <td>0.306841</td>
      <td>0.299428</td>
      <td>0.333608</td>
      <td>0.407873</td>
      <td>...</td>
      <td>0.159100</td>
      <td>0.034722</td>
      <td>-0.013524</td>
      <td>0.151084</td>
      <td>0.171834</td>
      <td>-0.148938</td>
      <td>-0.121879</td>
      <td>-0.073473</td>
      <td>0.162265</td>
      <td>0.529095</td>
    </tr>
    <tr>
      <th>NumCatalogPurchases</th>
      <td>0.696589</td>
      <td>-0.504598</td>
      <td>-0.112477</td>
      <td>0.024197</td>
      <td>0.634237</td>
      <td>0.485611</td>
      <td>0.733787</td>
      <td>0.532241</td>
      <td>0.494623</td>
      <td>0.441656</td>
      <td>...</td>
      <td>0.309130</td>
      <td>0.099931</td>
      <td>-0.018675</td>
      <td>0.219912</td>
      <td>0.091391</td>
      <td>-0.443199</td>
      <td>-0.372319</td>
      <td>-0.452734</td>
      <td>0.125856</td>
      <td>0.780250</td>
    </tr>
    <tr>
      <th>NumStorePurchases</th>
      <td>0.631424</td>
      <td>-0.501863</td>
      <td>0.049212</td>
      <td>-0.000460</td>
      <td>0.640219</td>
      <td>0.459875</td>
      <td>0.486349</td>
      <td>0.457885</td>
      <td>0.455150</td>
      <td>0.390693</td>
      <td>...</td>
      <td>0.178462</td>
      <td>0.085146</td>
      <td>-0.011947</td>
      <td>0.035563</td>
      <td>0.104245</td>
      <td>-0.323823</td>
      <td>-0.265916</td>
      <td>-0.284891</td>
      <td>0.138998</td>
      <td>0.675981</td>
    </tr>
    <tr>
      <th>NumWebVisitsMonth</th>
      <td>-0.650257</td>
      <td>0.447258</td>
      <td>0.130985</td>
      <td>-0.018965</td>
      <td>-0.321616</td>
      <td>-0.417741</td>
      <td>-0.539194</td>
      <td>-0.446151</td>
      <td>-0.422289</td>
      <td>-0.245973</td>
      <td>...</td>
      <td>-0.195200</td>
      <td>-0.007483</td>
      <td>0.020820</td>
      <td>-0.002625</td>
      <td>0.255436</td>
      <td>0.415558</td>
      <td>0.345316</td>
      <td>0.475856</td>
      <td>-0.120282</td>
      <td>-0.498769</td>
    </tr>
    <tr>
      <th>AcceptedCmp3</th>
      <td>-0.015152</td>
      <td>0.016135</td>
      <td>-0.042797</td>
      <td>-0.032361</td>
      <td>0.061360</td>
      <td>0.014644</td>
      <td>0.018416</td>
      <td>-0.000276</td>
      <td>0.001660</td>
      <td>0.125557</td>
      <td>...</td>
      <td>0.095562</td>
      <td>0.071649</td>
      <td>0.009620</td>
      <td>0.253849</td>
      <td>-0.006858</td>
      <td>-0.019518</td>
      <td>-0.026126</td>
      <td>-0.005472</td>
      <td>-0.061097</td>
      <td>0.053037</td>
    </tr>
    <tr>
      <th>AcceptedCmp4</th>
      <td>0.219633</td>
      <td>-0.162111</td>
      <td>0.038168</td>
      <td>0.017520</td>
      <td>0.373349</td>
      <td>0.006598</td>
      <td>0.091677</td>
      <td>0.016058</td>
      <td>0.029206</td>
      <td>0.024305</td>
      <td>...</td>
      <td>0.242681</td>
      <td>0.295015</td>
      <td>-0.027030</td>
      <td>0.180032</td>
      <td>0.013873</td>
      <td>-0.088427</td>
      <td>-0.076698</td>
      <td>-0.076936</td>
      <td>0.070035</td>
      <td>0.249118</td>
    </tr>
    <tr>
      <th>AcceptedCmp5</th>
      <td>0.395569</td>
      <td>-0.204582</td>
      <td>-0.190119</td>
      <td>0.000233</td>
      <td>0.472889</td>
      <td>0.208990</td>
      <td>0.375252</td>
      <td>0.194793</td>
      <td>0.258417</td>
      <td>0.176628</td>
      <td>...</td>
      <td>0.409420</td>
      <td>0.222918</td>
      <td>-0.008378</td>
      <td>0.324891</td>
      <td>-0.023459</td>
      <td>-0.284635</td>
      <td>-0.225671</td>
      <td>-0.346693</td>
      <td>-0.019025</td>
      <td>0.468695</td>
    </tr>
    <tr>
      <th>AcceptedCmp1</th>
      <td>0.327524</td>
      <td>-0.174261</td>
      <td>-0.145198</td>
      <td>-0.021147</td>
      <td>0.351610</td>
      <td>0.192417</td>
      <td>0.313379</td>
      <td>0.261712</td>
      <td>0.245113</td>
      <td>0.170853</td>
      <td>...</td>
      <td>1.000000</td>
      <td>0.176595</td>
      <td>-0.025018</td>
      <td>0.297212</td>
      <td>-0.037171</td>
      <td>-0.230291</td>
      <td>-0.185711</td>
      <td>-0.279387</td>
      <td>0.011941</td>
      <td>0.381354</td>
    </tr>
    <tr>
      <th>AcceptedCmp2</th>
      <td>0.104036</td>
      <td>-0.081911</td>
      <td>-0.015633</td>
      <td>-0.001429</td>
      <td>0.206309</td>
      <td>-0.009924</td>
      <td>0.043549</td>
      <td>0.002322</td>
      <td>0.010142</td>
      <td>0.050976</td>
      <td>...</td>
      <td>0.176595</td>
      <td>1.000000</td>
      <td>-0.011200</td>
      <td>0.169234</td>
      <td>0.006030</td>
      <td>-0.070037</td>
      <td>-0.059505</td>
      <td>-0.081575</td>
      <td>0.007821</td>
      <td>0.136336</td>
    </tr>
    <tr>
      <th>Complain</th>
      <td>-0.027900</td>
      <td>0.037067</td>
      <td>0.007746</td>
      <td>0.005713</td>
      <td>-0.036420</td>
      <td>-0.002956</td>
      <td>-0.021017</td>
      <td>-0.019098</td>
      <td>-0.020569</td>
      <td>-0.030166</td>
      <td>...</td>
      <td>-0.025018</td>
      <td>-0.011200</td>
      <td>1.000000</td>
      <td>-0.000145</td>
      <td>0.041662</td>
      <td>0.032181</td>
      <td>0.027081</td>
      <td>0.018124</td>
      <td>0.004602</td>
      <td>-0.034135</td>
    </tr>
    <tr>
      <th>Response</th>
      <td>0.161387</td>
      <td>-0.077901</td>
      <td>-0.154402</td>
      <td>-0.200114</td>
      <td>0.246320</td>
      <td>0.123007</td>
      <td>0.237966</td>
      <td>0.108135</td>
      <td>0.116059</td>
      <td>0.141096</td>
      <td>...</td>
      <td>0.297212</td>
      <td>0.169234</td>
      <td>-0.000145</td>
      <td>1.000000</td>
      <td>0.175743</td>
      <td>-0.167937</td>
      <td>-0.218383</td>
      <td>-0.203885</td>
      <td>-0.020937</td>
      <td>0.264443</td>
    </tr>
    <tr>
      <th>Customer_For</th>
      <td>-0.027892</td>
      <td>-0.057731</td>
      <td>0.008986</td>
      <td>0.030748</td>
      <td>0.148745</td>
      <td>0.059828</td>
      <td>0.071381</td>
      <td>0.078042</td>
      <td>0.076345</td>
      <td>0.145632</td>
      <td>...</td>
      <td>-0.037171</td>
      <td>0.006030</td>
      <td>0.041662</td>
      <td>0.175743</td>
      <td>1.000000</td>
      <td>-0.034836</td>
      <td>-0.027803</td>
      <td>-0.005195</td>
      <td>-0.021057</td>
      <td>0.138590</td>
    </tr>
    <tr>
      <th>Children</th>
      <td>-0.343529</td>
      <td>0.688081</td>
      <td>0.698199</td>
      <td>0.018062</td>
      <td>-0.353356</td>
      <td>-0.395161</td>
      <td>-0.504176</td>
      <td>-0.427482</td>
      <td>-0.389152</td>
      <td>-0.267776</td>
      <td>...</td>
      <td>-0.230291</td>
      <td>-0.070037</td>
      <td>0.032181</td>
      <td>-0.167937</td>
      <td>-0.034836</td>
      <td>1.000000</td>
      <td>0.849574</td>
      <td>0.799802</td>
      <td>0.092676</td>
      <td>-0.499931</td>
    </tr>
    <tr>
      <th>Family_Size</th>
      <td>-0.286638</td>
      <td>0.583250</td>
      <td>0.594481</td>
      <td>0.014717</td>
      <td>-0.296702</td>
      <td>-0.341414</td>
      <td>-0.429948</td>
      <td>-0.363522</td>
      <td>-0.330705</td>
      <td>-0.235826</td>
      <td>...</td>
      <td>-0.185711</td>
      <td>-0.059505</td>
      <td>0.027081</td>
      <td>-0.218383</td>
      <td>-0.027803</td>
      <td>0.849574</td>
      <td>1.000000</td>
      <td>0.692370</td>
      <td>0.078593</td>
      <td>-0.424497</td>
    </tr>
    <tr>
      <th>Is_Parent</th>
      <td>-0.403132</td>
      <td>0.520355</td>
      <td>0.587993</td>
      <td>0.002189</td>
      <td>-0.341994</td>
      <td>-0.410657</td>
      <td>-0.574147</td>
      <td>-0.449596</td>
      <td>-0.402064</td>
      <td>-0.245380</td>
      <td>...</td>
      <td>-0.279387</td>
      <td>-0.081575</td>
      <td>0.018124</td>
      <td>-0.203885</td>
      <td>-0.005195</td>
      <td>0.799802</td>
      <td>0.692370</td>
      <td>1.000000</td>
      <td>-0.011841</td>
      <td>-0.521603</td>
    </tr>
    <tr>
      <th>Age</th>
      <td>0.199977</td>
      <td>-0.237497</td>
      <td>0.361932</td>
      <td>0.015694</td>
      <td>0.164615</td>
      <td>0.013447</td>
      <td>0.033622</td>
      <td>0.041154</td>
      <td>0.021516</td>
      <td>0.059779</td>
      <td>...</td>
      <td>0.011941</td>
      <td>0.007821</td>
      <td>0.004602</td>
      <td>-0.020937</td>
      <td>-0.021057</td>
      <td>0.092676</td>
      <td>0.078593</td>
      <td>-0.011841</td>
      <td>1.000000</td>
      <td>0.115901</td>
    </tr>
    <tr>
      <th>Spent</th>
      <td>0.792740</td>
      <td>-0.557949</td>
      <td>-0.137964</td>
      <td>0.020479</td>
      <td>0.892996</td>
      <td>0.612129</td>
      <td>0.845543</td>
      <td>0.641884</td>
      <td>0.606652</td>
      <td>0.527101</td>
      <td>...</td>
      <td>0.381354</td>
      <td>0.136336</td>
      <td>-0.034135</td>
      <td>0.264443</td>
      <td>0.138590</td>
      <td>-0.499931</td>
      <td>-0.424497</td>
      <td>-0.521603</td>
      <td>0.115901</td>
      <td>1.000000</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># correlation matrix 히트맵으로 표현하기
</span><span class="n">corrmat</span><span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">corr</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="mi">20</span><span class="p">))</span>  
<span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span><span class="n">corrmat</span><span class="p">,</span> <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">"Spectral"</span><span class="p">,</span> <span class="n">center</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/12c70d73-8a50-4a3a-b825-669716d848b9/image.png" alt="" /></p>

<h2 id="-데이터-전처리하기-preprocessing">👣 데이터 전처리하기: Preprocessing</h2>
<ul>
  <li>카테고리 형 데이터를 숫자형으로 바꾸기</li>
  <li>피쳐 간 규모를 맞추기 위해 Scaling 하기</li>
  <li>모델에 넣을 피쳐 - 차원 축소하기</li>
</ul>

<p>카테고리 데이터 전처리하기</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">dtypes</span>

<span class="o">=&gt;</span>

<span class="n">Education</span>               <span class="nb">object</span>
<span class="n">Income</span>                 <span class="n">float64</span>
<span class="n">Kidhome</span>                  <span class="n">int64</span>
<span class="n">Teenhome</span>                 <span class="n">int64</span>
<span class="n">Recency</span>                  <span class="n">int64</span>
<span class="n">Wines</span>                    <span class="n">int64</span>
<span class="n">Fruits</span>                   <span class="n">int64</span>
<span class="n">Meat</span>                     <span class="n">int64</span>
<span class="n">Fish</span>                     <span class="n">int64</span>
<span class="n">Sweets</span>                   <span class="n">int64</span>
<span class="n">Gold</span>                     <span class="n">int64</span>
<span class="n">NumDealsPurchases</span>        <span class="n">int64</span>
<span class="n">NumWebPurchases</span>          <span class="n">int64</span>
<span class="n">NumCatalogPurchases</span>      <span class="n">int64</span>
<span class="n">NumStorePurchases</span>        <span class="n">int64</span>
<span class="n">NumWebVisitsMonth</span>        <span class="n">int64</span>
<span class="n">AcceptedCmp3</span>             <span class="n">int64</span>
<span class="n">AcceptedCmp4</span>             <span class="n">int64</span>
<span class="n">AcceptedCmp5</span>             <span class="n">int64</span>
<span class="n">AcceptedCmp1</span>             <span class="n">int64</span>
<span class="n">AcceptedCmp2</span>             <span class="n">int64</span>
<span class="n">Complain</span>                 <span class="n">int64</span>
<span class="n">Response</span>                 <span class="n">int64</span>
<span class="n">Customer_For</span>             <span class="n">int64</span>
<span class="n">Living_With</span>             <span class="nb">object</span>
<span class="n">Children</span>                 <span class="n">int64</span>
<span class="n">Family_Size</span>              <span class="n">int64</span>
<span class="n">Is_Parent</span>                <span class="n">int64</span>
<span class="n">Age</span>                      <span class="n">int64</span>
<span class="n">Spent</span>                    <span class="n">int64</span>
<span class="n">dtype</span><span class="p">:</span> <span class="nb">object</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 데이터 중에서 object 의 데이터 형태를 가지고 있는 컬럼을 가져와라
</span><span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="n">df</span><span class="p">.</span><span class="n">dtypes</span> <span class="o">==</span> <span class="s">'object'</span><span class="p">)</span>
<span class="n">s</span><span class="p">[</span><span class="n">s</span><span class="p">.</span><span class="n">values</span> <span class="o">==</span> <span class="bp">True</span><span class="p">].</span><span class="n">index</span>
<span class="n">s</span><span class="p">[</span><span class="n">s</span><span class="p">].</span><span class="n">index</span> 

<span class="o">=&gt;</span>

<span class="n">Index</span><span class="p">([</span><span class="s">'Education'</span><span class="p">,</span> <span class="s">'Living_With'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'object'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 카테고리 형태의 데이터
</span><span class="n">s</span> <span class="o">=</span> <span class="p">(</span><span class="n">df</span><span class="p">.</span><span class="n">dtypes</span> <span class="o">==</span> <span class="s">'object'</span><span class="p">)</span>
<span class="n">object_cols</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">s</span><span class="p">].</span><span class="n">index</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="s">"카테고리 데이터 종류:"</span><span class="p">,</span> <span class="n">object_cols</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="n">카테고리</span> <span class="n">데이터</span> <span class="n">종류</span><span class="p">:</span> <span class="p">[</span><span class="s">'Education'</span><span class="p">,</span> <span class="s">'Living_With'</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 카테고리 데이터에 라벨 인코더 사용하기
# 라벨 인코더란?
</span>   <span class="c1"># 카테고리형 데이터에 숫자를 매핑하여 바꿔준다. 
</span><span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">LabelEncoder</span>

<span class="c1"># Education 바꿔주기
</span><span class="n">LE</span><span class="o">=</span><span class="n">LabelEncoder</span><span class="p">()</span>
<span class="n">df</span><span class="p">[</span><span class="s">'Education'</span><span class="p">]</span><span class="o">=</span><span class="n">df</span><span class="p">[[</span><span class="s">'Education'</span><span class="p">]].</span><span class="nb">apply</span><span class="p">(</span><span class="n">LE</span><span class="p">.</span><span class="n">fit_transform</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">[</span><span class="s">'Education'</span><span class="p">].</span><span class="n">head</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="mi">0</span>    <span class="mi">0</span>
<span class="mi">1</span>    <span class="mi">0</span>
<span class="mi">2</span>    <span class="mi">0</span>
<span class="mi">3</span>    <span class="mi">0</span>
<span class="mi">4</span>    <span class="mi">1</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">Education</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">LE</span><span class="p">.</span><span class="n">classes_</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">LE</span><span class="p">.</span><span class="n">inverse_transform</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">]))</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="s">'Graduate'</span> <span class="s">'Postgraduate'</span> <span class="s">'Undergraduate'</span><span class="p">]</span>
<span class="p">[</span><span class="s">'Postgraduate'</span> <span class="s">'Undergraduate'</span> <span class="s">'Undergraduate'</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Living_With 카테고리 -&gt; 숫자로 바꿔주기
</span><span class="n">df</span><span class="p">[</span><span class="s">'Living_With'</span><span class="p">]</span><span class="o">=</span><span class="n">df</span><span class="p">[[</span><span class="s">'Living_With'</span><span class="p">]].</span><span class="nb">apply</span><span class="p">(</span><span class="n">LE</span><span class="p">.</span><span class="n">fit_transform</span><span class="p">)</span>
<span class="n">df</span><span class="p">[</span><span class="s">'Living_With'</span><span class="p">].</span><span class="n">head</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="mi">0</span>    <span class="mi">0</span>
<span class="mi">1</span>    <span class="mi">0</span>
<span class="mi">2</span>    <span class="mi">1</span>
<span class="mi">3</span>    <span class="mi">1</span>
<span class="mi">4</span>    <span class="mi">1</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">Living_With</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
</code></pre></div></div>

<p>피쳐 간 규모를 맞추기 위해 Scaling 하기</p>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/746d9c21-7a80-46ff-bfb7-5c101b7a4bf5/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">StandardScaler</span>

<span class="n">ds</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">copy</span><span class="p">()</span>

<span class="c1"># 나중에 클러스터 별 캠페인 반응률을 살펴보기 위해 사용
# 클러스터를 구성하기 위한 피쳐에서는 제거한다.
</span><span class="n">cols_del</span> <span class="o">=</span> <span class="p">[</span><span class="s">'AcceptedCmp3'</span><span class="p">,</span> <span class="s">'AcceptedCmp4'</span><span class="p">,</span> <span class="s">'AcceptedCmp5'</span><span class="p">,</span> <span class="s">'AcceptedCmp1'</span><span class="p">,</span><span class="s">'AcceptedCmp2'</span><span class="p">,</span> <span class="s">'Complain'</span><span class="p">,</span> <span class="s">'Response'</span><span class="p">]</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">ds</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="n">cols_del</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ds</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Education</th>
      <th>Income</th>
      <th>Kidhome</th>
      <th>Teenhome</th>
      <th>Recency</th>
      <th>Wines</th>
      <th>Fruits</th>
      <th>Meat</th>
      <th>Fish</th>
      <th>Sweets</th>
      <th>...</th>
      <th>NumCatalogPurchases</th>
      <th>NumStorePurchases</th>
      <th>NumWebVisitsMonth</th>
      <th>Customer_For</th>
      <th>Living_With</th>
      <th>Children</th>
      <th>Family_Size</th>
      <th>Is_Parent</th>
      <th>Age</th>
      <th>Spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>58138.0</td>
      <td>0</td>
      <td>0</td>
      <td>58</td>
      <td>635</td>
      <td>88</td>
      <td>546</td>
      <td>172</td>
      <td>88</td>
      <td>...</td>
      <td>10</td>
      <td>4</td>
      <td>7</td>
      <td>83894400000000000</td>
      <td>0</td>
      <td>0</td>
      <td>1</td>
      <td>0</td>
      <td>64</td>
      <td>1617</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0</td>
      <td>46344.0</td>
      <td>1</td>
      <td>1</td>
      <td>38</td>
      <td>11</td>
      <td>1</td>
      <td>6</td>
      <td>2</td>
      <td>1</td>
      <td>...</td>
      <td>1</td>
      <td>2</td>
      <td>5</td>
      <td>10800000000000000</td>
      <td>0</td>
      <td>2</td>
      <td>3</td>
      <td>1</td>
      <td>67</td>
      <td>27</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0</td>
      <td>71613.0</td>
      <td>0</td>
      <td>0</td>
      <td>26</td>
      <td>426</td>
      <td>49</td>
      <td>127</td>
      <td>111</td>
      <td>21</td>
      <td>...</td>
      <td>2</td>
      <td>10</td>
      <td>4</td>
      <td>40780800000000000</td>
      <td>1</td>
      <td>0</td>
      <td>2</td>
      <td>0</td>
      <td>56</td>
      <td>776</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0</td>
      <td>26646.0</td>
      <td>1</td>
      <td>0</td>
      <td>26</td>
      <td>11</td>
      <td>4</td>
      <td>20</td>
      <td>10</td>
      <td>3</td>
      <td>...</td>
      <td>0</td>
      <td>4</td>
      <td>6</td>
      <td>5616000000000000</td>
      <td>1</td>
      <td>1</td>
      <td>3</td>
      <td>1</td>
      <td>37</td>
      <td>53</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>58293.0</td>
      <td>1</td>
      <td>0</td>
      <td>94</td>
      <td>173</td>
      <td>43</td>
      <td>118</td>
      <td>46</td>
      <td>27</td>
      <td>...</td>
      <td>3</td>
      <td>6</td>
      <td>5</td>
      <td>27734400000000000</td>
      <td>1</td>
      <td>1</td>
      <td>3</td>
      <td>1</td>
      <td>40</td>
      <td>422</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Scaling
</span><span class="n">scaler</span> <span class="o">=</span> <span class="n">StandardScaler</span><span class="p">()</span>
<span class="n">scaler</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">ds</span><span class="p">)</span> <span class="c1"># mean, variance 계산
</span>
<span class="c1"># 적용
</span><span class="n">scaled_ds</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">scaler</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">ds</span><span class="p">),</span> <span class="n">columns</span><span class="o">=</span> <span class="n">ds</span><span class="p">.</span><span class="n">columns</span> <span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"All features are now scaled"</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="n">All</span> <span class="n">features</span> <span class="n">are</span> <span class="n">now</span> <span class="n">scaled</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Scaled data to be used for reducing the dimensionality
</span><span class="k">print</span><span class="p">(</span><span class="s">"Dataframe to be used for further modelling:"</span><span class="p">)</span>
<span class="n">scaled_ds</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Education</th>
      <th>Income</th>
      <th>Kidhome</th>
      <th>Teenhome</th>
      <th>Recency</th>
      <th>Wines</th>
      <th>Fruits</th>
      <th>Meat</th>
      <th>Fish</th>
      <th>Sweets</th>
      <th>...</th>
      <th>NumCatalogPurchases</th>
      <th>NumStorePurchases</th>
      <th>NumWebVisitsMonth</th>
      <th>Customer_For</th>
      <th>Living_With</th>
      <th>Children</th>
      <th>Family_Size</th>
      <th>Is_Parent</th>
      <th>Age</th>
      <th>Spent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>-0.893586</td>
      <td>0.287105</td>
      <td>-0.822754</td>
      <td>-0.929699</td>
      <td>0.310353</td>
      <td>0.977660</td>
      <td>1.552041</td>
      <td>1.690293</td>
      <td>2.453472</td>
      <td>1.483713</td>
      <td>...</td>
      <td>2.503607</td>
      <td>-0.555814</td>
      <td>0.692181</td>
      <td>1.973583</td>
      <td>-1.349603</td>
      <td>-1.264598</td>
      <td>-1.758359</td>
      <td>-1.581139</td>
      <td>1.018352</td>
      <td>1.676245</td>
    </tr>
    <tr>
      <th>1</th>
      <td>-0.893586</td>
      <td>-0.260882</td>
      <td>1.040021</td>
      <td>0.908097</td>
      <td>-0.380813</td>
      <td>-0.872618</td>
      <td>-0.637461</td>
      <td>-0.718230</td>
      <td>-0.651004</td>
      <td>-0.634019</td>
      <td>...</td>
      <td>-0.571340</td>
      <td>-1.171160</td>
      <td>-0.132545</td>
      <td>-1.665144</td>
      <td>-1.349603</td>
      <td>1.404572</td>
      <td>0.449070</td>
      <td>0.632456</td>
      <td>1.274785</td>
      <td>-0.963297</td>
    </tr>
    <tr>
      <th>2</th>
      <td>-0.893586</td>
      <td>0.913196</td>
      <td>-0.822754</td>
      <td>-0.929699</td>
      <td>-0.795514</td>
      <td>0.357935</td>
      <td>0.570540</td>
      <td>-0.178542</td>
      <td>1.339513</td>
      <td>-0.147184</td>
      <td>...</td>
      <td>-0.229679</td>
      <td>1.290224</td>
      <td>-0.544908</td>
      <td>-0.172664</td>
      <td>0.740959</td>
      <td>-1.264598</td>
      <td>-0.654644</td>
      <td>-1.581139</td>
      <td>0.334530</td>
      <td>0.280110</td>
    </tr>
    <tr>
      <th>3</th>
      <td>-0.893586</td>
      <td>-1.176114</td>
      <td>1.040021</td>
      <td>-0.929699</td>
      <td>-0.795514</td>
      <td>-0.872618</td>
      <td>-0.561961</td>
      <td>-0.655787</td>
      <td>-0.504911</td>
      <td>-0.585335</td>
      <td>...</td>
      <td>-0.913000</td>
      <td>-0.555814</td>
      <td>0.279818</td>
      <td>-1.923210</td>
      <td>0.740959</td>
      <td>0.069987</td>
      <td>0.449070</td>
      <td>0.632456</td>
      <td>-1.289547</td>
      <td>-0.920135</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.571657</td>
      <td>0.294307</td>
      <td>1.040021</td>
      <td>-0.929699</td>
      <td>1.554453</td>
      <td>-0.392257</td>
      <td>0.419540</td>
      <td>-0.218684</td>
      <td>0.152508</td>
      <td>-0.001133</td>
      <td>...</td>
      <td>0.111982</td>
      <td>0.059532</td>
      <td>-0.132545</td>
      <td>-0.822130</td>
      <td>0.740959</td>
      <td>0.069987</td>
      <td>0.449070</td>
      <td>0.632456</td>
      <td>-1.033114</td>
      <td>-0.307562</td>
    </tr>
  </tbody>
</table>

<ul>
  <li>여기까지 카테고리 데이터를 LabelEncoder 를 통해 숫자형으로 변환</li>
  <li>Scaler 를 사용하여 데이터를 정규화</li>
</ul>

<h3 id="-dimension-reduction">💡 Dimension Reduction</h3>

<ul>
  <li>앞의 상관계수 행렬에서 봤듯이, 서로 관련있는 피쳐들이 많음.</li>
  <li>변수가 너무 많으면, 모델링이 적절하게 되지 않을 수 있음.</li>
</ul>

<p>-&gt; 우리 데이터에 상관성이 많고, 피쳐가 너무 많은데 클러스터를 수행하려다 보니
PCA(차원 축소 기법)을 사용해서 차원을 축소 시켜 보자.</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># PCA 로 3차원으로 데이터를 줄인다.
# sklearn 에서 PCA 모듈을 불러온다.
</span><span class="kn">from</span> <span class="nn">sklearn.decomposition</span> <span class="kn">import</span> <span class="n">PCA</span>

<span class="n">pca</span> <span class="o">=</span> <span class="n">PCA</span><span class="p">(</span><span class="n">n_components</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">pca</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">scaled_ds</span><span class="p">)</span>
<span class="n">PCA_ds</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">pca</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">scaled_ds</span><span class="p">),</span> <span class="n">columns</span><span class="o">=</span><span class="p">([</span><span class="s">"col1"</span><span class="p">,</span><span class="s">"col2"</span><span class="p">,</span> <span class="s">"col3"</span><span class="p">]))</span>

<span class="n">PCA_ds</span><span class="p">.</span><span class="n">describe</span><span class="p">().</span><span class="n">T</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>count</th>
      <th>mean</th>
      <th>std</th>
      <th>min</th>
      <th>25%</th>
      <th>50%</th>
      <th>75%</th>
      <th>max</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>col1</th>
      <td>2212.0</td>
      <td>-2.569775e-17</td>
      <td>2.878377</td>
      <td>-5.969394</td>
      <td>-2.538494</td>
      <td>-0.780421</td>
      <td>2.383290</td>
      <td>7.444305</td>
    </tr>
    <tr>
      <th>col2</th>
      <td>2212.0</td>
      <td>-8.994212e-17</td>
      <td>1.706839</td>
      <td>-4.312106</td>
      <td>-1.328321</td>
      <td>-0.158197</td>
      <td>1.242240</td>
      <td>6.142664</td>
    </tr>
    <tr>
      <th>col3</th>
      <td>2212.0</td>
      <td>7.388103e-17</td>
      <td>1.221956</td>
      <td>-3.530975</td>
      <td>-0.828811</td>
      <td>-0.021838</td>
      <td>0.800092</td>
      <td>6.615874</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PCA_ds</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>col1</th>
      <th>col2</th>
      <th>col3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>4.994347</td>
      <td>-0.151233</td>
      <td>2.647706</td>
    </tr>
    <tr>
      <th>1</th>
      <td>-2.884455</td>
      <td>-0.006707</td>
      <td>-1.863811</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2.617864</td>
      <td>-0.720806</td>
      <td>-0.254408</td>
    </tr>
    <tr>
      <th>3</th>
      <td>-2.676036</td>
      <td>-1.541954</td>
      <td>-0.922951</td>
    </tr>
    <tr>
      <th>4</th>
      <td>-0.649591</td>
      <td>0.209833</td>
      <td>-0.021460</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#A 3D Projection Of Data In The Reduced Dimension
</span><span class="n">x</span> <span class="o">=</span><span class="n">PCA_ds</span><span class="p">[</span><span class="s">"col1"</span><span class="p">]</span>
<span class="n">y</span> <span class="o">=</span><span class="n">PCA_ds</span><span class="p">[</span><span class="s">"col2"</span><span class="p">]</span>
<span class="n">z</span> <span class="o">=</span><span class="n">PCA_ds</span><span class="p">[</span><span class="s">"col3"</span><span class="p">]</span>

<span class="c1">#3D 플랏 그리기
# https://www.python-graph-gallery.com/370-3d-scatterplot
</span><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="p">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">projection</span><span class="o">=</span><span class="s">"3d"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">z</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s">"maroon"</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s">"o"</span> <span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"A 3D Projection Of Data In The Reduced Dimension"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/a43a2837-7263-4978-85eb-eb694973027f/image.png" alt="" /></p>

<h2 id="-유저-세그먼트-나누기-clustering">👣 유저 세그먼트 나누기: Clustering</h2>

<h3 id="-elbow-method">💡 Elbow Method</h3>

<p>데이터를 몇 개로 나눠야 할까?</p>

<ul>
  <li>
    <p>Elbow Method</p>
  </li>
  <li>WSS(군집 내 분산)은 작을 수록 군집의 중심에 많이 모여있는 것이므로 WSS(군집 내 분산)이 작을 수록 좋다.</li>
  <li>하지만 클러스터를 늘려서 더 이상 작아지지 않는 한계점이 있다면 더 이상 클러스터 수를 증가시키지 않아도 좋다.</li>
</ul>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/30e2462d-adb0-427a-b402-ca2adf8c4a0e/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 이미지 출처: https://heung-bae-lee.github.io/2020/05/30/machine_learning_19/
</span></code></pre></div></div>

<p>``py
from sklearn.cluster import KMeans</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>```py
# Quick examination of elbow method to find numbers of clusters to make.
print('Elbow Method to determine the number of clusters to be formed:')

distortions = []

# 클러스터 개수 1 ~ 10 까지 늘려보면서
# 클러스터 내 거리합을 저장한다.
K = range(1,10)
for k in K:
    kmeanModel = KMeans(n_clusters=k)
    kmeanModel.fit(PCA_ds)
    distortions.append(kmeanModel.inertia_) # within-cluster sum-of-squares criterion
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">distortions</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="mf">28060.968908252147</span><span class="p">,</span>
 <span class="mf">13841.123695080303</span><span class="p">,</span>
 <span class="mf">9625.179269985394</span><span class="p">,</span>
 <span class="mf">7482.577127923258</span><span class="p">,</span>
 <span class="mf">6590.895912178157</span><span class="p">,</span>
 <span class="mf">5885.488242066771</span><span class="p">,</span>
 <span class="mf">5317.942980809662</span><span class="p">,</span>
 <span class="mf">4867.552990155307</span><span class="p">,</span>
 <span class="mf">4545.679730812983</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">6</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">K</span><span class="p">,</span> <span class="n">distortions</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'k'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Distortion'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'The Elbow Method showing the optimal k'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/22ad5527-0c66-49b2-9ec8-c982855efbd6/image.png" alt="" /></p>

<p>클러스터링을 해보자!</p>
<ul>
  <li>여기서는 Hierarchical Clustering 을 먼저 적용해 보았다.</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.cluster</span> <span class="kn">import</span> <span class="n">AgglomerativeClustering</span>


<span class="c1"># 유사도 기준: affinity = euclidean
</span><span class="n">AC</span> <span class="o">=</span> <span class="n">AgglomerativeClustering</span><span class="p">(</span><span class="n">n_clusters</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>

<span class="c1"># fit model and predict clusters
</span><span class="n">yhat_AC</span> <span class="o">=</span> <span class="n">AC</span><span class="p">.</span><span class="n">fit_predict</span><span class="p">(</span><span class="n">PCA_ds</span><span class="p">)</span>
<span class="n">PCA_ds</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">]</span> <span class="o">=</span> <span class="n">yhat_AC</span>

<span class="c1"># 기존 데이터에도 넣는다.
</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">]</span><span class="o">=</span> <span class="n">yhat_AC</span>
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Plotting the clusters
</span><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">projection</span><span class="o">=</span><span class="s">'3d'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"bla"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">40</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">PCA_ds</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">cmap</span> <span class="o">=</span> <span class="n">cmap</span> <span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"The Plot Of The Clusters"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/2940d356-4b2d-415c-9a1d-aeaa6a9edffe/image.png" alt="" /></p>

<h2 id="-모델-평가하기-evaluation">👣 모델 평가하기: Evaluation</h2>
<p>EDA 를 통해 만들어진 클러스터와 그 특성 파악하기
그룹이 골고루 분포되어 있을까?</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Plotting countplot of clusters
</span><span class="n">pal</span> <span class="o">=</span> <span class="p">[</span><span class="s">"#682F2F"</span><span class="p">,</span><span class="s">"#B9C0C9"</span><span class="p">,</span> <span class="s">"#9F8A78"</span><span class="p">,</span><span class="s">"#F3AB60"</span><span class="p">]</span>
<span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">palette</span><span class="o">=</span><span class="n">pal</span><span class="p">)</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Distribution Of The Clusters"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/1fbaf96a-d867-4c45-a0da-79d6db48c8de/image.png" alt="" /></p>

<p>클러스터 별 구매금액과 연간 수입을 알아보자</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">df</span><span class="p">,</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Income"</span><span class="p">],</span><span class="n">hue</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">palette</span><span class="o">=</span> <span class="n">pal</span><span class="p">)</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Cluster's Profile Based On Income And Spending"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>


<span class="c1"># group 3: high spending &amp; low income (3)
# group 2: low spending &amp; low income (2)
# group 0: high spending &amp; average income (0)
# group 1: high spending &amp; high income (1)
</span></code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/e6313143-e7bc-40f0-b413-5471efefeb8e/image.png" alt="" /></p>

<p>구매를 많이하는 고객층은?</p>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">pl</span><span class="o">=</span><span class="n">sns</span><span class="p">.</span><span class="n">swarmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span> <span class="s">"#CBEDDD"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span> <span class="p">)</span>
<span class="n">pl</span><span class="o">=</span><span class="n">sns</span><span class="p">.</span><span class="n">boxenplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">],</span> <span class="n">palette</span><span class="o">=</span><span class="n">pal</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/d9bd5c10-9927-47e9-8111-4e90969f925a/image.png" alt="" /></p>

<p>어떤 클러스터에서 캠페인 반응이 높을까?</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Creating a feature to get a sum of accepted promotions 
</span><span class="n">df</span><span class="p">[</span><span class="s">"Total_Promos"</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">"AcceptedCmp1"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"AcceptedCmp2"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"AcceptedCmp3"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"AcceptedCmp4"</span><span class="p">]</span><span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"AcceptedCmp5"</span><span class="p">]</span>

<span class="c1"># Plotting count of total campaign accepted.
</span><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Total_Promos"</span><span class="p">],</span><span class="n">hue</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">palette</span><span class="o">=</span> <span class="n">pal</span><span class="p">)</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Count Of Promotion Accepted"</span><span class="p">)</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Number Of Total Accepted Promotions"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>

<span class="c1"># 대부분 반응을 하지 않았다. 프로모션이 더 잘 디자인 될 필요가 있다.
</span></code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/807370d2-5fc6-4fee-9aa2-4aac32154ae5/image.png" alt="" /></p>

<p>할인에 잘 반응한 그룹은?</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Plotting the number of deals purchased
</span><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">pl</span><span class="o">=</span><span class="n">sns</span><span class="p">.</span><span class="n">boxenplot</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"NumDealsPurchases"</span><span class="p">],</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">palette</span><span class="o">=</span> <span class="n">pal</span><span class="p">)</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Number of Deals Purchased"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/1448db97-0710-4a4b-a76c-0d0587e8cf77/image.png" alt="" /></p>

<p>가족 구성원과 나이대는 어떻게 될까?</p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Personal</span> <span class="o">=</span> <span class="p">[</span> <span class="s">"Income"</span><span class="p">,</span><span class="s">"Kidhome"</span><span class="p">,</span><span class="s">"Teenhome"</span><span class="p">,</span><span class="s">"Customer_For"</span><span class="p">,</span> <span class="s">"Age"</span><span class="p">,</span> <span class="s">"Children"</span><span class="p">,</span> <span class="s">"Family_Size"</span><span class="p">,</span> <span class="s">"Is_Parent"</span><span class="p">,</span> <span class="s">"Education"</span><span class="p">,</span><span class="s">"Living_With"</span><span class="p">]</span>

<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">Personal</span><span class="p">:</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">],</span> <span class="n">palette</span><span class="o">=</span><span class="n">pal</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/eb396c81-c395-40da-8fab-9be950783d97/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/e248bc4d-ebcd-4833-b510-1a46e86008da/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/b2974746-9ba0-4dd4-b62c-7b9e05f50f54/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/ddbe0cf4-4697-4b77-a7eb-05128016b7bf/image.png" alt="" />
<img src="https://velog.velcdn.com/images/yy2hi/post/8dbd3dec-8f37-4928-a064-7d3b72bc6b49/image.png" alt="" /></p>

<ul>
  <li>부모이며, 가족 구성원이 2 -4, 한부모 가정, 10대, 비교적 나이가 있음</li>
  <li>부모가 아니며, 많아도 2명정도의 가족 구성원, 전 연령에 골고루 있음, 연간 수입이 높음</li>
  <li>대부분 부모이며, 최대 4명의 가족 구성원, 10대가 아닌 1명의 어린 아이를 두고 있음, 상대적으로 어림</li>
  <li>부모이며, 최대 5명까지의 가족 구성원, 상대적으로 나이가 있고, 수입이 낮음</li>
</ul>

<h2 id="-다양한-클러스터링-모델-사용해보기">👣 다양한 클러스터링 모델 사용해보기</h2>

<h3 id="density-based-clustering-1">💡Density Based Clustering</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.cluster</span> <span class="kn">import</span> <span class="n">DBSCAN</span>
<span class="c1"># Initiating the DBSCAN Clustering model 
</span><span class="n">DP</span> <span class="o">=</span> <span class="n">DBSCAN</span><span class="p">(</span><span class="n">eps</span><span class="o">=</span><span class="mf">0.30</span><span class="p">,</span> <span class="n">min_samples</span><span class="o">=</span><span class="mi">9</span><span class="p">)</span>

<span class="c1"># fit model and predict clusters
</span><span class="n">DP_df</span> <span class="o">=</span> <span class="n">DP</span><span class="p">.</span><span class="n">fit_predict</span><span class="p">(</span><span class="n">PCA_ds</span><span class="p">)</span>
<span class="n">PCA_ds</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">]</span> <span class="o">=</span> <span class="n">DP_df</span>

<span class="c1"># Adding the Clusters feature to the orignal dataframe.
</span><span class="n">df</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">]</span><span class="o">=</span> <span class="n">DP_df</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Plotting the clusters
</span><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">111</span><span class="p">,</span> <span class="n">projection</span><span class="o">=</span><span class="s">'3d'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"bla"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">40</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="n">PCA_ds</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">'o'</span><span class="p">,</span> <span class="n">cmap</span> <span class="o">=</span> <span class="s">'viridis'</span> <span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"The Plot Of The Clusters"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/67480ade-520a-4bbc-aa35-29992ed93446/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Plotting countplot of clusters
</span><span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">])</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Distribution Of The Clusters"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/5877bb2c-5e39-4d9a-be4f-f7116d561834/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">df</span><span class="p">,</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Income"</span><span class="p">],</span><span class="n">hue</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">])</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Cluster's Profile Based On Income And Spending"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/5c8176e0-1e33-4017-80ac-a935623cea1b/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">pl</span><span class="o">=</span><span class="n">sns</span><span class="p">.</span><span class="n">swarmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span> <span class="s">"#CBEDDD"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span> <span class="p">)</span>
<span class="n">pl</span><span class="o">=</span><span class="n">sns</span><span class="p">.</span><span class="n">boxenplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/6169ee79-4ca3-48aa-bc6e-1aadfac421e6/image.png" alt="" /></p>

<h3 id="partition-based-clustering-1">💡Partition Based Clustering</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Kmeans</span> <span class="o">=</span> <span class="n">KMeans</span><span class="p">(</span><span class="n">n_clusters</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="c1"># fit model and predict clusters
</span><span class="n">Kmeans_df</span> <span class="o">=</span> <span class="n">Kmeans</span><span class="p">.</span><span class="n">fit_predict</span><span class="p">(</span><span class="n">PCA_ds</span><span class="p">)</span>
<span class="n">PCA_ds</span><span class="p">[</span><span class="s">"Kmeans_Clusters"</span><span class="p">]</span> <span class="o">=</span> <span class="n">Kmeans_df</span>
<span class="c1">#Adding the Clusters feature to the orignal dataframe.
</span><span class="n">df</span><span class="p">[</span><span class="s">"Kmeans_Clusters"</span><span class="p">]</span><span class="o">=</span> <span class="n">Kmeans_df</span>
</code></pre></div></div>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Plotting countplot of clusters
</span><span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Kmeans_Clusters"</span><span class="p">])</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Distribution Of The Clusters"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/0d4eb129-f2f9-4420-9953-28d660610a0d/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pl</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">df</span><span class="p">,</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Spent"</span><span class="p">],</span> <span class="n">y</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Income"</span><span class="p">],</span><span class="n">hue</span><span class="o">=</span><span class="n">df</span><span class="p">[</span><span class="s">"Kmeans_Clusters"</span><span class="p">])</span>
<span class="n">pl</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Cluster's Profile Based On Income And Spending"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="https://velog.velcdn.com/images/yy2hi/post/b2937793-0e24-4b6f-b2ba-f0dc57f68c01/image.png" alt="" /></p>

<h3 id="-silhouette-coefficient">💡 Silhouette Coefficient</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">silhouette_samples</span>

<span class="c1"># Hierachical
</span><span class="n">sample_silhouette_values</span> <span class="o">=</span> <span class="n">silhouette_samples</span><span class="p">(</span><span class="n">PCA_ds</span><span class="p">,</span> <span class="n">PCA_ds</span><span class="p">[</span><span class="s">"Clusters"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Hierachical"</span><span class="p">,</span><span class="n">sample_silhouette_values</span><span class="p">.</span><span class="n">mean</span><span class="p">())</span>

<span class="c1"># Density - DBSCAN
</span><span class="n">sample_silhouette_values</span> <span class="o">=</span> <span class="n">silhouette_samples</span><span class="p">(</span><span class="n">PCA_ds</span><span class="p">,</span> <span class="n">PCA_ds</span><span class="p">[</span><span class="s">"DBSCAN_Clusters"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Density"</span><span class="p">,</span><span class="n">sample_silhouette_values</span><span class="p">.</span><span class="n">mean</span><span class="p">())</span>


<span class="c1"># Kmeans
</span><span class="n">sample_silhouette_values</span> <span class="o">=</span> <span class="n">silhouette_samples</span><span class="p">(</span><span class="n">PCA_ds</span><span class="p">,</span> <span class="n">PCA_ds</span><span class="p">[</span><span class="s">"Kmeans_Clusters"</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Kmeans"</span><span class="p">,</span><span class="n">sample_silhouette_values</span><span class="p">.</span><span class="n">mean</span><span class="p">())</span>

<span class="o">=&gt;</span>

<span class="n">Hierachical</span> <span class="mf">0.45273806967143565</span>
<span class="n">Density</span> <span class="mf">0.05700586237988258</span>
<span class="n">Kmeans</span> <span class="mf">0.46554372154673307</span>
</code></pre></div></div>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[클러스터링, 군집화]]></summary></entry><entry><title type="html">Project 2 - 서울시 범죄 현황 데이터 분석 (1)</title><link href="https://yy2-hi.github.io/dataanalysis/crimeanalysis1/" rel="alternate" type="text/html" title="Project 2 - 서울시 범죄 현황 데이터 분석 (1)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/crimeanalysis1</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/crimeanalysis1/"><![CDATA[<h1 id="project-02-analysis-seoul-crime">Project 02. Analysis Seoul Crime</h1>
<h2 id="프로젝트-개요">프로젝트 개요</h2>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/6ecd2ac0-36b5-4ebc-b513-e3dc092f7e90/image.png" alt="" /></p>

<ul>
  <li>실제 강남3구가 범죄로부터 안전한지 데이터로 확인</li>
</ul>

<h2 id="데이터-개요">데이터 개요</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>

<span class="c1"># 데이터 읽기
</span><span class="n">crime_raw_data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">'../data/02. crime_in_Seoul.csv'</span><span class="p">,</span> <span class="n">thousands</span><span class="o">=</span><span class="s">","</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"euc-kr"</span><span class="p">)</span>
                                                            <span class="c1"># thousands 숫자값을 문자로 인식할 수 있기 때문에 설정
</span><span class="n">crime_raw_data</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th style="text-align: center"> </th>
      <th style="text-align: center">구분</th>
      <th>죄종</th>
      <th style="text-align: center">발생검거</th>
      <th>건수</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: center">0</td>
      <td style="text-align: center">중부</td>
      <td>살인</td>
      <td style="text-align: center">발생</td>
      <td>2.0</td>
    </tr>
    <tr>
      <td style="text-align: center">1</td>
      <td style="text-align: center">중부</td>
      <td>살인</td>
      <td style="text-align: center">검거</td>
      <td>2.0</td>
    </tr>
    <tr>
      <td style="text-align: center">2</td>
      <td style="text-align: center">중부</td>
      <td>강도</td>
      <td style="text-align: center">발생</td>
      <td>3.0</td>
    </tr>
    <tr>
      <td style="text-align: center">3</td>
      <td style="text-align: center">중부</td>
      <td>강도</td>
      <td style="text-align: center">검거</td>
      <td>3.0</td>
    </tr>
    <tr>
      <td style="text-align: center">4</td>
      <td style="text-align: center">중부</td>
      <td>강간</td>
      <td style="text-align: center">발생</td>
      <td>141.0</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_raw_data</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
RangeIndex: 65534 entries, 0 to 65533
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   구분      310 non-null    object 
 1   죄종      310 non-null    object 
 2   발생검거    310 non-null    object 
 3   건수      310 non-null    float64
dtypes: float64(1), object(3)
memory usage: 2.0+ MB
</span></code></pre></div></div>
<ul>
  <li>info(): 데이터의 개요 확인하기</li>
  <li>
    <h2 id="rangeindex가-65534인데-데이터가-310개뿐인-것-확인">RangeIndex가 65534인데, 데이터가 310개뿐인 것 확인</h2>
    <p>```python
crime_raw_data[‘죄종’].unique()</p>
  </li>
</ul>

<p>=&gt;</p>

<p>array([‘살인’, ‘강도’, ‘강간’, ‘절도’, ‘폭력’, nan], dtype=object)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- 특정 컬럼에서 unique 조사
- nan 값 발견
---
```python
crime_raw_data[crime_raw_data['죄종'].isnull()].head()
</code></pre></div></div>
<p>|	|구분|죄종|발생검거|건수|
|::|:-:|:–:|—–|:–:|
|310|NaN|NaN|NaN|NaN|
|311|NaN|NaN|NaN|NaN|
|312|NaN|NaN|NaN|NaN|
|313|NaN|NaN|NaN|NaN|
|314|NaN|NaN|NaN|NaN|</p>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_raw_data</span> <span class="o">=</span> <span class="n">crime_raw_data</span><span class="p">[</span><span class="n">crime_raw_data</span><span class="p">[</span><span class="s">'죄종'</span><span class="p">].</span><span class="n">notnull</span><span class="p">()]</span>
<span class="n">crime_raw_data</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
Int64Index: 310 entries, 0 to 309
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   구분      310 non-null    object 
 1   죄종      310 non-null    object 
 2   발생검거    310 non-null    object 
 3   건수      310 non-null    float64
dtypes: float64(1), object(3)
memory usage: 12.1+ KB
</span></code></pre></div></div>
<hr />
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_raw_data</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<p>|	|구분|죄종|발생검거|건수|
|::|:-:|:–:|—–|:–:|
|0|	중부|	살인|	발생|	2.0|
|1|	중부|	살인|	검거|	2.0|
|2|	중부|	강도|	발생|	3.0|
|3|	중부|	강도|	검거|	3.0|
|4|	중부|	강간|	발생|	141.0|</p>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_raw_data</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<p>|	|구분|죄종|발생검거|건수|
|::|:-:|:–:|—–|:–:|
|305|	수서|	강간|	검거|	144.0|
|306|	수서|	절도|	발생|	1149.0|
|307|	수서|	절도|	검거|	789.0|
|308|	수서|	폭력|	발생|	1666.0|
|309|	수서|	폭력|	검거|	1431.0|</p>

<hr />

<h2 id="pandas-pivot-table">Pandas pivot table</h2>
<ul>
  <li>index, columns, values, aggfunc</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_excel</span><span class="p">(</span><span class="s">"../data/02. sales-funnel.xlsx"</span><span class="p">)</span>
<span class="n">df</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th style="text-align: left"> </th>
      <th>Account</th>
      <th>Name</th>
      <th>Rep</th>
      <th>Manager</th>
      <th>Product</th>
      <th>Quantity</th>
      <th>Price</th>
      <th>Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">0</td>
      <td>714466</td>
      <td>Trantow-Barrows</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>CPU</td>
      <td>1</td>
      <td>30000</td>
      <td>presented</td>
    </tr>
    <tr>
      <td style="text-align: left">1</td>
      <td>714466</td>
      <td>Trantow-Barrows</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>Software</td>
      <td>1</td>
      <td>10000</td>
      <td>presented</td>
    </tr>
    <tr>
      <td style="text-align: left">2</td>
      <td>714466</td>
      <td>Trantow-Barrows</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>Maintenance</td>
      <td>2</td>
      <td>5000</td>
      <td>pending</td>
    </tr>
    <tr>
      <td style="text-align: left">3</td>
      <td>737550</td>
      <td>Fritsch, Russel and Anderson</td>
      <td>Craig Booker</td>
      <td>Debra Henley</td>
      <td>CPU</td>
      <td>1</td>
      <td>35000</td>
      <td>declined</td>
    </tr>
    <tr>
      <td style="text-align: left">4</td>
      <td>146832</td>
      <td>Kiehn-Spinka</td>
      <td>Daniel Hilton</td>
      <td>Debra Henley</td>
      <td>CPU</td>
      <td>2</td>
      <td>65000</td>
      <td>won</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="index-설정">index 설정</h3>
<h5 id="name-컬럼을-인덱스로-설정">Name 컬럼을 인덱스로 설정</h5>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="s">"Name"</span><span class="p">)</span>	<span class="c1"># pd.pivot_table(df, index="Name")
</span></code></pre></div></div>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Account</th>
      <th>Price</th>
      <th>Quantity</th>
    </tr>
    <tr>
      <th>Name</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Barton LLC</th>
      <td>740150</td>
      <td>35000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Fritsch, Russel and Anderson</th>
      <td>737550</td>
      <td>35000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Herman LLC</th>
      <td>141962</td>
      <td>65000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Jerde-Hilpert</th>
      <td>412290</td>
      <td>5000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Kassulke, Ondricka and Metz</th>
      <td>307599</td>
      <td>7000</td>
      <td>3.000000</td>
    </tr>
    <tr>
      <th>Keeling LLC</th>
      <td>688981</td>
      <td>100000</td>
      <td>5.000000</td>
    </tr>
    <tr>
      <th>Kiehn-Spinka</th>
      <td>146832</td>
      <td>65000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Koepp Ltd</th>
      <td>729833</td>
      <td>35000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Kulas Inc</th>
      <td>218895</td>
      <td>25000</td>
      <td>1.500000</td>
    </tr>
    <tr>
      <th>Purdy-Kunde</th>
      <td>163416</td>
      <td>30000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Stokes LLC</th>
      <td>239344</td>
      <td>7500</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Trantow-Barrows</th>
      <td>714466</td>
      <td>15000</td>
      <td>1.333333</td>
    </tr>
  </tbody>
</table>

<hr />
<h4 id="멀티-인덱스-설정">멀티 인덱스 설정</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Name"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">,</span> <span class="s">"Manager"</span><span class="p">])</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th></th>
      <th>Account</th>
      <th>Price</th>
      <th>Quantity</th>
    </tr>
    <tr>
      <th>Name</th>
      <th>Rep</th>
      <th>Manager</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Barton LLC</th>
      <th>John Smith</th>
      <th>Debra Henley</th>
      <td>740150</td>
      <td>35000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Fritsch, Russel and Anderson</th>
      <th>Craig Booker</th>
      <th>Debra Henley</th>
      <td>737550</td>
      <td>35000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Herman LLC</th>
      <th>Cedric Moss</th>
      <th>Fred Anderson</th>
      <td>141962</td>
      <td>65000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Jerde-Hilpert</th>
      <th>John Smith</th>
      <th>Debra Henley</th>
      <td>412290</td>
      <td>5000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Kassulke, Ondricka and Metz</th>
      <th>Wendy Yule</th>
      <th>Fred Anderson</th>
      <td>307599</td>
      <td>7000</td>
      <td>3.000000</td>
    </tr>
    <tr>
      <th>Keeling LLC</th>
      <th>Wendy Yule</th>
      <th>Fred Anderson</th>
      <td>688981</td>
      <td>100000</td>
      <td>5.000000</td>
    </tr>
    <tr>
      <th>Kiehn-Spinka</th>
      <th>Daniel Hilton</th>
      <th>Debra Henley</th>
      <td>146832</td>
      <td>65000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Koepp Ltd</th>
      <th>Wendy Yule</th>
      <th>Fred Anderson</th>
      <td>729833</td>
      <td>35000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Kulas Inc</th>
      <th>Daniel Hilton</th>
      <th>Debra Henley</th>
      <td>218895</td>
      <td>25000</td>
      <td>1.500000</td>
    </tr>
    <tr>
      <th>Purdy-Kunde</th>
      <th>Cedric Moss</th>
      <th>Fred Anderson</th>
      <td>163416</td>
      <td>30000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Stokes LLC</th>
      <th>Cedric Moss</th>
      <th>Fred Anderson</th>
      <td>239344</td>
      <td>7500</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Trantow-Barrows</th>
      <th>Craig Booker</th>
      <th>Debra Henley</th>
      <td>714466</td>
      <td>15000</td>
      <td>1.333333</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">])</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>Account</th>
      <th>Price</th>
      <th>Quantity</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">Debra Henley</th>
      <th>Craig Booker</th>
      <td>720237.0</td>
      <td>20000.000000</td>
      <td>1.250000</td>
    </tr>
    <tr>
      <th>Daniel Hilton</th>
      <td>194874.0</td>
      <td>38333.333333</td>
      <td>1.666667</td>
    </tr>
    <tr>
      <th>John Smith</th>
      <td>576220.0</td>
      <td>20000.000000</td>
      <td>1.500000</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Fred Anderson</th>
      <th>Cedric Moss</th>
      <td>196016.5</td>
      <td>27500.000000</td>
      <td>1.250000</td>
    </tr>
    <tr>
      <th>Wendy Yule</th>
      <td>614061.5</td>
      <td>44250.000000</td>
      <td>3.000000</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="values-설정">values 설정</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">],</span> <span class="n">values</span><span class="o">=</span><span class="s">"Price"</span><span class="p">)</span>
</code></pre></div></div>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>Price</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">Debra Henley</th>
      <th>Craig Booker</th>
      <td>20000.000000</td>
    </tr>
    <tr>
      <th>Daniel Hilton</th>
      <td>38333.333333</td>
    </tr>
    <tr>
      <th>John Smith</th>
      <td>20000.000000</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Fred Anderson</th>
      <th>Cedric Moss</th>
      <td>27500.000000</td>
    </tr>
    <tr>
      <th>Wendy Yule</th>
      <td>44250.000000</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="price-컬럼-sum-연산-적용">Price 컬럼 sum 연산 적용</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">],</span> <span class="n">values</span><span class="o">=</span><span class="s">"Price"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>Price</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">Debra Henley</th>
      <th>Craig Booker</th>
      <td>80000</td>
    </tr>
    <tr>
      <th>Daniel Hilton</th>
      <td>115000</td>
    </tr>
    <tr>
      <th>John Smith</th>
      <td>40000</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Fred Anderson</th>
      <th>Cedric Moss</th>
      <td>110000</td>
    </tr>
    <tr>
      <th>Wendy Yule</th>
      <td>177000</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">],</span> <span class="n">values</span><span class="o">=</span><span class="s">"Price"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">,</span> <span class="nb">len</span><span class="p">])</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th></th>
      <th>sum</th>
      <th>len</th>
    </tr>
    <tr>
      <th></th>
      <th></th>
      <th>Price</th>
      <th>Price</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">Debra Henley</th>
      <th>Craig Booker</th>
      <td>80000</td>
      <td>4</td>
    </tr>
    <tr>
      <th>Daniel Hilton</th>
      <td>115000</td>
      <td>3</td>
    </tr>
    <tr>
      <th>John Smith</th>
      <td>40000</td>
      <td>2</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Fred Anderson</th>
      <th>Cedric Moss</th>
      <td>110000</td>
      <td>4</td>
    </tr>
    <tr>
      <th>Wendy Yule</th>
      <td>177000</td>
      <td>4</td>
    </tr>
  </tbody>
</table>

<h3 id="columns-설정">columns 설정</h3>
<h4 id="product를-컬럼으로-지정">Product를 컬럼으로 지정</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">],</span> <span class="n">values</span><span class="o">=</span><span class="s">"Price"</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="s">"Product"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Product</th>
      <th>CPU</th>
      <th>Maintenance</th>
      <th>Monitor</th>
      <th>Software</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">Debra Henley</th>
      <th>Craig Booker</th>
      <td>65000.0</td>
      <td>5000.0</td>
      <td>NaN</td>
      <td>10000.0</td>
    </tr>
    <tr>
      <th>Daniel Hilton</th>
      <td>105000.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>10000.0</td>
    </tr>
    <tr>
      <th>John Smith</th>
      <td>35000.0</td>
      <td>5000.0</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Fred Anderson</th>
      <th>Cedric Moss</th>
      <td>95000.0</td>
      <td>5000.0</td>
      <td>NaN</td>
      <td>10000.0</td>
    </tr>
    <tr>
      <th>Wendy Yule</th>
      <td>165000.0</td>
      <td>7000.0</td>
      <td>5000.0</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="nan-값-설정--fill_value">Nan 값 설정 : fill_value</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">],</span> <span class="n">values</span><span class="o">=</span><span class="s">"Price"</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="s">"Product"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">,</span> <span class="n">fill_value</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Product</th>
      <th>CPU</th>
      <th>Maintenance</th>
      <th>Monitor</th>
      <th>Software</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">Debra Henley</th>
      <th>Craig Booker</th>
      <td>65000</td>
      <td>5000</td>
      <td>0</td>
      <td>10000</td>
    </tr>
    <tr>
      <th>Daniel Hilton</th>
      <td>105000</td>
      <td>0</td>
      <td>0</td>
      <td>10000</td>
    </tr>
    <tr>
      <th>John Smith</th>
      <td>35000</td>
      <td>5000</td>
      <td>0</td>
      <td>0</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Fred Anderson</th>
      <th>Cedric Moss</th>
      <td>95000</td>
      <td>5000</td>
      <td>0</td>
      <td>10000</td>
    </tr>
    <tr>
      <th>Wendy Yule</th>
      <td>165000</td>
      <td>7000</td>
      <td>5000</td>
      <td>0</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="2개-이상-index-values-설정">2개 이상 index, values 설정</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">,</span> <span class="s">"Product"</span><span class="p">],</span> <span class="n">values</span><span class="o">=</span><span class="p">[</span><span class="s">"Price"</span><span class="p">,</span> <span class="s">"Quantity"</span><span class="p">],</span> <span class="n">aggfunc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">,</span> <span class="n">fill_value</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th></th>
      <th>Price</th>
      <th>Quantity</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th>Product</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="7" valign="top">Debra Henley</th>
      <th rowspan="3" valign="top">Craig Booker</th>
      <th>CPU</th>
      <td>65000</td>
      <td>2</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>5000</td>
      <td>2</td>
    </tr>
    <tr>
      <th>Software</th>
      <td>10000</td>
      <td>1</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Daniel Hilton</th>
      <th>CPU</th>
      <td>105000</td>
      <td>4</td>
    </tr>
    <tr>
      <th>Software</th>
      <td>10000</td>
      <td>1</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">John Smith</th>
      <th>CPU</th>
      <td>35000</td>
      <td>1</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>5000</td>
      <td>2</td>
    </tr>
    <tr>
      <th rowspan="6" valign="top">Fred Anderson</th>
      <th rowspan="3" valign="top">Cedric Moss</th>
      <th>CPU</th>
      <td>95000</td>
      <td>3</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>5000</td>
      <td>1</td>
    </tr>
    <tr>
      <th>Software</th>
      <td>10000</td>
      <td>1</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">Wendy Yule</th>
      <th>CPU</th>
      <td>165000</td>
      <td>7</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>7000</td>
      <td>3</td>
    </tr>
    <tr>
      <th>Monitor</th>
      <td>5000</td>
      <td>2</td>
    </tr>
  </tbody>
</table>
</div>

<h3 id="aggfunc-2개-이상-설정">aggfunc 2개 이상 설정</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span>
    <span class="n">index</span><span class="o">=</span><span class="p">[</span><span class="s">"Manager"</span><span class="p">,</span> <span class="s">"Rep"</span><span class="p">,</span> <span class="s">"Product"</span><span class="p">],</span>
    <span class="n">values</span><span class="o">=</span><span class="p">[</span><span class="s">"Price"</span><span class="p">,</span> <span class="s">"Quantity"</span><span class="p">],</span>
    <span class="n">aggfunc</span><span class="o">=</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">],</span> <span class="n">fill_value</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
    <span class="n">margins</span><span class="o">=</span><span class="bp">True</span> <span class="c1"># 총계(All) 추가
</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th></th>
      <th></th>
      <th colspan="2" halign="left">sum</th>
      <th colspan="2" halign="left">mean</th>
    </tr>
    <tr>
      <th></th>
      <th></th>
      <th></th>
      <th>Price</th>
      <th>Quantity</th>
      <th>Price</th>
      <th>Quantity</th>
    </tr>
    <tr>
      <th>Manager</th>
      <th>Rep</th>
      <th>Product</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="7" valign="top">Debra Henley</th>
      <th rowspan="3" valign="top">Craig Booker</th>
      <th>CPU</th>
      <td>65000</td>
      <td>2</td>
      <td>32500.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>5000</td>
      <td>2</td>
      <td>5000.000000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Software</th>
      <td>10000</td>
      <td>1</td>
      <td>10000.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">Daniel Hilton</th>
      <th>CPU</th>
      <td>105000</td>
      <td>4</td>
      <td>52500.000000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>Software</th>
      <td>10000</td>
      <td>1</td>
      <td>10000.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">John Smith</th>
      <th>CPU</th>
      <td>35000</td>
      <td>1</td>
      <td>35000.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>5000</td>
      <td>2</td>
      <td>5000.000000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th rowspan="6" valign="top">Fred Anderson</th>
      <th rowspan="3" valign="top">Cedric Moss</th>
      <th>CPU</th>
      <td>95000</td>
      <td>3</td>
      <td>47500.000000</td>
      <td>1.500000</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>5000</td>
      <td>1</td>
      <td>5000.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>Software</th>
      <td>10000</td>
      <td>1</td>
      <td>10000.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">Wendy Yule</th>
      <th>CPU</th>
      <td>165000</td>
      <td>7</td>
      <td>82500.000000</td>
      <td>3.500000</td>
    </tr>
    <tr>
      <th>Maintenance</th>
      <td>7000</td>
      <td>3</td>
      <td>7000.000000</td>
      <td>3.000000</td>
    </tr>
    <tr>
      <th>Monitor</th>
      <td>5000</td>
      <td>2</td>
      <td>5000.000000</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>All</th>
      <th></th>
      <th></th>
      <td>522000</td>
      <td>30</td>
      <td>30705.882353</td>
      <td>1.764706</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="서울시-범죄-현황-데이터-정리">서울시 범죄 현황 데이터 정리</h3>
<h4 id="데이터-정리">데이터 정리</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span> <span class="o">=</span> <span class="n">crime_raw_data</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span>
    <span class="n">crime_raw_data</span><span class="p">,</span>
    <span class="n">index</span><span class="o">=</span><span class="s">"구분"</span><span class="p">,</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">"죄종"</span><span class="p">,</span> <span class="s">"발생검거"</span><span class="p">],</span>
    <span class="n">aggfunc</span><span class="o">=</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">]</span>
<span class="p">)</span>
<span class="n">crime_station</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead tr th {
        text-align: left;
    }

    .dataframe thead tr:last-of-type th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th colspan="10" halign="left">sum</th>
    </tr>
    <tr>
      <th></th>
      <th colspan="10" halign="left">건수</th>
    </tr>
    <tr>
      <th>죄종</th>
      <th colspan="2" halign="left">강간</th>
      <th colspan="2" halign="left">강도</th>
      <th colspan="2" halign="left">살인</th>
      <th colspan="2" halign="left">절도</th>
      <th colspan="2" halign="left">폭력</th>
    </tr>
    <tr>
      <th>발생검거</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
    </tr>
    <tr>
      <th>구분</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남</th>
      <td>269.0</td>
      <td>339.0</td>
      <td>26.0</td>
      <td>24.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>1129.0</td>
      <td>2438.0</td>
      <td>2096.0</td>
      <td>2336.0</td>
    </tr>
    <tr>
      <th>강동</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
    </tr>
    <tr>
      <th>강북</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
    </tr>
    <tr>
      <th>강서</th>
      <td>239.0</td>
      <td>275.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>9.0</td>
      <td>1070.0</td>
      <td>1952.0</td>
      <td>2768.0</td>
      <td>3204.0</td>
    </tr>
    <tr>
      <th>관악</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
    </tr>
  </tbody>
</table>
</div>

<ul>
  <li>경찰서 이름을 index하도록 정리</li>
  <li>default가 평균(mean)</li>
  <li>column이 multi로 잡힌 모습</li>
</ul>

<hr />

<h4 id="다중-컬럼에서-특정-컬럼-제거">다중 컬럼에서 특정 컬럼 제거</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span> <span class="c1"># Multiindex
</span>
<span class="o">=&gt;</span>

<span class="n">MultiIndex</span><span class="p">([(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'강간'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'강간'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'강도'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'강도'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'살인'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'살인'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'절도'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'절도'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'폭력'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'sum'</span><span class="p">,</span> <span class="s">'건수'</span><span class="p">,</span> <span class="s">'폭력'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">)],</span>
           <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="bp">None</span><span class="p">,</span> <span class="s">'죄종'</span><span class="p">,</span> <span class="s">'발생검거'</span><span class="p">])</span>
</code></pre></div></div>
<hr />
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">[</span><span class="s">"sum"</span><span class="p">,</span> <span class="s">"건수"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"검거"</span><span class="p">][:</span><span class="mi">5</span><span class="p">]</span>

<span class="o">=&gt;</span>

<span class="n">구분</span>
<span class="n">강남</span>    <span class="mf">26.0</span>
<span class="n">강동</span>    <span class="mf">13.0</span>
<span class="n">강북</span>     <span class="mf">4.0</span>
<span class="n">강서</span>    <span class="mf">10.0</span>
<span class="n">관악</span>    <span class="mf">10.0</span>
<span class="n">Name</span><span class="p">:</span> <span class="p">(</span><span class="nb">sum</span><span class="p">,</span> <span class="n">건수</span><span class="p">,</span> <span class="n">강도</span><span class="p">,</span> <span class="n">검거</span><span class="p">),</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>
</code></pre></div></div>
<hr />
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">droplevel</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span> <span class="c1">#다중 컬럼에서 특정 컬럼 제거
</span><span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span>

<span class="o">=&gt;</span>

<span class="n">MultiIndex</span><span class="p">([(</span><span class="s">'강간'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'강간'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'강도'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'강도'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'살인'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'살인'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'절도'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'절도'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'폭력'</span><span class="p">,</span> <span class="s">'검거'</span><span class="p">),</span>
            <span class="p">(</span><span class="s">'폭력'</span><span class="p">,</span> <span class="s">'발생'</span><span class="p">)],</span>
           <span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s">'죄종'</span><span class="p">,</span> <span class="s">'발생검거'</span><span class="p">])</span>
</code></pre></div></div>
<hr />
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr>
      <th>죄종</th>
      <th colspan="2" halign="left">강간</th>
      <th colspan="2" halign="left">강도</th>
      <th colspan="2" halign="left">살인</th>
      <th colspan="2" halign="left">절도</th>
      <th colspan="2" halign="left">폭력</th>
    </tr>
    <tr>
      <th>발생검거</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
    </tr>
    <tr>
      <th>구분</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남</th>
      <td>269.0</td>
      <td>339.0</td>
      <td>26.0</td>
      <td>24.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>1129.0</td>
      <td>2438.0</td>
      <td>2096.0</td>
      <td>2336.0</td>
    </tr>
    <tr>
      <th>강동</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
    </tr>
    <tr>
      <th>강북</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
    </tr>
    <tr>
      <th>강서</th>
      <td>239.0</td>
      <td>275.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>9.0</td>
      <td>1070.0</td>
      <td>1952.0</td>
      <td>2768.0</td>
      <td>3204.0</td>
    </tr>
    <tr>
      <th>관악</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">.</span><span class="n">index</span>

<span class="o">=&gt;</span>

<span class="n">Index</span><span class="p">([</span><span class="s">'강남'</span><span class="p">,</span> <span class="s">'강동'</span><span class="p">,</span> <span class="s">'강북'</span><span class="p">,</span> <span class="s">'강서'</span><span class="p">,</span> <span class="s">'관악'</span><span class="p">,</span> <span class="s">'광진'</span><span class="p">,</span> <span class="s">'구로'</span><span class="p">,</span> <span class="s">'금천'</span><span class="p">,</span> <span class="s">'남대문'</span><span class="p">,</span> <span class="s">'노원'</span><span class="p">,</span> <span class="s">'도봉'</span><span class="p">,</span>
       <span class="s">'동대문'</span><span class="p">,</span> <span class="s">'동작'</span><span class="p">,</span> <span class="s">'마포'</span><span class="p">,</span> <span class="s">'방배'</span><span class="p">,</span> <span class="s">'서대문'</span><span class="p">,</span> <span class="s">'서부'</span><span class="p">,</span> <span class="s">'서초'</span><span class="p">,</span> <span class="s">'성동'</span><span class="p">,</span> <span class="s">'성북'</span><span class="p">,</span> <span class="s">'송파'</span><span class="p">,</span> <span class="s">'수서'</span><span class="p">,</span>
       <span class="s">'양천'</span><span class="p">,</span> <span class="s">'영등포'</span><span class="p">,</span> <span class="s">'용산'</span><span class="p">,</span> <span class="s">'은평'</span><span class="p">,</span> <span class="s">'종로'</span><span class="p">,</span> <span class="s">'종암'</span><span class="p">,</span> <span class="s">'중랑'</span><span class="p">,</span> <span class="s">'중부'</span><span class="p">,</span> <span class="s">'혜화'</span><span class="p">],</span>
      <span class="n">dtype</span><span class="o">=</span><span class="s">'object'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'구분'</span><span class="p">)</span>
</code></pre></div></div>
<p>→ 경찰서 이름으로 해당 구 이름을 알아내야 함</p>

<hr />

<h2 id="google-maps-api-설치-및-테스트">Google Maps API 설치 및 테스트</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">googlemaps</span>
<span class="n">gmaps_key</span> <span class="o">=</span> <span class="s">"~"</span>
<span class="n">gmaps</span> <span class="o">=</span> <span class="n">googlemaps</span><span class="p">.</span><span class="n">Client</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">gmaps_key</span><span class="p">)</span>
<span class="n">gmaps</span><span class="p">.</span><span class="n">geocode</span><span class="p">(</span><span class="s">"서울영등포경찰서"</span><span class="p">,</span> <span class="n">language</span><span class="o">=</span><span class="s">"ko"</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="p">[{</span><span class="s">'address_components'</span><span class="p">:</span> <span class="p">[{</span><span class="s">'long_name'</span><span class="p">:</span> <span class="s">'608'</span><span class="p">,</span>
    <span class="s">'short_name'</span><span class="p">:</span> <span class="s">'608'</span><span class="p">,</span>
    <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'premise'</span><span class="p">]},</span>
   <span class="p">{</span><span class="s">'long_name'</span><span class="p">:</span> <span class="s">'국회대로'</span><span class="p">,</span>
    <span class="s">'short_name'</span><span class="p">:</span> <span class="s">'국회대로'</span><span class="p">,</span>
    <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'political'</span><span class="p">,</span> <span class="s">'sublocality'</span><span class="p">,</span> <span class="s">'sublocality_level_4'</span><span class="p">]},</span>
   <span class="p">{</span><span class="s">'long_name'</span><span class="p">:</span> <span class="s">'영등포구'</span><span class="p">,</span>
    <span class="s">'short_name'</span><span class="p">:</span> <span class="s">'영등포구'</span><span class="p">,</span>
    <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'political'</span><span class="p">,</span> <span class="s">'sublocality'</span><span class="p">,</span> <span class="s">'sublocality_level_1'</span><span class="p">]},</span>
   <span class="p">{</span><span class="s">'long_name'</span><span class="p">:</span> <span class="s">'서울특별시'</span><span class="p">,</span>
    <span class="s">'short_name'</span><span class="p">:</span> <span class="s">'서울특별시'</span><span class="p">,</span>
    <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'administrative_area_level_1'</span><span class="p">,</span> <span class="s">'political'</span><span class="p">]},</span>
   <span class="p">{</span><span class="s">'long_name'</span><span class="p">:</span> <span class="s">'대한민국'</span><span class="p">,</span>
    <span class="s">'short_name'</span><span class="p">:</span> <span class="s">'KR'</span><span class="p">,</span>
    <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'country'</span><span class="p">,</span> <span class="s">'political'</span><span class="p">]},</span>
   <span class="p">{</span><span class="s">'long_name'</span><span class="p">:</span> <span class="s">'150-043'</span><span class="p">,</span>
    <span class="s">'short_name'</span><span class="p">:</span> <span class="s">'150-043'</span><span class="p">,</span>
    <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'postal_code'</span><span class="p">]}],</span>
  <span class="s">'formatted_address'</span><span class="p">:</span> <span class="s">'대한민국 서울특별시 영등포구 국회대로 608'</span><span class="p">,</span>
  <span class="s">'geometry'</span><span class="p">:</span> <span class="p">{</span><span class="s">'location'</span><span class="p">:</span> <span class="p">{</span><span class="s">'lat'</span><span class="p">:</span> <span class="mf">37.5260441</span><span class="p">,</span> <span class="s">'lng'</span><span class="p">:</span> <span class="mf">126.9008091</span><span class="p">},</span>
   <span class="s">'location_type'</span><span class="p">:</span> <span class="s">'ROOFTOP'</span><span class="p">,</span>
   <span class="s">'viewport'</span><span class="p">:</span> <span class="p">{</span><span class="s">'northeast'</span><span class="p">:</span> <span class="p">{</span><span class="s">'lat'</span><span class="p">:</span> <span class="mf">37.5273930802915</span><span class="p">,</span>
     <span class="s">'lng'</span><span class="p">:</span> <span class="mf">126.9021580802915</span><span class="p">},</span>
    <span class="s">'southwest'</span><span class="p">:</span> <span class="p">{</span><span class="s">'lat'</span><span class="p">:</span> <span class="mf">37.5246951197085</span><span class="p">,</span> <span class="s">'lng'</span><span class="p">:</span> <span class="mf">126.8994601197085</span><span class="p">}}},</span>
  <span class="s">'partial_match'</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>
  <span class="s">'place_id'</span><span class="p">:</span> <span class="s">'ChIJ1TimJLaffDURptXOs0Tj6sY'</span><span class="p">,</span>
  <span class="s">'plus_code'</span><span class="p">:</span> <span class="p">{</span><span class="s">'compound_code'</span><span class="p">:</span> <span class="s">'GWG2+C8 대한민국 서울특별시'</span><span class="p">,</span>
   <span class="s">'global_code'</span><span class="p">:</span> <span class="s">'8Q98GWG2+C8'</span><span class="p">},</span>
  <span class="s">'types'</span><span class="p">:</span> <span class="p">[</span><span class="s">'establishment'</span><span class="p">,</span> <span class="s">'point_of_interest'</span><span class="p">,</span> <span class="s">'police'</span><span class="p">]}]</span>
</code></pre></div></div>

<h4 id="출처">출처</h4>
<p>서울시 관서별 5대 범죄 현황, https://www.data.go.kr/data/15054738/fileData.do?recommendDataYn=Y</p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[Project 02. Analysis Seoul Crime 프로젝트 개요]]></summary></entry><entry><title type="html">Project 2 - 서울시 범죄 현황 데이터 분석 (2)</title><link href="https://yy2-hi.github.io/dataanalysis/crimeanalysis2/" rel="alternate" type="text/html" title="Project 2 - 서울시 범죄 현황 데이터 분석 (2)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/crimeanalysis2</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/crimeanalysis2/"><![CDATA[<h2 id="-pandas에-잘-맞춰진-반복문용-명령어--iterrows">📖 Pandas에 잘 맞춰진 반복문용 명령어 : iterrows()</h2>

<ul>
  <li>Pandas 데이터 프레임 대부분은 2차원이므로 for문을 사용하면 가독률 <span style="color: red"><strong>↓</strong></span>
∴ Pandas 데이터 프레임으로 반복문을 만들때는 iterrows()를 권장 (인덱스와 내용으로 구분)
—
    <h2 id="google-maps를-이용한-데이터-정리">Google Maps를 이용한 데이터 정리</h2>
  </li>
</ul>

<h4 id="구별-lat-lng-컬럼-추가">구별, lat, lng 컬럼 추가</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">[</span><span class="s">"구별"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">nan</span>
<span class="n">crime_station</span><span class="p">[</span><span class="s">"lat"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">nan</span>
<span class="n">crime_station</span><span class="p">[</span><span class="s">"lng"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">nan</span>

<span class="n">crime_station</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr>
      <th>죄종</th>
      <th colspan="2" halign="left">강간</th>
      <th colspan="2" halign="left">강도</th>
      <th colspan="2" halign="left">살인</th>
      <th colspan="2" halign="left">절도</th>
      <th colspan="2" halign="left">폭력</th>
      <th>구별</th>
      <th>lat</th>
      <th>lng</th>
    </tr>
    <tr>
      <th>발생검거</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th>검거</th>
      <th>발생</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
    <tr>
      <th>구분</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남</th>
      <td>269.0</td>
      <td>339.0</td>
      <td>26.0</td>
      <td>24.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>1129.0</td>
      <td>2438.0</td>
      <td>2096.0</td>
      <td>2336.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>강동</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>강북</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>강서</th>
      <td>239.0</td>
      <td>275.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>9.0</td>
      <td>1070.0</td>
      <td>1952.0</td>
      <td>2768.0</td>
      <td>3204.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>관악</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="iterrows">iterrows()</h4>
<ul>
  <li>경찰서 이름으로 소속된 구 이름 얻기</li>
  <li>구 이름, 위도, 경도 정보 저장</li>
  <li>반복문으로 NaN 채우기</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>

<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">rows</span> <span class="ow">in</span> <span class="n">crime_station</span><span class="p">.</span><span class="n">iterrows</span><span class="p">():</span>
    <span class="n">station_name</span> <span class="o">=</span> <span class="s">"서울"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">idx</span><span class="p">)</span> <span class="o">+</span> <span class="s">"경찰서"</span>
    <span class="n">tmp</span> <span class="o">=</span> <span class="n">gmaps</span><span class="p">.</span><span class="n">geocode</span><span class="p">(</span><span class="n">station_name</span><span class="p">,</span> <span class="n">language</span><span class="o">=</span><span class="s">"ko"</span><span class="p">)</span>
    
    <span class="n">tmp_gu</span> <span class="o">=</span> <span class="n">tmp</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">get</span><span class="p">(</span><span class="s">"formatted_address"</span><span class="p">)</span>
    
    <span class="n">lat</span> <span class="o">=</span> <span class="n">tmp</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">get</span><span class="p">(</span><span class="s">"geometry"</span><span class="p">)[</span><span class="s">"location"</span><span class="p">][</span><span class="s">"lat"</span><span class="p">]</span>
    <span class="n">lng</span> <span class="o">=</span> <span class="n">tmp</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">get</span><span class="p">(</span><span class="s">"geometry"</span><span class="p">)[</span><span class="s">"location"</span><span class="p">][</span><span class="s">"lng"</span><span class="p">]</span>
    
    <span class="k">if</span> <span class="n">count</span> <span class="o">==</span> <span class="mi">4</span><span class="p">:</span>
        <span class="n">crime_station</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">,</span> <span class="s">"구별"</span><span class="p">]</span> <span class="o">=</span> <span class="s">"관악구"</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">crime_station</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">,</span> <span class="s">"lat"</span><span class="p">]</span> <span class="o">=</span> <span class="n">lat</span>
        <span class="n">crime_station</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">,</span> <span class="s">"lng"</span><span class="p">]</span> <span class="o">=</span> <span class="n">lng</span>
        <span class="n">crime_station</span><span class="p">.</span><span class="n">loc</span><span class="p">[</span><span class="n">idx</span><span class="p">,</span> <span class="s">"구별"</span><span class="p">]</span> <span class="o">=</span> <span class="n">tmp_gu</span><span class="p">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">2</span><span class="p">]</span>
        
    <span class="k">print</span><span class="p">(</span><span class="n">count</span><span class="p">)</span>
    <span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
    
<span class="n">crime_station</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
      <th>구별</th>
      <th>lat</th>
      <th>lng</th>
    </tr>
    <tr>
      <th>구분</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남</th>
      <td>269.0</td>
      <td>339.0</td>
      <td>26.0</td>
      <td>24.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>1129.0</td>
      <td>2438.0</td>
      <td>2096.0</td>
      <td>2336.0</td>
      <td>강남구</td>
      <td>37.509435</td>
      <td>127.066958</td>
    </tr>
    <tr>
      <th>강동</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
      <td>강동구</td>
      <td>37.528511</td>
      <td>127.126822</td>
    </tr>
    <tr>
      <th>강북</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
      <td>강북구</td>
      <td>37.637304</td>
      <td>127.027340</td>
    </tr>
    <tr>
      <th>강서</th>
      <td>239.0</td>
      <td>275.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>9.0</td>
      <td>1070.0</td>
      <td>1952.0</td>
      <td>2768.0</td>
      <td>3204.0</td>
      <td>양천구</td>
      <td>37.539783</td>
      <td>126.829997</td>
    </tr>
    <tr>
      <th>관악</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
      <td>관악구</td>
      <td>37.474395</td>
      <td>126.951349</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="데이터-정리">데이터 정리</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="mi">2</span><span class="p">]</span>
<span class="nb">len</span><span class="p">(</span><span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>

<span class="s">'강도검거'</span>
<span class="mi">13</span>

<span class="n">tmp</span> <span class="o">=</span> <span class="p">[</span>
    <span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">0</span><span class="p">)[</span><span class="n">n</span><span class="p">]</span> <span class="o">+</span> <span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="n">n</span><span class="p">]</span>
    <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">get_level_values</span><span class="p">(</span><span class="mi">0</span><span class="p">)))</span>
<span class="p">]</span>
<span class="n">tmp</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span>

<span class="p">([</span><span class="s">'강간검거'</span><span class="p">,</span>
  <span class="s">'강간발생'</span><span class="p">,</span>
  <span class="s">'강도검거'</span><span class="p">,</span>
  <span class="s">'강도발생'</span><span class="p">,</span>
  <span class="s">'살인검거'</span><span class="p">,</span>
  <span class="s">'살인발생'</span><span class="p">,</span>
  <span class="s">'절도검거'</span><span class="p">,</span>
  <span class="s">'절도발생'</span><span class="p">,</span>
  <span class="s">'폭력검거'</span><span class="p">,</span>
  <span class="s">'폭력발생'</span><span class="p">,</span>
  <span class="s">'구별'</span><span class="p">,</span>
  <span class="s">'lat'</span><span class="p">,</span>
  <span class="s">'lng'</span><span class="p">],</span>
 <span class="mi">13</span><span class="p">)</span>
 
 <span class="n">crime_station</span><span class="p">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">tmp</span>
 <span class="n">crime_station</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
      <th>구별</th>
      <th>lat</th>
      <th>lng</th>
    </tr>
    <tr>
      <th>구분</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남</th>
      <td>269.0</td>
      <td>339.0</td>
      <td>26.0</td>
      <td>24.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>1129.0</td>
      <td>2438.0</td>
      <td>2096.0</td>
      <td>2336.0</td>
      <td>강남구</td>
      <td>37.509435</td>
      <td>127.066958</td>
    </tr>
    <tr>
      <th>강동</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
      <td>강동구</td>
      <td>37.528511</td>
      <td>127.126822</td>
    </tr>
    <tr>
      <th>강북</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
      <td>강북구</td>
      <td>37.637304</td>
      <td>127.027340</td>
    </tr>
    <tr>
      <th>강서</th>
      <td>239.0</td>
      <td>275.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>9.0</td>
      <td>1070.0</td>
      <td>1952.0</td>
      <td>2768.0</td>
      <td>3204.0</td>
      <td>강서구</td>
      <td>37.539783</td>
      <td>126.829997</td>
    </tr>
    <tr>
      <th>관악</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
      <td>관악구</td>
      <td>37.474395</td>
      <td>126.951349</td>
    </tr>
  </tbody>
</table>
<hr />

<h4 id="데이터-저장">데이터 저장</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_station</span><span class="p">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s">"../data/02. crime_in_Seoul_raw.csv"</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">","</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h2 id="구별-데이터로-정리">구별 데이터로 정리</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_station</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span>
    <span class="s">"../data/02. crime_in_Seoul_raw.csv"</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">)</span> <span class="c1"># index_col "구분"을 인덱스 컬럼으로 설정
</span><span class="n">crime_anal_station</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
      <th>구별</th>
      <th>lat</th>
      <th>lng</th>
    </tr>
    <tr>
      <th>구분</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남</th>
      <td>269.0</td>
      <td>339.0</td>
      <td>26.0</td>
      <td>24.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>1129.0</td>
      <td>2438.0</td>
      <td>2096.0</td>
      <td>2336.0</td>
      <td>강남구</td>
      <td>37.509435</td>
      <td>127.066958</td>
    </tr>
    <tr>
      <th>강동</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
      <td>강동구</td>
      <td>37.528511</td>
      <td>127.126822</td>
    </tr>
    <tr>
      <th>강북</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
      <td>강북구</td>
      <td>37.637304</td>
      <td>127.027340</td>
    </tr>
    <tr>
      <th>강서</th>
      <td>239.0</td>
      <td>275.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>10.0</td>
      <td>9.0</td>
      <td>1070.0</td>
      <td>1952.0</td>
      <td>2768.0</td>
      <td>3204.0</td>
      <td>양천구</td>
      <td>37.539783</td>
      <td>126.829997</td>
    </tr>
    <tr>
      <th>관악</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
      <td>관악구</td>
      <td>37.474395</td>
      <td>126.951349</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_gu</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">crime_anal_station</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="s">"구별"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">)</span>

<span class="k">del</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="s">"lat"</span><span class="p">]</span>
<span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">drop</span><span class="p">(</span><span class="s">"lng"</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

<span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>413.0</td>
      <td>516.0</td>
      <td>42.0</td>
      <td>39.0</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1918.0</td>
      <td>3587.0</td>
      <td>3527.0</td>
      <td>4002.0</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>234.0</td>
      <td>279.0</td>
      <td>6.0</td>
      <td>11.0</td>
      <td>4.0</td>
      <td>4.0</td>
      <td>1057.0</td>
      <td>2636.0</td>
      <td>2011.0</td>
      <td>2392.0</td>
    </tr>
  </tbody>
</table>
</div>

<h4 id="검거율-생성">검거율 생성</h4>
<ul>
  <li>하나의 컬럼을 다른 컬럼으로 나누기
```py
crime_anal_gu[“강도검거”] / crime_anal_gu[“강도발생”]</li>
</ul>

<p>구별
강남구     1.076923
강동구     0.928571
강북구     0.800000
관악구     0.833333
광진구     0.545455
구로구     1.300000
금천구     1.000000
노원구     1.500000
도봉구     1.000000
동대문구    1.200000
동작구     1.000000
마포구     1.750000
서대문구    0.800000
서초구     0.769231
성동구     1.666667
성북구     1.000000
송파구     0.800000
양천구     1.000000
영등포구    0.736842
용산구     1.111111
은평구     0.777778
종로구     0.750000
중구      0.875000
중랑구     1.000000
dtype: float64</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>---
- 다수의 컬럼을 다른 컬럼으로 나누기
```py
crime_anal_gu[["강도검거", "살인검거"]].div(crime_anal_gu["강도발생"], axis=0).head(3)
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강도검거</th>
      <th>살인검거</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>1.076923</td>
      <td>0.128205</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.928571</td>
      <td>0.357143</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.800000</td>
      <td>1.200000</td>
    </tr>
  </tbody>
</table>
</div>

<hr />

<ul>
  <li>다수의 컬럼을 다수의 컬럼으로 각각 나누기
```py
num =[“강간검거”, “강도검거”, “살인검거”, “절도검거”, “폭력검거”]
den =[“강간발생”, “강도발생”, “살인발생”, “절도발생”, “폭력발생”]</li>
</ul>

<p>crime_anal_gu[num].div(crime_anal_gu[den].values).head()</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;div&gt;
&lt;style scoped=""&gt;
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
&lt;/style&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;강간검거&lt;/th&gt;
      &lt;th&gt;강도검거&lt;/th&gt;
      &lt;th&gt;살인검거&lt;/th&gt;
      &lt;th&gt;절도검거&lt;/th&gt;
      &lt;th&gt;폭력검거&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;구별&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;강남구&lt;/th&gt;
      &lt;td&gt;0.800388&lt;/td&gt;
      &lt;td&gt;1.076923&lt;/td&gt;
      &lt;td&gt;1.000000&lt;/td&gt;
      &lt;td&gt;0.534709&lt;/td&gt;
      &lt;td&gt;0.881309&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;강동구&lt;/th&gt;
      &lt;td&gt;0.950000&lt;/td&gt;
      &lt;td&gt;0.928571&lt;/td&gt;
      &lt;td&gt;1.250000&lt;/td&gt;
      &lt;td&gt;0.514253&lt;/td&gt;
      &lt;td&gt;0.869960&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;강북구&lt;/th&gt;
      &lt;td&gt;0.732719&lt;/td&gt;
      &lt;td&gt;0.800000&lt;/td&gt;
      &lt;td&gt;0.857143&lt;/td&gt;
      &lt;td&gt;0.549918&lt;/td&gt;
      &lt;td&gt;0.893449&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;관악구&lt;/th&gt;
      &lt;td&gt;0.819876&lt;/td&gt;
      &lt;td&gt;0.833333&lt;/td&gt;
      &lt;td&gt;1.166667&lt;/td&gt;
      &lt;td&gt;0.445554&lt;/td&gt;
      &lt;td&gt;0.836785&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;광진구&lt;/th&gt;
      &lt;td&gt;0.838710&lt;/td&gt;
      &lt;td&gt;0.545455&lt;/td&gt;
      &lt;td&gt;1.000000&lt;/td&gt;
      &lt;td&gt;0.400986&lt;/td&gt;
      &lt;td&gt;0.840719&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;

```py
target = ["강간검거율", "강도검거율", "살인검거율", "절도검거율", "폭력검거율"]

num =["강간검거", "강도검거", "살인검거", "절도검거", "폭력검거"]
den =["강간발생", "강도발생", "살인발생", "절도발생", "폭력발생"]

crime_anal_gu[target] = crime_anal_gu[num].div(crime_anal_gu[den].values) * 100
crime_anal_gu.head()
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>413.0</td>
      <td>516.0</td>
      <td>42.0</td>
      <td>39.0</td>
      <td>5.0</td>
      <td>5.0</td>
      <td>1918.0</td>
      <td>3587.0</td>
      <td>3527.0</td>
      <td>4002.0</td>
      <td>80.038760</td>
      <td>107.692308</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>152.0</td>
      <td>160.0</td>
      <td>13.0</td>
      <td>14.0</td>
      <td>5.0</td>
      <td>4.0</td>
      <td>902.0</td>
      <td>1754.0</td>
      <td>2201.0</td>
      <td>2530.0</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>125.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>159.0</td>
      <td>217.0</td>
      <td>4.0</td>
      <td>5.0</td>
      <td>6.0</td>
      <td>7.0</td>
      <td>672.0</td>
      <td>1222.0</td>
      <td>2482.0</td>
      <td>2778.0</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>264.0</td>
      <td>322.0</td>
      <td>10.0</td>
      <td>12.0</td>
      <td>7.0</td>
      <td>6.0</td>
      <td>937.0</td>
      <td>2103.0</td>
      <td>2707.0</td>
      <td>3235.0</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>116.666667</td>
      <td>44.555397</td>
      <td>83.678516</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>234.0</td>
      <td>279.0</td>
      <td>6.0</td>
      <td>11.0</td>
      <td>4.0</td>
      <td>4.0</td>
      <td>1057.0</td>
      <td>2636.0</td>
      <td>2011.0</td>
      <td>2392.0</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="필요없는-칼럼-제거">필요없는 칼럼 제거</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">del</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="s">"강간검거"</span><span class="p">]</span>
<span class="k">del</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="s">"강도검거"</span><span class="p">]</span>
<span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">drop</span><span class="p">([</span><span class="s">"살인검거"</span><span class="p">,</span> <span class="s">"절도검거"</span><span class="p">,</span> <span class="s">"폭력검거"</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

<span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간발생</th>
      <th>강도발생</th>
      <th>살인발생</th>
      <th>절도발생</th>
      <th>폭력발생</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>516.0</td>
      <td>39.0</td>
      <td>5.0</td>
      <td>3587.0</td>
      <td>4002.0</td>
      <td>80.038760</td>
      <td>107.692308</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>160.0</td>
      <td>14.0</td>
      <td>4.0</td>
      <td>1754.0</td>
      <td>2530.0</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>125.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>217.0</td>
      <td>5.0</td>
      <td>7.0</td>
      <td>1222.0</td>
      <td>2778.0</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>322.0</td>
      <td>12.0</td>
      <td>6.0</td>
      <td>2103.0</td>
      <td>3235.0</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>116.666667</td>
      <td>44.555397</td>
      <td>83.678516</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>279.0</td>
      <td>11.0</td>
      <td>4.0</td>
      <td>2636.0</td>
      <td>2392.0</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
    </tr>
  </tbody>
</table>
</div>

<hr />

<h4 id="검거율-100보다-큰-숫자-변경">검거율 100보다 큰 숫자 변경</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_gu</span><span class="p">[</span><span class="n">crime_anal_gu</span><span class="p">[</span><span class="n">target</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">100</span><span class="p">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간발생</th>
      <th>강도발생</th>
      <th>살인발생</th>
      <th>절도발생</th>
      <th>폭력발생</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>516.0</td>
      <td>39.0</td>
      <td>5.0</td>
      <td>3587.0</td>
      <td>4002.0</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>160.0</td>
      <td>14.0</td>
      <td>4.0</td>
      <td>1754.0</td>
      <td>2530.0</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>217.0</td>
      <td>5.0</td>
      <td>7.0</td>
      <td>1222.0</td>
      <td>2778.0</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>322.0</td>
      <td>12.0</td>
      <td>6.0</td>
      <td>2103.0</td>
      <td>3235.0</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>279.0</td>
      <td>11.0</td>
      <td>4.0</td>
      <td>2636.0</td>
      <td>2392.0</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="컬럼-이름-변경">컬럼 이름 변경</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">rename</span><span class="p">(</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">"강간발생"</span><span class="p">:</span> <span class="s">"강간"</span><span class="p">,</span> <span class="s">"강도발생"</span><span class="p">:</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"살인발생"</span><span class="p">:</span> <span class="s">"살인"</span><span class="p">,</span> <span class="s">"절도발생"</span><span class="p">:</span> <span class="s">"절도"</span><span class="p">,</span> <span class="s">"폭력발생"</span><span class="p">:</span> <span class="s">"폭력"</span><span class="p">},</span>
    <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간</th>
      <th>강도</th>
      <th>살인</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>516.0</td>
      <td>39.0</td>
      <td>5.0</td>
      <td>3587.0</td>
      <td>4002.0</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>160.0</td>
      <td>14.0</td>
      <td>4.0</td>
      <td>1754.0</td>
      <td>2530.0</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>217.0</td>
      <td>5.0</td>
      <td>7.0</td>
      <td>1222.0</td>
      <td>2778.0</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>322.0</td>
      <td>12.0</td>
      <td>6.0</td>
      <td>2103.0</td>
      <td>3235.0</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>279.0</td>
      <td>11.0</td>
      <td>4.0</td>
      <td>2636.0</td>
      <td>2392.0</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
    </tr>
  </tbody>
</table>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[📖 Pandas에 잘 맞춰진 반복문용 명령어 : iterrows()]]></summary></entry><entry><title type="html">Project 2 - 서울시 범죄 현황 데이터 분석 (3)</title><link href="https://yy2-hi.github.io/dataanalysis/crimeanalysis3/" rel="alternate" type="text/html" title="Project 2 - 서울시 범죄 현황 데이터 분석 (3)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/crimeanalysis3</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/crimeanalysis3/"><![CDATA[<h2 id="범죄-데이터-정렬을-위한-데이터-정리">범죄 데이터 정렬을 위한 데이터 정리</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_gu</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>강간</th>
      <th>강도</th>
      <th>살인</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>516.0</td>
      <td>39.0</td>
      <td>5.0</td>
      <td>3587.0</td>
      <td>4002.0</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>160.0</td>
      <td>14.0</td>
      <td>4.0</td>
      <td>1754.0</td>
      <td>2530.0</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>217.0</td>
      <td>5.0</td>
      <td>7.0</td>
      <td>1222.0</td>
      <td>2778.0</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>322.0</td>
      <td>12.0</td>
      <td>6.0</td>
      <td>2103.0</td>
      <td>3235.0</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>279.0</td>
      <td>11.0</td>
      <td>4.0</td>
      <td>2636.0</td>
      <td>2392.0</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="정규화">정규화</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#최대값 1, 최소값 0
</span><span class="n">crime_anal_gu</span><span class="p">[</span><span class="s">"강도"</span><span class="p">]</span> <span class="o">/</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="s">"강도"</span><span class="p">].</span><span class="nb">max</span><span class="p">()</span>

<span class="n">구별</span>
<span class="n">강남구</span>     <span class="mf">1.000000</span>
<span class="n">강동구</span>     <span class="mf">0.358974</span>
<span class="n">강북구</span>     <span class="mf">0.128205</span>
<span class="n">관악구</span>     <span class="mf">0.307692</span>
<span class="n">광진구</span>     <span class="mf">0.282051</span>
<span class="n">구로구</span>     <span class="mf">0.256410</span>
<span class="n">금천구</span>     <span class="mf">0.179487</span>
<span class="n">노원구</span>     <span class="mf">0.153846</span>
<span class="n">도봉구</span>     <span class="mf">0.128205</span>
<span class="n">동대문구</span>    <span class="mf">0.256410</span>
<span class="n">동작구</span>     <span class="mf">0.179487</span>
<span class="n">마포구</span>     <span class="mf">0.102564</span>
<span class="n">서대문구</span>    <span class="mf">0.128205</span>
<span class="n">서초구</span>     <span class="mf">0.333333</span>
<span class="n">성동구</span>     <span class="mf">0.076923</span>
<span class="n">성북구</span>     <span class="mf">0.205128</span>
<span class="n">송파구</span>     <span class="mf">0.384615</span>
<span class="n">양천구</span>     <span class="mf">0.435897</span>
<span class="n">영등포구</span>    <span class="mf">0.487179</span>
<span class="n">용산구</span>     <span class="mf">0.230769</span>
<span class="n">은평구</span>     <span class="mf">0.230769</span>
<span class="n">종로구</span>     <span class="mf">0.307692</span>
<span class="n">중구</span>      <span class="mf">0.205128</span>
<span class="n">중랑구</span>     <span class="mf">0.358974</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">강도</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">float64</span>

<span class="n">col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"살인"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"강간"</span><span class="p">,</span> <span class="s">"절도"</span><span class="p">,</span> <span class="s">"폭력"</span><span class="p">]</span>
<span class="n">crime_anal_norm</span> <span class="o">=</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="n">col</span><span class="p">]</span> <span class="o">/</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="n">col</span><span class="p">].</span><span class="nb">max</span><span class="p">()</span>
<span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="검거율-추가">검거율 추가</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">col2</span> <span class="o">=</span> <span class="p">[</span><span class="s">"강간검거율"</span><span class="p">,</span> <span class="s">"강도검거율"</span><span class="p">,</span> <span class="s">"살인검거율"</span><span class="p">,</span> <span class="s">"절도검거율"</span><span class="p">,</span> <span class="s">"폭력검거율"</span><span class="p">]</span>
<span class="n">crime_anal_norm</span><span class="p">[</span><span class="n">col2</span><span class="p">]</span> <span class="o">=</span> <span class="n">crime_anal_gu</span><span class="p">[</span><span class="n">col2</span><span class="p">]</span>
<span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="구별-cctv자료에서-인구수-cctv-수-추가">구별 CCTV자료에서 인구수, CCTV 수 추가</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">result_CCTV</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"../data/01. CCTV_result.csv"</span><span class="p">,</span> <span class="n">index_col</span><span class="o">=</span><span class="s">"구별"</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">)</span>
<span class="n">result_CCTV</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>소계</th>
      <th>최근증가율</th>
      <th>인구수</th>
      <th>한국인</th>
      <th>외국인</th>
      <th>고령자</th>
      <th>외국인비율</th>
      <th>고령자비율</th>
      <th>CCTV비율</th>
      <th>오차</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>3238</td>
      <td>150.619195</td>
      <td>561052</td>
      <td>556164</td>
      <td>4888</td>
      <td>65060</td>
      <td>0.871220</td>
      <td>11.596073</td>
      <td>0.577130</td>
      <td>1549.200326</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>1010</td>
      <td>166.490765</td>
      <td>440359</td>
      <td>436223</td>
      <td>4136</td>
      <td>56161</td>
      <td>0.939234</td>
      <td>12.753458</td>
      <td>0.229358</td>
      <td>-544.642322</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>831</td>
      <td>125.203252</td>
      <td>328002</td>
      <td>324479</td>
      <td>3523</td>
      <td>56530</td>
      <td>1.074079</td>
      <td>17.234651</td>
      <td>0.253352</td>
      <td>-598.750923</td>
    </tr>
    <tr>
      <th>강서구</th>
      <td>911</td>
      <td>134.793814</td>
      <td>608255</td>
      <td>601691</td>
      <td>6564</td>
      <td>76032</td>
      <td>1.079153</td>
      <td>12.500021</td>
      <td>0.149773</td>
      <td>-830.268578</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>2109</td>
      <td>149.290780</td>
      <td>520929</td>
      <td>503297</td>
      <td>17632</td>
      <td>70046</td>
      <td>3.384722</td>
      <td>13.446362</td>
      <td>0.404854</td>
      <td>464.799395</td>
    </tr>
  </tbody>
</table>
</div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_norm</span><span class="p">[[</span><span class="s">"인구수"</span><span class="p">,</span> <span class="s">"CCTV"</span><span class="p">]]</span> <span class="o">=</span> <span class="n">result_CCTV</span><span class="p">[[</span><span class="s">"인구수"</span><span class="p">,</span> <span class="s">"소계"</span><span class="p">]]</span>
<span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
      <th>인구수</th>
      <th>CCTV</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
      <td>561052</td>
      <td>3238</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
      <td>440359</td>
      <td>1010</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
      <td>328002</td>
      <td>831</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
      <td>520929</td>
      <td>2109</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
      <td>372298</td>
      <td>878</td>
    </tr>
  </tbody>
</table>
</div>

<hr />

<h4 id="정규화된-범죄발생-건수-전체-평균을-구해서-범죄-컬럼-대표값으로-사용">정규화된 범죄발생 건수 전체 평균을 구해서 범죄 컬럼 대표값으로 사용</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"강간"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"살인"</span><span class="p">,</span> <span class="s">"절도"</span><span class="p">,</span> <span class="s">"폭력"</span><span class="p">]</span>
<span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"범죄"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">crime_anal_norm</span><span class="p">[</span><span class="n">col</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1">#행렬의 평균
</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<div class="output_area"><div class="run_this_cell"></div><div class="prompt output_prompt"><bdi>Out[112]:</bdi></div><div class="output_subarea output_html rendered_html output_result" dir="auto"><div>
<style scoped="">
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
      <th>인구수</th>
      <th>CCTV</th>
      <th>범죄</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
      <td>561052</td>
      <td>3238</td>
      <td>0.813607</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
      <td>440359</td>
      <td>1010</td>
      <td>0.379289</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
      <td>328002</td>
      <td>831</td>
      <td>0.378196</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
      <td>520929</td>
      <td>2109</td>
      <td>0.505261</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
      <td>372298</td>
      <td>878</td>
      <td>0.453020</td>
    </tr>
  </tbody>
</table>
</div></div></div>

<hr />

<h2 id="npmean">np.mean()</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span>
    <span class="p">[[</span><span class="mf">0.357143</span><span class="p">,</span> <span class="mf">1.000000</span><span class="p">,</span> <span class="mf">1.000000</span><span class="p">,</span> <span class="mf">0.977118</span><span class="p">,</span> <span class="mf">0.733773</span><span class="p">],</span>
    <span class="p">[</span><span class="mf">0.285714</span><span class="p">,</span> <span class="mf">0.358974</span><span class="p">,</span> <span class="mf">0.310078</span><span class="p">,</span> <span class="mf">0.477799</span><span class="p">,</span> <span class="mf">0.463880</span><span class="p">]]</span>
<span class="p">)</span>

<span class="o">=&gt;</span>

<span class="n">array</span><span class="p">([[</span><span class="mf">0.357143</span><span class="p">,</span> <span class="mf">1.</span>      <span class="p">,</span> <span class="mf">1.</span>      <span class="p">,</span> <span class="mf">0.977118</span><span class="p">,</span> <span class="mf">0.733773</span><span class="p">],</span>
       <span class="p">[</span><span class="mf">0.285714</span><span class="p">,</span> <span class="mf">0.358974</span><span class="p">,</span> <span class="mf">0.310078</span><span class="p">,</span> <span class="mf">0.477799</span><span class="p">,</span> <span class="mf">0.46388</span> <span class="p">]])</span>

<span class="o">------------------------------------------------------------</span>
       
<span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span>
    <span class="p">[[</span><span class="mf">0.357143</span><span class="p">,</span> <span class="mf">1.000000</span><span class="p">,</span> <span class="mf">1.000000</span><span class="p">,</span> <span class="mf">0.977118</span><span class="p">,</span> <span class="mf">0.733773</span><span class="p">],</span>
    <span class="p">[</span><span class="mf">0.285714</span><span class="p">,</span> <span class="mf">0.358974</span><span class="p">,</span> <span class="mf">0.310078</span><span class="p">,</span> <span class="mf">0.477799</span><span class="p">,</span> <span class="mf">0.463880</span><span class="p">]]</span>
<span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># axis=1 : 행 기준, axis=0 : 열 기준
</span>
<span class="o">=&gt;</span>

<span class="n">array</span><span class="p">([</span><span class="mf">0.8136068</span><span class="p">,</span> <span class="mf">0.379289</span> <span class="p">])</span>
</code></pre></div></div>
<hr />

<h4 id="검거율의-평균을-검거-컬럼의-대표값으로-사용">검거율의 평균을 검거 컬럼의 대표값으로 사용</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"강간검거율"</span><span class="p">,</span> <span class="s">"강도검거율"</span><span class="p">,</span> <span class="s">"살인검거율"</span><span class="p">,</span> <span class="s">"절도검거율"</span><span class="p">,</span> <span class="s">"폭력검거율"</span><span class="p">]</span>
<span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"검거"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">crime_anal_norm</span><span class="p">[</span><span class="n">col</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># axis=1 : 행 기준
</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
      <th>인구수</th>
      <th>CCTV</th>
      <th>범죄</th>
      <th>검거</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
      <td>561052</td>
      <td>3238</td>
      <td>0.813607</td>
      <td>84.328112</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
      <td>440359</td>
      <td>1010</td>
      <td>0.379289</td>
      <td>85.255701</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
      <td>328002</td>
      <td>831</td>
      <td>0.378196</td>
      <td>76.664569</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
      <td>520929</td>
      <td>2109</td>
      <td>0.505261</td>
      <td>78.710965</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
      <td>372298</td>
      <td>878</td>
      <td>0.453020</td>
      <td>72.517393</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="최종-데이터-프레임">최종 데이터 프레임</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_norm</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
      <th>인구수</th>
      <th>CCTV</th>
      <th>범죄</th>
      <th>검거</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
      <td>561052</td>
      <td>3238</td>
      <td>0.813607</td>
      <td>84.328112</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
      <td>440359</td>
      <td>1010</td>
      <td>0.379289</td>
      <td>85.255701</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
      <td>328002</td>
      <td>831</td>
      <td>0.378196</td>
      <td>76.664569</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
      <td>520929</td>
      <td>2109</td>
      <td>0.505261</td>
      <td>78.710965</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
      <td>372298</td>
      <td>878</td>
      <td>0.453020</td>
      <td>72.517393</td>
    </tr>
    <tr>
      <th>구로구</th>
      <td>0.642857</td>
      <td>0.256410</td>
      <td>0.529070</td>
      <td>0.520294</td>
      <td>0.580125</td>
      <td>66.300366</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>45.078534</td>
      <td>84.702908</td>
      <td>441559</td>
      <td>1884</td>
      <td>0.505751</td>
      <td>79.216362</td>
    </tr>
    <tr>
      <th>금천구</th>
      <td>0.428571</td>
      <td>0.179487</td>
      <td>0.339147</td>
      <td>0.344320</td>
      <td>0.402090</td>
      <td>81.714286</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>51.740506</td>
      <td>88.736890</td>
      <td>253491</td>
      <td>1348</td>
      <td>0.338723</td>
      <td>84.438336</td>
    </tr>
    <tr>
      <th>노원구</th>
      <td>0.357143</td>
      <td>0.153846</td>
      <td>0.308140</td>
      <td>0.505857</td>
      <td>0.461313</td>
      <td>89.308176</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>39.849219</td>
      <td>84.419714</td>
      <td>558075</td>
      <td>1566</td>
      <td>0.357260</td>
      <td>82.715422</td>
    </tr>
    <tr>
      <th>도봉구</th>
      <td>0.214286</td>
      <td>0.128205</td>
      <td>0.238372</td>
      <td>0.235903</td>
      <td>0.264210</td>
      <td>98.373984</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>56.812933</td>
      <td>90.839695</td>
      <td>346234</td>
      <td>825</td>
      <td>0.216195</td>
      <td>89.205322</td>
    </tr>
    <tr>
      <th>동대문구</th>
      <td>0.357143</td>
      <td>0.256410</td>
      <td>0.368217</td>
      <td>0.528466</td>
      <td>0.484415</td>
      <td>83.157895</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>55.206186</td>
      <td>89.969720</td>
      <td>366011</td>
      <td>1870</td>
      <td>0.398930</td>
      <td>85.666760</td>
    </tr>
    <tr>
      <th>동작구</th>
      <td>0.571429</td>
      <td>0.179487</td>
      <td>0.629845</td>
      <td>0.333969</td>
      <td>0.304547</td>
      <td>45.846154</td>
      <td>100.000000</td>
      <td>75.000000</td>
      <td>45.187602</td>
      <td>86.935581</td>
      <td>408493</td>
      <td>1302</td>
      <td>0.403855</td>
      <td>70.593867</td>
    </tr>
    <tr>
      <th>마포구</th>
      <td>0.285714</td>
      <td>0.102564</td>
      <td>0.773256</td>
      <td>0.688368</td>
      <td>0.538871</td>
      <td>80.200501</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>37.198259</td>
      <td>85.062947</td>
      <td>385783</td>
      <td>980</td>
      <td>0.477755</td>
      <td>80.492341</td>
    </tr>
    <tr>
      <th>서대문구</th>
      <td>0.428571</td>
      <td>0.128205</td>
      <td>0.339147</td>
      <td>0.409425</td>
      <td>0.362303</td>
      <td>84.000000</td>
      <td>80.000000</td>
      <td>100.000000</td>
      <td>50.033267</td>
      <td>83.198381</td>
      <td>325028</td>
      <td>1254</td>
      <td>0.333530</td>
      <td>79.446329</td>
    </tr>
    <tr>
      <th>서초구</th>
      <td>0.357143</td>
      <td>0.333333</td>
      <td>0.829457</td>
      <td>0.600654</td>
      <td>0.428676</td>
      <td>63.317757</td>
      <td>76.923077</td>
      <td>100.000000</td>
      <td>50.204082</td>
      <td>86.783576</td>
      <td>445401</td>
      <td>2297</td>
      <td>0.509853</td>
      <td>75.445698</td>
    </tr>
    <tr>
      <th>성동구</th>
      <td>0.285714</td>
      <td>0.076923</td>
      <td>0.201550</td>
      <td>0.353037</td>
      <td>0.296846</td>
      <td>75.000000</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>69.135802</td>
      <td>86.967264</td>
      <td>312711</td>
      <td>1327</td>
      <td>0.242814</td>
      <td>86.220613</td>
    </tr>
    <tr>
      <th>성북구</th>
      <td>0.285714</td>
      <td>0.205128</td>
      <td>0.298450</td>
      <td>0.400436</td>
      <td>0.386505</td>
      <td>75.974026</td>
      <td>100.000000</td>
      <td>75.000000</td>
      <td>49.319728</td>
      <td>86.290323</td>
      <td>455407</td>
      <td>1651</td>
      <td>0.315247</td>
      <td>77.316815</td>
    </tr>
    <tr>
      <th>송파구</th>
      <td>0.642857</td>
      <td>0.384615</td>
      <td>0.453488</td>
      <td>0.692727</td>
      <td>0.603044</td>
      <td>78.632479</td>
      <td>80.000000</td>
      <td>88.888889</td>
      <td>41.211168</td>
      <td>85.375494</td>
      <td>671173</td>
      <td>1081</td>
      <td>0.555346</td>
      <td>74.821606</td>
    </tr>
    <tr>
      <th>양천구</th>
      <td>1.000000</td>
      <td>0.435897</td>
      <td>0.786822</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>85.467980</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>49.713974</td>
      <td>85.918592</td>
      <td>475018</td>
      <td>2482</td>
      <td>0.844544</td>
      <td>84.220109</td>
    </tr>
    <tr>
      <th>영등포구</th>
      <td>0.928571</td>
      <td>0.487179</td>
      <td>0.689922</td>
      <td>0.637701</td>
      <td>0.658783</td>
      <td>63.202247</td>
      <td>73.684211</td>
      <td>100.000000</td>
      <td>40.153780</td>
      <td>83.690509</td>
      <td>402024</td>
      <td>1277</td>
      <td>0.680431</td>
      <td>72.146149</td>
    </tr>
    <tr>
      <th>용산구</th>
      <td>0.285714</td>
      <td>0.230769</td>
      <td>0.486434</td>
      <td>0.405612</td>
      <td>0.437110</td>
      <td>85.258964</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>40.228341</td>
      <td>84.228188</td>
      <td>244444</td>
      <td>2096</td>
      <td>0.369128</td>
      <td>81.943099</td>
    </tr>
    <tr>
      <th>은평구</th>
      <td>0.428571</td>
      <td>0.230769</td>
      <td>0.302326</td>
      <td>0.453827</td>
      <td>0.488449</td>
      <td>91.025641</td>
      <td>77.777778</td>
      <td>100.000000</td>
      <td>53.421369</td>
      <td>86.636637</td>
      <td>491202</td>
      <td>2108</td>
      <td>0.380788</td>
      <td>81.772285</td>
    </tr>
    <tr>
      <th>종로구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.461240</td>
      <td>0.528466</td>
      <td>0.414925</td>
      <td>74.369748</td>
      <td>75.000000</td>
      <td>33.333333</td>
      <td>39.587629</td>
      <td>87.361909</td>
      <td>164257</td>
      <td>1619</td>
      <td>0.428179</td>
      <td>61.930524</td>
    </tr>
    <tr>
      <th>중구</th>
      <td>0.214286</td>
      <td>0.205128</td>
      <td>0.383721</td>
      <td>0.585671</td>
      <td>0.407957</td>
      <td>74.747475</td>
      <td>87.500000</td>
      <td>100.000000</td>
      <td>42.511628</td>
      <td>89.707865</td>
      <td>134593</td>
      <td>1023</td>
      <td>0.359353</td>
      <td>78.893394</td>
    </tr>
    <tr>
      <th>중랑구</th>
      <td>0.571429</td>
      <td>0.358974</td>
      <td>0.317829</td>
      <td>0.460637</td>
      <td>0.580125</td>
      <td>91.463415</td>
      <td>100.000000</td>
      <td>87.500000</td>
      <td>62.211709</td>
      <td>85.714286</td>
      <td>412780</td>
      <td>916</td>
      <td>0.457799</td>
      <td>85.377882</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="seaborn">Seaborn</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">rc</span>

<span class="n">plt</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"axes.unicode_minus"</span><span class="p">]</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">rc</span><span class="p">(</span><span class="s">"font"</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="s">"malgun gothic"</span><span class="p">)</span>
<span class="o">%</span><span class="n">matplotlib</span> <span class="n">inline</span> <span class="c1"># get_ipython().run_line_magic("matplotlib", "inline")
</span></code></pre></div></div>
<hr />
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">14</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span> <span class="c1"># 0부터 14까지 100개
</span>
<span class="o">=&gt;</span>

<span class="n">array</span><span class="p">([</span> <span class="mf">0.</span>        <span class="p">,</span>  <span class="mf">0.14141414</span><span class="p">,</span>  <span class="mf">0.28282828</span><span class="p">,</span>  <span class="mf">0.42424242</span><span class="p">,</span>  <span class="mf">0.56565657</span><span class="p">,</span>
									<span class="p">.</span>
                                    <span class="p">.</span>
                                    <span class="p">.</span>
       <span class="mf">13.43434343</span><span class="p">,</span> <span class="mf">13.57575758</span><span class="p">,</span> <span class="mf">13.71717172</span><span class="p">,</span> <span class="mf">13.85858586</span><span class="p">,</span> <span class="mf">14.</span>        <span class="p">])</span>
</code></pre></div></div>
<hr />
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">14</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">y1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">y2</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="mf">0.5</span><span class="p">)</span>
<span class="n">y3</span> <span class="o">=</span> <span class="mi">3</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="n">y4</span> <span class="o">=</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="mf">1.5</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y1</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y2</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y3</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y4</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/3689a77b-fb25-4893-90d1-ad988a1fe4cb/image.png" alt="" /></p>

<hr />

<h4 id="style--white-whitegrid-dark-darkgrid">style : “white”, “whitegrid”, “dark”, “darkgrid”</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"dark"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y1</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y2</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y3</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y4</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/e0c315ac-4d3d-432a-a6dd-05681e0b7241/image.png" alt="" /></p>

<hr />

<h3 id="seaborn-tips-data">seaborn tips data</h3>
<ul>
  <li>boxplot</li>
  <li>swarmplot</li>
  <li>lmplot</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tips</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">load_dataset</span><span class="p">(</span><span class="s">"tips"</span><span class="p">)</span>
<span class="n">tips</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>total_bill</th>
      <th>tip</th>
      <th>sex</th>
      <th>smoker</th>
      <th>day</th>
      <th>time</th>
      <th>size</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>16.99</td>
      <td>1.01</td>
      <td>Female</td>
      <td>No</td>
      <td>Sun</td>
      <td>Dinner</td>
      <td>2</td>
    </tr>
    <tr>
      <th>1</th>
      <td>10.34</td>
      <td>1.66</td>
      <td>Male</td>
      <td>No</td>
      <td>Sun</td>
      <td>Dinner</td>
      <td>3</td>
    </tr>
    <tr>
      <th>2</th>
      <td>21.01</td>
      <td>3.50</td>
      <td>Male</td>
      <td>No</td>
      <td>Sun</td>
      <td>Dinner</td>
      <td>3</td>
    </tr>
    <tr>
      <th>3</th>
      <td>23.68</td>
      <td>3.31</td>
      <td>Male</td>
      <td>No</td>
      <td>Sun</td>
      <td>Dinner</td>
      <td>2</td>
    </tr>
    <tr>
      <th>4</th>
      <td>24.59</td>
      <td>3.61</td>
      <td>Female</td>
      <td>No</td>
      <td>Sun</td>
      <td>Dinner</td>
      <td>4</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>239</th>
      <td>29.03</td>
      <td>5.92</td>
      <td>Male</td>
      <td>No</td>
      <td>Sat</td>
      <td>Dinner</td>
      <td>3</td>
    </tr>
    <tr>
      <th>240</th>
      <td>27.18</td>
      <td>2.00</td>
      <td>Female</td>
      <td>Yes</td>
      <td>Sat</td>
      <td>Dinner</td>
      <td>2</td>
    </tr>
    <tr>
      <th>241</th>
      <td>22.67</td>
      <td>2.00</td>
      <td>Male</td>
      <td>Yes</td>
      <td>Sat</td>
      <td>Dinner</td>
      <td>2</td>
    </tr>
    <tr>
      <th>242</th>
      <td>17.82</td>
      <td>1.75</td>
      <td>Male</td>
      <td>No</td>
      <td>Sat</td>
      <td>Dinner</td>
      <td>2</td>
    </tr>
    <tr>
      <th>243</th>
      <td>18.78</td>
      <td>3.00</td>
      <td>Female</td>
      <td>No</td>
      <td>Thur</td>
      <td>Dinner</td>
      <td>2</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="boxplot">boxplot</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">tips</span><span class="p">[</span><span class="s">"total_bill"</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/a75b2556-d4e3-46ed-a7ad-fe47fe993cfc/image.png" alt="" /></p>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"day"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/16744f07-160c-42f7-8cf7-63a1240a6bb7/image.png" alt="" /></p>

<hr />

<h4 id="boxplot-hue-palette-option">boxplot hue, palette option</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="c1"># hue: 카테고리 데이터 표현
</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"day"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="s">"smoker"</span><span class="p">,</span> <span class="n">palette</span><span class="o">=</span><span class="s">"Set3"</span><span class="p">)</span> <span class="c1"># Set 1 ~ 3
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/dc54b2b3-f250-49c6-b3ff-0e4fc63eb004/image.png" alt="" /></p>

<hr />

<h4 id="swarmplot">swarmplot</h4>
<ul>
  <li>color : 0~1 사이 -&gt; 검은색부터 흰색 사이 값 조절
    <div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">swarmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"day"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"0"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>    </div>
    <p><img src="https://velog.velcdn.com/images/yy2hi/post/5b3e2b87-4090-4afe-93ab-6042e2de73b3/image.png" alt="" /></p>
  </li>
</ul>

<hr />

<h4 id="boxplot-with-swarmplot">boxplot with swarmplot</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"day"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">swarmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"day"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"0"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/04950a63-f83f-4ca9-ba48-38cee5be543b/image.png" alt="" /></p>

<hr />

<h4 id="lmplot">lmplot</h4>
<ul>
  <li>total_bill과 tip 사이 관계 파악</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"tip"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span> <span class="c1"># size =&gt; height
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/a3fa36ef-6ed8-47c2-9fec-d987294560a0/image.png" alt="" /></p>

<hr />

<ul>
  <li>hue option
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"total_bill"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"tip"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">tips</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="s">"smoker"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>    </div>
    <p><img src="https://velog.velcdn.com/images/yy2hi/post/e3000780-26f8-4277-aaf9-59cab28b561b/image.png" alt="" /></p>
  </li>
</ul>

<hr />

<h3 id="flights-datas">flights datas</h3>
<ul>
  <li>heatmap</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">flights</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">load_dataset</span><span class="p">(</span><span class="s">"flights"</span><span class="p">)</span>
<span class="n">flights</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>year</th>
      <th>month</th>
      <th>passengers</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1949</td>
      <td>Jan</td>
      <td>112</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1949</td>
      <td>Feb</td>
      <td>118</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1949</td>
      <td>Mar</td>
      <td>132</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1949</td>
      <td>Apr</td>
      <td>129</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1949</td>
      <td>May</td>
      <td>121</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="pivotindex-columns-values">pivot(index, columns, values)</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">flights</span> <span class="o">=</span> <span class="n">flights</span><span class="p">.</span><span class="n">pivot</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="s">"month"</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="s">"year"</span><span class="p">,</span> <span class="n">values</span><span class="o">=</span><span class="s">"passengers"</span><span class="p">)</span>
<span class="n">flights</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>year</th>
      <th>1949</th>
      <th>1950</th>
      <th>1951</th>
      <th>1952</th>
      <th>1953</th>
      <th>1954</th>
      <th>1955</th>
      <th>1956</th>
      <th>1957</th>
      <th>1958</th>
      <th>1959</th>
      <th>1960</th>
    </tr>
    <tr>
      <th>month</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Jan</th>
      <td>112</td>
      <td>115</td>
      <td>145</td>
      <td>171</td>
      <td>196</td>
      <td>204</td>
      <td>242</td>
      <td>284</td>
      <td>315</td>
      <td>340</td>
      <td>360</td>
      <td>417</td>
    </tr>
    <tr>
      <th>Feb</th>
      <td>118</td>
      <td>126</td>
      <td>150</td>
      <td>180</td>
      <td>196</td>
      <td>188</td>
      <td>233</td>
      <td>277</td>
      <td>301</td>
      <td>318</td>
      <td>342</td>
      <td>391</td>
    </tr>
    <tr>
      <th>Mar</th>
      <td>132</td>
      <td>141</td>
      <td>178</td>
      <td>193</td>
      <td>236</td>
      <td>235</td>
      <td>267</td>
      <td>317</td>
      <td>356</td>
      <td>362</td>
      <td>406</td>
      <td>419</td>
    </tr>
    <tr>
      <th>Apr</th>
      <td>129</td>
      <td>135</td>
      <td>163</td>
      <td>181</td>
      <td>235</td>
      <td>227</td>
      <td>269</td>
      <td>313</td>
      <td>348</td>
      <td>348</td>
      <td>396</td>
      <td>461</td>
    </tr>
    <tr>
      <th>May</th>
      <td>121</td>
      <td>125</td>
      <td>172</td>
      <td>183</td>
      <td>229</td>
      <td>234</td>
      <td>270</td>
      <td>318</td>
      <td>355</td>
      <td>363</td>
      <td>420</td>
      <td>472</td>
    </tr>
  </tbody>
</table>

<h4 id="heatmap">heatmap</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">flights</span><span class="p">,</span> <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="s">"d"</span><span class="p">)</span> <span class="c1"># annot=True : 데이터 값 표시, fmt="d" : 정수형 표현
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/22a3a10a-3f32-4ae3-a6d1-6c440249be01/image.png" alt="" /></p>

<hr />

<h4 id="colormap">colormap</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span><span class="n">flights</span><span class="p">,</span> <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">fmt</span><span class="o">=</span><span class="s">"d"</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">"YlGnBu"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/e57ae578-6555-49e6-8d7a-7089f61480f1/image.png" alt="" /></p>

<hr />

<h3 id="iris-data">iris data</h3>
<ul>
  <li>pairplot</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iris</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">load_dataset</span><span class="p">(</span><span class="s">"iris"</span><span class="p">)</span>
<span class="n">iris</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sepal_length</th>
      <th>sepal_width</th>
      <th>petal_length</th>
      <th>petal_width</th>
      <th>species</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>145</th>
      <td>6.7</td>
      <td>3.0</td>
      <td>5.2</td>
      <td>2.3</td>
      <td>virginica</td>
    </tr>
    <tr>
      <th>146</th>
      <td>6.3</td>
      <td>2.5</td>
      <td>5.0</td>
      <td>1.9</td>
      <td>virginica</td>
    </tr>
    <tr>
      <th>147</th>
      <td>6.5</td>
      <td>3.0</td>
      <td>5.2</td>
      <td>2.0</td>
      <td>virginica</td>
    </tr>
    <tr>
      <th>148</th>
      <td>6.2</td>
      <td>3.4</td>
      <td>5.4</td>
      <td>2.3</td>
      <td>virginica</td>
    </tr>
    <tr>
      <th>149</th>
      <td>5.9</td>
      <td>3.0</td>
      <td>5.1</td>
      <td>1.8</td>
      <td>virginica</td>
    </tr>
  </tbody>
</table>

<h4 id="pairplot">pairplot</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"ticks"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">iris</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/4d705dd9-168b-458e-aa72-12461b1e3594/image.png" alt="" /></p>

<hr />

<ul>
  <li>hue option</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">iris</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="s">"species"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/094f0196-f77a-44e9-bf2b-d4fd3204e0a6/image.png" alt="" /></p>

<hr />

<h4 id="원하는-컬럼만-pairplot">원하는 컬럼만 pairplot</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">iris</span><span class="p">,</span> <span class="n">x_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"sepal_width"</span><span class="p">,</span> <span class="s">"sepal_length"</span><span class="p">],</span> 
                   <span class="n">y_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"petal_width"</span><span class="p">,</span> <span class="s">"petal_length"</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/6c888b83-23f0-4d9f-abf8-ffe74286c024/image.png" alt="" /></p>

<hr />

<h3 id="anscombe-data">anscombe data</h3>
<ul>
  <li>lmplot
    <div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">anscombe</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">load_dataset</span><span class="p">(</span><span class="s">"anscombe"</span><span class="p">)</span>
<span class="n">anscombe</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div>    </div>
  </li>
</ul>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>dataset</th>
      <th>x</th>
      <th>y</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>39</th>
      <td>IV</td>
      <td>8.0</td>
      <td>5.25</td>
    </tr>
    <tr>
      <th>40</th>
      <td>IV</td>
      <td>19.0</td>
      <td>12.50</td>
    </tr>
    <tr>
      <th>41</th>
      <td>IV</td>
      <td>8.0</td>
      <td>5.56</td>
    </tr>
    <tr>
      <th>42</th>
      <td>IV</td>
      <td>8.0</td>
      <td>7.91</td>
    </tr>
    <tr>
      <th>43</th>
      <td>IV</td>
      <td>8.0</td>
      <td>6.89</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="lmplot-1">lmplot</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"x"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"y"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">anscombe</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">"dataset == 'I'"</span><span class="p">),</span> <span class="n">ci</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span> <span class="c1"># ci : 신뢰구간 선택
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/8da92c91-bb26-4ee6-b594-5ae2633e8128/image.png" alt="" /></p>

<hr />

<h4 id="scatter-크기-조정">scatter 크기 조정</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"x"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"y"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">anscombe</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">"dataset == 'I'"</span><span class="p">),</span> <span class="n">ci</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">scatter_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"s"</span><span class="p">:</span><span class="mi">80</span><span class="p">})</span> <span class="c1"># ci : 신뢰구간 선택
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/581f04f8-d978-44c1-9b21-9d67bcef8ef3/image.png" alt="" /></p>

<hr />

<h4 id="order-option">order option</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span>
    <span class="n">x</span><span class="o">=</span><span class="s">"x"</span><span class="p">,</span>
    <span class="n">y</span><span class="o">=</span><span class="s">"y"</span><span class="p">,</span>
    <span class="n">data</span><span class="o">=</span><span class="n">anscombe</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">"dataset == 'II'"</span><span class="p">),</span>
    <span class="n">order</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
    <span class="n">ci</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
    <span class="n">scatter_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"s"</span><span class="p">:</span><span class="mi">80</span><span class="p">})</span> <span class="c1"># ci : 신뢰구간 선택
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/fa9e0e2f-3ce9-4c81-9aa6-3a4cefec31f2/image.png" alt="" /></p>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span>
    <span class="n">x</span><span class="o">=</span><span class="s">"x"</span><span class="p">,</span>
    <span class="n">y</span><span class="o">=</span><span class="s">"y"</span><span class="p">,</span>
    <span class="n">data</span><span class="o">=</span><span class="n">anscombe</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">"dataset == 'II'"</span><span class="p">),</span>
    <span class="n">order</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
    <span class="n">ci</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
    <span class="n">scatter_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"s"</span><span class="p">:</span><span class="mi">80</span><span class="p">})</span> <span class="c1"># ci : 신뢰구간 선택
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/22b6d3cb-d671-4955-ba45-1a5d63760117/image.png" alt="" /></p>

<hr />

<h4 id="outliner">outliner</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span>
    <span class="n">x</span><span class="o">=</span><span class="s">"x"</span><span class="p">,</span>
    <span class="n">y</span><span class="o">=</span><span class="s">"y"</span><span class="p">,</span>
    <span class="n">data</span><span class="o">=</span><span class="n">anscombe</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">"dataset == 'III'"</span><span class="p">),</span>
    <span class="n">ci</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
    <span class="n">scatter_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"s"</span> <span class="p">:</span> <span class="mi">80</span><span class="p">})</span> <span class="c1"># ci : 신뢰구간 선택
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/bd3228e7-b74e-4543-a5e0-fcfad2349d04/image.png" alt="" /></p>

<hr />

<h4 id="robust">robust</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">set_style</span><span class="p">(</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">lmplot</span><span class="p">(</span>
    <span class="n">x</span><span class="o">=</span><span class="s">"x"</span><span class="p">,</span>
    <span class="n">y</span><span class="o">=</span><span class="s">"y"</span><span class="p">,</span>
    <span class="n">data</span><span class="o">=</span><span class="n">anscombe</span><span class="p">.</span><span class="n">query</span><span class="p">(</span><span class="s">"dataset == 'III'"</span><span class="p">),</span>
    <span class="n">robust</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">ci</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
    <span class="n">height</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span>
    <span class="n">scatter_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"s"</span> <span class="p">:</span> <span class="mi">80</span><span class="p">})</span> <span class="c1"># ci : 신뢰구간 선택
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/03bfdf60-9d57-4e88-bbe7-42b090d71a2b/image.png" alt="" /></p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[범죄 데이터 정렬을 위한 데이터 정리 crime_anal_gu.head() 강간 강도 살인 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 구별 강남구 516.0 39.0 5.0 3587.0 4002.0 80.038760 100.000000 100.000000 53.470867 88.130935 강동구 160.0 14.0 4.0 1754.0 2530.0 95.000000 92.857143 100.000000 51.425314 86.996047 강북구 217.0 5.0 7.0 1222.0 2778.0 73.271889 80.000000 85.714286 54.991817 89.344852 관악구 322.0 12.0 6.0 2103.0 3235.0 81.987578 83.333333 100.000000 44.555397 83.678516 광진구 279.0 11.0 4.0 2636.0 2392.0 83.870968 54.545455 100.000000 40.098634 84.071906]]></summary></entry><entry><title type="html">Project 2 - 서울시 범죄 현황 데이터 분석 (4)</title><link href="https://yy2-hi.github.io/dataanalysis/crimeanalysis4/" rel="alternate" type="text/html" title="Project 2 - 서울시 범죄 현황 데이터 분석 (4)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/crimeanalysis4</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/crimeanalysis4/"><![CDATA[<h2 id="서울시-범죄현황-데이터-시각화">서울시 범죄현황 데이터 시각화</h2>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
      <th>인구수</th>
      <th>CCTV</th>
      <th>범죄</th>
      <th>검거</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>0.357143</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>0.977118</td>
      <td>0.733773</td>
      <td>80.038760</td>
      <td>100.000000</td>
      <td>100.000000</td>
      <td>53.470867</td>
      <td>88.130935</td>
      <td>561052</td>
      <td>3238</td>
      <td>0.813607</td>
      <td>84.328112</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>0.285714</td>
      <td>0.358974</td>
      <td>0.310078</td>
      <td>0.477799</td>
      <td>0.463880</td>
      <td>95.000000</td>
      <td>92.857143</td>
      <td>100.000000</td>
      <td>51.425314</td>
      <td>86.996047</td>
      <td>440359</td>
      <td>1010</td>
      <td>0.379289</td>
      <td>85.255701</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>0.500000</td>
      <td>0.128205</td>
      <td>0.420543</td>
      <td>0.332879</td>
      <td>0.509351</td>
      <td>73.271889</td>
      <td>80.000000</td>
      <td>85.714286</td>
      <td>54.991817</td>
      <td>89.344852</td>
      <td>328002</td>
      <td>831</td>
      <td>0.378196</td>
      <td>76.664569</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>0.428571</td>
      <td>0.307692</td>
      <td>0.624031</td>
      <td>0.572868</td>
      <td>0.593143</td>
      <td>81.987578</td>
      <td>83.333333</td>
      <td>100.000000</td>
      <td>44.555397</td>
      <td>83.678516</td>
      <td>520929</td>
      <td>2109</td>
      <td>0.505261</td>
      <td>78.710965</td>
    </tr>
    <tr>
      <th>광진구</th>
      <td>0.285714</td>
      <td>0.282051</td>
      <td>0.540698</td>
      <td>0.718060</td>
      <td>0.438577</td>
      <td>83.870968</td>
      <td>54.545455</td>
      <td>100.000000</td>
      <td>40.098634</td>
      <td>84.071906</td>
      <td>372298</td>
      <td>878</td>
      <td>0.453020</td>
      <td>72.517393</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="pairplot-강도-살인-폭력에-대한-상관관계-확인">pairplot 강도, 살인, 폭력에 대한 상관관계 확인</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">,</span> <span class="nb">vars</span><span class="o">=</span><span class="p">[</span><span class="s">"살인"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"폭력"</span><span class="p">],</span> <span class="n">kind</span><span class="o">=</span><span class="s">"reg"</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">3</span><span class="p">);</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/e522a34b-30ef-48bc-80e1-62426c3902ca/image.png" alt="" /></p>

<hr />

<h4 id="인구수-cctv와-살인-강도의-상관관계-확인">“인구수”, “CCTV”와 “살인”, “강도”의 상관관계 확인</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">drawGraph</span><span class="p">():</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">,</span> 
                 <span class="n">x_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"인구수"</span><span class="p">,</span> <span class="s">"CCTV"</span><span class="p">],</span>
                 <span class="n">y_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"살인"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">],</span> 
                 <span class="n">kind</span><span class="o">=</span><span class="s">"reg"</span><span class="p">,</span>
                 <span class="n">height</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">drawGraph</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/cecab182-56af-4d9f-8fac-7a5994f3db7e/image.png" alt="" /></p>

<hr />

<h4 id="인구수-cctv-와-살인검거율-폭력검거율의-상관관계-확인">“인구수”, “CCTV” 와 “살인검거율”, “폭력검거율”의 상관관계 확인</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">drawGraph</span><span class="p">():</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">,</span> 
                 <span class="n">x_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"인구수"</span><span class="p">,</span> <span class="s">"CCTV"</span><span class="p">],</span>
                 <span class="n">y_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"살인검거율"</span><span class="p">,</span> <span class="s">"폭력검거율"</span><span class="p">],</span> 
                 <span class="n">kind</span><span class="o">=</span><span class="s">"reg"</span><span class="p">,</span>
                 <span class="n">height</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">drawGraph</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/6f4b7140-7a64-42b1-850f-501bb96bab17/image.png" alt="" /></p>

<hr />

<h4 id="인구수-cctv-와-절도검거율-강도검거율의-상관관계-확인">“인구수”, “CCTV” 와 “절도검거율”, “강도검거율”의 상관관계 확인</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">def</span> <span class="nf">drawGraph</span><span class="p">():</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">,</span> 
                 <span class="n">x_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"인구수"</span><span class="p">,</span> <span class="s">"CCTV"</span><span class="p">],</span>
                 <span class="n">y_vars</span><span class="o">=</span><span class="p">[</span><span class="s">"절도검거율"</span><span class="p">,</span> <span class="s">"강도검거율"</span><span class="p">],</span> 
                 <span class="n">kind</span><span class="o">=</span><span class="s">"reg"</span><span class="p">,</span>
                 <span class="n">height</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">drawGraph</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ce440d41-a3fe-4de2-80d0-3cf1249bf472/image.png" alt="" /></p>

<hr />

<h4 id="검거율-heatmap">검거율 heatmap</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># "검거" 컬럼을 기준으로 정렬
</span>
<span class="k">def</span> <span class="nf">drawGraph</span><span class="p">():</span>
    
    <span class="c1"># 데이터 프레임 생성
</span>    <span class="n">target_col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"강간검거율"</span><span class="p">,</span> <span class="s">"강도검거율"</span><span class="p">,</span> <span class="s">"살인검거율"</span><span class="p">,</span> <span class="s">"절도검거율"</span><span class="p">,</span> <span class="s">"폭력검거율"</span><span class="p">,</span> <span class="s">"검거"</span><span class="p">]</span>
    <span class="n">crime_anal_norm_sort</span> <span class="o">=</span> <span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">"검거"</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="c1"># 내림차순
</span>    
    <span class="c1"># 그래프 설정
</span>    <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span>
        <span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm_sort</span><span class="p">[</span><span class="n">target_col</span><span class="p">],</span>
        <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="c1"># 데이터 값 표현
</span>        <span class="n">fmt</span><span class="o">=</span><span class="s">"f"</span><span class="p">,</span> <span class="c1"># d: 정수, f: 실수
</span>        <span class="n">linewidths</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="c1"># 간격설정
</span>        <span class="n">cmap</span><span class="o">=</span><span class="s">"RdPu"</span>
    <span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">"범죄 검거 비율(정규화된 검거의 합으로 정렬)"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">drawGraph</span><span class="p">()</span>    
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/9dc905b1-4088-4a01-a7aa-b3bcc5e37254/image.png" alt="" /></p>

<hr />

<h4 id="범죄발생-건수-heatmap">범죄발생 건수 heatmap</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># "범죄" 컬럼을 기준으로 정렬
</span>
<span class="k">def</span> <span class="nf">drawGraph</span><span class="p">():</span>
    
    <span class="c1"># 데이터 프레임 생성
</span>    <span class="n">target_col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"살인"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"강간"</span><span class="p">,</span> <span class="s">"절도"</span><span class="p">,</span> <span class="s">"폭력"</span><span class="p">,</span> <span class="s">"범죄"</span><span class="p">]</span>
    <span class="n">crime_anal_norm_sort</span> <span class="o">=</span> <span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">"범죄"</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="c1"># 내림차순
</span>    
    <span class="c1"># 그래프 설정
</span>    <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span>
        <span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm_sort</span><span class="p">[</span><span class="n">target_col</span><span class="p">],</span>
        <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="c1"># 데이터값 표현
</span>        <span class="n">fmt</span><span class="o">=</span><span class="s">"f"</span><span class="p">,</span> <span class="c1"># 실수값으로 표현
</span>        <span class="n">linewidth</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="c1"># 간격설정
</span>        <span class="n">cmap</span><span class="o">=</span><span class="s">"RdPu"</span>
    <span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">"범죄 비율(정규화된 발생 건수로 정렬)"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">drawGraph</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ade88aba-4b34-4c4e-bbe8-16d619404f36/image.png" alt="" /></p>

<hr />

<h4 id="데이터-저장">데이터 저장</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">to_csv</span><span class="p">(</span><span class="s">"../data/02. crime_in_Seoul_final.csv"</span><span class="p">,</span> <span class="n">sep</span><span class="o">=</span><span class="s">","</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">)</span>
</code></pre></div></div>
<hr />

<h2 id="folium">folium</h2>
<h3 id="foliummap">folium.Map()</h3>
<ul>
  <li>location: tuple or list, default None Latitude and Longitude of Map (Northing, Easting).
    <div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5665512</span><span class="p">,</span> <span class="mf">126.97805437</span><span class="p">],</span> <span class="n">zoom_start</span><span class="o">=</span><span class="mi">18</span><span class="p">)</span> <span class="c1"># 0 ~ 18
</span><span class="n">m</span>
</code></pre></div>    </div>
    <p><img src="https://velog.velcdn.com/images/yy2hi/post/0b713c64-cce9-41ca-9a3e-4751ca7a8d33/image.png" alt="" /></p>
  </li>
</ul>

<hr />

<h4 id="savepath">save(“path”)</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">"./folium.html"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h4 id="tiles-option">tiles option</h4>

<ul>
  <li>“OpenStreetMap”</li>
  <li>“Mapbox Bright” (Limited levels of zoom for free tiles)</li>
  <li>“Mapbox Control Room” (Limited levels of zoom for free tiles)</li>
  <li>“Stamen” (Terrain, Toner, and Watercolor)</li>
  <li>“Cloudmade” (Must pass API key)</li>
  <li>“Mapbox” (Must pass API key)</li>
  <li>“CartoDB” (positron and dark_matter)
```py
m = folium.Map(
  location=[37.5665512, 126.97805437],
  zoom_start=14, # 0 ~ 18
  tiles=”openstreetmap”
  )
folium.Marker((37.56583779,126.97512197)).add_to(m)
folium.Marker(
  location=[37.5665512, 126.97805437],
  popup=”<b>서울</b>”,
  tooltip=”<i>서울</i>”
).add_to(m)
m</li>
</ul>

<p>folium.Marker(
    location=[37.54878936,126.973356],
    popup=”&lt;a href=’https://zero-base.co.kr/’ target=_‘blink’&gt;제로베이스&lt;/a&gt;”,
    tooltip=”<i>Zerobase</i>”
).add_to(m)
m</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>![](https://velog.velcdn.com/images/yy2hi/post/5cafd53d-8bb5-45a7-bc67-63692167196a/image.png)

---

### folium.Marker()
- 지도에 마커 생성

```py
m = folium.Map(
    location=[37.5665512, 126.97805437],
    zoom_start=14, # 0 ~ 18
    tiles="openstreetmap"
    )
folium.Marker((37.56583779,126.97512197)).add_to(m)
folium.Marker(
    location=[37.5665512, 126.97805437],
    popup="&lt;b&gt;서울&lt;/b&gt;",
    tooltip="&lt;i&gt;서울&lt;/i&gt;"
).add_to(m)
m

folium.Marker(
    location=[37.54878936,126.973356],
    popup="&lt;a href='https://zero-base.co.kr/' target=_'blink'&gt;제로베이스&lt;/a&gt;",
    tooltip="&lt;i&gt;Zerobase&lt;/i&gt;"
).add_to(m)
m
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/7e388b34-bd47-4dae-b90e-d7a2264226d8/image.png" alt="" /></p>

<hr />

<h3 id="foliumicon">folium.Icon()</h3>
<ul>
  <li>https://www.w3schools.com/bootstrap/bootstrap_ref_comp_glyphs.asp</li>
  <li>https://fontawesome.com/icons</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5665512</span><span class="p">,</span> <span class="mf">126.97805437</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">13</span><span class="p">,</span> <span class="c1"># 0 ~ 18
</span>    <span class="n">tiles</span><span class="o">=</span><span class="s">"openstreetmap"</span>
    <span class="p">)</span>

<span class="c1"># icon basic
</span><span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
<span class="p">(</span><span class="mf">37.54878936</span><span class="p">,</span><span class="mf">126.97336</span><span class="p">),</span>
<span class="n">icon</span><span class="o">=</span><span class="n">folium</span><span class="p">.</span><span class="n">Icon</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="s">"black"</span><span class="p">,</span> <span class="n">icon</span><span class="o">=</span><span class="s">'info-sign'</span><span class="p">)</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="c1"># icon _color
</span><span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
    <span class="p">(</span><span class="mf">37.54712</span><span class="p">,</span> <span class="mf">127.047219</span><span class="p">),</span>
    <span class="n">popup</span><span class="o">=</span><span class="s">"&lt;b&gt;Subway&lt;/b&gt;"</span><span class="p">,</span>
    <span class="n">tooltip</span><span class="o">=</span><span class="s">"icon_color"</span><span class="p">,</span>
    <span class="n">icon</span><span class="o">=</span><span class="n">folium</span><span class="p">.</span><span class="n">Icon</span><span class="p">(</span>
        <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span>
        <span class="n">icon_color</span><span class="o">=</span><span class="s">"pink"</span><span class="p">,</span>
        <span class="n">icon</span><span class="o">=</span><span class="s">"cloud"</span>
    <span class="p">)</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="c1"># Icon custom
</span><span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.540372</span><span class="p">,</span><span class="mf">127.069276</span><span class="p">],</span>
    <span class="n">popup</span><span class="o">=</span><span class="s">"건대입구역"</span><span class="p">,</span>
    <span class="n">tooltip</span><span class="o">=</span><span class="s">"Icon custiom"</span><span class="p">,</span>
    <span class="n">icon</span><span class="o">=</span><span class="n">folium</span><span class="p">.</span><span class="n">Icon</span><span class="p">(</span>
        <span class="n">color</span><span class="o">=</span><span class="s">"purple"</span><span class="p">,</span>
        <span class="n">icon_color</span><span class="o">=</span><span class="s">"white"</span><span class="p">,</span>
        <span class="n">icon</span><span class="o">=</span><span class="s">"glyphicon-cloud"</span><span class="p">,</span>
        <span class="n">angle</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
        <span class="n">prefix</span><span class="o">=</span><span class="s">"glyphicon"</span><span class="p">)</span> <span class="c1"># glyphicon
</span><span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="c1"># tooltip
</span><span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.544569</span><span class="p">,</span><span class="mf">127.055974</span><span class="p">],</span>
    <span class="n">popup</span><span class="o">=</span><span class="s">"&lt;b&gt;Subway&lt;/b&gt;"</span><span class="p">,</span>
    <span class="n">tooltip</span><span class="o">=</span><span class="s">"&lt;i&gt;성수역&lt;/i&gt;"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="c1"># html
</span><span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5030426</span><span class="p">,</span><span class="mf">127.041588</span><span class="p">],</span>
    <span class="n">popup</span><span class="o">=</span><span class="s">"&lt;a href='https://zero-base.co.kr/' target=_'blink'&gt;제로베이스&lt;/a&gt;"</span><span class="p">,</span>
    <span class="n">tooltip</span><span class="o">=</span><span class="s">"&lt;i&gt;Zerobase&lt;/i&gt;"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="n">m</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ae25071a-ce33-4e3e-b434-e70c10b9b6f7/image.png" alt="" /></p>

<hr />

<h3 id="foliumclickformarker">folium.ClickForMarker()</h3>
<ul>
  <li>지도위에 마우스로 클릭했을 때 마커 생성
```py
m = folium.Map(
  location=[37.5445, 127.0558],
  zoom_start=14,
  tile=”OpenStreetMap”
)</li>
</ul>

<p>m.add_child(folium.ClickForMarker(popup=”ClickForMarker”))</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>![](https://velog.velcdn.com/images/yy2hi/post/346920ba-4883-4788-8156-858e9c1fde07/image.png)

---

### folium.LatLngPopup()
- 지도를 마우스로 클릭했을 때 위도 경도 정보 반환
```py
m = folium.Map(
    location=[37.5445, 127.0558],
    zoom_start=14,
    tile="OpenStreetMap"
)

m.add_child(folium.LatLngPopup())
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/eca63ba2-87db-48cd-914e-9cdb90c04c96/image.png" alt="" /></p>

<hr />

<h3 id="foliumcircle-foliumcirclemarker">folium.Circle(), folium.CircleMarker()</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5445</span><span class="p">,</span> <span class="mf">127.0558</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">14</span><span class="p">,</span>
    <span class="n">tile</span><span class="o">=</span><span class="s">"OpenStreetMap"</span>
<span class="p">)</span>

<span class="c1"># Circle
</span><span class="n">folium</span><span class="p">.</span><span class="n">Circle</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5574</span><span class="p">,</span> <span class="mf">127.04370</span><span class="p">],</span>
    <span class="n">radius</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
    <span class="n">fill</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">color</span><span class="o">=</span><span class="s">"#eb9e34"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span>
    <span class="n">popup</span><span class="o">=</span><span class="s">"Circle Popup"</span><span class="p">,</span>
    <span class="n">tooltip</span><span class="o">=</span><span class="s">"Circle Tooltip"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="c1"># CircleMarker
</span><span class="n">folium</span><span class="p">.</span><span class="n">Circle</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5434</span><span class="p">,</span> <span class="mf">127.04470</span><span class="p">],</span>
    <span class="n">radius</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
    <span class="n">fill</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">color</span><span class="o">=</span><span class="s">"#34ebc6"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"#c636eb"</span><span class="p">,</span>
    <span class="n">popup</span><span class="o">=</span><span class="s">"CircleMarker Popup"</span><span class="p">,</span>
    <span class="n">tooltip</span><span class="o">=</span><span class="s">"CircleMarker Tooltip"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="n">m</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/a116a000-8fe0-4b1d-9e07-550c45538cf4/image.png" alt="" /></p>

<hr />

<h3 id="foliumchoropleth">folium.Choropleth</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">state_data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"../data/02. US_Unemployment_Oct2012.csv"</span><span class="p">)</span>
<span class="n">state_data</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>State</th>
      <th>Unemployment</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>48</th>
      <td>WI</td>
      <td>6.8</td>
    </tr>
    <tr>
      <th>49</th>
      <td>WY</td>
      <td>5.1</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">m</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">([</span><span class="mi">43</span><span class="p">,</span> <span class="o">-</span><span class="mi">102</span><span class="p">],</span> <span class="n">zoom_start</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">folium</span><span class="p">.</span><span class="n">Choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="s">"../data/02. us-states.json"</span><span class="p">,</span> <span class="c1"># 경계선 좌표값이 담긴 데이터
</span>    <span class="n">data</span><span class="o">=</span><span class="n">state_data</span><span class="p">,</span> <span class="c1">#Series or DataFrame
</span>    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">"State"</span><span class="p">,</span> <span class="s">"Unemployment"</span><span class="p">],</span> <span class="c1"># DataFrame columns
</span>    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"BuPu"</span><span class="p">,</span>
    <span class="n">fill_opacity</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># 0~1
</span>    <span class="n">line_opacity</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># 0~1
</span>    <span class="n">legend_name</span><span class="o">=</span><span class="s">"unemployment rate (%)"</span> 
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

<span class="n">m</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/054d0bdb-8f3c-4075-aba7-a7346d2e430f/image.png" alt="" /></p>

<hr />

<h2 id="아파트-유형-지도-시각화">아파트 유형 지도 시각화</h2>
<ul>
  <li>공공데이터포털, https://data.go.kr/data/15066101/fileData.do</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"../data/02. 서울특별시 동작구_주택유형별 위치 정보 및 세대수 현황_20210825.csv"</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"cp949"</span><span class="p">)</span>
<span class="n">df</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>연번</th>
      <th>분류</th>
      <th>건물명</th>
      <th>행정동</th>
      <th>주소</th>
      <th>세대수</th>
      <th>위도</th>
      <th>경도</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>165</th>
      <td>166</td>
      <td>연립주택</td>
      <td>능내연립</td>
      <td>사당5동</td>
      <td>서울특별시 동작구 사당로8길 39</td>
      <td>22</td>
      <td>37.483599</td>
      <td>126.968672</td>
    </tr>
    <tr>
      <th>166</th>
      <td>167</td>
      <td>연립주택</td>
      <td>천록</td>
      <td>대방동</td>
      <td>서울특별시 동작구 등용로 43</td>
      <td>29</td>
      <td>37.505475</td>
      <td>126.933434</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="nan-데이터-제거">NaN 데이터 제거</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">dropna</span><span class="p">()</span>
<span class="n">df</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
Int64Index: 163 entries, 0 to 166
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   연번      163 non-null    int64  
 1   분류      163 non-null    object 
 2   건물명     163 non-null    object 
 3   행정동     163 non-null    object 
 4   주소      163 non-null    object 
 5   세대수     163 non-null    int64  
 6   위도      163 non-null    float64
 7   경도      163 non-null    float64
dtypes: float64(2), int64(2), object(4)
memory usage: 11.5+ KB
-------------------------------------------
df = df.reset_index(drop=True)
df.tail(2)
</span></code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>연번</th>
      <th>분류</th>
      <th>건물명</th>
      <th>행정동</th>
      <th>주소</th>
      <th>세대수</th>
      <th>위도</th>
      <th>경도</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>161</th>
      <td>166</td>
      <td>연립주택</td>
      <td>능내연립</td>
      <td>사당5동</td>
      <td>서울특별시 동작구 사당로8길 39</td>
      <td>22</td>
      <td>37.483599</td>
      <td>126.968672</td>
    </tr>
    <tr>
      <th>162</th>
      <td>167</td>
      <td>연립주택</td>
      <td>천록</td>
      <td>대방동</td>
      <td>서울특별시 동작구 등용로 43</td>
      <td>29</td>
      <td>37.505475</td>
      <td>126.933434</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">columns</span>

<span class="o">=&gt;</span>

<span class="n">Index</span><span class="p">([</span><span class="s">'연번 '</span><span class="p">,</span> <span class="s">'분류 '</span><span class="p">,</span> <span class="s">'건물명'</span><span class="p">,</span> <span class="s">'행정동'</span><span class="p">,</span> <span class="s">'주소'</span><span class="p">,</span> <span class="s">'세대수'</span><span class="p">,</span> <span class="s">'위도'</span><span class="p">,</span> <span class="s">'경도'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'object'</span><span class="p">)</span>
<span class="o">--------------------------------------------------------------------------</span>
<span class="n">df</span><span class="p">[</span><span class="s">"연번 "</span><span class="p">]</span>

<span class="o">=&gt;</span>

<span class="mi">0</span>        <span class="mi">1</span>
<span class="mi">1</span>        <span class="mi">2</span>
<span class="mi">2</span>        <span class="mi">3</span>
<span class="mi">3</span>        <span class="mi">4</span>
<span class="mi">4</span>        <span class="mi">5</span>
      <span class="p">...</span> 
<span class="mi">158</span>    <span class="mi">163</span>
<span class="mi">159</span>    <span class="mi">164</span>
<span class="mi">160</span>    <span class="mi">165</span>
<span class="mi">161</span>    <span class="mi">166</span>
<span class="mi">162</span>    <span class="mi">167</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">연번</span> <span class="p">,</span> <span class="n">Length</span><span class="p">:</span> <span class="mi">163</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
<span class="o">-----------------------------------------------------------</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">"연번 "</span><span class="p">:</span> <span class="s">"연번"</span><span class="p">,</span> <span class="s">"분류 "</span><span class="p">:</span> <span class="s">"분류"</span><span class="p">})</span>
<span class="n">df</span><span class="p">.</span><span class="n">연번</span><span class="p">[:</span><span class="mi">10</span><span class="p">]</span>

<span class="o">=&gt;</span>

<span class="mi">0</span>     <span class="mi">1</span>
<span class="mi">1</span>     <span class="mi">2</span>
<span class="mi">2</span>     <span class="mi">3</span>
<span class="mi">3</span>     <span class="mi">4</span>
<span class="mi">4</span>     <span class="mi">5</span>
<span class="mi">5</span>     <span class="mi">6</span>
<span class="mi">6</span>     <span class="mi">7</span>
<span class="mi">7</span>     <span class="mi">8</span>
<span class="mi">8</span>     <span class="mi">9</span>
<span class="mi">9</span>    <span class="mi">10</span>
<span class="n">Name</span><span class="p">:</span> <span class="n">연번</span><span class="p">,</span> <span class="n">dtype</span><span class="p">:</span> <span class="n">int64</span>
<span class="o">------------------------</span>
<span class="k">del</span> <span class="n">df</span><span class="p">[</span><span class="s">"연번"</span><span class="p">]</span>
<span class="n">df</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>분류</th>
      <th>건물명</th>
      <th>행정동</th>
      <th>주소</th>
      <th>세대수</th>
      <th>위도</th>
      <th>경도</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>161</th>
      <td>연립주택</td>
      <td>능내연립</td>
      <td>사당5동</td>
      <td>서울특별시 동작구 사당로8길 39</td>
      <td>22</td>
      <td>37.483599</td>
      <td>126.968672</td>
    </tr>
    <tr>
      <th>162</th>
      <td>연립주택</td>
      <td>천록</td>
      <td>대방동</td>
      <td>서울특별시 동작구 등용로 43</td>
      <td>29</td>
      <td>37.505475</td>
      <td>126.933434</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df</span><span class="p">.</span><span class="n">describe</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>세대수</th>
      <th>위도</th>
      <th>경도</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>count</th>
      <td>163.000000</td>
      <td>163.000000</td>
      <td>163.000000</td>
    </tr>
    <tr>
      <th>mean</th>
      <td>371.920245</td>
      <td>37.497442</td>
      <td>126.949817</td>
    </tr>
    <tr>
      <th>std</th>
      <td>413.115354</td>
      <td>0.009532</td>
      <td>0.019861</td>
    </tr>
    <tr>
      <th>min</th>
      <td>21.000000</td>
      <td>37.477376</td>
      <td>126.906940</td>
    </tr>
    <tr>
      <th>25%</th>
      <td>86.000000</td>
      <td>37.490626</td>
      <td>126.933284</td>
    </tr>
    <tr>
      <th>50%</th>
      <td>199.000000</td>
      <td>37.496940</td>
      <td>126.949902</td>
    </tr>
    <tr>
      <th>75%</th>
      <td>518.500000</td>
      <td>37.505321</td>
      <td>126.967196</td>
    </tr>
    <tr>
      <th>max</th>
      <td>2621.000000</td>
      <td>37.514280</td>
      <td>126.981966</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="folium-1">folium</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># folium
</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.497112</span><span class="p">,</span><span class="mf">126.94437795</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">13</span><span class="p">)</span>
<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">rows</span> <span class="ow">in</span> <span class="n">df</span><span class="p">.</span><span class="n">iterrows</span><span class="p">():</span>
    <span class="c1"># location
</span>    <span class="n">lat</span><span class="p">,</span> <span class="n">lng</span> <span class="o">=</span> <span class="n">rows</span><span class="p">.</span><span class="n">위도</span><span class="p">,</span> <span class="n">rows</span><span class="p">.</span><span class="n">경도</span>
    
    <span class="c1"># Marker
</span>    <span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
        <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="n">lat</span><span class="p">,</span> <span class="n">lng</span><span class="p">],</span>
        <span class="n">popup</span><span class="o">=</span><span class="n">rows</span><span class="p">.</span><span class="n">주소</span><span class="p">,</span>
        <span class="n">tooltip</span><span class="o">=</span><span class="n">rows</span><span class="p">.</span><span class="n">분류</span><span class="p">,</span>
        <span class="n">icon</span><span class="o">=</span><span class="n">folium</span><span class="p">.</span><span class="n">Icon</span><span class="p">(</span>
            <span class="n">icon</span><span class="o">=</span><span class="s">"home"</span><span class="p">,</span>
            <span class="n">color</span><span class="o">=</span><span class="s">"lightred"</span> <span class="k">if</span> <span class="n">rows</span><span class="p">.</span><span class="n">세대수</span> <span class="o">&gt;=</span> <span class="mi">199</span> <span class="k">else</span> <span class="s">"lightblue"</span><span class="p">,</span>
            <span class="n">icon_color</span><span class="o">=</span><span class="s">"darked"</span> <span class="k">if</span> <span class="n">rows</span><span class="p">.</span><span class="n">세대수</span> <span class="o">&gt;=</span> <span class="mi">199</span> <span class="k">else</span> <span class="s">"darkblue"</span><span class="p">,</span>
        <span class="p">)</span>
    <span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>

    <span class="c1"># CircleMarker
</span>    <span class="n">folium</span><span class="p">.</span><span class="n">Circle</span><span class="p">(</span>
        <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="n">lat</span><span class="p">,</span> <span class="n">lng</span><span class="p">],</span>
        <span class="n">radius</span><span class="o">=</span><span class="n">rows</span><span class="p">.</span><span class="n">세대수</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">,</span>
        <span class="n">fill</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">color</span><span class="o">=</span><span class="s">"pink"</span> <span class="k">if</span> <span class="n">rows</span><span class="p">.</span><span class="n">세대수</span> <span class="o">&gt;=</span> <span class="mi">518</span> <span class="k">else</span> <span class="s">"green"</span><span class="p">,</span>
        <span class="n">fill_color</span><span class="o">=</span><span class="s">"pink"</span> <span class="k">if</span> <span class="n">rows</span><span class="p">.</span><span class="n">세대수</span> <span class="o">&gt;=</span> <span class="mi">518</span> <span class="k">else</span> <span class="s">"green"</span><span class="p">,</span>
    <span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
    
<span class="n">m</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/e80e012d-82b7-4b06-b219-eaa899988488/image.png" alt="" /></p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[서울시 범죄현황 데이터 시각화]]></summary></entry><entry><title type="html">Project 2 - 서울시 범죄 현황 데이터 분석 (5)</title><link href="https://yy2-hi.github.io/dataanalysis/crimeanalysis5/" rel="alternate" type="text/html" title="Project 2 - 서울시 범죄 현황 데이터 분석 (5)" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/crimeanalysis5</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/crimeanalysis5/"><![CDATA[<h2 id="서울시-범죄-현황에-대한-지도-시각화">서울시 범죄 현황에 대한 지도 시각화</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>살인</th>
      <th>강도</th>
      <th>강간</th>
      <th>절도</th>
      <th>폭력</th>
      <th>강간검거율</th>
      <th>강도검거율</th>
      <th>살인검거율</th>
      <th>절도검거율</th>
      <th>폭력검거율</th>
      <th>인구수</th>
      <th>CCTV</th>
      <th>범죄</th>
      <th>검거</th>
    </tr>
    <tr>
      <th>구별</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>중구</th>
      <td>0.214286</td>
      <td>0.205128</td>
      <td>0.383721</td>
      <td>0.585671</td>
      <td>0.407957</td>
      <td>74.747475</td>
      <td>87.5</td>
      <td>100.0</td>
      <td>42.511628</td>
      <td>89.707865</td>
      <td>134593</td>
      <td>1023</td>
      <td>0.359353</td>
      <td>78.893394</td>
    </tr>
    <tr>
      <th>중랑구</th>
      <td>0.571429</td>
      <td>0.358974</td>
      <td>0.317829</td>
      <td>0.460637</td>
      <td>0.580125</td>
      <td>91.463415</td>
      <td>100.0</td>
      <td>87.5</td>
      <td>62.211709</td>
      <td>85.714286</td>
      <td>412780</td>
      <td>916</td>
      <td>0.457799</td>
      <td>85.377882</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="살인발생-건수-지도-시각화">살인발생 건수 지도 시각화</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span>
    <span class="n">tiles</span><span class="o">=</span><span class="s">"Stamen Toner"</span>
<span class="p">)</span>

<span class="n">folium</span><span class="p">.</span><span class="n">Choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="n">geo_str</span><span class="p">,</span> <span class="c1"># 우리나라 경계선 좌표값이 담긴 데이터
</span>    <span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"살인"</span><span class="p">],</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"살인"</span><span class="p">]],</span>
    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"PuRd"</span><span class="p">,</span>
    <span class="n">fill_opacity</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span>
    <span class="n">line_opacity</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
    <span class="n">legend_name</span><span class="o">=</span><span class="s">"정규화된 살인 발생 건수"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>

<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/3b85aa2c-93cd-4368-8f38-3ca4a4061aa1/image.png" alt="" /></p>

<hr />

<h3 id="성범죄-건수-지도-시각화">성범죄 건수 지도 시각화</h3>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span>
    <span class="n">tiles</span><span class="o">=</span><span class="s">"Stamen Toner"</span>
<span class="p">)</span>

<span class="n">folium</span><span class="p">.</span><span class="n">Choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="n">geo_str</span><span class="p">,</span> <span class="c1"># 우리나라 경계선 좌표값이 담긴 데이터
</span>    <span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"강간"</span><span class="p">],</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"강간"</span><span class="p">]],</span>
    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"PuRd"</span><span class="p">,</span>
    <span class="n">fill_opacity</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span>
    <span class="n">line_opacity</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
    <span class="n">legend_name</span><span class="o">=</span><span class="s">"정규화된 강간 발생 건수"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>

<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/7a87307b-3e94-4480-b936-fafee18c56b9/image.png" alt="" /></p>

<h3 id="5대-범죄-건수-지도-시각화">5대 범죄 건수 지도 시각화</h3>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span>
    <span class="n">tiles</span><span class="o">=</span><span class="s">"Stamen Toner"</span>
<span class="p">)</span>

<span class="n">folium</span><span class="p">.</span><span class="n">Choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="n">geo_str</span><span class="p">,</span> <span class="c1"># 우리나라 경계선 좌표값이 담긴 데이터
</span>    <span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"범죄"</span><span class="p">],</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"범죄"</span><span class="p">]],</span>
    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"PuRd"</span><span class="p">,</span>
    <span class="n">fill_opacity</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span>
    <span class="n">line_opacity</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
    <span class="n">legend_name</span><span class="o">=</span><span class="s">"정규화된 5대 범죄 발생 건수"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>

<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ab8b44ee-137a-4ad5-89ae-4081a065b319/image.png" alt="" /></p>

<hr />

<h3 id="인구-대비-범죄-발생-건수">인구 대비 범죄 발생 건수</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tmp_criminal</span> <span class="o">=</span> <span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"범죄"</span><span class="p">]</span> <span class="o">/</span> <span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"인구수"</span><span class="p">]</span>

<span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span>
    <span class="n">zoom_start</span><span class="o">=</span><span class="mi">11</span><span class="p">,</span>
    <span class="n">tiles</span><span class="o">=</span><span class="s">"Stamen Toner"</span>
<span class="p">)</span>

<span class="n">folium</span><span class="p">.</span><span class="n">Choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="n">geo_str</span><span class="p">,</span> <span class="c1"># 우리나라 경계선 좌표값이 담긴 데이터
</span>    <span class="n">data</span><span class="o">=</span><span class="n">tmp_criminal</span><span class="p">,</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">tmp_criminal</span><span class="p">],</span>
    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"PuRd"</span><span class="p">,</span>
    <span class="n">fill_opacity</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span>
    <span class="n">line_opacity</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
    <span class="n">legend_name</span><span class="o">=</span><span class="s">"인구 대비 범죄 발생 건수"</span>
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>

<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/f2277223-2c26-4377-800e-c1f9564b5106/image.png" alt="" /></p>

<hr />

<h4 id="경찰서별-정보-범죄발생과-함께-정리">경찰서별 정보 범죄발생과 함께 정리</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_anal_station</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span>
    <span class="s">"../data/02. crime_in_Seoul_1st.csv"</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"utf-8"</span>
<span class="p">)</span>

<span class="n">crime_anal_station</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>구분</th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
      <th>구별</th>
      <th>lat</th>
      <th>lng</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>29</th>
      <td>29</td>
      <td>중부</td>
      <td>96.0</td>
      <td>141.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>2.0</td>
      <td>2.0</td>
      <td>485.0</td>
      <td>1204.0</td>
      <td>1164.0</td>
      <td>1335.0</td>
      <td>중구</td>
      <td>37.563646</td>
      <td>126.989580</td>
    </tr>
    <tr>
      <th>30</th>
      <td>30</td>
      <td>혜화</td>
      <td>64.0</td>
      <td>101.0</td>
      <td>6.0</td>
      <td>6.0</td>
      <td>2.0</td>
      <td>2.0</td>
      <td>379.0</td>
      <td>988.0</td>
      <td>842.0</td>
      <td>972.0</td>
      <td>종로구</td>
      <td>37.571840</td>
      <td>126.998856</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"살인검거"</span><span class="p">,</span> <span class="s">"강도검거"</span><span class="p">,</span> <span class="s">"강간검거"</span><span class="p">,</span> <span class="s">"절도검거"</span><span class="p">,</span> <span class="s">"폭력검거"</span><span class="p">]</span>
<span class="n">tmp</span> <span class="o">=</span> <span class="n">crime_anal_station</span><span class="p">[</span><span class="n">col</span><span class="p">]</span> <span class="o">/</span> <span class="n">crime_anal_station</span><span class="p">[</span><span class="n">col</span><span class="p">].</span><span class="nb">max</span><span class="p">()</span> <span class="c1"># 정규화
</span><span class="n">crime_anal_station</span><span class="p">[</span><span class="s">"검거"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">tmp</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># numpy axis=1 : 행(가로), pandas axis=1 : 열(세로)
</span><span class="n">crime_anal_station</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>구분</th>
      <th>강간검거</th>
      <th>강간발생</th>
      <th>강도검거</th>
      <th>강도발생</th>
      <th>살인검거</th>
      <th>살인발생</th>
      <th>절도검거</th>
      <th>절도발생</th>
      <th>폭력검거</th>
      <th>폭력발생</th>
      <th>구별</th>
      <th>lat</th>
      <th>lng</th>
      <th>검거</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>29</th>
      <td>29</td>
      <td>중부</td>
      <td>96.0</td>
      <td>141.0</td>
      <td>3.0</td>
      <td>3.0</td>
      <td>2.0</td>
      <td>2.0</td>
      <td>485.0</td>
      <td>1204.0</td>
      <td>1164.0</td>
      <td>1335.0</td>
      <td>중구</td>
      <td>37.563646</td>
      <td>126.989580</td>
      <td>0.277182</td>
    </tr>
    <tr>
      <th>30</th>
      <td>30</td>
      <td>혜화</td>
      <td>64.0</td>
      <td>101.0</td>
      <td>6.0</td>
      <td>6.0</td>
      <td>2.0</td>
      <td>2.0</td>
      <td>379.0</td>
      <td>988.0</td>
      <td>842.0</td>
      <td>972.0</td>
      <td>종로구</td>
      <td>37.571840</td>
      <td>126.998856</td>
      <td>0.240065</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="경찰서-위치-마커-표시">경찰서 위치 마커 표시</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span> <span class="n">zoom_start</span><span class="o">=</span><span class="mi">11</span>
<span class="p">)</span>

<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">rows</span> <span class="ow">in</span> <span class="n">crime_anal_station</span><span class="p">.</span><span class="n">iterrows</span><span class="p">():</span>
    <span class="n">folium</span><span class="p">.</span><span class="n">Marker</span><span class="p">(</span>
        <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="n">rows</span><span class="p">[</span><span class="s">"lat"</span><span class="p">],</span> <span class="n">rows</span><span class="p">[</span><span class="s">"lng"</span><span class="p">]]</span>
    <span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>
    
<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/15120575-b86e-4e58-9652-0a3ab02d2eea/image.png" alt="" /></p>

<hr />

<h4 id="경찰서-검거율-원에-적용">경찰서 검거율 원에 적용</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span>
    <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span> <span class="n">zoom_start</span><span class="o">=</span><span class="mi">11</span>
<span class="p">)</span>

<span class="n">folium</span><span class="p">.</span><span class="n">Choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="n">geo_str</span><span class="p">,</span>
    <span class="n">data</span><span class="o">=</span><span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"범죄"</span><span class="p">],</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="n">crime_anal_norm</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">crime_anal_norm</span><span class="p">[</span><span class="s">"범죄"</span><span class="p">]],</span>
    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"PuRd"</span><span class="p">,</span>
    <span class="n">fill_opacity</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span>
    <span class="n">line_opacity</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span>
    
<span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>

<span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">rows</span> <span class="ow">in</span> <span class="n">crime_anal_station</span><span class="p">.</span><span class="n">iterrows</span><span class="p">():</span>
    <span class="n">folium</span><span class="p">.</span><span class="n">CircleMarker</span><span class="p">(</span>
        <span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="n">rows</span><span class="p">[</span><span class="s">"lat"</span><span class="p">],</span> <span class="n">rows</span><span class="p">[</span><span class="s">"lng"</span><span class="p">]],</span>
        <span class="n">radius</span><span class="o">=</span><span class="n">rows</span><span class="p">[</span><span class="s">"검거"</span><span class="p">]</span> <span class="o">*</span> <span class="mi">50</span><span class="p">,</span>
        <span class="n">popup</span><span class="o">=</span><span class="n">rows</span><span class="p">[</span><span class="s">"구분"</span><span class="p">]</span> <span class="o">+</span> <span class="s">" : "</span> <span class="o">+</span> <span class="s">"%.2f"</span> <span class="o">%</span> <span class="n">rows</span><span class="p">[</span><span class="s">"검거"</span><span class="p">],</span>
        <span class="n">color</span><span class="o">=</span><span class="s">"#3186cc"</span><span class="p">,</span>
        <span class="n">fill</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">fill_color</span><span class="o">=</span><span class="s">"#3186cc"</span>
    <span class="p">).</span><span class="n">add_to</span><span class="p">(</span><span class="n">my_map</span><span class="p">)</span>
    
<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/2c613cce-3537-44bc-b040-43ea453b53c8/image.png" alt="" /></p>

<hr />

<h2 id="서울시-범죄-현황-발생-장소-분석">서울시 범죄 현황 발생 장소 분석</h2>
<h4 id="추가-검증">추가 검증</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_loc_raw</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span>
    <span class="s">"../data/02. crime_in_Seoul_location.csv"</span><span class="p">,</span> <span class="n">thousands</span><span class="o">=</span><span class="s">","</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"euc-kr"</span>
<span class="p">)</span>

<span class="n">crime_loc_raw</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>범죄명</th>
      <th>장소</th>
      <th>발생건수</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>63</th>
      <td>폭력</td>
      <td>금융기관</td>
      <td>42</td>
    </tr>
    <tr>
      <th>64</th>
      <td>폭력</td>
      <td>기타</td>
      <td>26382</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_loc_raw</span><span class="p">.</span><span class="n">범죄명</span><span class="p">.</span><span class="n">unique</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="n">array</span><span class="p">([</span><span class="s">'살인'</span><span class="p">,</span> <span class="s">'강도'</span><span class="p">,</span> <span class="s">'강간.추행'</span><span class="p">,</span> <span class="s">'절도'</span><span class="p">,</span> <span class="s">'폭력'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
<span class="o">---------------------------------------------------------------</span>
<span class="n">crime_loc_raw</span><span class="p">[</span><span class="s">"장소"</span><span class="p">].</span><span class="n">unique</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="n">array</span><span class="p">([</span><span class="s">'아파트, 연립 다세대'</span><span class="p">,</span> <span class="s">'단독주택'</span><span class="p">,</span> <span class="s">'노상'</span><span class="p">,</span> <span class="s">'상점'</span><span class="p">,</span> <span class="s">'숙박업소, 목욕탕'</span><span class="p">,</span> <span class="s">'유흥 접객업소'</span><span class="p">,</span> <span class="s">'사무실'</span><span class="p">,</span>
       <span class="s">'역, 대합실'</span><span class="p">,</span> <span class="s">'교통수단'</span><span class="p">,</span> <span class="s">'유원지 '</span><span class="p">,</span> <span class="s">'학교'</span><span class="p">,</span> <span class="s">'금융기관'</span><span class="p">,</span> <span class="s">'기타'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">)</span>
<span class="o">--------------------------------------------------------------------------------------------------</span>
<span class="n">crime_loc</span> <span class="o">=</span> <span class="n">crime_loc_raw</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span>
    <span class="n">crime_loc_raw</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="s">"장소"</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="s">"범죄명"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">]</span>
<span class="p">)</span>

<span class="n">crime_loc</span><span class="p">.</span><span class="n">columns</span> <span class="o">=</span> <span class="n">crime_loc</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">droplevel</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
<span class="n">crime_loc</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>범죄명</th>
      <th>강간.추행</th>
      <th>강도</th>
      <th>살인</th>
      <th>절도</th>
      <th>폭력</th>
    </tr>
    <tr>
      <th>장소</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>유흥 접객업소</th>
      <td>398</td>
      <td>13</td>
      <td>8</td>
      <td>2035</td>
      <td>2645</td>
    </tr>
    <tr>
      <th>학교</th>
      <td>33</td>
      <td>0</td>
      <td>0</td>
      <td>400</td>
      <td>203</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">col</span> <span class="o">=</span> <span class="p">[</span><span class="s">"살인"</span><span class="p">,</span> <span class="s">"강도"</span><span class="p">,</span> <span class="s">"강간"</span><span class="p">,</span> <span class="s">"절도"</span><span class="p">,</span> <span class="s">"폭력"</span><span class="p">]</span>
<span class="n">crime_loc_norm</span> <span class="o">=</span> <span class="n">crime_loc</span> <span class="o">/</span> <span class="n">crime_loc</span><span class="p">.</span><span class="nb">max</span><span class="p">()</span> <span class="c1"># 정규화
</span><span class="n">crime_loc_norm</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>범죄명</th>
      <th>강간.추행</th>
      <th>강도</th>
      <th>살인</th>
      <th>절도</th>
      <th>폭력</th>
    </tr>
    <tr>
      <th>장소</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>교통수단</th>
      <td>0.324718</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.021027</td>
      <td>0.008415</td>
    </tr>
    <tr>
      <th>금융기관</th>
      <td>0.000940</td>
      <td>0.011494</td>
      <td>0.015385</td>
      <td>0.049738</td>
      <td>0.001592</td>
    </tr>
    <tr>
      <th>기타</th>
      <td>1.000000</td>
      <td>0.770115</td>
      <td>1.000000</td>
      <td>1.000000</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>노상</th>
      <td>0.463346</td>
      <td>1.000000</td>
      <td>0.338462</td>
      <td>0.429235</td>
      <td>0.929990</td>
    </tr>
    <tr>
      <th>단독주택</th>
      <td>0.185620</td>
      <td>0.172414</td>
      <td>0.461538</td>
      <td>0.103110</td>
      <td>0.135661</td>
    </tr>
  </tbody>
</table>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_loc_norm</span><span class="p">[</span><span class="s">"종합"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">crime_loc_norm</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">crime_loc_norm</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>범죄명</th>
      <th>강간.추행</th>
      <th>강도</th>
      <th>살인</th>
      <th>절도</th>
      <th>폭력</th>
      <th>종합</th>
    </tr>
    <tr>
      <th>장소</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>유흥 접객업소</th>
      <td>0.187030</td>
      <td>0.149425</td>
      <td>0.123077</td>
      <td>0.093632</td>
      <td>0.100258</td>
      <td>0.130684</td>
    </tr>
    <tr>
      <th>학교</th>
      <td>0.015508</td>
      <td>0.000000</td>
      <td>0.000000</td>
      <td>0.018404</td>
      <td>0.007695</td>
      <td>0.008321</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="장소별-범죄-발생-장소">장소별 범죄 발생 장소</h2>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">crime_loc_norm_sort</span> <span class="o">=</span> <span class="n">crime_loc_norm</span><span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="s">"종합"</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="c1"># 내림차순
</span>
<span class="k">def</span> <span class="nf">drawGraph</span><span class="p">():</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span>
    <span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span>
        <span class="n">crime_loc_norm_sort</span><span class="p">,</span>
        <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">fmt</span><span class="o">=</span><span class="s">"f"</span><span class="p">,</span>
        <span class="n">linewidths</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
        <span class="n">cmap</span><span class="o">=</span><span class="s">"RdPu"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">"범죄 발생 장소"</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">drawGraph</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/de2a44d6-2ed0-4bf5-813f-04f3b43bb786/image.png" alt="" /></p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[서울시 범죄 현황에 대한 지도 시각화 crime_anal_norm.tail(2) 살인 강도 강간 절도 폭력 강간검거율 강도검거율 살인검거율 절도검거율 폭력검거율 인구수 CCTV 범죄 검거 구별 중구 0.214286 0.205128 0.383721 0.585671 0.407957 74.747475 87.5 100.0 42.511628 89.707865 134593 1023 0.359353 78.893394 중랑구 0.571429 0.358974 0.317829 0.460637 0.580125 91.463415 100.0 87.5 62.211709 85.714286 412780 916 0.457799 85.377882]]></summary></entry><entry><title type="html">Project 6 - 주유소 가격 비교</title><link href="https://yy2-hi.github.io/dataanalysis/gasstationanalysis/" rel="alternate" type="text/html" title="Project 6 - 주유소 가격 비교" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/gasstationanalysis</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/gasstationanalysis/"><![CDATA[<h1 id="selenium-basic">Selenium Basic</h1>
<ul>
  <li>https://www.selenium.dev/documentation/</li>
</ul>

<h3 id="1-셀레니움-설치">1. 셀레니움 설치</h3>

<ul>
  <li>윈도우, mac(intel)
    <ul>
      <li>conda install selenium</li>
    </ul>
  </li>
  <li>mac(m1)
    <ul>
      <li>pip install selenium</li>
    </ul>
  </li>
</ul>

<h3 id="2-selenium-webdriver-사용하기">2. selenium webdriver 사용하기</h3>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.common.by</span> <span class="kn">import</span> <span class="n">By</span>

<span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="p">.</span><span class="n">Chrome</span><span class="p">(</span><span class="n">executable_path</span><span class="o">=</span><span class="s">"../drive/chromedriver.exe"</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"https://pinkwink.kr"</span><span class="p">)</span>

<span class="n">driver</span><span class="p">.</span><span class="n">quit</span><span class="p">()</span> <span class="c1"># 드라이버 종료
</span></code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/e0e63f1e-2a71-4787-bfe9-f5a8adfd46f3/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 화면 최대 크기 설정
</span><span class="n">driver</span><span class="p">.</span><span class="n">maximize_window</span><span class="p">()</span>

<span class="c1"># 화면 최소 크기 설정
</span><span class="n">driver</span><span class="p">.</span><span class="n">minimize_window</span><span class="p">()</span>

<span class="c1"># 화면 크기 설정
</span><span class="n">driver</span><span class="p">.</span><span class="n">set_window_size</span><span class="p">(</span><span class="mi">600</span><span class="p">,</span> <span class="mi">600</span><span class="p">)</span>

<span class="c1"># 새로 고침
</span><span class="n">driver</span><span class="p">.</span><span class="n">refresh</span><span class="p">()</span>

<span class="c1"># 뒤로 가기
</span><span class="n">driver</span><span class="p">.</span><span class="n">back</span><span class="p">()</span>

<span class="c1"># 앞으로 가기
</span><span class="n">driver</span><span class="p">.</span><span class="n">forward</span><span class="p">()</span>

<span class="c1">#클릭
</span><span class="n">first_content</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">'#content &gt; div.cover-masonry &gt; div &gt; ul &gt; li:nth-child(1)'</span><span class="p">)</span>
<span class="n">first_content</span><span class="p">.</span><span class="n">click</span><span class="p">()</span>

<span class="c1"># 새로운 탭 생성
</span><span class="n">driver</span><span class="p">.</span><span class="n">execute_script</span><span class="p">(</span><span class="s">'window.open("https://www.naver.com")'</span><span class="p">)</span>

<span class="c1"># 탭 이동
</span><span class="n">driver</span><span class="p">.</span><span class="n">switch_to</span><span class="p">.</span><span class="n">window</span><span class="p">(</span><span class="n">driver</span><span class="p">.</span><span class="n">window_handles</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>

<span class="c1"># 탭 닫기
</span><span class="n">driver</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="3-화면-스크롤">3. 화면 스크롤</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 스크롤 가능한 높이(길이)
# 자바스크립트 코드 실행
</span><span class="n">driver</span><span class="p">.</span><span class="n">execute_script</span><span class="p">(</span><span class="s">'return document.body.scrollHeight'</span><span class="p">)</span>

<span class="c1"># 화면 스크롤 하단 이동
</span><span class="n">driver</span><span class="p">.</span><span class="n">execute_script</span><span class="p">(</span><span class="s">"window.scrollTo(0, document.body.scrollHeight);"</span><span class="p">)</span>

<span class="c1"># 현재 보이는 화면 스크린샷 저장
</span><span class="n">driver</span><span class="p">.</span><span class="n">save_screenshot</span><span class="p">(</span><span class="s">"./last_height.png"</span><span class="p">)</span>

<span class="c1"># 화면 스크롤 상단 이동
</span><span class="n">driver</span><span class="p">.</span><span class="n">execute_script</span><span class="p">(</span><span class="s">"window.scrollTo(0, 0);"</span><span class="p">)</span>

<span class="c1"># 특정 태그 지점까지 스크롤 이동
</span><span class="kn">from</span> <span class="nn">selenium.webdriver</span> <span class="kn">import</span> <span class="n">ActionChains</span>

<span class="n">some_tag</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">"#content &gt; div:nth-child(2) &gt; div &gt; ul &gt; li:nth-child(1)"</span><span class="p">)</span>
<span class="n">action</span> <span class="o">=</span> <span class="n">ActionChains</span><span class="p">(</span><span class="n">driver</span><span class="p">)</span>
<span class="n">action</span><span class="p">.</span><span class="n">move_to_element</span><span class="p">(</span><span class="n">some_tag</span><span class="p">).</span><span class="n">perform</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="4-검색어-입력">4. 검색어 입력</h3>
<h4 id="css_selector">CSS_SELECTOR</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="p">.</span><span class="n">Chrome</span><span class="p">(</span><span class="s">'../drive/chromedriver.exe'</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"https://www.naver.com"</span><span class="p">)</span>

<span class="n">keyword</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">"#query"</span><span class="p">)</span>
<span class="n">keyword</span><span class="p">.</span><span class="n">clear</span><span class="p">()</span>
<span class="n">keyword</span><span class="p">.</span><span class="n">send_keys</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/d96bede0-35a7-4e8b-819e-514ea0c1c7c9/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">search_btn</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">"#search_btn"</span><span class="p">)</span>
<span class="n">search_btn</span><span class="p">.</span><span class="n">click</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/a4bd5bdd-594b-40fd-8dcd-13ffaa414dbe/image.png" alt="" /></p>

<h4 id="xpath">XPATH</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'//': 최상위
'*': 자손 태그
'/': 자식 태그
'div[1]': div 중에서 1번째 태그
''

ex) //*[@id="main_pack"]/section[2]/div/div[2]/panel-list/div/ul/li[1]/div/div/a
</code></pre></div></div>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="p">.</span><span class="n">Chrome</span><span class="p">(</span><span class="s">'../drive/chromedriver.exe'</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">'https://pinkwink.kr'</span><span class="p">)</span>

<span class="c1"># 1. 돋보기 버튼을 선택
</span><span class="kn">from</span> <span class="nn">selenium.webdriver</span> <span class="kn">import</span> <span class="n">ActionChains</span>

<span class="n">search_tag</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">'.search'</span><span class="p">)</span>
<span class="n">action</span> <span class="o">=</span> <span class="n">ActionChains</span><span class="p">(</span><span class="n">driver</span><span class="p">)</span>
<span class="n">action</span><span class="p">.</span><span class="n">click</span><span class="p">(</span><span class="n">search_tag</span><span class="p">)</span>
<span class="n">action</span><span class="p">.</span><span class="n">perform</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/b552e41c-ac66-4882-bbcb-42bcded2fb1d/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 2. 검색어를 입력
</span><span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">'#header &gt; div.search.on &gt; input[type=text]'</span><span class="p">).</span><span class="n">send_keys</span><span class="p">(</span><span class="s">'딥러닝'</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/5d669f99-78ab-425d-b0d2-6522fc4d1662/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 3. 검색 버튼 클릭
</span><span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">'#header &gt; div.search &gt; button'</span><span class="p">).</span><span class="n">click</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/1df4d2d0-5a1c-4f96-a13a-6f93244efe10/image.png" alt="" /></p>

<h3 id="5-selenium--beautifulsoup">5. selenium + beautifulsoup</h3>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 현재 화면의 html 코드 가져오기
</span><span class="n">driver</span><span class="p">.</span><span class="n">page_source</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="n">html</span> <span class="n">lang</span><span class="o">=</span><span class="s">"ko"</span><span class="o">&gt;&lt;</span><span class="n">head</span><span class="o">&gt;</span>\<span class="n">n</span>                \<span class="n">n</span>                \<span class="n">n</span>                        <span class="o">&lt;</span><span class="err">!</span><span class="o">--</span> <span class="n">BusinessLicenseInfo</span> <span class="o">-</span> <span class="n">START</span> <span class="o">--&gt;</span>\<span class="n">n</span>        \<span class="n">n</span>            <span class="o">&lt;</span><span class="n">link</span> 
									<span class="p">.</span>
                                    <span class="p">.</span>
                                    <span class="p">.</span>
                                    
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">bs4</span> <span class="kn">import</span> <span class="n">BeautifulSoup</span>

<span class="n">req</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">page_source</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">req</span><span class="p">,</span> <span class="s">"html.parser"</span><span class="p">)</span>

<span class="n">soup</span><span class="p">.</span><span class="n">select</span><span class="p">(</span><span class="s">'.post-item'</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="o">&lt;</span><span class="n">div</span> <span class="n">class</span><span class="o">=</span><span class="s">"post-item"</span><span class="o">&gt;</span>
 <span class="o">&lt;</span><span class="n">a</span> <span class="n">href</span><span class="o">=</span><span class="s">"/1410"</span><span class="o">&gt;</span>
 <span class="o">&lt;</span><span class="n">span</span> <span class="n">class</span><span class="o">=</span><span class="s">"thum"</span><span class="o">&gt;</span>
 <span class="o">&lt;</span><span class="n">img</span> <span class="n">alt</span><span class="o">=</span><span class="s">""</span> <span class="n">src</span><span class="o">=</span><span class="s">"//i1.daumcdn.net/thumb/C264x200/?fname=https://blog.kakaocdn.net/dn/cn2qfp/btrW2tEffvS/m98SKBp0PBaA93pASI4Cl1/im
										.
                                        .
                                        .
시 올라올때 코란도 2인슨 벤 화물칸과 조수석에 짐을 pinkwink.kr 여러가지 상황이 있겠지만, 핑크랩이 선호하는 상황은 조금 특정지어져 있습니다. 기업은 아직 본격적으로 팀을 빌딩하지 못했거나 혹은 가능성을 먼저 보고 싶..&lt;/span&gt;
 &lt;/a&gt;
 &lt;/div&gt;]
</span></code></pre></div></div>
<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">contents</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="n">div</span> <span class="n">class</span><span class="o">=</span><span class="s">"post-item"</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">a</span> <span class="n">href</span><span class="o">=</span><span class="s">"/1407"</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">span</span> <span class="n">class</span><span class="o">=</span><span class="s">"thum"</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">img</span> <span class="n">alt</span><span class="o">=</span><span class="s">""</span> <span class="n">src</span><span class="o">=</span><span class="s">"//i1.daumcdn.net/thumb/C264x200/?fname=https://blog.kakaocdn.net/dn/wpuRJ/btrWismzvEK/d5xmwkQwdKvMew1fGM7KXk/img.png"</span><span class="o">/&gt;</span>
<span class="o">&lt;/</span><span class="n">span</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">span</span> <span class="n">class</span><span class="o">=</span><span class="s">"title"</span><span class="o">&gt;</span><span class="p">[</span><span class="n">수강생</span> <span class="n">프로젝트</span> <span class="n">소개</span><span class="p">]</span> <span class="n">네이버</span> <span class="n">쇼핑몰</span> <span class="n">데이터</span> <span class="n">기반</span> <span class="n">감성</span> <span class="n">분석</span><span class="o">&lt;/</span><span class="n">span</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">span</span> <span class="n">class</span><span class="o">=</span><span class="s">"date"</span><span class="o">&gt;</span><span class="mf">2023.</span> <span class="mf">1.</span> <span class="mf">17.</span> <span class="mi">08</span><span class="p">:</span><span class="mi">00</span><span class="o">&lt;/</span><span class="n">span</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">span</span> <span class="n">class</span><span class="o">=</span><span class="s">"excerpt"</span><span class="o">&gt;</span><span class="p">...</span><span class="n">이런</span> <span class="n">저런</span> <span class="n">이</span><span class="p">..</span><span class="o">&lt;/</span><span class="n">span</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">a</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">div</span><span class="o">&gt;</span>
</code></pre></div></div>
<hr />

<h1 id="정말-셀프-주유소가-저렴할까">정말 셀프 주유소가 저렴할까?</h1>
<h3 id="1-데이터-확보하기-위한-작업">1. 데이터 확보하기 위한 작업</h3>
<ul>
  <li>https://www.opinet.co.kr/searRqSelect.do</li>
  <li>사이트 구조 확인</li>
  <li>목표 데이터
    <ul>
      <li>브랜드</li>
      <li>가격</li>
      <li>셀프 주유 여부</li>
      <li>위치</li>
    </ul>
  </li>
</ul>

<h3 id="2-셀레니움으로-접근">2. 셀레니움으로 접근</h3>
<h4 id="requirements">requirements</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">selenium</span> <span class="kn">import</span> <span class="n">webdriver</span>
<span class="kn">from</span> <span class="nn">selenium.webdriver.common.by</span> <span class="kn">import</span> <span class="n">By</span>
<span class="kn">import</span> <span class="nn">time</span>
</code></pre></div></div>

<h4 id="페이지-접근">페이지 접근</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 페이지 접근
</span><span class="n">url</span> <span class="o">=</span> <span class="s">"https://www.opinet.co.kr/searRgSelect.do"</span>
<span class="n">driver</span> <span class="o">=</span> <span class="n">webdriver</span><span class="p">.</span><span class="n">Chrome</span><span class="p">(</span><span class="s">"../drive/chromedriver.exe"</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">driver</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/ff57161b-2e43-4cbc-adfd-0e12ba32275d/image.png" alt="" /></p>

<h4 id="지역-시도">지역: 시/도</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sido_list_raw</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">"#SIDO_NM0"</span><span class="p">)</span>
<span class="n">sido_list_raw</span><span class="p">.</span><span class="n">text</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/823960ef-c63a-4cc1-b774-6898c1afb7c7/image.png" alt="" /></p>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sido_list</span> <span class="o">=</span> <span class="n">sido_list_raw</span><span class="p">.</span><span class="n">find_elements</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">CSS_SELECTOR</span><span class="p">,</span> <span class="s">'option'</span><span class="p">)</span>
<span class="nb">len</span><span class="p">(</span><span class="n">sido_list</span><span class="p">),</span> <span class="n">sido_list</span><span class="p">[</span><span class="mi">17</span><span class="p">].</span><span class="n">text</span>

<span class="o">=&gt;</span>

<span class="p">(</span><span class="mi">18</span><span class="p">,</span> <span class="s">'제주'</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sido_list</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">get_attribute</span><span class="p">(</span><span class="s">"value"</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="s">'서울특별시'</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># sido_names = []
</span>
<span class="c1"># for option in sido_list:
#     sido_names.append(option.get_attribute("value"))
</span>
<span class="n">sido_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">option</span><span class="p">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s">"value"</span><span class="p">)</span> <span class="k">for</span> <span class="n">option</span> <span class="ow">in</span> <span class="n">sido_list</span><span class="p">]</span>
<span class="n">sido_names</span><span class="p">[:</span><span class="mi">5</span><span class="p">]</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="s">''</span><span class="p">,</span> <span class="s">'서울특별시'</span><span class="p">,</span> <span class="s">'부산광역시'</span><span class="p">,</span> <span class="s">'대구광역시'</span><span class="p">,</span> <span class="s">'인천광역시'</span><span class="p">]</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sido_names</span> <span class="o">=</span> <span class="n">sido_names</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="n">sido_names</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="s">'서울특별시'</span><span class="p">,</span>
 <span class="s">'부산광역시'</span><span class="p">,</span>
 <span class="s">'대구광역시'</span><span class="p">,</span>
		<span class="p">.</span>
        <span class="p">.</span>
        <span class="p">.</span>
 <span class="s">'경상남도'</span><span class="p">,</span>
 <span class="s">'제주특별자치도'</span><span class="p">]</span>
</code></pre></div></div>
<hr />

<h4 id="구">구</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gu_list_raw</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">ID</span><span class="p">,</span> <span class="s">'SIGUNGU_NM0'</span><span class="p">)</span> <span class="c1"># 부모 태그
</span><span class="n">gu_list</span> <span class="o">=</span> <span class="n">gu_list_raw</span><span class="p">.</span><span class="n">find_elements</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">TAG_NAME</span><span class="p">,</span> <span class="s">'option'</span><span class="p">)</span> <span class="c1"># 자식 태그
</span>
<span class="n">gu_names</span> <span class="o">=</span> <span class="p">[</span><span class="n">option</span><span class="p">.</span><span class="n">get_attribute</span><span class="p">(</span><span class="s">"value"</span><span class="p">)</span> <span class="k">for</span> <span class="n">option</span> <span class="ow">in</span> <span class="n">gu_list</span><span class="p">]</span>
<span class="n">gu_names</span> <span class="o">=</span> <span class="n">gu_names</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="n">gu_names</span><span class="p">[:</span><span class="mi">5</span><span class="p">],</span> <span class="nb">len</span><span class="p">(</span><span class="n">gu_names</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="p">([</span><span class="s">'강남구'</span><span class="p">,</span> <span class="s">'강동구'</span><span class="p">,</span> <span class="s">'강북구'</span><span class="p">,</span> <span class="s">'강서구'</span><span class="p">,</span> <span class="s">'관악구'</span><span class="p">],</span> <span class="mi">25</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="엑셀-저장">엑셀 저장</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tqdm</span> <span class="kn">import</span> <span class="n">tqdm_notebook</span>

<span class="k">for</span> <span class="n">gu</span> <span class="ow">in</span> <span class="n">tqdm_notebook</span><span class="p">(</span><span class="n">gu_names</span><span class="p">):</span>
    <span class="n">element</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">ID</span><span class="p">,</span> <span class="s">'SIGUNGU_NM0'</span><span class="p">)</span>
    <span class="n">element</span><span class="p">.</span><span class="n">send_keys</span><span class="p">(</span><span class="n">gu</span><span class="p">)</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>

    <span class="n">element_get_excel</span> <span class="o">=</span> <span class="n">driver</span><span class="p">.</span><span class="n">find_element</span><span class="p">(</span><span class="n">By</span><span class="p">.</span><span class="n">XPATH</span><span class="p">,</span> <span class="p">(</span><span class="s">'//*[@id="glopopd_excel"]'</span><span class="p">)).</span><span class="n">click</span><span class="p">()</span>
    <span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/c4523cb7-21fd-415a-958a-f491e296c0e8/image.png" alt="" /></p>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">driver</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
</code></pre></div></div>

<h3 id="3-데이터-정리">3. 데이터 정리</h3>
<h4 id="requirements-1">requirements</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="nn">glob</span> <span class="kn">import</span> <span class="n">glob</span>
</code></pre></div></div>

<h4 id="파일-목록-한번에-가져오기">파일 목록 한번에 가져오기</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">glob</span><span class="p">(</span><span class="s">'../data/지역_*.xls'</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="s">'../data</span><span class="se">\\</span><span class="s">지역_위치별(주유소) (1).xls'</span><span class="p">,</span>
 <span class="s">'../data</span><span class="se">\\</span><span class="s">지역_위치별(주유소) (10).xls'</span><span class="p">,</span>
 <span class="s">'../data</span><span class="se">\\</span><span class="s">지역_위치별(주유소) (11).xls'</span><span class="p">,</span>
					<span class="p">.</span>
                    <span class="p">.</span>
                    <span class="p">.</span>
 <span class="s">'../data</span><span class="se">\\</span><span class="s">지역_위치별(주유소) (8).xls'</span><span class="p">,</span>
 <span class="s">'../data</span><span class="se">\\</span><span class="s">지역_위치별(주유소) (9).xls'</span><span class="p">,</span>
 <span class="s">'../data</span><span class="se">\\</span><span class="s">지역_위치별(주유소).xls'</span><span class="p">]</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 파일명 저장
</span><span class="n">stations_files</span> <span class="o">=</span> <span class="n">glob</span><span class="p">(</span><span class="s">'../data/지역_*.xls'</span><span class="p">)</span>

<span class="c1"># 하나만 읽어보기
</span><span class="n">tmp</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_excel</span><span class="p">(</span><span class="n">stations_files</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">header</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">tmp</span><span class="p">.</span><span class="n">tail</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>지역</th>
      <th>상호</th>
      <th>주소</th>
      <th>상표</th>
      <th>전화번호</th>
      <th>셀프여부</th>
      <th>고급휘발유</th>
      <th>휘발유</th>
      <th>경유</th>
      <th>실내등유</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>12</th>
      <td>서울특별시</td>
      <td>천호현대주유소</td>
      <td>서울 강동구 천중로 67 (천호동)</td>
      <td>현대오일뱅크</td>
      <td>02-484-9323</td>
      <td>N</td>
      <td>-</td>
      <td>1823</td>
      <td>1924</td>
      <td>-</td>
    </tr>
    <tr>
      <th>13</th>
      <td>서울특별시</td>
      <td>광성주유소</td>
      <td>서울 강동구 올림픽로 673 (천호동)</td>
      <td>S-OIL</td>
      <td>02-470-5133</td>
      <td>N</td>
      <td>-</td>
      <td>1978</td>
      <td>2028</td>
      <td>1900</td>
    </tr>
  </tbody>
</table>
<p>&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</p>

<hr />

<h4 id="concat">concat</h4>
<ul>
  <li>형식이 동일하고 연달아 붙이기만 하면 될 때</li>
</ul>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tmp_raw</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">file_name</span> <span class="ow">in</span> <span class="n">stations_files</span><span class="p">:</span>
    <span class="n">tmp</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_excel</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
    <span class="n">tmp_raw</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span>
<span class="n">stations_raw</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">(</span><span class="n">tmp_raw</span><span class="p">)</span>
<span class="n">stations_raw</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>지역</th>
      <th>상호</th>
      <th>주소</th>
      <th>상표</th>
      <th>전화번호</th>
      <th>셀프여부</th>
      <th>고급휘발유</th>
      <th>휘발유</th>
      <th>경유</th>
      <th>실내등유</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>서울특별시</td>
      <td>재건에너지 재정제2주유소 고속셀프지점</td>
      <td>서울특별시 강동구  천호대로 1246 (둔촌제2동)</td>
      <td>현대오일뱅크</td>
      <td>02-487-2030</td>
      <td>Y</td>
      <td>-</td>
      <td>1569</td>
      <td>1669</td>
      <td>-</td>
    </tr>
    <tr>
      <th>1</th>
      <td>서울특별시</td>
      <td>구천면주유소</td>
      <td>서울 강동구 구천면로 357 (암사동)</td>
      <td>현대오일뱅크</td>
      <td>02-441-0536</td>
      <td>N</td>
      <td>-</td>
      <td>1584</td>
      <td>1693</td>
      <td>-</td>
    </tr>
    <tr>
      <th>2</th>
      <td>서울특별시</td>
      <td>(주)소모에너지 신월주유소</td>
      <td>서울 강동구 양재대로 1323 (성내동)</td>
      <td>GS칼텍스</td>
      <td>02-6956-6674</td>
      <td>Y</td>
      <td>1836</td>
      <td>1586</td>
      <td>1698</td>
      <td>1650</td>
    </tr>
    <tr>
      <th>3</th>
      <td>서울특별시</td>
      <td>대성석유(주)길동주유소</td>
      <td>서울 강동구 천호대로 1168</td>
      <td>GS칼텍스</td>
      <td>02-474-7222</td>
      <td>N</td>
      <td>1845</td>
      <td>1596</td>
      <td>1728</td>
      <td>1600</td>
    </tr>
    <tr>
      <th>4</th>
      <td>서울특별시</td>
      <td>(주)삼표에너지 고덕주유소</td>
      <td>서울 강동구 고덕로 39 (암사동)</td>
      <td>GS칼텍스</td>
      <td>02-441-3327</td>
      <td>Y</td>
      <td>1845</td>
      <td>1625</td>
      <td>1745</td>
      <td>1615</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>28</th>
      <td>서울특별시</td>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>SK에너지</td>
      <td>02-445-5500</td>
      <td>N</td>
      <td>2486</td>
      <td>2246</td>
      <td>2236</td>
      <td>1836</td>
    </tr>
    <tr>
      <th>29</th>
      <td>서울특별시</td>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>SK에너지</td>
      <td>02-540-4965</td>
      <td>N</td>
      <td>2488</td>
      <td>2290</td>
      <td>2349</td>
      <td>1840</td>
    </tr>
    <tr>
      <th>30</th>
      <td>서울특별시</td>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>SK에너지</td>
      <td>02-511-0955</td>
      <td>N</td>
      <td>2495</td>
      <td>2290</td>
      <td>2360</td>
      <td>1835</td>
    </tr>
    <tr>
      <th>31</th>
      <td>서울특별시</td>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>현대오일뱅크</td>
      <td>02-518-5631</td>
      <td>N</td>
      <td>2548</td>
      <td>2298</td>
      <td>2387</td>
      <td>-</td>
    </tr>
    <tr>
      <th>32</th>
      <td>서울특별시</td>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>GS칼텍스</td>
      <td>02-518-5141</td>
      <td>N</td>
      <td>2818</td>
      <td>2578</td>
      <td>2570</td>
      <td>1850</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations_raw</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
Int64Index: 443 entries, 0 to 32
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   지역      443 non-null    object
 1   상호      443 non-null    object
 2   주소      443 non-null    object
 3   상표      443 non-null    object
 4   전화번호    443 non-null    object
 5   셀프여부    443 non-null    object
 6   고급휘발유   443 non-null    object
 7   휘발유     443 non-null    object
 8   경유      443 non-null    object
 9   실내등유    443 non-null    object
dtypes: object(10)
memory usage: 38.1+ KB
</span></code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations_raw</span><span class="p">.</span><span class="n">columns</span>

<span class="o">=&gt;</span>

<span class="n">Index</span><span class="p">([</span><span class="s">'지역'</span><span class="p">,</span> <span class="s">'상호'</span><span class="p">,</span> <span class="s">'주소'</span><span class="p">,</span> <span class="s">'상표'</span><span class="p">,</span> <span class="s">'전화번호'</span><span class="p">,</span> <span class="s">'셀프여부'</span><span class="p">,</span> <span class="s">'고급휘발유'</span><span class="p">,</span> <span class="s">'휘발유'</span><span class="p">,</span> <span class="s">'경유'</span><span class="p">,</span> <span class="s">'실내등유'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'object'</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span>
    <span class="s">"상호"</span><span class="p">:</span> <span class="n">stations_raw</span><span class="p">[</span><span class="s">"상호"</span><span class="p">],</span>
    <span class="s">"주소"</span><span class="p">:</span> <span class="n">stations_raw</span><span class="p">[</span><span class="s">"주소"</span><span class="p">],</span>
    <span class="s">"가격"</span><span class="p">:</span> <span class="n">stations_raw</span><span class="p">[</span><span class="s">"휘발유"</span><span class="p">],</span>
    <span class="s">"셀프"</span><span class="p">:</span> <span class="n">stations_raw</span><span class="p">[</span><span class="s">"셀프여부"</span><span class="p">],</span>
    <span class="s">"상표"</span><span class="p">:</span> <span class="n">stations_raw</span><span class="p">[</span><span class="s">"상표"</span><span class="p">]</span>
<span class="p">})</span>
<span class="n">stations</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<div>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>28</th>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246</td>
      <td>N</td>
      <td>SK에너지</td>
    </tr>
    <tr>
      <th>29</th>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290</td>
      <td>N</td>
      <td>SK에너지</td>
    </tr>
    <tr>
      <th>30</th>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290</td>
      <td>N</td>
      <td>SK에너지</td>
    </tr>
    <tr>
      <th>31</th>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298</td>
      <td>N</td>
      <td>현대오일뱅크</td>
    </tr>
    <tr>
      <th>32</th>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578</td>
      <td>N</td>
      <td>GS칼텍스</td>
    </tr>
  </tbody>
</table>
</div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">eachAddress</span> <span class="ow">in</span> <span class="n">stations</span><span class="p">[</span><span class="s">"주소"</span><span class="p">]:</span>
    <span class="k">print</span><span class="p">(</span><span class="n">eachAddress</span><span class="p">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">1</span><span class="p">])</span>
    
<span class="o">=&gt;</span>

<span class="n">강동구</span>
<span class="n">강동구</span>
<span class="n">강동구</span>
  <span class="p">.</span>
  <span class="p">.</span>
  <span class="p">.</span>
<span class="n">강남구</span>
<span class="n">강남구</span>
<span class="n">강남구</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">[</span><span class="s">"구"</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="n">eachAddress</span><span class="p">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">eachAddress</span> <span class="ow">in</span> <span class="n">stations</span><span class="p">[</span><span class="s">"주소"</span><span class="p">]]</span>
<span class="n">stations</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>재건에너지 재정제2주유소 고속셀프지점</td>
      <td>서울특별시 강동구  천호대로 1246 (둔촌제2동)</td>
      <td>1569</td>
      <td>Y</td>
      <td>현대오일뱅크</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>1</th>
      <td>구천면주유소</td>
      <td>서울 강동구 구천면로 357 (암사동)</td>
      <td>1584</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>2</th>
      <td>(주)소모에너지 신월주유소</td>
      <td>서울 강동구 양재대로 1323 (성내동)</td>
      <td>1586</td>
      <td>Y</td>
      <td>GS칼텍스</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>3</th>
      <td>대성석유(주)길동주유소</td>
      <td>서울 강동구 천호대로 1168</td>
      <td>1596</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>4</th>
      <td>(주)삼표에너지 고덕주유소</td>
      <td>서울 강동구 고덕로 39 (암사동)</td>
      <td>1625</td>
      <td>Y</td>
      <td>GS칼텍스</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>28</th>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>29</th>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>30</th>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>31</th>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>32</th>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강남구</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">[</span><span class="s">"구"</span><span class="p">].</span><span class="n">unique</span><span class="p">(),</span> <span class="nb">len</span><span class="p">(</span><span class="n">stations</span><span class="p">[</span><span class="s">"구"</span><span class="p">].</span><span class="n">unique</span><span class="p">())</span>

<span class="o">=&gt;</span>

<span class="p">(</span><span class="n">array</span><span class="p">([</span><span class="s">'강동구'</span><span class="p">,</span> <span class="s">'동대문구'</span><span class="p">,</span> <span class="s">'동작구'</span><span class="p">,</span> <span class="s">'마포구'</span><span class="p">,</span> <span class="s">'서대문구'</span><span class="p">,</span> <span class="s">'서초구'</span><span class="p">,</span> <span class="s">'성동구'</span><span class="p">,</span> <span class="s">'성북구'</span><span class="p">,</span> <span class="s">'송파구'</span><span class="p">,</span>
        <span class="s">'양천구'</span><span class="p">,</span> <span class="s">'영등포구'</span><span class="p">,</span> <span class="s">'강북구'</span><span class="p">,</span> <span class="s">'용산구'</span><span class="p">,</span> <span class="s">'은평구'</span><span class="p">,</span> <span class="s">'종로구'</span><span class="p">,</span> <span class="s">'중구'</span><span class="p">,</span> <span class="s">'중랑구'</span><span class="p">,</span> <span class="s">'강서구'</span><span class="p">,</span>
        <span class="s">'관악구'</span><span class="p">,</span> <span class="s">'광진구'</span><span class="p">,</span> <span class="s">'구로구'</span><span class="p">,</span> <span class="s">'금천구'</span><span class="p">,</span> <span class="s">'노원구'</span><span class="p">,</span> <span class="s">'도봉구'</span><span class="p">,</span> <span class="s">'강남구'</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">object</span><span class="p">),</span>
 <span class="mi">25</span><span class="p">)</span>
</code></pre></div></div>
<hr />

<h4 id="가격-데이터형-변환-object---float">가격 데이터형 변환 object -&gt; float</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">[</span><span class="s">"가격"</span><span class="p">]</span> <span class="o">=</span> <span class="n">stations</span><span class="p">[</span><span class="s">"가격"</span><span class="p">].</span><span class="n">astype</span><span class="p">(</span><span class="s">"float"</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="nb">ValueError</span><span class="p">:</span> <span class="n">could</span> <span class="ow">not</span> <span class="n">convert</span> <span class="n">string</span> <span class="n">to</span> <span class="nb">float</span><span class="p">:</span> <span class="s">'-'</span>
</code></pre></div></div>

<h4 id="가격-정보-있는-주유소만-사용">가격 정보 있는 주유소만 사용</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 가격 정보 없는 주유소
</span><span class="n">stations</span><span class="p">[</span><span class="n">stations</span><span class="p">[</span><span class="s">"가격"</span><span class="p">]</span> <span class="o">==</span> <span class="s">"-"</span><span class="p">]</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>15</th>
      <td>제이제이에너지</td>
      <td>서울 은평구 응암로 163</td>
      <td>-</td>
      <td>Y</td>
      <td>SK에너지</td>
      <td>은평구</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span> <span class="o">=</span> <span class="n">stations</span><span class="p">[</span><span class="n">stations</span><span class="p">[</span><span class="s">"가격"</span><span class="p">]</span> <span class="o">!=</span> <span class="s">"-"</span><span class="p">]</span>
<span class="n">stations</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>28</th>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>29</th>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>30</th>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>31</th>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>32</th>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강남구</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="다시-가격-데이터형-변환">다시 가격 데이터형 변환</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">[</span><span class="s">"가격"</span><span class="p">]</span> <span class="o">=</span> <span class="n">stations</span><span class="p">[</span><span class="s">"가격"</span><span class="p">].</span><span class="n">astype</span><span class="p">(</span><span class="s">"float"</span><span class="p">)</span>
<span class="n">stations</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
Int64Index: 442 entries, 0 to 32
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   상호      442 non-null    object 
 1   주소      442 non-null    object 
 2   가격      442 non-null    float64
 3   셀프      442 non-null    object 
 4   상표      442 non-null    object 
 5   구       442 non-null    object 
dtypes: float64(1), object(5)
memory usage: 24.2+ KB
</span></code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>재건에너지 재정제2주유소 고속셀프지점</td>
      <td>서울특별시 강동구  천호대로 1246 (둔촌제2동)</td>
      <td>1569.0</td>
      <td>Y</td>
      <td>현대오일뱅크</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>1</th>
      <td>구천면주유소</td>
      <td>서울 강동구 구천면로 357 (암사동)</td>
      <td>1584.0</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>2</th>
      <td>(주)소모에너지 신월주유소</td>
      <td>서울 강동구 양재대로 1323 (성내동)</td>
      <td>1586.0</td>
      <td>Y</td>
      <td>GS칼텍스</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>3</th>
      <td>대성석유(주)길동주유소</td>
      <td>서울 강동구 천호대로 1168</td>
      <td>1596.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>4</th>
      <td>(주)삼표에너지 고덕주유소</td>
      <td>서울 강동구 고덕로 39 (암사동)</td>
      <td>1625.0</td>
      <td>Y</td>
      <td>GS칼텍스</td>
      <td>강동구</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>28</th>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>29</th>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>30</th>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>31</th>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298.0</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>32</th>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강남구</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="인덱스-재정렬">인덱스 재정렬</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">stations</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>index</th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>437</th>
      <td>28</td>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>438</th>
      <td>29</td>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>439</th>
      <td>30</td>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>440</th>
      <td>31</td>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298.0</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>441</th>
      <td>32</td>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강남구</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">del</span> <span class="n">stations</span><span class="p">[</span><span class="s">"index"</span><span class="p">]</span>
<span class="n">stations</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>437</th>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>438</th>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>439</th>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>440</th>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298.0</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>441</th>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강남구</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="4-주유-가격-정보-시각화">4. 주유 가격 정보 시각화</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">platform</span>
<span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">font_manager</span><span class="p">,</span> <span class="n">rc</span>

<span class="n">get_ipython</span><span class="p">().</span><span class="n">run_line_magic</span><span class="p">(</span><span class="s">"matplotlib"</span><span class="p">,</span> <span class="s">"inline"</span><span class="p">)</span>
<span class="c1"># %matplotlib inline
</span>
<span class="n">path</span> <span class="o">=</span> <span class="s">"C:/Windows/Fonts/malgun.ttf"</span>

<span class="k">if</span> <span class="n">platform</span><span class="p">.</span><span class="n">system</span><span class="p">()</span> <span class="o">==</span> <span class="s">"Darwin"</span><span class="p">:</span>
    <span class="n">rc</span><span class="p">(</span><span class="s">"font"</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="s">"Arial Unicode MS"</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">platform</span><span class="p">.</span><span class="n">system</span><span class="p">()</span> <span class="o">==</span> <span class="s">"Windows"</span><span class="p">:</span>
    <span class="n">font_name</span> <span class="o">=</span> <span class="n">font_manager</span><span class="p">.</span><span class="n">FontProperties</span><span class="p">(</span><span class="n">fname</span><span class="o">=</span><span class="n">path</span><span class="p">).</span><span class="n">get_name</span><span class="p">()</span>
    <span class="n">rc</span><span class="p">(</span><span class="s">"font"</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">font_name</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Unknown system. sorry~"</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h4 id="boxplotfeat-pandas">boxplot(feat. pandas)</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">"가격"</span><span class="p">,</span> <span class="n">by</span><span class="o">=</span><span class="s">"셀프"</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">));</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/df3f32d5-9584-43ef-aabd-a023018325fa/image.png" alt="" /></p>

<hr />

<h4 id="boxplotfeat-seaborn">boxplot(feat. seaborn)</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"셀프"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"가격"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">stations</span><span class="p">,</span> <span class="n">palette</span><span class="o">=</span><span class="s">"Set3"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/58ef7368-dca7-4ed0-ab0e-a21766d4ba04/image.png" alt="" /></p>

<hr />

<h4 id="boxplotfeat-seaborn-1">boxplot(feat. seaborn)</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="s">"상표"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"가격"</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="s">"셀프"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">stations</span><span class="p">,</span> <span class="n">palette</span><span class="o">=</span><span class="s">"Set3"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="bp">True</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/67e254cf-6f9a-4bb2-9873-0cfd8c8bdd88/image.png" alt="" /></p>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">folium</span>
<span class="kn">import</span> <span class="nn">warnings</span>
<span class="n">warnings</span><span class="p">.</span><span class="n">simplefilter</span><span class="p">(</span><span class="n">action</span><span class="o">=</span><span class="s">"ignore"</span><span class="p">,</span> <span class="n">category</span><span class="o">=</span><span class="nb">FutureWarning</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<h4 id="가장-비싼-주유소-10개">가장 비싼 주유소 10개</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">"가격"</span><span class="p">,</span> <span class="n">ascending</span><span class="o">=</span><span class="bp">False</span><span class="p">).</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> 
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>273</th>
      <td>서남주유소</td>
      <td>서울 중구 통일로 30</td>
      <td>2697.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>중구</td>
    </tr>
    <tr>
      <th>240</th>
      <td>서계주유소</td>
      <td>서울 용산구  청파로 367 (청파동)</td>
      <td>2649.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>용산구</td>
    </tr>
    <tr>
      <th>441</th>
      <td>(주)만정에너지 삼보주유소</td>
      <td>서울 강남구 봉은사로 433 (삼성동)</td>
      <td>2578.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>272</th>
      <td>필동주유소</td>
      <td>서울 중구 퇴계로 196 (필동2가)</td>
      <td>2499.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>중구</td>
    </tr>
    <tr>
      <th>440</th>
      <td>(주)새서울네트웍스 제이제이주유소</td>
      <td>서울 강남구 언주로 716</td>
      <td>2298.0</td>
      <td>N</td>
      <td>현대오일뱅크</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>239</th>
      <td>한석주유소</td>
      <td>서울 용산구 이촌로 164</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>용산구</td>
    </tr>
    <tr>
      <th>438</th>
      <td>갤러리아주유소</td>
      <td>서울 강남구 압구정로 426</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>439</th>
      <td>SK논현주유소</td>
      <td>서울 강남구 논현로 747 (논현동)</td>
      <td>2290.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>437</th>
      <td>대청주유소</td>
      <td>서울 강남구 개포로 654 (일원동)</td>
      <td>2246.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>강남구</td>
    </tr>
    <tr>
      <th>238</th>
      <td>에너비스</td>
      <td>서울 용산구 한남대로 82 (한남동)</td>
      <td>2217.0</td>
      <td>N</td>
      <td>SK에너지</td>
      <td>용산구</td>
    </tr>
  </tbody>
</table>

<hr />

<h4 id="가장-싼-주유소-10개">가장 싼 주유소 10개</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stations</span><span class="p">.</span><span class="n">sort_values</span><span class="p">(</span><span class="n">by</span><span class="o">=</span><span class="s">"가격"</span><span class="p">).</span><span class="n">head</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>상호</th>
      <th>주소</th>
      <th>가격</th>
      <th>셀프</th>
      <th>상표</th>
      <th>구</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>52</th>
      <td>구도일주유소 두꺼비</td>
      <td>서울 서대문구 성산로 312</td>
      <td>1501.0</td>
      <td>Y</td>
      <td>S-OIL</td>
      <td>서대문구</td>
    </tr>
    <tr>
      <th>67</th>
      <td>만남의광장주유소</td>
      <td>서울 서초구 양재대로12길 73-71</td>
      <td>1504.0</td>
      <td>Y</td>
      <td>알뜰(ex)</td>
      <td>서초구</td>
    </tr>
    <tr>
      <th>241</th>
      <td>타이거주유소</td>
      <td>서울 은평구 수색로 188 (증산동)</td>
      <td>1504.0</td>
      <td>Y</td>
      <td>SK에너지</td>
      <td>은평구</td>
    </tr>
    <tr>
      <th>167</th>
      <td>플라트(주)서호주유소</td>
      <td>서울 양천구 남부순환로 317</td>
      <td>1508.0</td>
      <td>N</td>
      <td>GS칼텍스</td>
      <td>양천구</td>
    </tr>
    <tr>
      <th>287</th>
      <td>이케이에너지(주) 강서주유소</td>
      <td>서울 강서구 화곡로 273 (화곡동)</td>
      <td>1508.0</td>
      <td>Y</td>
      <td>현대오일뱅크</td>
      <td>강서구</td>
    </tr>
    <tr>
      <th>288</th>
      <td>화곡역주유소</td>
      <td>서울 강서구 강서로 154 (화곡동)</td>
      <td>1508.0</td>
      <td>Y</td>
      <td>알뜰주유소</td>
      <td>강서구</td>
    </tr>
    <tr>
      <th>289</th>
      <td>뉴신정주유소</td>
      <td>서울 강서구 곰달래로 207 (화곡동)</td>
      <td>1508.0</td>
      <td>N</td>
      <td>알뜰주유소</td>
      <td>강서구</td>
    </tr>
    <tr>
      <th>166</th>
      <td>현대주유소</td>
      <td>서울 양천구 남부순환로 372 (신월동)</td>
      <td>1508.0</td>
      <td>Y</td>
      <td>S-OIL</td>
      <td>양천구</td>
    </tr>
    <tr>
      <th>168</th>
      <td>양천구주유소</td>
      <td>서울 양천구 국회대로 275 (목동)</td>
      <td>1510.0</td>
      <td>Y</td>
      <td>알뜰주유소</td>
      <td>양천구</td>
    </tr>
    <tr>
      <th>290</th>
      <td>목화주유소</td>
      <td>서울 강서구 국회대로 251 (화곡동)</td>
      <td>1510.0</td>
      <td>Y</td>
      <td>알뜰주유소</td>
      <td>강서구</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">gu_data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">pivot_table</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">stations</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="s">"구"</span><span class="p">,</span> <span class="n">values</span><span class="o">=</span><span class="s">"가격"</span><span class="p">,</span> <span class="n">aggfunc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">)</span>
<span class="n">gu_data</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>가격</th>
    </tr>
    <tr>
      <th>구</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>강남구</th>
      <td>1884.454545</td>
    </tr>
    <tr>
      <th>강동구</th>
      <td>1669.642857</td>
    </tr>
    <tr>
      <th>강북구</th>
      <td>1546.583333</td>
    </tr>
    <tr>
      <th>강서구</th>
      <td>1588.181818</td>
    </tr>
    <tr>
      <th>관악구</th>
      <td>1642.214286</td>
    </tr>
  </tbody>
</table>

<hr />
<h4 id="지도-시각화">지도 시각화</h4>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">geo_path</span> <span class="o">=</span> <span class="s">"../data/02. skorea_municipalities_geo_simple.json"</span>
<span class="n">geo_str</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">geo_path</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">"utf-8"</span><span class="p">))</span>

<span class="n">my_map</span> <span class="o">=</span> <span class="n">folium</span><span class="p">.</span><span class="n">Map</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="p">[</span><span class="mf">37.5502</span><span class="p">,</span> <span class="mf">126.982</span><span class="p">],</span> <span class="n">zoom_start</span><span class="o">=</span><span class="mf">10.5</span><span class="p">,</span> <span class="n">tiles</span><span class="o">=</span><span class="s">"stamen toner"</span><span class="p">)</span>
<span class="n">my_map</span><span class="p">.</span><span class="n">choropleth</span><span class="p">(</span>
    <span class="n">geo_data</span><span class="o">=</span><span class="n">geo_str</span><span class="p">,</span>
    <span class="n">data</span><span class="o">=</span><span class="n">gu_data</span><span class="p">,</span>
    <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="n">gu_data</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="s">"가격"</span><span class="p">],</span>
    <span class="n">key_on</span><span class="o">=</span><span class="s">"feature.id"</span><span class="p">,</span>
    <span class="n">fill_color</span><span class="o">=</span><span class="s">"PuRd"</span>
<span class="p">)</span>
<span class="n">my_map</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/3a47a105-34ce-4ef5-a92d-64c548892025/image.png" alt="" /></p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[Selenium Basic https://www.selenium.dev/documentation/]]></summary></entry><entry><title type="html">Project 7 - Naver API에서 모은 몰스킨 데이터 분석</title><link href="https://yy2-hi.github.io/dataanalysis/moleskinanalysis/" rel="alternate" type="text/html" title="Project 7 - Naver API에서 모은 몰스킨 데이터 분석" /><published>2024-08-25T00:00:00+09:00</published><updated>2024-08-25T00:00:00+09:00</updated><id>https://yy2-hi.github.io/dataanalysis/moleskinanalysis</id><content type="html" xml:base="https://yy2-hi.github.io/dataanalysis/moleskinanalysis/"><![CDATA[<h2 id="1-네이버-api-사용-등록">1. 네이버 API 사용 등록</h2>
<ul>
  <li>네이버 개발자 센터</li>
  <li>https://developers.naver.com/main/</li>
  <li>Application
    <ul>
      <li>어플리케이션 등록</li>
      <li>어플리케이션 이름</li>
      <li>사용 API
        <ul>
          <li>검색</li>
        </ul>
      </li>
      <li>환경추가
        <ul>
          <li>WEB 설정</li>
          <li>http://localhost</li>
        </ul>
      </li>
      <li>Client ID : **</li>
      <li>Client Secret: **</li>
      <li>https://developers.naver.com/apps/#/myapps/E_N2j6ER9uWLIDb2BEEc/overview</li>
    </ul>
  </li>
</ul>

<h2 id="2-네이버-검색-api-사용">2. 네이버 검색 API 사용</h2>
<ul>
  <li>urllib: http 프로토콜에 따라서 서버의 요청/응답을 처리하기 위한 모듈</li>
  <li>urllib.request: 클라이언트의 요청을 처리하는 모듈</li>
  <li>urllib.parse: url 주소에 대한 분석
    <h4 id="개발-가이드">개발 가이드</h4>
  </li>
  <li>https://developers.naver.com/docs/serviceapi/search/blog/blog.md#python</li>
</ul>

<h3 id="검색-블로그blog">검색: 블로그(blog)</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 네이버 검색 API 예제 - 블로그 검색
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="n">client_id</span> <span class="o">=</span> <span class="s">"E_N2j6ER9uWLIDb2BEEc"</span>
<span class="n">client_secret</span> <span class="o">=</span> <span class="s">"OwQUT_S108"</span>
<span class="n">encText</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">parse</span><span class="p">.</span><span class="n">quote</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://openapi.naver.com/v1/search/blog?query="</span> <span class="o">+</span> <span class="n">encText</span> <span class="c1"># JSON 결과
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과
</span><span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">rescode</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">rescode</span><span class="o">==</span><span class="mi">200</span><span class="p">):</span>
    <span class="n">response_body</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Error Code:"</span> <span class="o">+</span> <span class="n">rescode</span><span class="p">)</span>
    
<span class="o">=&gt;</span>

<span class="s">"title"</span><span class="p">:</span><span class="s">"&lt;b&gt;파이썬&lt;\/b&gt;학원 초보자를 위한 공부과정!"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"https:\/\/blog.naver.com\/chzhvkdll\/222931446581"</span><span class="p">,</span>
			<span class="s">"description"</span><span class="p">:</span><span class="s">"그래서 제가 &lt;b&gt;파이썬&lt;\/b&gt;학원을 다닌 계기과 장단점, 고민에 대해 써보도록 하겠습니다 :) 모두가 같을 수는... &lt;b&gt;파이썬&lt;\/b&gt;학원 수강 신청한 건 대학 후배 중에 갑자기 전공 무관하게 그쪽으로 턴해서 공부하고 취직했단... "</span><span class="p">,</span>
			<span class="s">"bloggername"</span><span class="p">:</span><span class="s">"에피"</span><span class="p">,</span>
			<span class="s">"bloggerlink"</span><span class="p">:</span><span class="s">"blog.naver.com\/chzhvkdll"</span><span class="p">,</span>
			<span class="s">"postdate"</span><span class="p">:</span><span class="s">"20221118"</span>
            								<span class="p">.</span>
                                            <span class="p">.</span>
                                            <span class="p">.</span>
</code></pre></div></div>
<hr />
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">response</span><span class="p">,</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">(),</span> <span class="n">response</span><span class="p">.</span><span class="n">code</span><span class="p">,</span> <span class="n">response</span><span class="p">.</span><span class="n">status</span>

<span class="o">=&gt;</span>

<span class="p">(</span><span class="o">&lt;</span><span class="n">http</span><span class="p">.</span><span class="n">client</span><span class="p">.</span><span class="n">HTTPResponse</span> <span class="n">at</span> <span class="mh">0x25e7c01c190</span><span class="o">&gt;</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 글자로 읽을 경우, decode utf-8 설정
</span><span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">"utf-8"</span><span class="p">))</span>

<span class="o">=&gt;</span>

<span class="p">{</span>
	<span class="s">"lastBuildDate"</span><span class="p">:</span><span class="s">"Thu, 02 Feb 2023 18:25:40 +0900"</span><span class="p">,</span>
	<span class="s">"total"</span><span class="p">:</span><span class="mi">384417</span><span class="p">,</span>
	<span class="s">"start"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"display"</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span>
	<span class="s">"items"</span><span class="p">:[</span>
		<span class="p">{</span>
			<span class="s">"title"</span><span class="p">:</span><span class="s">"&lt;b&gt;파이썬&lt;\/b&gt;학원 초보자를 위한 공부과정!"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"https:\/\/blog.naver.com\/chzhvkdll\/222931446581"</span><span class="p">,</span>
			<span class="s">"description"</span><span class="p">:</span><span class="s">"그래서 제가 &lt;b&gt;파이썬&lt;\/b&gt;학원을 다닌 계기과 장단점, 고민에 대해 써보도록 하겠습니다 :) 모두가 같을 수는... &lt;b&gt;파이썬&lt;\/b&gt;학원 수강 신청한 건 대학 후배 중에 갑자기 전공 무관하게 그쪽으로 턴해서 공부하고 취직했단... "</span><span class="p">,</span>
			<span class="s">"bloggername"</span><span class="p">:</span><span class="s">"에피"</span><span class="p">,</span>
			<span class="s">"bloggerlink"</span><span class="p">:</span><span class="s">"blog.naver.com\/chzhvkdll"</span><span class="p">,</span>
			<span class="s">"postdate"</span><span class="p">:</span><span class="s">"20221118"</span>
		<span class="p">},</span>
        									<span class="p">.</span>
                                            <span class="p">.</span>
                                            <span class="p">.</span>
</code></pre></div></div>

<hr />
<h3 id="검색-책book">검색: 책(book)</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 네이버 검색 API 예제 - 책 검색
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="n">client_id</span> <span class="o">=</span> <span class="s">"E_N2j6ER9uWLIDb2BEEc"</span>
<span class="n">client_secret</span> <span class="o">=</span> <span class="s">"OwQUT_S108"</span>
<span class="n">encText</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">parse</span><span class="p">.</span><span class="n">quote</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://openapi.naver.com/v1/search/book?query="</span> <span class="o">+</span> <span class="n">encText</span> <span class="c1"># JSON 결과
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과
</span><span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">rescode</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">rescode</span><span class="o">==</span><span class="mi">200</span><span class="p">):</span>
    <span class="n">response_body</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Error Code:"</span> <span class="o">+</span> <span class="n">rescode</span><span class="p">)</span>
    
<span class="o">=&gt;</span>

<span class="p">{</span>
	<span class="s">"lastBuildDate"</span><span class="p">:</span><span class="s">"Thu, 02 Feb 2023 18:25:40 +0900"</span><span class="p">,</span>
	<span class="s">"total"</span><span class="p">:</span><span class="mi">826</span><span class="p">,</span>
	<span class="s">"start"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"display"</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span>
	<span class="s">"items"</span><span class="p">:[</span>
		<span class="p">{</span>
			<span class="s">"title"</span><span class="p">:</span><span class="s">"혼자 공부하는 파이썬 (1:1 과외하듯 배우는 프로그래밍 자습서)"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"https:\/\/search.shopping.naver.com\/book\/catalog\/32507605957"</span><span class="p">,</span>
			<span class="s">"image"</span><span class="p">:</span><span class="s">"https:\/\/shopping-phinf.pstatic.net\/main_3250760\/32507605957.20221019133018.jpg"</span><span class="p">,</span>
			<span class="s">"author"</span><span class="p">:</span><span class="s">"윤인성"</span><span class="p">,</span>
			<span class="s">"discount"</span><span class="p">:</span><span class="s">"19800"</span><span class="p">,</span>
			<span class="s">"publisher"</span><span class="p">:</span><span class="s">"한빛미디어"</span><span class="p">,</span>
			<span class="s">"pubdate"</span><span class="p">:</span><span class="s">"20220601"</span><span class="p">,</span>
			<span class="s">"isbn"</span><span class="p">:</span><span class="s">"9791162245651"</span><span class="p">,</span>
            							<span class="p">.</span>
                                        <span class="p">.</span>
                                        <span class="p">.</span>
</code></pre></div></div>

<hr />

<h3 id="검색-영화movie">검색: 영화(movie)</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 네이버 검색 API 예제 - 영화 검색
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="n">client_id</span> <span class="o">=</span> <span class="s">"E_N2j6ER9uWLIDb2BEEc"</span>
<span class="n">client_secret</span> <span class="o">=</span> <span class="s">"OwQUT_S108"</span>
<span class="n">encText</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">parse</span><span class="p">.</span><span class="n">quote</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://openapi.naver.com/v1/search/movie?query="</span> <span class="o">+</span> <span class="n">encText</span> <span class="c1"># JSON 결과
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과
</span><span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">rescode</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">rescode</span><span class="o">==</span><span class="mi">200</span><span class="p">):</span>
    <span class="n">response_body</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Error Code:"</span> <span class="o">+</span> <span class="n">rescode</span><span class="p">)</span>
    
<span class="o">=&gt;</span>

<span class="p">{</span>
	<span class="s">"lastBuildDate"</span><span class="p">:</span><span class="s">"Thu, 02 Feb 2023 18:25:40 +0900"</span><span class="p">,</span>
	<span class="s">"total"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"start"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"display"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"items"</span><span class="p">:[</span>
		<span class="p">{</span>
			<span class="s">"title"</span><span class="p">:</span><span class="s">"&lt;b&gt;파이썬&lt;\/b&gt; 앤 가드"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"https:\/\/movie.naver.com\/movie\/bi\/mi\/basic.nhn?code=152070"</span><span class="p">,</span>
			<span class="s">"image"</span><span class="p">:</span><span class="s">"https:\/\/ssl.pstatic.net\/imgmovie\/mdi\/mit110\/1520\/152070_P01_145336.jpg"</span><span class="p">,</span>
			<span class="s">"subtitle"</span><span class="p">:</span><span class="s">"PYTHON AND GUARD"</span><span class="p">,</span>
			<span class="s">"pubDate"</span><span class="p">:</span><span class="s">"2015"</span><span class="p">,</span>
			<span class="s">"director"</span><span class="p">:</span><span class="s">"안톤 디아코프|"</span><span class="p">,</span>
			<span class="s">"actor"</span><span class="p">:</span><span class="s">""</span><span class="p">,</span>
			<span class="s">"userRating"</span><span class="p">:</span><span class="s">"0.00"</span>
		<span class="p">}</span>
        									<span class="p">.</span>
                                            <span class="p">.</span>
                                            <span class="p">.</span>
</code></pre></div></div>
<hr />

<h3 id="검색-카페cafearticle">검색: 카페(cafearticle)</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 네이버 검색 API 예제 - 카페 검색
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="n">client_id</span> <span class="o">=</span> <span class="s">"E_N2j6ER9uWLIDb2BEEc"</span>
<span class="n">client_secret</span> <span class="o">=</span> <span class="s">"OwQUT_S108"</span>
<span class="n">encText</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">parse</span><span class="p">.</span><span class="n">quote</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://openapi.naver.com/v1/search/cafearticle?query="</span> <span class="o">+</span> <span class="n">encText</span> <span class="c1"># JSON 결과
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과
</span><span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">rescode</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">rescode</span><span class="o">==</span><span class="mi">200</span><span class="p">):</span>
    <span class="n">response_body</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Error Code:"</span> <span class="o">+</span> <span class="n">rescode</span><span class="p">)</span>
    
<span class="o">=&gt;</span>

<span class="p">{</span>
	<span class="s">"lastBuildDate"</span><span class="p">:</span><span class="s">"Thu, 02 Feb 2023 18:25:41 +0900"</span><span class="p">,</span>
	<span class="s">"total"</span><span class="p">:</span><span class="mi">156167</span><span class="p">,</span>
	<span class="s">"start"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"display"</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span>
	<span class="s">"items"</span><span class="p">:[</span>
		<span class="p">{</span>
			<span class="s">"title"</span><span class="p">:</span><span class="s">"&lt;b&gt;파이썬&lt;\/b&gt; vs C언어"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"http:\/\/cafe.naver.com\/mathall\/2564974"</span><span class="p">,</span>
			<span class="s">"description"</span><span class="p">:</span><span class="s">"고수님들의 현명한 답변 기다립니다 &lt;b&gt;파이썬&lt;\/b&gt;을 가르친다는 학원은 저희 집에서 차로 15분 거리라서 제가... 제가 라이딩이 힘든 건 아니니 그건 감안하고 &lt;b&gt;파이썬&lt;\/b&gt; vs C언어 중 어느 학원을 선택해야 아이한테 도움이 될까요?"</span><span class="p">,</span>
			<span class="s">"cafename"</span><span class="p">:</span><span class="s">"[상위1%카페] 대한민국 상위1% 교육정..."</span><span class="p">,</span>
			<span class="s">"cafeurl"</span><span class="p">:</span><span class="s">"https:\/\/cafe.naver.com\/mathall"</span>
		<span class="p">},</span>
        								<span class="p">.</span>
                                        <span class="p">.</span>
                                        <span class="p">.</span>
</code></pre></div></div>
<hr />
<h3 id="검색-쇼핑shop">검색: 쇼핑(shop)</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 네이버 검색 API 예제 - 쇼핑 검색
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="n">client_id</span> <span class="o">=</span> <span class="s">"E_N2j6ER9uWLIDb2BEEc"</span>
<span class="n">client_secret</span> <span class="o">=</span> <span class="s">"OwQUT_S108"</span>
<span class="n">encText</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">parse</span><span class="p">.</span><span class="n">quote</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://openapi.naver.com/v1/search/shop?query="</span> <span class="o">+</span> <span class="n">encText</span> <span class="c1"># JSON 결과
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과
</span><span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">rescode</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">rescode</span><span class="o">==</span><span class="mi">200</span><span class="p">):</span>
    <span class="n">response_body</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Error Code:"</span> <span class="o">+</span> <span class="n">rescode</span><span class="p">)</span>
    
<span class="o">=&gt;</span>

<span class="p">{</span>
	<span class="s">"lastBuildDate"</span><span class="p">:</span><span class="s">"Thu, 02 Feb 2023 18:25:41 +0900"</span><span class="p">,</span>
	<span class="s">"total"</span><span class="p">:</span><span class="mi">148244</span><span class="p">,</span>
	<span class="s">"start"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"display"</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span>
	<span class="s">"items"</span><span class="p">:[</span>
		<span class="p">{</span>
			<span class="s">"title"</span><span class="p">:</span><span class="s">"잘모이 셀리나 리얼 &lt;b&gt;파이톤&lt;\/b&gt; 뉴 빅 토트백 ZA-4022"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"https:\/\/search.shopping.naver.com\/gate.nhn?id=35683981989"</span><span class="p">,</span>
			<span class="s">"image"</span><span class="p">:</span><span class="s">"https:\/\/shopping-phinf.pstatic.net\/main_3568398\/35683981989.20221107081401.jpg"</span><span class="p">,</span>
			<span class="s">"lprice"</span><span class="p">:</span><span class="s">"165530"</span><span class="p">,</span>
			<span class="s">"hprice"</span><span class="p">:</span><span class="s">""</span><span class="p">,</span>
			<span class="s">"mallName"</span><span class="p">:</span><span class="s">"네이버"</span><span class="p">,</span>
			<span class="s">"productId"</span><span class="p">:</span><span class="s">"35683981989"</span><span class="p">,</span>
			<span class="s">"productType"</span><span class="p">:</span><span class="s">"1"</span><span class="p">,</span>
			<span class="s">"brand"</span><span class="p">:</span><span class="s">"잘모이"</span><span class="p">,</span>
			<span class="s">"maker"</span><span class="p">:</span><span class="s">""</span><span class="p">,</span>
			<span class="s">"category1"</span><span class="p">:</span><span class="s">"패션잡화"</span><span class="p">,</span>
			<span class="s">"category2"</span><span class="p">:</span><span class="s">"여성가방"</span><span class="p">,</span>
			<span class="s">"category3"</span><span class="p">:</span><span class="s">"토트백"</span><span class="p">,</span>
			<span class="s">"category4"</span><span class="p">:</span><span class="s">""</span>
		<span class="p">},</span>
        								<span class="p">.</span>
                                        <span class="p">.</span>
                                        <span class="p">.</span>
</code></pre></div></div>

<hr />

<h3 id="검색-백과사전encyc">검색: 백과사전(encyc)</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 네이버 검색 API 예제 - 백과사전 검색
</span><span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="n">client_id</span> <span class="o">=</span> <span class="s">"E_N2j6ER9uWLIDb2BEEc"</span>
<span class="n">client_secret</span> <span class="o">=</span> <span class="s">"OwQUT_S108"</span>
<span class="n">encText</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">parse</span><span class="p">.</span><span class="n">quote</span><span class="p">(</span><span class="s">"파이썬"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="s">"https://openapi.naver.com/v1/search/encyc?query="</span> <span class="o">+</span> <span class="n">encText</span> <span class="c1"># JSON 결과
# url = "https://openapi.naver.com/v1/search/blog.xml?query=" + encText # XML 결과
</span><span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
<span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">rescode</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">getcode</span><span class="p">()</span>
<span class="k">if</span><span class="p">(</span><span class="n">rescode</span><span class="o">==</span><span class="mi">200</span><span class="p">):</span>
    <span class="n">response_body</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
    <span class="k">print</span><span class="p">(</span><span class="n">response_body</span><span class="p">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"Error Code:"</span> <span class="o">+</span> <span class="n">rescode</span><span class="p">)</span>
    
<span class="o">=&gt;</span>

<span class="p">{</span>
	<span class="s">"lastBuildDate"</span><span class="p">:</span><span class="s">"Thu, 02 Feb 2023 18:25:41 +0900"</span><span class="p">,</span>
	<span class="s">"total"</span><span class="p">:</span><span class="mi">522</span><span class="p">,</span>
	<span class="s">"start"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span>
	<span class="s">"display"</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span>
	<span class="s">"items"</span><span class="p">:[</span>
		<span class="p">{</span>
			<span class="s">"title"</span><span class="p">:</span><span class="s">"&lt;b&gt;파이썬&lt;\/b&gt;"</span><span class="p">,</span>
			<span class="s">"link"</span><span class="p">:</span><span class="s">"https:\/\/terms.naver.com\/entry.naver?docId=3580815&amp;cid=59088&amp;categoryId=59096"</span><span class="p">,</span>
			<span class="s">"description"</span><span class="p">:</span><span class="s">"‘&lt;b&gt;파이썬&lt;\/b&gt;’이다. 간결한 문법으로 입문자가 이해하기 쉽고, 다양한 분야에 활용할 수 있기 때문이다. 이 외에도 &lt;b&gt;파이썬&lt;\/b&gt;은 머신러닝, 그래픽, 웹 개발 등 여러 업계에서 선호하는 언어로 꾸준히... "</span><span class="p">,</span>
			<span class="s">"thumbnail"</span><span class="p">:</span><span class="s">"http:\/\/openapi-dbscthumb.phinf.naver.net\/4749_000_1\/20170118193349632_0CHSSS5Y6.png\/01_16.png?type=m160_160"</span>
		<span class="p">},</span>
        						<span class="p">.</span>
                                <span class="p">.</span>
                                <span class="p">.</span>
</code></pre></div></div>

<hr />

<h2 id="3-상품-검색">3. 상품 검색</h2>
<ul>
  <li>“몰스킨”
```py
import os
import sys
import urllib.request
client_id = “E_N2j6ER9uWLIDb2BEEc”
client_secret = “OwQUT_S108”
encText = urllib.parse.quote(“몰스킨”)
url = “https://openapi.naver.com/v1/search/shop?query=” + encText # JSON 결과
    <h1 id="url--httpsopenapinavercomv1searchblogxmlquery--enctext--xml-결과">url = “https://openapi.naver.com/v1/search/blog.xml?query=” + encText # XML 결과</h1>
    <p>request = urllib.request.Request(url)
request.add_header(“X-Naver-Client-Id”,client_id)
request.add_header(“X-Naver-Client-Secret”,client_secret)
response = urllib.request.urlopen(request)
rescode = response.getcode()
if(rescode==200):
  response_body = response.read()
  print(response_body.decode(‘utf-8’))
else:
  print(“Error Code:” + rescode)</p>
  </li>
</ul>

<p>=&gt;</p>

<p>{
	“lastBuildDate”:”Thu, 02 Feb 2023 18:25:41 +0900”,
	“total”:43102,
	“start”:1,
	“display”:10,
	“items”:[
		{
			“title”:”<b>몰스킨&lt;\/b&gt; 2023 다이어리 위클리 소프트커버”,
			“link”:”https:\/\/search.shopping.naver.com\/gate.nhn?id=84662525433”,
			“image”:”https:\/\/shopping-phinf.pstatic.net\/main_8466252\/84662525433.2.jpg”,
			“lprice”:”22950”,
			“hprice”:””,
			“mallName”:”베스트펜”,
			“productId”:”84662525433”,
			“productType”:”2”,
			“brand”:”몰스킨”,
			“maker”:””,
			“category1”:”생활\/건강”,
			“category2”:”문구\/사무용품”,
			“category3”:”다이어리\/플래너”,
			“category4”:”다이어리”
		},
        						.
                                .
                                .</b></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
---

### gen_search_url()
&lt;span style="color: #FAFAD2"&gt;encText = urllib.parse.quote("몰스킨")
url = "https://openapi.naver.com/v1/search/shop?query=" + encText # JSON 결과&lt;/span&gt;
```py
def gen_search_url(api_node, search_text, start_num, disp_num):
    base = "https://openapi.naver.com/v1/search"
    node = "/" + api_node + ".json"
    param_query = "?query=" + urllib.parse.quote(search_text)
    param_start = "&amp;start=" + str(start_num)
    param_disp = "&amp;display=" + str(disp_num)
    
    return base + node + param_query + param_start + param_disp
    
gen_search_url("shop", "TEST", 10, 3)

=&gt;

'https://openapi.naver.com/v1/search/shop.json?query=TEST&amp;start=10&amp;display=3'
</code></pre></div></div>

<hr />

<h3 id="get_result_onpage">get_result_onpage()</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">datetime</span>

<span class="k">def</span> <span class="nf">get_result_onpage</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
    <span class="n">request</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">Request</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
    <span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Id"</span><span class="p">,</span><span class="n">client_id</span><span class="p">)</span>
    <span class="n">request</span><span class="p">.</span><span class="n">add_header</span><span class="p">(</span><span class="s">"X-Naver-Client-Secret"</span><span class="p">,</span><span class="n">client_secret</span><span class="p">)</span>
    <span class="n">response</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="n">request</span><span class="p">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
    <span class="k">print</span><span class="p">(</span><span class="s">"[%s] Url Request Success"</span> <span class="o">%</span> <span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">())</span>
    <span class="k">return</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">read</span><span class="p">().</span><span class="n">decode</span><span class="p">(</span><span class="s">"utf-8"</span><span class="p">))</span>
    
<span class="n">url</span> <span class="o">=</span> <span class="n">gen_search_url</span><span class="p">(</span><span class="s">"shop"</span><span class="p">,</span> <span class="s">"몰스킨"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">one_result</span> <span class="o">=</span> <span class="n">get_result_onpage</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="mi">2023</span><span class="o">-</span><span class="mi">02</span><span class="o">-</span><span class="mi">02</span> <span class="mi">18</span><span class="p">:</span><span class="mi">25</span><span class="p">:</span><span class="mf">42.249704</span><span class="p">]</span> <span class="n">Url</span> <span class="n">Request</span> <span class="n">Success</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">one_result</span>

<span class="o">=&gt;</span>

<span class="p">{</span><span class="s">'lastBuildDate'</span><span class="p">:</span> <span class="s">'Thu, 02 Feb 2023 18:25:42 +0900'</span><span class="p">,</span>
 <span class="s">'total'</span><span class="p">:</span> <span class="mi">43102</span><span class="p">,</span>
 <span class="s">'start'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
 <span class="s">'display'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
 <span class="s">'items'</span><span class="p">:</span> <span class="p">[{</span><span class="s">'title'</span><span class="p">:</span> <span class="s">'&lt;b&gt;몰스킨&lt;/b&gt; 2023 다이어리 위클리 소프트커버'</span><span class="p">,</span>
   <span class="s">'link'</span><span class="p">:</span> <span class="s">'https://search.shopping.naver.com/gate.nhn?id=84662525433'</span><span class="p">,</span>
   <span class="s">'image'</span><span class="p">:</span> <span class="s">'https://shopping-phinf.pstatic.net/main_8466252/84662525433.2.jpg'</span><span class="p">,</span>
   <span class="s">'lprice'</span><span class="p">:</span> <span class="s">'22950'</span><span class="p">,</span>
   <span class="s">'hprice'</span><span class="p">:</span> <span class="s">''</span><span class="p">,</span>
   <span class="s">'mallName'</span><span class="p">:</span> <span class="s">'베스트펜'</span><span class="p">,</span>
   <span class="s">'productId'</span><span class="p">:</span> <span class="s">'84662525433'</span><span class="p">,</span>
   <span class="s">'productType'</span><span class="p">:</span> <span class="s">'2'</span><span class="p">,</span>
   <span class="s">'brand'</span><span class="p">:</span> <span class="s">'몰스킨'</span><span class="p">,</span>
   <span class="s">'maker'</span><span class="p">:</span> <span class="s">''</span><span class="p">,</span>
   <span class="s">'category1'</span><span class="p">:</span> <span class="s">'생활/건강'</span><span class="p">,</span>
   <span class="s">'category2'</span><span class="p">:</span> <span class="s">'문구/사무용품'</span><span class="p">,</span>
   <span class="s">'category3'</span><span class="p">:</span> <span class="s">'다이어리/플래너'</span><span class="p">,</span>
   <span class="s">'category4'</span><span class="p">:</span> <span class="s">'다이어리'</span><span class="p">},</span>
   								<span class="p">.</span>
                                <span class="p">.</span>
                                <span class="p">.</span>
</code></pre></div></div>

<hr />

<h3 id="get_fields">get_fields()</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>

<span class="k">def</span> <span class="nf">get_fields</span><span class="p">(</span><span class="n">json_data</span><span class="p">):</span>
    <span class="n">title</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"title"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    <span class="n">link</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"link"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    <span class="n">lprice</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"lprice"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    <span class="n">mall_name</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"mallName"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    
    <span class="n">result_pd</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span>
        <span class="s">"title"</span><span class="p">:</span> <span class="n">title</span><span class="p">,</span>
        <span class="s">"link"</span><span class="p">:</span> <span class="n">link</span><span class="p">,</span>
        <span class="s">"lprice"</span><span class="p">:</span> <span class="n">lprice</span><span class="p">,</span>
        <span class="s">"mall"</span><span class="p">:</span> <span class="n">mall_name</span><span class="p">,</span>
    <span class="p">},</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">"title"</span><span class="p">,</span> <span class="s">"lprice"</span><span class="p">,</span> <span class="s">"link"</span><span class="p">,</span> <span class="s">"mall"</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">result_pd</span>
    
<span class="n">get_fields</span><span class="p">(</span><span class="n">one_result</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>title</th>
      <th>lprice</th>
      <th>link</th>
      <th>mall</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>&lt;b&gt;몰스킨&lt;/b&gt; 2023 다이어리 위클리 소프트커버</td>
      <td>22950</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>베스트펜</td>
    </tr>
    <tr>
      <th>1</th>
      <td>&lt;b&gt;몰스킨&lt;/b&gt; 2023 데일리 12개월 다이어리 L</td>
      <td>38030</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>2</th>
      <td>&lt;b&gt;몰스킨&lt;/b&gt; 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플</td>
      <td>24000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>베스트펜</td>
    </tr>
    <tr>
      <th>3</th>
      <td>&lt;b&gt;몰스킨&lt;/b&gt; 2023다이어리 데일리 하드커버 라지블루 다이어리노트</td>
      <td>31000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>4</th>
      <td>[&lt;b&gt;몰스킨&lt;/b&gt;] 2023년 클래식 다이어리(12개월) (데일리, 위클리, 먼슬리)</td>
      <td>27000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>몰스킨공식온라인스토어</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="delete_tag">delete_tag()</h3>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">delete_tag</span><span class="p">(</span><span class="n">input_str</span><span class="p">):</span>
    <span class="n">input_str</span> <span class="o">=</span> <span class="n">input_str</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"&lt;b&gt;"</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span>
    <span class="n">input_str</span> <span class="o">=</span> <span class="n">input_str</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"&lt;/b&gt;"</span><span class="p">,</span> <span class="s">""</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">input_str</span>
    
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>

<span class="k">def</span> <span class="nf">get_fields</span><span class="p">(</span><span class="n">json_data</span><span class="p">):</span>
    <span class="n">title</span> <span class="o">=</span> <span class="p">[</span><span class="n">delete_tag</span><span class="p">(</span><span class="n">each</span><span class="p">[</span><span class="s">"title"</span><span class="p">])</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    <span class="n">link</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"link"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    <span class="n">lprice</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"lprice"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    <span class="n">mall_name</span> <span class="o">=</span> <span class="p">[</span><span class="n">each</span><span class="p">[</span><span class="s">"mallName"</span><span class="p">]</span> <span class="k">for</span> <span class="n">each</span> <span class="ow">in</span> <span class="n">json_data</span><span class="p">[</span><span class="s">"items"</span><span class="p">]]</span>
    
    <span class="n">result_pd</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span>
        <span class="s">"title"</span><span class="p">:</span> <span class="n">title</span><span class="p">,</span>
        <span class="s">"link"</span><span class="p">:</span> <span class="n">link</span><span class="p">,</span>
        <span class="s">"lprice"</span><span class="p">:</span> <span class="n">lprice</span><span class="p">,</span>
        <span class="s">"mall"</span><span class="p">:</span> <span class="n">mall_name</span><span class="p">,</span>
    <span class="p">},</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">"title"</span><span class="p">,</span> <span class="s">"lprice"</span><span class="p">,</span> <span class="s">"link"</span><span class="p">,</span> <span class="s">"mall"</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">result_pd</span>
    
<span class="n">get_fields</span><span class="p">(</span><span class="n">one_result</span><span class="p">)</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>title</th>
      <th>lprice</th>
      <th>link</th>
      <th>mall</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>몰스킨 2023 다이어리 위클리 소프트커버</td>
      <td>22950</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>베스트펜</td>
    </tr>
    <tr>
      <th>1</th>
      <td>몰스킨 2023 데일리 12개월 다이어리 L</td>
      <td>38030</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>2</th>
      <td>몰스킨 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플</td>
      <td>24000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>베스트펜</td>
    </tr>
    <tr>
      <th>3</th>
      <td>몰스킨 2023다이어리 데일리 하드커버 라지블루 다이어리노트</td>
      <td>31000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>4</th>
      <td>[몰스킨] 2023년 클래식 다이어리(12개월) (데일리, 위클리, 먼슬리)</td>
      <td>27000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>몰스킨공식온라인스토어</td>
    </tr>
  </tbody>
</table>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">url</span> <span class="o">=</span> <span class="n">gen_search_url</span><span class="p">(</span><span class="s">"shop"</span><span class="p">,</span> <span class="s">"몰스킨"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">json_result</span> <span class="o">=</span> <span class="n">get_result_onpage</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">pd_result</span> <span class="o">=</span> <span class="n">get_fields</span><span class="p">(</span><span class="n">json_result</span><span class="p">)</span>

<span class="o">=&gt;</span>

<span class="p">[</span><span class="mi">2023</span><span class="o">-</span><span class="mi">02</span><span class="o">-</span><span class="mi">02</span> <span class="mi">18</span><span class="p">:</span><span class="mi">48</span><span class="p">:</span><span class="mf">57.156929</span><span class="p">]</span> <span class="n">Url</span> <span class="n">Request</span> <span class="n">Success</span>
</code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pd_result</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>title</th>
      <th>lprice</th>
      <th>link</th>
      <th>mall</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>몰스킨 2023 다이어리 위클리 소프트커버</td>
      <td>22950</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>베스트펜</td>
    </tr>
    <tr>
      <th>1</th>
      <td>몰스킨 2023 데일리 12개월 다이어리 L</td>
      <td>38030</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>2</th>
      <td>몰스킨 노트 가죽 하드커버 감성 고급 업무용 이쁜 심플</td>
      <td>24000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>베스트펜</td>
    </tr>
    <tr>
      <th>3</th>
      <td>몰스킨 2023다이어리 데일리 하드커버 라지블루 다이어리노트</td>
      <td>31000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>4</th>
      <td>[몰스킨] 2023년 클래식 다이어리(12개월) (데일리, 위클리, 먼슬리)</td>
      <td>27000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>몰스킨공식온라인스토어</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="actmain">actMain()</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">result_mol</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">100</span><span class="p">):</span>
    <span class="n">url</span> <span class="o">=</span> <span class="n">gen_search_url</span><span class="p">(</span><span class="s">"shop"</span><span class="p">,</span> <span class="s">"몰스킨"</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
    <span class="n">json_result</span> <span class="o">=</span> <span class="n">get_result_onpage</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
    <span class="n">pd_result</span> <span class="o">=</span> <span class="n">get_fields</span><span class="p">(</span><span class="n">json_result</span><span class="p">)</span>

    <span class="n">result_mol</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">pd_result</span><span class="p">)</span>

<span class="n">result_mol</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">concat</span><span class="p">(</span><span class="n">result_mol</span><span class="p">)</span>

<span class="n">result_mol</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
Int64Index: 1000 entries, 0 to 99
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   title   1000 non-null   object
 1   lprice  1000 non-null   object
 2   link    1000 non-null   object
 3   mall    1000 non-null   object
dtypes: object(4)
memory usage: 39.1+ KB
</span></code></pre></div></div>

<hr />
<h4 id="인덱스-재정렬-object---float">인덱스 재정렬, object -&gt; float</h4>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">result_mol</span><span class="p">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">result_mol</span><span class="p">[</span><span class="s">"lprice"</span><span class="p">]</span> <span class="o">=</span> <span class="n">result_mol</span><span class="p">[</span><span class="s">"lprice"</span><span class="p">].</span><span class="n">astype</span><span class="p">(</span><span class="s">"float"</span><span class="p">)</span>
<span class="n">result_mol</span><span class="p">.</span><span class="n">info</span><span class="p">()</span>

<span class="o">=&gt;</span>

<span class="o">&lt;</span><span class="k">class</span> <span class="err">'</span><span class="nc">pandas</span><span class="p">.</span><span class="n">core</span><span class="p">.</span><span class="n">frame</span><span class="p">.</span><span class="n">DataFrame</span><span class="s">'&gt;
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   title   1000 non-null   object 
 1   lprice  1000 non-null   float64
 2   link    1000 non-null   object 
 3   mall    1000 non-null   object 
dtypes: float64(1), object(3)
memory usage: 31.4+ KB
</span></code></pre></div></div>

<hr />

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">result_mol</span><span class="p">.</span><span class="n">tail</span><span class="p">()</span>
</code></pre></div></div>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align:right">
      <th></th>
      <th>title</th>
      <th>lprice</th>
      <th>link</th>
      <th>mall</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>995</th>
      <td>몰스킨 까이에 룰드 라지 사이즈 옵션1</td>
      <td>20000</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>996</th>
      <td>갤럭시 갤럭시 몰스킨 다잉 팬츠 GA1821U22P</td>
      <td>116630</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>997</th>
      <td>몰스킨 Moleskine 클래식 2023 데일리 플래너 하드 커버 라지 12 x 스칼렛</td>
      <td>29800</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
    <tr>
      <th>998</th>
      <td>몰스킨 기프트박스 트래블 - Travel Journal+Luggage Tags</td>
      <td>73700</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>몰스킨스토어</td>
    </tr>
    <tr>
      <th>999</th>
      <td>올젠 몰스킨 워싱 스트레치 자켓 편한 착장 고급스러운 연출 ZOC3KG1312</td>
      <td>230910</td>
      <td>https://search.shopping.naver.com/gate.nhn?id=...</td>
      <td>네이버</td>
    </tr>
  </tbody>
</table>

<hr />

<h3 id="to_excel">to_excel()</h3>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">writer</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">ExcelWriter</span><span class="p">(</span><span class="s">"../data/06_molskin_diary_in_naver_shop.xlsx"</span><span class="p">,</span> <span class="n">engine</span><span class="o">=</span><span class="s">"xlsxwriter"</span><span class="p">)</span>
<span class="n">result_mol</span><span class="p">.</span><span class="n">to_excel</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span> <span class="n">sheet_name</span><span class="o">=</span><span class="s">"Sheet1"</span><span class="p">)</span>

<span class="n">workbook</span> <span class="o">=</span> <span class="n">writer</span><span class="p">.</span><span class="n">book</span>
<span class="n">worksheet</span> <span class="o">=</span> <span class="n">writer</span><span class="p">.</span><span class="n">sheets</span><span class="p">[</span><span class="s">"Sheet1"</span><span class="p">]</span>
<span class="n">worksheet</span><span class="p">.</span><span class="n">set_column</span><span class="p">(</span><span class="s">"A:A"</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="n">worksheet</span><span class="p">.</span><span class="n">set_column</span><span class="p">(</span><span class="s">"B:B"</span><span class="p">,</span> <span class="mi">80</span><span class="p">)</span>
<span class="n">worksheet</span><span class="p">.</span><span class="n">set_column</span><span class="p">(</span><span class="s">"C:C"</span><span class="p">,</span> <span class="mi">7</span><span class="p">)</span>
<span class="n">worksheet</span><span class="p">.</span><span class="n">set_column</span><span class="p">(</span><span class="s">"D:D"</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">worksheet</span><span class="p">.</span><span class="n">set_column</span><span class="p">(</span><span class="s">"E:E"</span><span class="p">,</span> <span class="mi">40</span><span class="p">)</span>
<span class="n">worksheet</span><span class="p">.</span><span class="n">set_column</span><span class="p">(</span><span class="s">"F:F"</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>

<span class="n">worksheet</span><span class="p">.</span><span class="n">conditional_format</span><span class="p">(</span><span class="s">"C2:C1001"</span><span class="p">,</span> <span class="p">{</span><span class="s">"type"</span><span class="p">:</span> <span class="s">"3_color_scale"</span><span class="p">})</span>
<span class="n">writer</span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/18b6cf77-5d17-439b-8ffe-bbd4e7b61eee/image.png" alt="" /></p>

<hr />

<h3 id="시각화">시각화</h3>

<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">set_matplotlib_hangul</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>

<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">sns</span><span class="p">.</span><span class="n">countplot</span><span class="p">(</span>
    <span class="n">x</span><span class="o">=</span><span class="n">result_mol</span><span class="p">[</span><span class="s">"mall"</span><span class="p">],</span> 
    <span class="n">data</span><span class="o">=</span><span class="n">result_mol</span><span class="p">,</span> 
    <span class="n">palette</span><span class="o">=</span><span class="s">"RdYlGn"</span><span class="p">,</span>
    <span class="n">order</span><span class="o">=</span><span class="n">result_mol</span><span class="p">[</span><span class="s">"mall"</span><span class="p">].</span><span class="n">value_counts</span><span class="p">().</span><span class="n">index</span>
<span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">rotation</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://velog.velcdn.com/images/yy2hi/post/125d018e-9fdf-4a9f-8aa9-6dc4b5d18adc/image.png" alt="" /></p>]]></content><author><name>yy2-hi</name></author><category term="DataAnalysis" /><summary type="html"><![CDATA[1. 네이버 API 사용 등록 네이버 개발자 센터 https://developers.naver.com/main/ Application 어플리케이션 등록 어플리케이션 이름 사용 API 검색 환경추가 WEB 설정 http://localhost Client ID : ** Client Secret: ** https://developers.naver.com/apps/#/myapps/E_N2j6ER9uWLIDb2BEEc/overview 2. 네이버 검색 API 사용 urllib: http 프로토콜에 따라서 서버의 요청/응답을 처리하기 위한 모듈 urllib.request: 클라이언트의 요청을 처리하는 모듈 urllib.parse: url 주소에 대한 분석 개발 가이드 https://developers.naver.com/docs/serviceapi/search/blog/blog.md#python]]></summary></entry></feed>