[빅데이터 분석기사] 실기 준비(1)

자격증

[빅데이터 분석기사] 실기 준비(1)

RainIron 2021. 6. 11. 14:24

※ 예제 작업형 1

mtcars 데이터셋(data/mtars.csv)의 qsec 컬럼을 최소최대 척도(Min-Max Scale)로 변환한 후 0.5보다 큰 값을 가지는 레코드 수를 구하시오.

* dataset위치: data/mtcars.csv

# 출력을 원하실 경우 print() 활용
# 예) print(df.head())

# 답안 제출 예시
# print(레코드 수)

import pandas as pd
import numpy as np

df = pd.read_csv('data/mtcars.csv')
max_value = df['qsec'].max()
min_value = df['qsec'].min()

tmp_list = (df['qsec'] - min_value ) / (max_value - min_value)
print(tmp_list[tmp_list > 0.5].count())

※ 예제 작업형 2

1. Import

# 출력을 원하실 경우 print() 활용
# 예) print(df.head())

# 답안 제출 예시
# 수험번호.csv 생성
# DataFrame.to_csv("0000.csv", index=False)

import pandas as pd
import sklearn

from sklearn.preprocessing import LabelEncoder

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# 측정 지표 라이브러리
from sklearn.metrics import roc_auc_score

2. Make DataFrame & Data Preprocessing

X_train_path = 'data/X_train.csv'
X_test_path = 'data/X_test.csv'
y_train_path = 'data/y_train.csv'

# 수험번호.csv 파일이 만들어지도록 코드를 제출한다.
y_test_path = 'data/3049285.csv'

# DataFrame Read 
x_train = pd.read_csv(X_train_path)
x_train = x_train.drop(['cust_id'], axis = 1)

x_test = pd.read_csv(X_test_path)
x_test_id = x_test.iloc[:, 0]
x_test = x_test.drop(['cust_id'], axis = 1)

y_train = pd.read_csv(y_train_path)
y_train = y_train.iloc[:, -1]

# ======================
# Data Preprocessing(1) 값이 한글로 구성된 컬럼 변경
# '주구매상품', '주구매지점' => LabelEncoder 사용
# Dataframe.apply(함수)
# LabelEncoder().fit_transform: 각 항목을 0부터 값을 매핑한다. ex) a - 0, b - 1, c - 2, d - 3
x_train.loc[:, ['주구매상품', '주구매지점']] = x_train.loc[:, ['주구매상품', '주구매지점']].apply(LabelEncoder().fit_transform)
x_test.loc[:, ['주구매상품', '주구매지점']] = x_test.loc[:, ['주구매상품', '주구매지점']].apply(LabelEncoder().fit_transform)

# =======================
# Data Preprocessing(2) 결측값 처리
# fillna() 함수 사용
x_train.loc[:, '환불금액'] = x_train.loc[:, '환불금액'].fillna(0)
x_test.loc[:, '환불금액'] = x_test.loc[:, '환불금액'].fillna(0)

# ====Preprocessing 완료====

3. Logistic Regression

# 1. LinearRegression()
model1 = LogisticRegression()
model1.fit(x_train, y_train)
print('Logistic Regression score: ', model1.score(x_train, y_train))

result_df = pd.DataFrame(model1.predict_proba(x_test))

final_df = pd.concat([x_test_id, result_df.iloc[:, -1]], axis = 1)
final_df.rename(columns = {'cust_id': 'cust_id', 1: 'gender'}, inplace = True)
print('Logistic Regression roc score: ', roc_auc_score(y_train, pd.DataFrame(model1.predict_proba(x_train)).iloc[:, 1]))

'자격증' 카테고리의 다른 글

[ADsP] ADsP 자격증 취득 후기 (0)	2021.06.20
[빅데이터 분석기사] 실기 후기 (0)	2021.06.19
[빅데이터 분석기사] 실기 준비(3) - 필기/실기 준비 키워드 (0)	2021.06.14
[빅데이터 분석기사] 실기 준비(2) (0)	2021.06.11
[SQLD] SQLD 자격증 취득 후기 (0)	2021.05.06

현재글[빅데이터 분석기사] 실기 준비(1)

일상 정리하기

26살! 계획과 실행을 좋아합니다:) 소프트웨어 전공생

SpringMVC, jsp, 로지스틱회귀분석, oracle, pyspark, HTML, matplotlib, hive, 빅데이터분석기사, CSS, SQL튜닝, r, spring, SQL, Python, 회계관리, 모델평가, Pandas, PL/SQL, 실습,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

일상 정리하기

[빅데이터 분석기사] 실기 준비(1)

※ 예제 작업형 1

※ 예제 작업형 2

1. Import

2. Make DataFrame & Data Preprocessing

3. Logistic Regression

'자격증' 카테고리의 다른 글

'자격증'의 다른글

티스토리툴바

[빅데이터 분석기사] 실기 준비(1)

※ 예제 작업형 1

※ 예제 작업형 2

1. Import

2. Make DataFrame & Data Preprocessing

3. Logistic Regression

'자격증' 카테고리의 다른 글

'자격증'의 다른글

관련글

티스토리툴바