3주차

CTE1. 고객별의 첫 구매와 마지막 구매 날짜 등을 메타로 만든 테이블

CTE2. 고객별 주문 요약한 테이블

CTE3. CLV, 고객 충성도, 세그먼트 관련해서 필요

CTE4. 재구매율, 장바구니, 신규고객 등등

with first_last_orders as (
	select
		customerNumber,
        min(orderDate) as first_order_date,
        max(orderDate) as last_order_date
    from orders
    group by customerNumber
),
purchase_summary as (
	select
		o.customerNumber,
        count(o.orderNumber) as total_orders,
        sum(od.priceEach * od.quantityOrdered) as total_sales,
        avg(od.priceEach * od.quantityOrdered) as AOV,
        count(o.orderNumber) / nullif(timestampdiff(Month, Min(o.orderDate), curdate()),0) as purcahse_frequency 
    from orders as o
    join orderdetails as od on o.orderNumber = od.orderNumber
    group by o.customerNumber
),
customer_loyalty as (
	select
		o.customerNumber,
    case when count(o.orderNumber) >1 then 1 else 0 end repeat_customer,-- 재구매
		sum(od.quantityOrdered * od.priceEach) * count(o.orderDate) / count(distinct year(o.orderDate)) as CLV,-- 실제 구매한 금액 고객별로 존재 월단위 특정 기간 단위 간단하게 나눠서 고객생애가치 
		case when max(o.orderDate) >= date_sub(curdate(), interval 6 month) then 1 else 0 end active_last_6_months -- 활성화 고객 여부
	from orders o 
	join orderdetails as od on o.orderNumber = od.orderNumber
	group by o.customerNumber
),
new_customers as (
	select
    o.customerNumber,
    case when min(o.orderDate) >= date_sub(curdate(), interval 6 month) then 1 else 0 end new_customer,-- 최근 6개월 내에 첫 구매 했느냐?!
    count(o.orderNumber)> 1 as repeat_purchase_rate,
    sum(od.quantityOrdered) / count(o.orderNumber) as avg_basket_size
  from orders as o 
	join orderdetails as od
	group by o.customerNumber
)
SQL
복사

•

nullif(timestampdiff(month, min(orderDate), curdate()),0)

→ 첫 주문날짜와 현재 날짜 사이의 월 단위 차이를 계산

→ nullif(계산한 값, 0) 으로, 위의 월 단위 차이가 0이면 즉, 첫 주문이 이번 달에 이루어진 경우는 Null 반환 → 0으로 나눴을 때 발생하는 오류를 방지하기 위해 진행

•

CLV(고객 생애 가치)

◦

고객 총 주문금액(매출) X 고객 총 주문 횟수 X 고객이 주문한 고유 연도 수

◦

select
		c.customerNumber,
    c.customerName,
    c.country,
    flo.first_order_date,
    flo.last_order_date,
    nc.avg_basket_size
from customers c
left join first_last_orders flo on c.customerNumber = flo.customerNumber
left join purchase_summary ps on c.customerNumber = ps.customerNumber
left join customer_loyalty cl on c.customerNumber = cl.customerNumber
left join new_customers nc on c.customerNumber = nc.customerNumber;
SQL
복사

파이썬에서 GA 데이터셋 다루기

GA 데이터셋(csv) 을 파이썬으로 읽어들이면 아래와 같이 불러와진다.

import pandas as pd

df = pd.read_csv('ga.csv')
df
SQL
복사

struct 구조 이해하기

GA 데이터셋에 있는 구조인데,

•

key : value 형태의 여러 쌍을 포함한다. json 기준 = {”browser” : ”Firefox”, “OS” : “Android”}

array 구조 이해하기

여러 개의 값을 포함하는 리스트 형태의 데이터, 단순 값으로도 들어갈 수 있지만, struct 안으로 들어갈 수 도 있음

ex. ‘items’ : [{’id’ : 111, {’id’ : 112}}]

struct나 array나 json 형태로 구성되어있기 때문에 이 구조를 풀려면
json 라이브러리를 활용해야 한다. 

## 예를 들어, 아래와 같은 ga 데이터셋 예시가 있다고 하자
data = {
		'user_id':[1,2],
		'device':[
				{'browser':'Chrome', 'os':'Windows'}, # struct
				{'browser':'Safari', 'os':'MacOS'},
		],
		'interests':[
				['sports','music'], # array
				['movie','travel']
		],
		'purchase':[
				[{'item':'apple', 'price':1.5}, {'item':'watch', 'price':1.0}], # array of struct
				[{'item':'banana', 'price':1.3}]
				
		]

}
Python
복사

df = pd.DataFrame(data)
Python
복사

# struct 컬럼을 파싱하는 방법 : pd.Series
device_df = df['device'].apply(pd.Series)
df = df.join(device_df, how='left')
df.drop('device', axis=1, inplace=True)
Python
복사

이전

이후

# array 컬럼 파싱하는 방법 : explode 함수
# 'interests' 컬럼

df = df.explode('interests')
Python
복사

# array of struct 컬럼 파싱하기
# 1. 일단 array 파싱
df_pr = df.explode('purchase')
df_pr
Python
복사

# 2. struct 파싱
pr_details = df_pr['purchase'].apply(pd.Series)
pr_details
Python
복사

# 이번에는 concat 으로 병합
result = pd.concat([df_pr.drop(columns=['purchase']), pr_details.add_prefix('purchase_')], axis=1)
result
Python
복사