Commit cc39dbf0 by chenyuanjie

fix

parent 96bba449
"""
author: CT
description: Keepa 数据聚合 — 一站式 Hive → Hive + Doris
步骤:
1) 读 Hive ods_keepa_asin_detail 当日分区,解析 last_detail JSON 各字段
派生 keepa_launch_time = min(listed_since, tracking_since) 转 yyyy-MM-dd HH:mm:ss
weight 字段已废弃置 NULL(Doris 端不再保留)
2) 与 Hive 历史 dim_keepa_asin_info union 按 asin 去重保留 updated_time 最新
3) 写入 Hive dim_keepa_asin_info(当日分区)+ 删除所有 date_info < 今日的历史分区
4) 当日新数据(不含历史)写入 Doris dwd.dwd_keepa_asin_detail
Doris UNIQUE KEY(site_name, asin) + sequence_col=updated_time 自动取最新
执行示例: spark-submit dim_keepa_asin_info.py us 2026-05-15
"""
import os import os
import sys import sys
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment