Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
A
Amazon-Selection-Data
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
abel_cjy
Amazon-Selection-Data
Commits
cc39dbf0
Commit
cc39dbf0
authored
May 18, 2026
by
chenyuanjie
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix
parent
96bba449
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
0 deletions
+13
-0
dim_keepa_asin_info.py
Pyspark_job/dim/dim_keepa_asin_info.py
+13
-0
No files found.
Pyspark_job/dim/dim_keepa_asin_info.py
View file @
cc39dbf0
"""
author: CT
description: Keepa 数据聚合 — 一站式 Hive → Hive + Doris
步骤:
1) 读 Hive ods_keepa_asin_detail 当日分区,解析 last_detail JSON 各字段
派生 keepa_launch_time = min(listed_since, tracking_since) 转 yyyy-MM-dd HH:mm:ss
weight 字段已废弃置 NULL(Doris 端不再保留)
2) 与 Hive 历史 dim_keepa_asin_info union 按 asin 去重保留 updated_time 最新
3) 写入 Hive dim_keepa_asin_info(当日分区)+ 删除所有 date_info < 今日的历史分区
4) 当日新数据(不含历史)写入 Doris dwd.dwd_keepa_asin_detail
Doris UNIQUE KEY(site_name, asin) + sequence_col=updated_time 自动取最新
执行示例: spark-submit dim_keepa_asin_info.py us 2026-05-15
"""
import
os
import
os
import
sys
import
sys
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment