Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
A
Amazon-Selection-Data
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
abel_cjy
Amazon-Selection-Data
Commits
153fd5ec
Commit
153fd5ec
authored
Jun 18, 2026
by
hejiangming
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
no message
parent
edd42c8d
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
3 deletions
+4
-3
dws_aba_word_freq_cate.py
Pyspark_job/dws/dws_aba_word_freq_cate.py
+4
-3
No files found.
Pyspark_job/dws/dws_aba_word_freq_cate.py
View file @
153fd5ec
...
@@ -668,9 +668,10 @@ class DwsAbaWordFreqCate(Templates):
...
@@ -668,9 +668,10 @@ class DwsAbaWordFreqCate(Templates):
# 为什么先 dedup 再 avg:rank 是 search_term 维度的 ABA 排名(每个搜索词一个值,与分类无关),
# 为什么先 dedup 再 avg:rank 是 search_term 维度的 ABA 排名(每个搜索词一个值,与分类无关),
# 上面 dedup 已按 (分类,base,search_term) 去重,所以叠词('shoe shoe rack')不会把同一搜索词算两次。
# 上面 dedup 已按 (分类,base,search_term) 去重,所以叠词('shoe shoe rack')不会把同一搜索词算两次。
# 与 min_rank(分类内最佳排名)同范围:都只看本分类、本 scope 白名单过滤后的搜索词。
# 与 min_rank(分类内最佳排名)同范围:都只看本分类、本 scope 白名单过滤后的搜索词。
# 例:base='towel' 在某分类关联 3 个去重搜索词 rank=[5, 50, 200] → avg_rank=85
.0
# 例:base='towel' 在某分类关联 3 个去重搜索词 rank=[5, 50, 200] → avg_rank=85
(四舍五入取整)
# 口径与临时报表 spark_hjm/月词频计算/word_head.py 的 avg_rank 一致(那边是全站、不分类)。
# 口径与临时报表 spark_hjm/月词频计算/word_head.py 的 avg_rank 一致(那边是全站、不分类)。
F
.
round
(
F
.
avg
(
'rank'
),
2
)
.
alias
(
'avg_rank'
),
# 四舍五入到整数:下游导出到 PG 集群,平均排名不需要小数(round 到 0 位,下面 select 再 cast int)
F
.
round
(
F
.
avg
(
'rank'
),
0
)
.
alias
(
'avg_rank'
),
F
.
sum
(
F
.
when
(
F
.
col
(
'date_info_first'
)
>=
self
.
year_ago
,
1
)
.
otherwise
(
0
))
.
alias
(
'new_st_num'
),
F
.
sum
(
F
.
when
(
F
.
col
(
'date_info_first'
)
>=
self
.
year_ago
,
1
)
.
otherwise
(
0
))
.
alias
(
'new_st_num'
),
F
.
expr
(
'min_by(search_term, rank)'
)
.
alias
(
'top_aba_example'
))
F
.
expr
(
'min_by(search_term, rank)'
)
.
alias
(
'top_aba_example'
))
...
@@ -745,7 +746,7 @@ class DwsAbaWordFreqCate(Templates):
...
@@ -745,7 +746,7 @@ class DwsAbaWordFreqCate(Templates):
F
.
col
(
'word_heat'
),
F
.
col
(
'word_heat'
),
F
.
col
(
'relate_st_num'
)
.
cast
(
'int'
)
.
alias
(
'relate_st_num'
),
F
.
col
(
'relate_st_num'
)
.
cast
(
'int'
)
.
alias
(
'relate_st_num'
),
F
.
col
(
'min_rank'
)
.
cast
(
'int'
)
.
alias
(
'min_rank'
),
F
.
col
(
'min_rank'
)
.
cast
(
'int'
)
.
alias
(
'min_rank'
),
F
.
col
(
'avg_rank'
)
,
# 分类内平均排名(保留 2 位小数,不
取整)
F
.
col
(
'avg_rank'
)
.
cast
(
'int'
)
.
alias
(
'avg_rank'
),
# 分类内平均排名(四舍五入
取整)
F
.
col
(
'new_st_num'
)
.
cast
(
'int'
)
.
alias
(
'new_st_num'
),
F
.
col
(
'new_st_num'
)
.
cast
(
'int'
)
.
alias
(
'new_st_num'
),
F
.
col
(
'word_heat_last_year'
),
F
.
col
(
'word_heat_last_year'
),
F
.
col
(
'word_heat_change_rate'
),
F
.
col
(
'word_heat_change_rate'
),
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment