Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
A
Amazon-Selection-Data
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
abel_cjy
Amazon-Selection-Data
Commits
a3a44cc8
Commit
a3a44cc8
authored
May 22, 2026
by
chenyuanjie
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
导出pg集群表待调用keepa数据
parent
0a04ad9f
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
9 deletions
+9
-9
dwt_asin_sync.py
Pyspark_job/dwt/dwt_asin_sync.py
+4
-4
export_asin_without_keepa.py
Pyspark_job/script/export_asin_without_keepa.py
+5
-5
No files found.
Pyspark_job/dwt/dwt_asin_sync.py
View file @
a3a44cc8
...
...
@@ -200,10 +200,10 @@ class DwtAsinSync(Templates):
df_keepa
=
self
.
spark
.
sql
(
f
"""
select asin from dim_keepa_asin_info
where site_name = '{self.site_name}'
and package_length >=
0
and package_width >=
0
and package_height >=
0
and weight >= 0
and package_length >
0
and package_width >
0
and package_height >
0
and (package_weight > 0 or item_weight > 0)
"""
)
.
repartition
(
40
,
'asin'
)
df
=
df
.
join
(
df_keepa
,
on
=
'asin'
,
how
=
'left_anti'
)
.
cache
()
print
(
f
"排除keepa后数据量: {df.count()}"
)
...
...
Pyspark_job/script/export_asin_without_keepa.py
View file @
a3a44cc8
...
...
@@ -244,15 +244,15 @@ class ExportAsinWithoutKeepa(object):
print
(
f
"筛选后数据量: {df.count()}"
)
# 排除 dim_keepa_asin_info 中已有有效keepa数据的ASIN
#
若 package_length/width/height/weight 任意一个 < 0,视为数据异常,不排除(需重新抓取
)
#
有效定义:长/宽/高 >= 0,且 package_weight 或 item_weight 任意一个 > 0(与 export_need_profit_rate 取数规则一致
)
print
(
"8. 排除已有keepa数据的ASIN (dim_keepa_asin_info)"
)
df_keepa
=
self
.
spark
.
sql
(
f
"""
select asin from dim_keepa_asin_info
where site_name = '{self.site_name}'
and package_length >
=
0
and package_width
>=
0
and package_height >
=
0
and
weight >= 0
and package_length > 0
and package_width
>
0
and package_height > 0
and
(package_weight > 0 or item_weight > 0)
"""
)
.
repartition
(
40
,
'asin'
)
df
=
df
.
join
(
df_keepa
,
on
=
'asin'
,
how
=
'left_anti'
)
.
cache
()
print
(
f
"排除keepa后数据量: {df.count()}"
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment