comments | difficulty | edit_url | tags | |
---|---|---|---|---|
true |
简单 |
|
表: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id 是该表的主键列(具有唯一值的列)。 该表的每一行包含一封电子邮件。电子邮件将不包含大写字母。
编写解决方案 删除 所有重复的电子邮件,只保留一个具有最小 id
的唯一电子邮件。
(对于 SQL 用户,请注意你应该编写一个 DELETE
语句而不是 SELECT
语句。)
(对于 Pandas 用户,请注意你应该直接修改 Person
表。)
运行脚本后,显示的答案是 Person
表。驱动程序将首先编译并运行您的代码片段,然后再显示 Person
表。Person
表的最终顺序 无关紧要 。
返回结果格式如下示例所示。
示例 1:
输入: Person 表: +----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+ 输出: +----+------------------+ | id | email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+ 解释: john@example.com重复两次。我们保留最小的Id = 1。
import pandas as pd
# Modify Person in place
def delete_duplicate_emails(person: pd.DataFrame) -> None:
# Sort the rows based on id (Ascending order)
person.sort_values(by="id", ascending=True, inplace=True)
# Drop the duplicates based on email.
person.drop_duplicates(subset="email", keep="first", inplace=True)
# Write your MySQL query statement below
DELETE FROM Person
WHERE id NOT IN (SELECT MIN(id) FROM (SELECT * FROM Person) AS p GROUP BY email);
# Write your MySQL query statement below
DELETE FROM Person
WHERE
id NOT IN (
SELECT id
FROM
(
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id
) AS rk
FROM Person
) AS p
WHERE rk = 1
);
DELETE p2
FROM
person AS p1
JOIN person AS p2 ON p1.email = p2.email
WHERE
p1.id < p2.id;