Py学习  »  Python

Python循环从列表中删除重复项+原始项?

teightdev • 3 年前 • 1247 次点击  

因此,我有一个csv文件,我想导入,并希望根据第一列中的用户编号跳过导入csv文件中的重复行和原始行,我正在使用StringIO模块。我现在这样做的方式是不正确的,因为即使它跳过了重复的行,它仍然会导入我相信的原始行。跳过从csv导入重复行和原始行的最佳方法是什么?

    def csv_import(stream):

        ostream = StringIO()
        headers = stream.readline()
        ostream.write(headers)

        seen_user_numbers = {}

        for row in stream:
            list_row = row.split(',')
            user_number = list_row[0]

            if user_number in seen_user_numbers:
                seen_user_numbers.pop(user_number)
                continue

            seen_user_numbers[user_number] = True
            ostream.write(row)

        ostream.seek(0)
        return ostream
   
Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/132035
 
1247 次点击  
文章 [ 1 ]  |  最新文章 3 年前
Alain T.
Reply   •   1 楼
Alain T.    3 年前

因为在到达输入文件的末尾之前,无法确定是否包含行,所以需要将所有未排除的行存储在内存中,然后才能将它们写入文件。

你可以用字典来做到这一点:

def csv_import(stream):

    ostream = StringIO()
    headers = stream.readline()
    ostream.write(headers)

    outputLines = dict()  # will use None for lines to exclude

    for row in stream:
        list_row = row.split(',')
        user_number = list_row[0]
        
        if user_number in outputLines:
            outputLines[user_number] = None
        else:
            outputLines[user_number] = row
        
    for row in filter(None,outputLines.values()):
        ostream.write(row)

    ostream.seek(0)
    return ostream