I have a csv_1 that, simplified, has the following structure:
A,B,C
1,34,55
2,45,54
3,77,90
4,89,98
A second csv_2 that simplified, has the following structure:
a,b
1,Y
4,Y
I'm trying to write a third csv_3 file that will write all rows of data from csv_1 except those that appear in csv_2. That is, csv_3 in this case would be like this:
A,B,C
2,45,54
3,77,90
I am trying this:
import csv
with open("csv_1.csv", 'r', encoding = 'utf8') as f1,\
open("csv_2.csv", "r", encoding = 'utf8') as f2,\
open("csv_3.csv", "w", encoding = 'utf8') as f3:
reader1 = csv.DictReader(f1, dialect='unix', delimiter=",",
quotechar='"', quoting=csv.QUOTE_MINIMAL)
reader2 = csv.DictReader(f2, dialect='unix', delimiter=",",
quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer = csv.DictWriter(f3, dialect='unix', delimiter=",", quotechar='"',
fieldnames=("A","B","C"),
quoting=csv.QUOTE_MINIMAL)
writer.writerow()
for row1 in reader1:
if row1["A"] not in reader2:
writer.writerow(row1)
csv.DictReader
returns an iterator, so you can't do a search on itin
directly, you must get the column from the second file and store it in some data structure, preferably in a set, in order to perform the search.To write the header if you use you
csv.DictWriter
must use thewriteheader
.The code should be something like this:
You can also use
writerows
and a generator:If we want to write to the output file only some columns of the input file we can use the argument
extrasaction
with the value'ignore'
. For example, for the example above, if we just want to get the columnsA
andC
from,csv_1
just do:With what we get: