You have 2 lists and you want to know:
- Items in list 1 that are also in list 2
- Items in list 1 that are not in list 2
In my work, that might be lists of a few thousand URLs to compare.
You can use
comm, FileMerge, etc., but they tend to work only when the 2 lists are reasonably similar. No good when you have thousands of items and limited overlap between the lists.
You can use
sqlite3. I do. Import each list into a table and write simple queries. But that’s several steps and not everyone is fluent in SQL.
You could put it into Excel or Numbers and use
VLOOKUP or similar, but slooowww.
So we use
grep. Lets assume 1 item per line and each item is the whole line.
Items in file1 that are also in file2:
grep -x -F -f file2 file1
Items in file1 that are not in file2:
grep -x -v -F -f file2 file1
What the switches mean:
-x means match the entire line
-F means perform a literal match instead of a regex match
-f means match the expressions from the specified file
It’s fast and cool and you don’t need to sort the lists.
andrew$ cat file1 item1 item2 item4 item5 andrew$ cat file2 item1 item3 item4 item6 andrew$ grep -x -F -f file2 file1 item1 item4 andrew$ grep -x -v -F -f file2 file1 item2 item5