You have 2 lists and you want to know:

  • Items in list 1 that are also in list 2
  • Items in list 1 that are not in list 2

In my work, that might be lists of a few thousand URLs to compare.

You can use diff, sdiff, comm, FileMerge, etc., but they tend to work only when the 2 lists are reasonably similar. No good when you have thousands of items and limited overlap between the lists.

You can use sqlite3. I do. Import each list into a table and write simple queries. But that’s several steps and not everyone is fluent in SQL.

You could put it into Excel or Numbers and use VLOOKUP or similar, but slooowww.

So we use grep. Lets assume 1 item per line and each item is the whole line.

Items in file1 that are also in file2:

grep -x -F -f file2 file1

Items in file1 that are not in file2:

grep -x -v -F -f file2 file1

What the switches mean:

-x means match the entire line

-F means perform a literal match instead of a regex match

-f means match the expressions from the specified file

It’s fast and cool and you don’t need to sort the lists.

Example:

andrew$ cat file1
item1
item2
item4
item5

andrew$ cat file2
item1
item3
item4
item6

andrew$ grep -x -F -f file2 file1
item1
item4

andrew$ grep -x -v -F -f file2 file1
item2
item5