How to require all checked conditions are met by a result

Problem You are developing a consultant search application where by a user looking for a consultant can check all the items in the list they require a consultant to have. You have 2 tables. consultant and consultant_qual. The consultant_qual has primary key formed by 2 fields consultant_id and qual_id. qual_id contains a constrained list with things like 'OpenLayers', 'PostGIS', 'Python', 'C#' etc.

How do you devise a query that given a list of checked options by the user, only returns consultants with not 1 but all of those qualifications?

Solutions

When looking at a problem like this, I am always amazed at the number of ways you can solve this in SQL. Some very convoluted and slow, some very convoluted looking and fast, and some pretty fast and pretty simple. How can that be when SQL is a declarative language? It all boils down to what angle you look at the problem or rather how you restate the problem and also the fact that like in other languages there are a myriad of ways to say the same thing and some databases and dataset patterns do better with one than another.

To demonstrate:

  1. The divide and conquer approach: Treat this as an n step repeating problem and only solutions that are solutions to each step will be the solutions to the final problem.
  2. The lets just do this in one gulp approach: The number of checked options by the user must equal the number of qualifications that a consultant satisfies that are in the list of checked options by the user.
Divide and Conquer Solutions
-- This will work in databases that support ANSI INTERSECT (e.g. PostgreSQL and SQL Server 2005/2008)
--but not in MySQL, SQL Server 2000 for example
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
        INNER JOIN consultant_qual AS Q ON C.consultant_id = Q.consultant_id
    WHERE Q.qual_id = 'OpenLayers'
INTERSECT
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
        INNER JOIN consultant_qual AS Q ON C.consultant_id = Q.consultant_id
    WHERE Q.qual_id = 'PostGIS';
    
    
--This will work in any database that supports EXISTS (which is a lot)
-- and uses several correlated subqueries (that is just variant -- I can count at least one more)
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
    WHERE EXISTS(SELECT Q.consultant_id 
                    FROM consultant_qual As Q 
                        WHERE Q.qual_id = 'OpenLayers' AND C.consultant_id = Q.consultant_id)
        AND EXISTS(SELECT Q.consultant_id 
                    FROM consultant_qual As Q 
                        WHERE Q.qual_id = 'PostGIS' AND C.consultant_id = Q.consultant_id);
                        

--This will work in any database that supports IN (which is even more than EXISTS)
-- and uses several non-correlated subqueries
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
    WHERE C.consultant_id IN(SELECT Q.consultant_id 
                    FROM consultant_qual As Q 
                        WHERE Q.qual_id = 'OpenLayers')
        AND C.consultant_id IN (SELECT Q.consultant_id 
                    FROM consultant_qual As Q 
                        WHERE Q.qual_id = 'PostGIS');
                        
-- This will work in any database that supports subselects in JOINS
-- which is a lot
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
        INNER JOIN
            (SELECT Q.consultant_id 
                FROM consultant_qual As Q 
                        WHERE Q.qual_id = 'OpenLayers') AS Q1
                        ON C.consultant_id = Q1.consultant_id 
        INNER JOIN
            (SELECT Q.consultant_id 
                FROM consultant_qual As Q 
                        WHERE Q.qual_id = 'PostGIS') AS Q2
                        ON C.consultant_id = Q2.consultant_id ;
                        

One Gulp solutions

The nice thing about the gulp solutions is that regardless of what the user checks, your query is more or less the same and always has the same number of sub queries. There are lots of permutations we can think of here too, but these are just 3 to demonstrate.

-- This will work in pretty much any relational database I can think of
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C INNER JOIN consultant_qual As Q 
            ON C.consultant_id = Q.consultant_id
    WHERE  Q.qual_id IN('OpenLayers', 'PostGIS')
    GROUP BY C.consultant_id, C.consultant_name
    HAVING COUNT(Q.qual_id) = 2;

                
-- This will work in pretty much any relational database I can think of
-- If it supports subselects in WHERE it will work
SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
    WHERE 2 = (SELECT COUNT(Q.qual_id) 
            FROM consultant_qual As Q 
            WHERE Q.consultant_id = C.consultant_id 
                AND Q.consultant_qual IN('OpenLayers', 'PostGIS')) ;
                
--This on first glance looks redundant, 
-- but takes advantage of the fact that the first condition is easy for most relational dbs to optimize
-- and the second will never be run if the first check fails

SELECT C.consultant_id, C.consultant_name
    FROM consultant AS C 
    WHERE EXISTS(SELECT Q.qual_id FROM consultant_qual As Q 
            WHERE Q.consultant_qual IN('OpenLayers', 'PostGIS')
                AND Q.consultant_id = C.consultant_id) AND 
                (2 = (SELECT COUNT(Q.qual_id) 
            FROM consultant_qual As Q 
            WHERE Q.consultant_id = C.consultant_id 
                AND Q.qual_id IN('OpenLayers', 'PostGIS') ) ) ;