Welcome!

Hi there! Hopefully, you've come for one or more of my pearls of wisdom, or at least a good laugh :)

Hit the main page for the latest from me, or the archives for the older (but not necessarily better) stuff.

Monday 4 July 2011

Project Euler, Problem 2

Moving straight on to the second of the problems posed by Project Euler:

"By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms."

Again, my mind started thinking in terms of the sets that are involved in the question:
  1. The set of numbers that make us the Fibonacci sequence.
  2. The set of numbers less than or equal to four million.
  3. The set of even numbers.
The problem is looking for the sum of numbers that form the intersection between these sets.

In my last post, we covered using the modulus operator to determine if one number is divisible by another, so that's a straight-forward idea for us to implement.

Applying a condition such as the limit of four million given to us is similarly simple.

But how do we calculate the Fibonacci sequence in T-SQL?

Fibonacci? Shmibonacci!
Let's first look at the Fibonacci sequence. The first few terms of the sequence are:
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...

Hopefully, you can see that each term after the first two is simply the sum of the previous two terms in the sequence, i.e. Fib(n) = Fib(n - 1) + Fib(n - 2).

Such sequences are sometimes referred to as recursive sequences because of this self-referencing way of calculating the next value in the sequence.

So, the question now becomes: "How do I perform recursion in T-SQL?" Well, just like in the first problem, you could use a cursor or WHILE loop to perform the recursion in a similar fashion to the iteration that we did in that problem. But again, I feel that there is a better solutiona available to us.

When discussing the first Euler problem, I introduced the Common Table Expression, or CTE, as a way of quickly defining sub-queries that can be re-used in a main query. However, I did not mention a really fun aspect of them: self-referencing. Not only can you define multiple CTEs, and reference another, prior CTE in a later CTE, but you can also reference the same CTE within itself if you wish.

Here's a piece of typical syntax:
WITH Recursion AS
(
 SELECT  table_1.a,
    table_1.b
 FROM  table_1
 UNION ALL
 SELECT  table_1.a,
    table_1.b
 FROM  table_1
 INNER JOIN Recursion
   ON Recursion.b = table_1.a
 WHERE  table_1.b IS NOT NULL
)
SELECT *
FROM Recursion
There are a few important things to note with this approach:
  1. Note the two sub-queries involved. The top half is known as an anchor member and the lower half is the recursive member.
  2. You can have multiple anchor members in a recursive CTE, but only one recursive member.
  3. All of the set-based operators (UNION, UNION ALL, EXCEPT and INTERSECT) can be used to separate anchor members from each other.
  4. Only the UNION ALL set-based operator can be used to separate the last anchor member from the recursive member.
  5. By definition, the recursive member is the only member allowed to reference the CTE it is part of, and even then it can only be referenced once; you cannot use aliasing to make two or more references to the CTE within a recursive member.
  6. To prevent infinite loops caused by queries that have no inherent stopping point, recursive CTEs by default will only recurse to 100 levels deep. Whilst this may be enough for most purposes, sometimes you just need to go deeper! You can use the MAXRECURSION query hint to override this depth limit. It has a maximum of 32767 or, if you are very, very confident that you won't cause an infinite recursion, you can disable the safety feature by setting MAXRECURSION to 0.
So, with this knowledge of recursion using CTEs, how do we solve this problem? This was my attempt:

WITH FibonacciGenerator AS
 (
  SELECT 1  AS n,
    0  AS [F(n-1)],
    1  AS [F(n)]
  UNION ALL
  SELECT n + 1,
    [F(n-1)].[F(n)],
    [F(n-1)].[F(n-1)] + [F(n-1)].[F(n)]
  FROM FibonacciGenerator AS [F(n-1)]
  WHERE [F(n-1)].[F(n-1)] + [F(n-1)].[F(n)] < 4000000
 )
 SELECT SUM(FibonacciGenerator.[F(n)]) AS Result
 FROM FibonacciGenerator
 WHERE FibonacciGenerator.[F(n)] % 2 = 0;
where n is the term in the sequence, F(n) is the Fibonacci value for n, and F(n-1) is the previous Fibonacci term to that one.

It renders the result of 4613732 in 1ms on my PC.