Select Random Row Using Stored Procedure

The ASPSmith Articles, ASPAlliance.com

Select Random Row Using Stored Procedure

By Steven Smith
http://www.aspalliance.com/stevesmith/articles/randomselect.asp
[Example/ClassicASP]
[Example/VB.Net]
[Example/C#]

Updated 12 September 2005:
A very simply way to do this is:

select top 1 * from Customers order by newid()
This will work in many situations and is performs well.

In building the advertising and quick tips portions of ASPAlliance.com, I found a need to pull random records from a database query. I did a little bit of searching, and found this article from 4Guys, Getting A Random Record Using A Stored Procedure. This looked promising, but as I tried to use this technique, I discovered that (a) it didn't work and (b) once I got it to work it was incredibly inefficient. So after an hour or so of playing with it on my own, I developed two alternative methods of accomplishing this task, each of which works and is faster than the method described in the 4Guys article. And to prove that my code actually works, we'll let you run some examples that prove it.

Looking at the 4Guys article as a model, here is what their stored procedure does:

  • Create a temp table
  • Copy your entire query's results into this table
  • Generate a random number (using this formula: Round(((@nRecordCount - 2) * Rand() + 1), 0))
  • Grab the row with the index column corresponding to that number
Ok, so let's analyze what is wrong with this picture. First of all, why do we need to copy the ENTIRE query into a temp table. Imagine if this is an article database full of TEXT and IMAGE fields, with a million rows. Do you really want to copy several hundred megabytes of data into a temp table every time this stored procedure is called??? And what about the random number algorithm? I ran into a bit of a problem in my initial test of their stored procedure, using my table with two rows in it. For some reason, it always "randomly" returned the first row. Let me think, if the @nRecordCount = 2 for the query, what will (2-2)*Rand()+1 always return? That's right, 1. In fact, for any query, the 4Guys algorithm will never return the last record. There is one last problem with their method which didn't manifest itself until I wrote this article. While my stored procedure based on theirs worked fine in query analyzer, it failed whenever I tried to open a recordset with it. The reason is because the 4Guys procedure doesn't have "SET NOCOUNT ON", which is required for ADO to interpret the stored procedure correctly via OLEDB. This one isn't really their fault, though, because the author may not have been using this connection method. Read about this little "feature" here.

Now that I've thoroughly trashed their article (which I only do because it deserved it), let's do this the right way. First, let's try using their basic method, but with the correct random number algorithm and only copying the primary key of our query into the temp table:

CREATE PROCEDURE dbo.sp_GetRandomQuickTip
AS
    BEGIN
		declare @num_recs int
		declare @rand_num int

		SET NOCOUNT ON 
		-- Create a temporary table with the primary key of the table to return
		-- and identity column to use for the random lookup
		CREATE TABLE #TempTable
		(
		  quick_tip_id int,
		  idNum int identity(1,1)
		)

		-- Fill temp table with primary keys
		INSERT INTO #TempTable
		Select quick_tip_id FROM t_quick_tip

		-- Get the number of records in our temp table
		set @num_recs = @@rowcount
		--Select @num_recs = count(*) From #TempTable

		-- Select a random number between 1 and the number
		-- of records in our table
		Set @rand_num = Round(((@num_recs - 1) * Rand() + 1), 0)

		-- Select the record from the table, joined with the temp table
		select qt.quick_tip_id, qt.columnist_id, qt.name, 
		  qt.start_date, qt.end_date, qt.create_date, 
		  qt.disable, qt.description
		from t_quick_tip qt
		Inner Join #TempTable t
		On t.quick_tip_id = qt.quick_tip_id
		Where qt.quick_tip_id = t.quick_tip_id
		And t.idNum = @rand_num
    END

This works pretty well. Note that our algorithm uses @num_recs-1 instead of -2, which will work for any size of query. There's still a bit of overhead in this method, though, which I didn't like. The fact that we're using a temp table at all and filling it with potentially millions of rows just doesn't sound right. So I played with the code a bit more and came up with the twin of this function, sp_GetRandomQuickTip2, which also returns a random Quick Tip from my database, but does it in a very different manner:

CREATE PROCEDURE dbo.sp_GetRandomQuickTip2
AS
    BEGIN
		declare @quick_tip_id int
		declare @rand_num int
		declare @num_recs int

		-- Get Random Number from number of records
		select @num_recs = count(*) from t_quick_tip
		-- Select a random number between 1 and the number
		-- of records in our table
		
		-- Rounding error corrected by Finn Gundersen; old version commented below
		-- set @rand_num = Round(((@num_recs - 1) * Rand() + 1), 0)
        set @rand_num = Round(@num_recs * Rand() + 1, 0, 1)
        
		-- Create a local, static, read-only, scrollable cursor
		-- Needs scrollable to use fetch absolute, otherwise we
		-- would use forward-only
		declare #mycursor cursor scroll static read_only for 
		    select quick_tip_id from t_quick_tip
		open #mycursor
		fetch absolute @rand_num from #mycursor into @quick_tip_id
		close #mycursor
		deallocate #mycursor

		-- Select the columns from the table joined for the key 
		-- selected from the cursor
		select qt.quick_tip_id, qt.columnist_id, qt.name, 
		  qt.start_date, qt.end_date, qt.create_date, 
		  qt.disable, qt.description
		from t_quick_tip qt
		where qt.quick_tip_id = @quick_tip_id
    END

This example is a bit more advanced from a SQL perspective, in that it makes use of a cursor. A full discussion of cursors is beyond the scope of this article, but basically you can use one to loop a query just like you can use a recordset in ASP, or to jump to a particular position in a query. (Thanks to Starkman and Jonathan for help optimizing this procedure) And again, for performance's sake, I only use a query with the primary key in it for my loop query, and then I use that primary key to return the whole result at the end of the procedure. Note that my random number algorithm remains unchanged.

In my un-scientific tests of both methods, I found that the second technique tended to outperform the first one by 20 to 30%. However, both techniques vary in time to process based on how large a random number is generated, because more records need to be traversed for higher numbers. So in order to do an apples-to-apples comparison, you would want to hard-code the random number to something, and test both methods with @rand_num = 1, with @rand_num = @num_recs/2 (halfway point), and @rand_num = @num_recs. I leave this as an exercise for the reader. Trust me, the second method is a bit faster, and both techniques are faster than the original 4Guys method.

Ok, so I have an example that is really simple. It consists of a button that returns a quick tip link for each method, and displays the elapsed time. Here is the code, in ASP:

   <% OPTION EXPLICIT %>
   <!-- #INCLUDE VIRTUAL="/stevesmith/include/articleformat.asp" -->
   <!-- #INCLUDE VIRTUAL="/stevesmith/include/asptimer.asp" -->
   <%
   'Declare Variables
   Dim start_time
   Dim end_time
   Dim fSubmit
   Dim conn
10   Dim sql
11   Dim rs
12   Dim arrResult1
13   Dim arrResult2
14   Dim strTime1
15   Dim strTime2
16   
17   'What's this?
18   'See http://www.aspalliance.com/stevesmith/articles/selfsubmittutorial.asp
19   fSubmit = Request("btnSubmit") = "Go"
20   
21   Set conn = Server.CreateObject("ADODB.Connection")
22   Set rs = Server.CreateObject("ADODB.Recordset")
23   
24   Call ArticleHeader("Random Select With Stored Procedure Example","","")
25   
26   conn.Open Application("aspdb_connectionstring")
27   sql = "sp_GetRandomQuickTip"
28   start_time = milliDif()
29   rs.Open sql, conn
30   end_time = milliDif()
31   strTime1 = "Time to process: " & (end_time-start_time)/1000 & "s"
32   If Not rs.EOF Then
33    arrResult1 = rs.GetRows
34   End If
35   rs.Close
36   
37   sql = "sp_GetRandomQuickTip2"
38   start_time = milliDif()
39   rs.Open sql, conn
40   end_time = milliDif()
41   strTime2 = "Time to process: " & (end_time-start_time)/1000 & "s"
42   If Not rs.EOF Then
43    arrResult2 = rs.GetRows
44   End If
45   rs.Close
46   conn.Close
47   
48   set rs = Nothing
49   set conn = Nothing
50   
51   %>

52   <a href="/stevesmith/articles/randomselect.asp">Return To Article</a>
53   <form name="form1" action="<%=request.servervariables("url")%>" method="post">
54   <table width="100%">
55   <tr>
56    <td valign="top" colspan="2">
57    <input type="submit" name="btnSubmit" value="Go">
58    </td>
59   </tr>
60   <tr>
61    <td valign="top">
62    <%=strTime1%>
63    </td>
64    <td valign="top">
65    <%=strTime2%>
66    </td>
67   </tr>
68   <tr>
69    <td valign="top" width="50%">
70    <b><%=arrResult1(2,0)%></b><br>
71    <%=arrResult1(7,0)%>
72    </td>
73    <td valign="top" width="50%">
74    <b><%=arrResult2(2,0)%></b><br>
75    <%=arrResult2(7,0)%>
76    </td>
77   </tr>
78   </table>
79   </form>
80   <%
81   Call ArticleFooter()
82   %>

The articleformat include file simply adds the headers and footers to the page, and has no impact on the example. The asptimer include file is based on a LearnASP.com article located here. Here is its source code, which is literally just taken from Charles Carroll's original code:

   <SCRIPT LANGUAGE=JScript RUNAT=Server>
   function y2k(number) {
    return (number < 1000) ? number + 1900 : number;
    }
   function milliDif() {
    var d = new Date();
    return d.getTime()
    }
   
10   function elapsedpretty(parm1)
11   {
12    var elapsedsecs = 0
13    var elapsedmins = 0
14   
15    elapsedsecs=Math.floor(parm1/1000)
16    parm1=parm1%1000
17   
18    elapsedmins=Math.floor(elapsedsecs/60)
19    elapsedsecs=elapsedsecs%60
20   
21   
22   elapsedpretty=elapsedmins + " minute"
23   if(elapsedmins!=1)
24    elapsedpretty=elapsedpretty+"s"
25   
26   elapsedpretty = elapsedpretty+" " + elapsedsecs+" second"
27   if(elapsedsecs!=1)
28    elapsedpretty=elapsedpretty+"s"
29   
30   elapsedpretty = elapsedpretty+ " "+parm1+" millisecond"
31   if(parm1!=1)
32    elapsedpretty=elapsedpretty+"s"
33   
34    return elapsedpretty;
35   }
36   </script>
Ok, now let's see how it would be done in ASP.NET. Just to make this article more interesting, I'll list the code in all three standard languages of .NET: VB, C#, and JScript.


//coming soon
C# VB JScript

I'll add the ASP.NET version of the example shortly. Note that right now there are less than ten rows in the t_quick_tip table, so the times you'll get are really more of a factor of how heavily loaded the database is at any given instant than of how fast the stored procedures are executing. With larger tables on a server with no other users, you could use this technique to properly compare the two methods, though. In any event, they both work.

Followup (February 2002):

One reader, Charles Cherry, did some testing using this technique, with the following results:

I ran your example on a small table (12 records). Each time I selected a record, I incremented a counter field on that record called "Impressions". After running the stored procedure ten thousand times, I graphed the resulting Impressions field for each of the twelve records.

The middle ten records were randomized quite nicely, with a fairly even distribution. However, the first and the last record had only been hit about half of the times the other records had been.

Here are the numbers:

# Impressions
1  441
2  919
3  875
4  892
5  975
6  896
7  846
8  858
9  962
10 933
11 921
12 482

I'm not a math whiz, nor a SQL Server expert, but it seems like there is
either something wrong with the way you are selecting the random number,
or possibly the way in which SQL Server's RAND function works.


I haven't had a chance to further investigate this myself yet, but I thought it worth mentioning. If anyone else can offer some more insight into this behavior, or confirm it on a separate system, I'd appreciate it.

Update: 18 Feb 2002:
Your use of Round makes all the difference. The first row is selected only
if the value in your expression is 0-0.5 and the last row if it is between
ROWCOUNT-0.5 and ROWCOUNT. All other rows have a full identity interval,
for instance row N from N-0.5 to N+0.5. In a uniform random distribution
this means that the first and last rows are selected only half as often as
they should.

There are several solutions to this, one of which is to use truncation
instead of rounding. Change your formula to:

Round(@num_recs * Rand() + 1, 0, 1)

--
Finn Espen Gundersen

Thanks, Finn! Please note this change (which I've made above and left the old version in a comment), everyone.