In this example the R-code scans and processes the input (stdin) line-by-line and calculates some score for each line. Now let's suppose that I want to do scoring for 5 loan products and I have samples of n size for each product so all together 5 x n lines. If n = 200 I can set nlines = 200. If the sample is ordered then each cycle in the loop will process one product. But normally, the structure is not that symmetric, and n can be different for different products.
How should I write the SQL query in SQL-MR so that the stream function be called 5 times, once for each product?
I found a tutorial on www.asterdata.com on how to use R with asterdata. The main body of the mapper looks like this:
while (1)
{
input_list = scan(stdin,what=list(stock_id=" ",open_price=0),nlines=1, quiet=TRUE)
id<-input_list[["stock_id"]]
open_price<-input_list[["open_price"]]
if (length(id) == 0)
break
input<-open_price
score = score_function(input)
# Output original tuple with attached score
result = c(id, score)
write(result, stdout(), sep=DELIMITER, ncolumns = length(result))
}
In this example the R-code scans and processes the input (stdin) line-by-line and calculates some score for each line. Now let's suppose that I want to do scoring for 5 loan products and I have samples of n size for each product so all together 5 x n lines. If n = 200 I can set nlines = 200. If the sample is ordered then each cycle in the loop will process one product. But normally, the structure is not that symmetric, and n can be different for different products.
How should I write the SQL query in SQL-MR so that the stream function be called 5 times, once for each product?
Thanks,
Attila
xx
xxx