22S:172 Summer 2005 Midterm This exam requires the use of two data files, which may be downloaded from the course web page under "Datasets." They are customers.dat invoices.dat The "customers.dat" file contains information on some of the customers of Hank's Hardware. The variables are: customer id number customer name customer type ( C = commercial; I = individual ) customer contact date This is what the file looks like. 001 Johnson Floral C 01/02/2002 002 Wang Jian I 01/15/2002 003 TrueSale Jewelry C 06/25/2003 004 Salo Farms D 09/18/2003 005 Jackson Michael I 12/15/2003 006 Armstrong Lance I 01/13/2004 007 Askov Lighting C 12/14/2004 The "invoices.dat" file contains summary information on invoices processed by Hank's Hardware. The variables are: invoice number customer id number invoice date invoice total This is what this file looks like. 0001 002 05/15/2004 35.00 0002 002 06/12/2004 25.50 0003 001 07/15/2004 5.00 0004 002 12/15/2004 72.50 0005 003 01/25/2005 10.00 0006 005 03/15/2005 52.75 0007 002 04/12/2005 25.00 0008 001 07/12/2000 16.00 Here are your tasks: 1. Write a SAS data step that will read the customers file. Hint: The dates in the file are in mmddyy10. format. 2. Write a SAS procedure to produce the following table of counts of the number of each unique value of customer type. custtype Frequency --------------------- C 3 D 1 I 3 3. Write a SAS data step that will read the invoices file. 4. Create a new dataset by merging the customers file with the invoices file by the customer id. Include only customers who have invoices. A proc print of the merged dataset should look like this (you don't have to use the same variable names as I did, but you should get the same values): Obs custid custname custtype custdate invid invdate amount 1 001 Johnson Floral C 01/02/2002 0003 07/15/2004 5.00 2 001 Johnson Floral C 01/02/2002 0008 07/12/2000 16.00 3 002 Wang Jian I 01/15/2002 0001 05/15/2004 35.00 4 002 Wang Jian I 01/15/2002 0002 06/12/2004 25.50 5 002 Wang Jian I 01/15/2002 0004 12/15/2004 72.50 6 002 Wang Jian I 01/15/2002 0007 04/12/2005 25.00 7 003 TrueSale Jewelry C 06/25/2003 0005 01/25/2005 10.00 8 005 Jackson Michael I 12/15/2003 0006 03/15/2005 52.75 5. Use proc print to display any invoices in which the invoice date is earlier in time than the customer contact date. You should find only one such observation: Obs custid custname custtype custdate invid invdate amount 2 001 Johnson Floral C 01/02/2002 0008 07/12/2000 16.00 6. Produce a table showing the total of all invoices for each customer. Do not include the invoice with the anomalous date (the one listed in part 5). One possible table format would be the following. Others are fine as long as they include the customer id and the sum and have just one row for each customer. custid Obs N Sum ------------------------------------ 001 1 1 5.0000000 002 4 4 158.0000000 003 1 1 10.0000000 005 1 1 52.7500000 ------------------------------------ 7. Send email to me at kcowles@stat.uiowa.edu Copy in your entire SAS program and the output from steps 2, 4, 5, and 6.