Create a Document in your PADJO2015 folder named: PADJO2015-MIDTERM-WSJ-MEDICARE.
Every problem should have its own page devoted to it. Just like in the Midterm Babysteps assignment, e.g. a separate page for problem 2A, and another page for problem 2B, etc.
Please do this assignment on your own.
If you don’t have a relatively new computer. Or you have less than 20GB of free disk space…you should do this exercise on the journalism lab computers.
Download the medicare_providers_2012.sqlite.zip, which I’ve indexed for your convenience. Be warned: it is nearly 700MB zipped and expands to more than 2 gigabytes. Again, probably prudent to use the journalism lab computers for this.
For each problem, write the SQLite query that will return the given result.
Your query must return the same results – including the same column headers and format.
Work on your own on this one. You can do it. Start by reading the companion tutorial on the subject.
The first part of this midterm is simply writing the SQL queries that ostensibly back to the hard numbers and assertions in the first article in the WSJ's Medicare Unmasked series.
The second part of this midterm is to read one of the other stories in the WSJ series – or a story by any news organization about the same dataset. Then pick a (non-trivial) fact that you believe can be found in the Medicare database, and the query to replicate it (i.e. this part is much shorter than the first part).
Download the zipped SQLite database (2GB+ when unzipped): medicare_providers_2012.sqlite.zip
In the image above, I've taken the PDF that the WSJ submitted for its Pulitzer-winning entry and highlighted in purple all of the assertions and facts that can be found directly in the 2012 Medicare payment database. While a significant portion of the article can be fact-checked via the raw data, as you re-read the WSJ article, take note of the things that require reporting and research independent of the database.
Here's the excerpt pertaining to Dr. Weaver:
Ronald S. Weaver isn’t a cardiologist. Yet 98% of the $2.3 million that the Los Angeles doctor’s practice received from Medicare in 2012 was for a cardiac procedure, according to recently released government data. The government data show that out of the thousands of cardiology providers who treated Medicare patients in 2012, just 239 billed for the procedure, and they used it on fewer than 5% of their patients on average.
The 141 cardiologists at the Cleveland Clinic, renowned for heart care, performed it on just six patients last year. Dr. Weaver’s clinic administered it to 99.5% of his Medicare patients — 615 in all — billing the federal health-insurance program for the elderly and disabled 16,619 times, according to the data.
Hello, you don't have to actually do any problems based on this excerpt. It's all in the tutorial. These first couple of paragraphs – and the SQL needed to fact-check them – will introduce you to most of the concepts needed to do the rest of this test.
More about Dr. Weaver, except focusing on how his total payment received compares to the rest of Medicare providers:
More than 2,300 providers earned $500,000 or more from Medicare in 2012 from a single procedure or service, according to a Wall Street Journal analysis of Medicare physician-payment data made public for the first time in April. A few of those providers, including Dr. Weaver, collected more from the single procedures than anyone else who billed for them — by very large margins. The data release was prompted by a Journal legal effort to make the information public.
Warning: on a relatively speedy laptop, the subquery can take as much as 40 seconds. Rather than have you worry about whether the query you wrote is taking a long time for the right reasons, I've provided the nested query for you as a warmup gift:
SELECT SUM(average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments WHERE `nppes_entity_code` = 'I' GROUP BY npi, hcpcs_code
Again, running that query will take as long as 40 seconds, even on a SSD-based Macbook Pro. It will also return 8.5 million rows.
So you need to write the main query that aggregates the above query – i.e. a
SELECT COUNT statement and
WHERE condition – to get the "2,300+ providers" statistic:
(Note: the derivation of
total_hcpcs_medicare_payment_amt is used in many of the queries for this story. Feel free to re-use it)
SELECT COUNT(*) FROM( SELECT SUM(average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments WHERE `nppes_entity_code` = 'I' GROUP BY npi, hcpcs_code) AS my_sub_query WHERE total_hcpcs_medicare_payment_amt > 500000;
This is a loaded statement, and one of the key assertions for the entire article. All of the providers profiled by the WSJ are seen as outliers in what they earn from Medicare, though for different reasons and metrics of comparison.
Dr. Weaver is the lead anecdote in the story for three apparent reasons:
So there are apparently a "few" other providers who fit criteria #1 and #2. But how many of these providers are receiving an unusually high amount of Medicare payments for a procedure unusual for the providers' specialties? In fact, what does the WSJ consider to be "very large margins" – and how unusual is the margin that Dr. Weaver has compared to other providers?
At the very least, we would have to write a query that would find the top 2 providers per procedure, then calculate the average margin. This (as far as I know) is not easy in SQLite – and maybe slightly easier in more complicated SQL flavors.
For now, let's just get the other top providers for the specific procedure that Dr. Weaver is supposedly an outlier for so that we can see how large a "very large margin" is.
Here's the query to get the HCPCS code for Dr. Weaver's preferred procedure (we learned in the previous section that his NPI is
SELECT hcpcs_code FROM payments WHERE npi = 1669465928 ORDER BY (average_Medicare_payment_amt * line_srvc_cnt) DESC LIMIT 1;
Now use that
hcpcs_code to find all the doctors that have also administered this specific procedure, and how they compare to Dr. Weaver in terms of the total Medicare payment received.
Note: technically there's no reason to include the doctor names, which means there's no reason to join
providers. But I'm making you do it for this answer just for practice…notice that I've used aliases to shorten up the column names, e.g.
Looks like Dr. Weaver receives about 3 times as much for this specific procedure as the second-highest earner. I've included
provider_type since the WSJ asserts that it is notable that Dr. Weaver is not a cardiologist.
Here's a hint of the answer – which should have an
SELECT nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, payments.provider_type, ROUND(average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments
SELECT nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, payments.provider_type, ROUND(average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments INNER JOIN providers ON providers.npi = payments.npi WHERE hcpcs_code = 'G0166' GROUP BY payments.npi, hcpcs_code ORDER BY total_hcpcs_medicare_payment_amt DESC LIMIT 5;
Among the doctors whose billings stand out is Evangelos G. Geraniotis, a urologist in Hyannis, Mass. Dr. Geraniotis received $2.1 million from Medicare in 2012, the most of any member of his specialty.
Nearly $1 million of that sum came from a procedure not considered routine in a urological practice. Known as a “cystoscopy and fulguration,” it involves threading a scope up the male urethra to burn potentially cancerous lesions inside the bladder. According to his Medicare billings, Dr. Geraniotis performed two variations of the procedure 1,757 times in 2012.
Of the 8,791 providers whose specialty is listed in the Medicare data as urology, 973 billed for the procedure, doing so an average of 38 times. The urologist who billed for the second-most performed the procedure less than one-third as often as Dr. Geraniotis did, the data show.
…Dr. Geraniotis said the more than $500 he received from Medicare each time he billed for the procedure played no role in his medical judgment and, by performing the procedure in his office, he keeps patients out of the hospital.
providers table and confirm that someone with the name of "Evangelos Geraniotis" is indeed a urologist in Massachusetts.
Write the minimal
WHERE condition to get exactly one result row. Here's the
SELECT...FROM clause I use:
SELECT npi, nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, provider_type, nppes_provider_city AS city, nppes_provider_state AS state FROM providers WHERE ...
SELECT npi, nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, provider_type, nppes_provider_city AS city, nppes_provider_state AS state FROM providers WHERE first_name LIKE 'EVAN%' AND last_name = 'GERANIOTIS' AND state = 'MA';
If you try to filter by
'cystoscopy and fulguration', you'll come up empty. I think the best strategy is to just list the top few of Dr. Geraniotis's billed procedures (he has more than 40 in the 2012 dataset), and see if anything like
'cystoscopy and fulguration' shows up.
For your convenience, here's the query that will help you find the actual answer query:
SELECT hcpcs_code, hcpcs_description, ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt)) AS total_hcpcs_medicare_payment_amt FROM payments WHERE npi = 1174500953 GROUP BY hcpcs_code, hcpcs_description ORDER BY total_hcpcs_medicare_payment_amt DESC LIMIT 5;
And here's its result:
|52214||Cystoscopy and treatment||753334.0|
|99214||Office/outpatient visit est||273531.0|
|52214||Cystoscopy and treatment||229452.0|
|J9217||Leuprolide acetate suspnsion||186560.0|
|88112||Cytopath cell enhance tech||103240.0|
It seems like HCPCS codes
52214 are what we need.
For this exercise, write the query that collects these two line items from
payments and adds their total payment together for Dr. Geraniotis – yes, in other words, use a sQL query to confirm that
753334 + 273531 is close to the "nearly $1 million" asserted by the WSJ.
You need to use
SUM, but not necessarily a
You don't even need a
JOIN or nested query here.
SELECT ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt)) AS total_hcpcs_medicare_payment_amt FROM payments WHERE npi = 1174500953 AND hcpcs_code IN('52214', '52224');
Pretty much the same query from the previous question, except a summation of a different field:
SELECT COUNT(*), SUM(line_srvc_cnt) FROM payments WHERE npi = 1174500953 AND hcpcs_code IN('52214', '52224');
Count the number of providers that are urologists:
'Urology' catch all relevant urologists, i.e. are there any providers who are labeled
'Urologist' or some other variation?
SELECT provider_type, COUNT(*) FROM providers WHERE provider_type LIKE '%Urolog%' AND nppes_entity_code = 'I' GROUP BY provider_type ORDER BY provider_type;
Sure seems like there's only one label for all urologists:
SELECT COUNT(*) FROM providers WHERE provider_type LIKE 'Urology' AND nppes_entity_code = 'I';
Just another type of counting
(the column header should be a major hint)
SELECT COUNT(DISTINCT npi) FROM payments WHERE provider_type ='Urology' AND nppes_entity_code = 'I' AND hcpcs_code IN('52214', '52224');
The WSJ asserts that this procedure is not routine. This query finds out how they quantify "not routine" – i.e. how much is
973 more than the average number of times that this procedure is done by other doctors?
Hint: You'll probably need a nested query for this, with the outer query looking something like:
SELECT AVG(procedure_count) ...
SELECT AVG(procedure_count) FROM( SELECT SUM(line_srvc_cnt) AS procedure_count FROM payments WHERE provider_type ='Urology' AND nppes_entity_code = 'I' AND hcpcs_code IN('52214', '52224') AND npi != '1174500953' GROUP BY npi);
We just need to generate a list of the top 2 urologists: the first is ostensibly Dr. Geraniotis, of course.
533 is less than one-third of 1757:
SELECT npi, SUM(line_srvc_cnt) AS procedure_count FROM payments WHERE provider_type ='Urology' AND nppes_entity_code = 'I' AND hcpcs_code IN('52214', '52224') GROUP BY npi ORDER BY procedure_count DESC LIMIT 2;
Don't even need to do an aggregate for this query.
SELECT hcpcs_code, average_Medicare_payment_amt FROM payments WHERE npi = 1174500953 AND hcpcs_code IN('52214', '52224');
In Port St. Lucie, Fla., Gary L. Marder, a dermatologist, specializes in treating melanoma with radiation. Dr. Marder’s website, which features photos of smiling elderly couples, says he has cured more than 100,000 skin cancers.
Medicare paid Dr. Marder $3.7 million in 2012 — $2.41 million of which came from a radiation treatment billed by just two other doctors in the data, which doesn’t include hospital billings. Neither of them came close to billing as much for it as Dr. Marder.
…Under Medicare guidelines, the lower-voltage machine pictured on Dr. Marder’s website was reimbursed at a rate of about $22 per treatment in 2012, radiation oncologists say. Dr. Marder received an average of $154 per treatment by billing under the code for the higher-voltage machine.
…Dr. Marder billed for the procedure, using the more lucrative code, 15,610 times in 2012, and performed the procedure on 94 patients, according to the Medicare data. That works out to 166 treatments per patient, on average.
Look at the previous queries we've done to pinpoint a provider by name.
|1730117003||GARY||MARDER||Radiation Oncology||PORT SAINT LUCIE||FL|
SELECT npi, nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, provider_type, nppes_provider_city AS city, nppes_provider_state AS state FROM providers WHERE first_name = 'GARY' AND last_name = 'MARDER' AND state = 'FL';
You can find this in the
providers table without referring to
SELECT npi, nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, ROUND(total_medicare_payment_amt) FROM providers WHERE npi = '1730117003';
What exactly is that "radiation treatment"? Who knows. So write a query using Dr. Marder's
npi to list his procedures and what Medicare paid him. Whatever is at the top is probably this radiation treatment.
|77402||Radiation treatment delivery||2406840.0|
It's actually worth listing the top 10 procedures by total payment for Dr. Marder, just to get some context. But if you want to limit it to the result solely based on the WSJ's assertion, you can use a
WHERE condition like this:
WHERE total_hcpcs_medicare_payment_amt BETWEEN 2405000 AND 2414990 AND ...
SELECT hcpcs_code, hcpcs_description, ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt)) AS total_hcpcs_medicare_payment_amt FROM payments WHERE npi = 1730117003 GROUP BY hcpcs_code, hcpcs_description ORDER BY total_hcpcs_medicare_payment_amt DESC LIMIT 10;
GROUP BY and
SUM aggregate not needed, as there is exactly one row for each combination of doctor and
In the previous answer, you found the
hcpcs_code for the "radiation treatment" that is bringing Dr. Marder the big Medicare payment. Now use it to write a query to find all the other doctors that have ever billed for that exact same procedure.
According to the WSJ, you should end up with a list of 3 doctors (if you include Dr. Marder)
That's right, I want you to write a query that
INNER JOINs against
providers, just for fun, because we really don't need the doctor names…though getting their
provider_type is useful.
SELECT providers.npi, nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, providers.provider_type, ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt)) AS total_hcpcs_medicare_payment_amt FROM payments INNER JOIN providers ON providers.npi = payments.npi WHERE hcpcs_code = 77402 GROUP BY providers.npi ORDER BY total_hcpcs_medicare_payment_amt DESC;
GROUP BY and
SUM aggregate not needed, as there is exactly one row for each combination of doctor and
OK, so this is tricky because what is the HCPCS code for this "lower-voltage machine"? It might be fun to do a archive.org search for Dr. Marder's website. But you can do that on your own. I'm going to try to derive this with just SQL and the hard number of "$22 per treatment".
It's worth noting the phrase "Under Medicare guidelines"…I interpret that to mean that the $22 per treatment figure comes from
average_Medicare_allowed_amt, rather than what we've been using so far,
However, I include both fields for comparison's sake:
|21.137257454092||16.5379122457126||77401||Radiation treatment delivery|
|14.1102372002831||11.2056990984501||77417||Radiology port film(s)|
|44.6459502983889||35.3444672554544||77421||Stereoscopic x-ray guidance|
|50.0||32.392444444||77499||Radiation therapy management|
|100.25534435||79.502910581093||77431||Radiation therapy management|
So the code for this lower-voltage machine seems to be
77401. Note that the
hcpcs_description is still the vague
'Radiation treatment delivery' – but how were we supposed to know that?
Check out my sorting condition – because we really don't know how close to
$22 the actual value is:
SELECT AVG(average_Medicare_allowed_amt) AS avg_avg_allowed /* you fill out the rest */ ORDER BY ABS(avg_avg_allowed - 22) ASC LIMIT 5;
SELECT AVG(average_Medicare_allowed_amt) AS avg_avg_allowed, AVG(average_Medicare_payment_amt) AS avg_avg_payment, hcpcs_code, hcpcs_description FROM payments WHERE hcpcs_code LIKE '774%' GROUP BY hcpcs_code, hcpcs_description ORDER BY ABS(avg_avg_allowed - 22) ASC LIMIT 5;
OK, back to using
SELECT npi, hcpcs_code, line_srvc_cnt, bene_unique_cnt, average_Medicare_payment_amt FROM payments WHERE npi = 1730117003 AND hcpcs_code = 77402;
The practice of James E. Beale, an orthopedic surgeon in the Detroit area, received $3.7 million from Medicare in 2012, more than any other member of his specialty, according to the data.
Dr. Beale’s practice accomplished that despite not performing a single surgery on a Medicare patient. His chief Medicare revenue source was “manual therapy techniques,” which the coding manual used by Medicare to set reimbursements describes as a massage or manipulation of various regions of the body, lasting 15 minutes.
Dr. Beale’s practice billed Medicare for it 107,670 times and received $2.3 million. By contrast, the average doctor or physical therapist in the data who billed for the technique performed it 520 times and was reimbursed less than $11,000 for it.
…The Medicare payment data show that Dr. Beale’s practice performed the 15-minute massage an average of 149 times per patient for average Medicare billings per patient of $3,155.
Does "Detroit area" mean that the listed city will be
'Detroit'? Maybe, maybe not. Let's just use
nppes_provider_state = 'MI' to be safe:
SELECT npi, nppes_provider_first_name AS first_name, nppes_provider_last_org_name AS last_name, provider_type, nppes_provider_city AS city, nppes_provider_state AS state, ROUND(total_medicare_payment_amt) AS total_payment FROM providers WHERE first_name = 'JAMES' AND last_name = 'BEALE' AND state = 'MI';
The previous answer has Dr. Beale's
npi and his
provider_type. So, write the query that shows that he's at the top of his profession in Medicare payments.
SELECT npi, ROUND(total_medicare_payment_amt) AS total_payment FROM providers WHERE provider_type = 'Orthopedic Surgery' ORDER BY total_payment DESC LIMIT 5;
Hmmm…how to filter for non-surgeries? It's just safer to list all of his possible operations. Write a query that filters
payments for rows belonging to Dr. Beale's
|99215||Office/outpatient visit est||182795.0|
|99205||Office/outpatient visit new||45158.0|
|99214||Office/outpatient visit est||36999.0|
|95904||Sense nerve conduction test||12292.0|
|95903||Motor nerve conduction test||11610.0|
|93306||Tte w/doppler complete||9122.0|
|99348||Home visit est patient||8914.0|
|G0181||Home health care supervision||8790.0|
|99213||Office/outpatient visit est||8449.0|
|93923||Upr/lxtr art stdy 3+ lvls||4563.0|
|99343||Home visit new patient||4146.0|
|99349||Home visit est patient||3638.0|
|97016||Vasopneumatic device therapy||3286.0|
|99204||Office/outpatient visit new||2666.0|
|G0180||MD certification HHA patient||1365.0|
|99212||Office/outpatient visit est||1361.0|
|99406||Behav chng smoking 3-10 min||854.0|
None of the descriptions have the word "surgery" in them, so I guess that checks out?
SELECT hcpcs_code, hcpcs_description, ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt)) AS total_hcpcs_medicare_payment_amt FROM payments WHERE npi = 1316934409 GROUP BY hcpcs_code, hcpcs_description ORDER BY total_hcpcs_medicare_payment_amt DESC;
GROUP BY and
SUM aggregate not needed, as there is exactly one row for each combination of doctor and
SELECT npi, hcpcs_code, hcpcs_description, line_srvc_cnt, ROUND(average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments WHERE npi = 1316934409 AND hcpcs_code = '97140';
SELECT ROUND(AVG(line_srvc_cnt)) AS avg_service_count, ROUND(AVG(average_Medicare_payment_amt * line_srvc_cnt)) AS avg_total_payment FROM payments WHERE hcpcs_code = '97140';
SELECT npi, hcpcs_code, ROUND(line_srvc_cnt / bene_unique_cnt) AS services_per_patient, ROUND(average_Medicare_payment_amt * line_srvc_cnt / bene_unique_cnt ) AS payment_per_patient FROM payments WHERE /* fill out the conditions for yourself */
SELECT npi, hcpcs_code, ROUND(line_srvc_cnt / bene_unique_cnt) AS services_per_patient, ROUND(average_Medicare_payment_amt * line_srvc_cnt / bene_unique_cnt ) AS payment_per_patient FROM payments WHERE npi = 1316934409 AND hcpcs_code = '97140';
This is the last paragraph in the story. There's a whole lot of paragraphs that precede it that are based on reporting, interviewing, and documentation external to the Medicare database. You should read it.
The government data show the lab collected nearly $1 million from Medicare in 2012. It billed the program for medical tests on 626 patients, roughly the same number as were treated with EECP at Dr. Weaver’s clinic.
Basically, look up the clinic by name:
|1578566808||GCC IMAGING, LLC||1049161.0||930349.0||626.0|
SELECT npi, nppes_provider_last_org_name AS last_name, ROUND(total_medicare_allowed_amt) AS total_allowed_amt, ROUND(total_medicare_payment_amt) AS total_payment_amt, total_unique_benes AS total_patients FROM providers WHERE nppes_provider_last_org_name LIKE 'GCC%';
And whew! – we're done! That was a lot of database queries – none of them particularly complicated in terms of pure SQL. But they all require a significant amount of attention to detail and awareness of the domain. Hopefully you got a good SQL workout and a better appreciation of the nuances of the angles to the WSJ's investigation.
Honestly, the most time-consuming part of this problem is just reading the story. You've already written about 20 queries about Medicare data. Now you just have to come up with one – but also do the work of identifying a part in a story that depended directly on the Medicare database.
You don't have to pick a WSJ story. Various news outlets joined the Medicare data party after the WSJ was successful in fighting for the release of the data. For example, here's a couple of contemporary New York Times articles:
As with the previous problems, create a new page for this problem, and include the following:
Please pick a non-trivial assertion, e.g. not "Dr. John Smith is a cardiologist who lives in Iowa". It should include at least either a
JOIN or a nested-query. I include a couple examples below.
Word of warning: Be sure that the data-based fact/assertion that you attempt to reproduce via the provided 2012 Medicare dataset is actually derived from the 2012 Medicare dataset. Some of the stories in the WSJ investigation refer to different datasets to which they have special access, and for which no SQL mastery can actually replicate without the source data.
For example, this Bloomberg story finds that "Medicare paid at least 3,900 individual health-care providers at least $1 million in 2013". However, that comes from the 2013 Medicare Dataset. I guess you could download that data and make your own database if you really want to, but it's not necessary for this exercise.
(But obviously, the fact that Medicare will keep releasing these massive datasets on an annual basis means that the WSJ series, amazing as it is, is just the beginning of important stories and analyses that can be done with this data)
Here's a couple of examples of what I would like to see you do:
The excerpt below contains lots of interesting assertions. But I will only show the query for the first one:
In 2012, 100 doctors received a total of $610 million, ranging from a Florida ophthalmologist who was paid $21 million by Medicare to dozens of doctors, eye and cancer specialists chief among them, who received more than $4 million each that year. While more money by far is spent for routine office visits than any other single expenditure, one of the most heavily reimbursed procedures — costing a total of $1 billion for 143,000 patients — is for a single treatment for an eye disorder common in the elderly.
SELECT ROUND(SUM(total_medicare_payment_amt)) FROM ( SELECT total_medicare_payment_amt FROM providers WHERE nppes_entity_code = 'I' ORDER BY total_medicare_payment_amt DESC LIMIT 100) AS my_nested_query;
This is a multi-faceted – and thus, a multi-step problem. It's a nice real-world example of how you can make-do when you are highly ignorant of the intricacies of the healthcare system.
Doctors Bill Big For Tarnished Drug, another installment in the WSJ's "Medicare Unmasked" series.
One Florida oncology group stands out for how much it bills Medicare for the pricey drug. Medicare paid U.S. oncologists $128 million in 2012 to administer Procrit, federal data show. One-sixth of that money went to oncologists in the group, Florida Cancer Specialists. Of the 20 oncologists whom Medicare paid most for Procrit, 11 belonged to the Florida group.
By this point, you've looked through enough of the Medicare database to know that while individual doctor identities are included, group or clinic affiliation is not shown. In other words, querying for the term
'Florida Cancer Specialists' won't find the data needed to back the WSJ's assertion.
So I'll just aim to show that 11 of the top 20 oncologists receiving Medicare payments for administering Procrit are based in Florida. That's a start.
(sidenote: so how did the WSJ link doctors to Florida Cancer Specialists? Well, you can always start with the group's homepage and doing an old-fashioned look-see.)
But if you query for
hcpcs_description LIKE '%procrit%', you'll find zero results in the
payments database. Knowing what to look for requires understanding the variations in how procedures and drugs are categorized via HCPCS…which is something beyond my knowledge level.
So I Googled for "HCPCS code for procrit medicare":
It took me longer than I care to admit that the very first cms.gov URL contained what I needed. Even though "Procrit" isn't mentioned in the document, the name of its active ingredient is: Epoetin alfa, better known in sports media via the acronym EPO and its reputation in blood doping, which should explain some of the WSJ's investigative team's particular interest.
For our attempt to find the proper query to back the assertion, we just need to search for
'epoetin' instead of
First, let's list all the possible HCPCS codes that involve epoetin:
SELECT DISTINCT hcpcs_description, hcpcs_code FROM payments WHERE hcpcs_description LIKE '%epoetin%';
|Epoetin alfa, non-esrd||J0885|
|Darbepoetin alfa, non-esrd||J0881|
|Epoetin alfa 1000 units ESRD||J0886|
Now, another query to find which of these drugs correspond to this assertion:
Medicare paid U.S. oncologists $128 million in 2012 to administer Procrit
SELECT hcpcs_description, hcpcs_code, ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt)) AS total_hcpcs_medicare_payment_amt FROM payments WHERE hcpcs_code IN ("J0881", "J0886", "J0885") AND nppes_entity_code = 'I' AND provider_type LIKE '%Oncology%' GROUP BY hcpcs_code;
|Darbepoetin alfa, non-esrd||J0881||128129354.0|
|Epoetin alfa, non-esrd||J0885||127936974.0|
|Epoetin alfa 1000 units ESRD||J0886||31236.0|
So it could be either
J0885. The easiest (and laziest) way to try to figure out the proper code would be to make an ad-hoc query based on this assertion:
…Of the 20 oncologists whom Medicare paid most for Procrit, 11 belonged to the Florida group.
This is going to be one gigantic query, so let me break it down ins teps:
First, create a query on
payments that finds
provider_type of Oncology, the corresponding
npi, and calculates the total Medicare payment received for either one of the codes
J0885. Then do an
INNER JOIN with
providers based on
npi, then sort the list by the total payment, limit by
Oh yeah, include the
SELECT nppes_provider_state, (average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments INNER JOIN providers ON payments.npi = providers.npi WHERE payments.hcpcs_code = "J0881" AND payments.nppes_entity_code = 'I' GROUP BY payments.npi, payments.hcpcs_code ORDER BY total_hcpcs_medicare_payment_amt DESC LIMIT 20;
Then, do a query on the above query results, but do a
COUNT for when
nppes_provider_state corresponds to Florida. And again, do this whole thing twice, once for
J0881 amd once for
The query that returns a
11 is the one that has the HCPCS code that the WSJ is using (we think).
The code that gets the answer that we want is
J0885; try the query yourself:
SELECT COUNT(*) FROM ( SELECT nppes_provider_state, (average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments INNER JOIN providers ON payments.npi = providers.npi WHERE payments.hcpcs_code = "J0885" AND payments.nppes_entity_code = 'I' GROUP BY payments.npi, payments.hcpcs_code ORDER BY total_hcpcs_medicare_payment_amt DESC LIMIT 20) AS top_20_providers WHERE nppes_provider_state = 'FL';
OK, all of this work would qualify for a great answer to this part of the midterm. It seems heavily contrived, but it's quite reasonable if you're trying to learn from the WSJ's work and you have no idea what Procrit is, or what the point of targeting these Florida doctors is (though maybe you should read the entire story?)
It's worth doing one more query: finding the
npi numbers for the 11 Florida doctors and then re-joining against
providers just to take a look at all of the other fields to see if they have any relevance to the WSJ story:
SELECT providers.* FROM ( SELECT providers.npi AS the_npi, nppes_provider_state AS the_state, (average_Medicare_payment_amt * line_srvc_cnt) AS total_hcpcs_medicare_payment_amt FROM payments INNER JOIN providers ON payments.npi = providers.npi WHERE payments.hcpcs_code = "J0885" AND payments.nppes_entity_code = 'I' GROUP BY payments.npi, payments.hcpcs_code ORDER BY total_hcpcs_medicare_payment_amt DESC LIMIT 20) AS top_20_providers INNER JOIN providers ON the_npi = providers.npi WHERE the_state = 'FL';
You'll find that none of the fields refer to their affiliation with Florida Cancer Specialists. But you can find their names on the FCS website. Which again, reaffirms my point that investigations aren't created from queries of a single database inside a vacuum, even when that database contains several gigabytes of data.