Create a Document in your PADJO2015 folder named: PADJO2015-MIDTERM-WSJ-MEDICARE.
Every problem should have its own page devoted to it. Just like in the Midterm Babysteps assignment, e.g. a separate page for problem 2A, and another page for problem 2B, etc.
Please do this assignment on your own.
If you don’t have a relatively new computer. Or you have less than 20GB of free disk space…you should do this exercise on the journalism lab computers.
Download the medicare_providers_2012.sqlite.zip, which I’ve indexed for your convenience. Be warned: it is nearly 700MB zipped and expands to more than 2 gigabytes. Again, probably prudent to use the journalism lab computers for this.
For each problem, write the SQLite query that will return the given result.
Your query must return the same results – including the same column headers and format.
Work on your own on this one. You can do it. Start by reading the companion tutorial on the subject.
The first part of this midterm is simply writing the SQL queries that ostensibly back to the hard numbers and assertions in the first article in the WSJ's Medicare Unmasked series.
The second part of this midterm is to read one of the other stories in the WSJ series – or a story by any news organization about the same dataset. Then pick a (non-trivial) fact that you believe can be found in the Medicare database, and the query to replicate it (i.e. this part is much shorter than the first part).
Download the zipped SQLite database (2GB+ when unzipped): medicare_providers_2012.sqlite.zip
In the image above, I've taken the PDF that the WSJ submitted for its Pulitzer-winning entry and highlighted in purple all of the assertions and facts that can be found directly in the 2012 Medicare payment database. While a significant portion of the article can be fact-checked via the raw data, as you re-read the WSJ article, take note of the things that require reporting and research independent of the database.
Here's the excerpt pertaining to Dr. Weaver:
Ronald S. Weaver isn’t a cardiologist. Yet 98% of the $2.3 million that the Los Angeles doctor’s practice received from Medicare in 2012 was for a cardiac procedure, according to recently released government data. The government data show that out of the thousands of cardiology providers who treated Medicare patients in 2012, just 239 billed for the procedure, and they used it on fewer than 5% of their patients on average.
The 141 cardiologists at the Cleveland Clinic, renowned for heart care, performed it on just six patients last year. Dr. Weaver’s clinic administered it to 99.5% of his Medicare patients — 615 in all — billing the federal health-insurance program for the elderly and disabled 16,619 times, according to the data.
Hello, you don't have to actually do any problems based on this excerpt. It's all in the tutorial. These first couple of paragraphs – and the SQL needed to fact-check them – will introduce you to most of the concepts needed to do the rest of this test.
More about Dr. Weaver, except focusing on how his total payment received compares to the rest of Medicare providers:
More than 2,300 providers earned $500,000 or more from Medicare in 2012 from a single procedure or service, according to a Wall Street Journal analysis of Medicare physician-payment data made public for the first time in April. A few of those providers, including Dr. Weaver, collected more from the single procedures than anyone else who billed for them — by very large margins. The data release was prompted by a Journal legal effort to make the information public.
Warning: on a relatively speedy laptop, the subquery can take as much as 40 seconds. Rather than have you worry about whether the query you wrote is taking a long time for the right reasons, I've provided the nested query for you as a warmup gift:
SELECT
SUM(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE `nppes_entity_code` = 'I'
GROUP BY npi, hcpcs_code
Again, running that query will take as long as 40 seconds, even on a SSD-based Macbook Pro. It will also return 8.5 million rows.
So you need to write the main query that aggregates the above query – i.e. a SELECT COUNT
statement and WHERE
condition – to get the "2,300+ providers" statistic:
COUNT(*) |
---|
2377 |
(Note: the derivation of total_hcpcs_medicare_payment_amt
is used in many of the queries for this story. Feel free to re-use it)
SELECT COUNT(*)
FROM(
SELECT
SUM(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE `nppes_entity_code` = 'I'
GROUP BY npi, hcpcs_code) AS my_sub_query
WHERE total_hcpcs_medicare_payment_amt > 500000;
This is a loaded statement, and one of the key assertions for the entire article. All of the providers profiled by the WSJ are seen as outliers in what they earn from Medicare, though for different reasons and metrics of comparison.
Dr. Weaver is the lead anecdote in the story for three apparent reasons:
So there are apparently a "few" other providers who fit criteria #1 and #2. But how many of these providers are receiving an unusually high amount of Medicare payments for a procedure unusual for the providers' specialties? In fact, what does the WSJ consider to be "very large margins" – and how unusual is the margin that Dr. Weaver has compared to other providers?
At the very least, we would have to write a query that would find the top 2 providers per procedure, then calculate the average margin. This (as far as I know) is not easy in SQLite – and maybe slightly easier in more complicated SQL flavors.
For now, let's just get the other top providers for the specific procedure that Dr. Weaver is supposedly an outlier for so that we can see how large a "very large margin" is.
Here's the query to get the HCPCS code for Dr. Weaver's preferred procedure (we learned in the previous section that his NPI is 1669465928
):
SELECT hcpcs_code
FROM payments
WHERE npi = 1669465928
ORDER BY
(average_Medicare_payment_amt * line_srvc_cnt) DESC
LIMIT 1;
Now use that hcpcs_code
to find all the doctors that have also administered this specific procedure, and how they compare to Dr. Weaver in terms of the total Medicare payment received.
Note: technically there's no reason to include the doctor names, which means there's no reason to join payments
and providers
. But I'm making you do it for this answer just for practice…notice that I've used aliases to shorten up the column names, e.g. first_name
for nppes_provider_first_name
:
first_name | last_name | provider_type | total_hcpcs_medicare_payment_amt |
---|---|---|---|
RONALD | WEAVER | Internal Medicine | 2274250.0 |
SANG | KIM | Internal Medicine | 755989.0 |
JOSE | CACERES | Cardiology | 705895.0 |
RAJEN | MANIAR | Cardiology | 604547.0 |
BACK | KIM | Cardiology | 350389.0 |
Looks like Dr. Weaver receives about 3 times as much for this specific procedure as the second-highest earner. I've included provider_type
since the WSJ asserts that it is notable that Dr. Weaver is not a cardiologist.
Here's a hint of the answer – which should have an INNER JOIN
SELECT
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
payments.provider_type,
ROUND(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
SELECT
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
payments.provider_type,
ROUND(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
INNER JOIN providers
ON providers.npi = payments.npi
WHERE hcpcs_code = 'G0166'
GROUP BY payments.npi, hcpcs_code
ORDER BY total_hcpcs_medicare_payment_amt DESC
LIMIT 5;
Among the doctors whose billings stand out is Evangelos G. Geraniotis, a urologist in Hyannis, Mass. Dr. Geraniotis received $2.1 million from Medicare in 2012, the most of any member of his specialty.
Nearly $1 million of that sum came from a procedure not considered routine in a urological practice. Known as a “cystoscopy and fulguration,” it involves threading a scope up the male urethra to burn potentially cancerous lesions inside the bladder. According to his Medicare billings, Dr. Geraniotis performed two variations of the procedure 1,757 times in 2012.
Of the 8,791 providers whose specialty is listed in the Medicare data as urology, 973 billed for the procedure, doing so an average of 38 times. The urologist who billed for the second-most performed the procedure less than one-third as often as Dr. Geraniotis did, the data show.
…Dr. Geraniotis said the more than $500 he received from Medicare each time he billed for the procedure played no role in his medical judgment and, by performing the procedure in his office, he keeps patients out of the hospital.
Query the providers
table and confirm that someone with the name of "Evangelos Geraniotis" is indeed a urologist in Massachusetts.
npi | first_name | last_name | provider_type | city | state |
---|---|---|---|---|---|
1174500953 | EVANGELOS | GERANIOTIS | Urology | HYANNIS | MA |
Write the minimal WHERE
condition to get exactly one result row. Here's the SELECT...FROM
clause I use:
SELECT npi,
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
provider_type,
nppes_provider_city AS city,
nppes_provider_state AS state
FROM providers
WHERE ...
SELECT npi,
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
provider_type,
nppes_provider_city AS city,
nppes_provider_state AS state
FROM providers
WHERE
first_name LIKE 'EVAN%'
AND last_name = 'GERANIOTIS'
AND state = 'MA';
If you try to filter by hcpcs_description
containing 'cystoscopy and fulguration'
, you'll come up empty. I think the best strategy is to just list the top few of Dr. Geraniotis's billed procedures (he has more than 40 in the 2012 dataset), and see if anything like 'cystoscopy and fulguration'
shows up.
For your convenience, here's the query that will help you find the actual answer query:
SELECT
hcpcs_code, hcpcs_description,
ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt))
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE
npi = 1174500953
GROUP BY
hcpcs_code, hcpcs_description
ORDER BY
total_hcpcs_medicare_payment_amt DESC
LIMIT 5;
And here's its result:
hcpcs_code | hcpcs_description | total_hcpcs_medicare_payment_amt |
---|---|---|
52214 | Cystoscopy and treatment | 753334.0 |
99214 | Office/outpatient visit est | 273531.0 |
52214 | Cystoscopy and treatment | 229452.0 |
J9217 | Leuprolide acetate suspnsion | 186560.0 |
88112 | Cytopath cell enhance tech | 103240.0 |
It seems like HCPCS codes 52214
and 52214
are what we need.
For this exercise, write the query that collects these two line items from payments
and adds their total payment together for Dr. Geraniotis – yes, in other words, use a sQL query to confirm that 753334 + 273531
is close to the "nearly $1 million" asserted by the WSJ.
You need to use SUM
, but not necessarily a GROUP BY
.
total_hcpcs_medicare_payment_amt |
---|
982786.0 |
You don't even need a JOIN
or nested query here.
SELECT
ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt))
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE
npi = 1174500953
AND hcpcs_code IN('52214', '52224');
Pretty much the same query from the previous question, except a summation of a different field:
COUNT(*) | SUM(line_srvc_cnt) |
---|---|
2 | 1757.0 |
SELECT
COUNT(*),
SUM(line_srvc_cnt)
FROM payments
WHERE
npi = 1174500953
AND hcpcs_code IN('52214', '52224');
Count the number of providers that are urologists:
COUNT(*) |
---|
8791 |
But does 'Urology'
catch all relevant urologists, i.e. are there any providers who are labeled 'Urologist'
or some other variation?
SELECT provider_type, COUNT(*)
FROM providers
WHERE provider_type LIKE '%Urolog%'
AND nppes_entity_code = 'I'
GROUP BY provider_type
ORDER BY provider_type;
Sure seems like there's only one label for all urologists:
provider_type | COUNT(*) |
---|---|
Neurology | 12421 |
Urology | 8791 |
SELECT COUNT(*)
FROM providers
WHERE provider_type LIKE 'Urology'
AND nppes_entity_code = 'I';
Just another type of counting
(the column header should be a major hint)
COUNT(DISTINCT npi) |
---|
973 |
SELECT COUNT(DISTINCT npi)
FROM payments
WHERE provider_type ='Urology'
AND nppes_entity_code = 'I'
AND hcpcs_code IN('52214', '52224');
The WSJ asserts that this procedure is not routine. This query finds out how they quantify "not routine" – i.e. how much is 973
more than the average number of times that this procedure is done by other doctors?
Hint: You'll probably need a nested query for this, with the outer query looking something like:
SELECT AVG(procedure_count)
...
AVG(procedure_count) |
---|
38.3802672147996 |
SELECT AVG(procedure_count)
FROM(
SELECT
SUM(line_srvc_cnt) AS procedure_count
FROM payments
WHERE provider_type ='Urology'
AND nppes_entity_code = 'I'
AND hcpcs_code IN('52214', '52224')
AND npi != '1174500953'
GROUP BY npi);
We just need to generate a list of the top 2 urologists: the first is ostensibly Dr. Geraniotis, of course.
533 is less than one-third of 1757:
npi | procedure_count |
---|---|
1174500953 | 1757.0 |
1326058363 | 533.0 |
SELECT
npi,
SUM(line_srvc_cnt) AS procedure_count
FROM payments
WHERE provider_type ='Urology'
AND nppes_entity_code = 'I'
AND hcpcs_code IN('52214', '52224')
GROUP BY npi
ORDER BY procedure_count DESC
LIMIT 2;
hcpcs_code | average_Medicare_payment_amt |
---|---|
52214 | 532.37238979 |
52224 | 568.12487934 |
Don't even need to do an aggregate for this query.
SELECT hcpcs_code,
average_Medicare_payment_amt
FROM payments
WHERE
npi = 1174500953
AND hcpcs_code IN('52214', '52224');
In Port St. Lucie, Fla., Gary L. Marder, a dermatologist, specializes in treating melanoma with radiation. Dr. Marder’s website, which features photos of smiling elderly couples, says he has cured more than 100,000 skin cancers.
Medicare paid Dr. Marder $3.7 million in 2012 — $2.41 million of which came from a radiation treatment billed by just two other doctors in the data, which doesn’t include hospital billings. Neither of them came close to billing as much for it as Dr. Marder.
…Under Medicare guidelines, the lower-voltage machine pictured on Dr. Marder’s website was reimbursed at a rate of about $22 per treatment in 2012, radiation oncologists say. Dr. Marder received an average of $154 per treatment by billing under the code for the higher-voltage machine.
…Dr. Marder billed for the procedure, using the more lucrative code, 15,610 times in 2012, and performed the procedure on 94 patients, according to the Medicare data. That works out to 166 treatments per patient, on average.
Look at the previous queries we've done to pinpoint a provider by name.
npi | first_name | last_name | provider_type | city | state |
---|---|---|---|---|---|
1730117003 | GARY | MARDER | Radiation Oncology | PORT SAINT LUCIE | FL |
SELECT npi,
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
provider_type,
nppes_provider_city AS city,
nppes_provider_state AS state
FROM providers
WHERE
first_name = 'GARY'
AND last_name = 'MARDER'
AND state = 'FL';
You can find this in the providers
table without referring to payments
:
npi | first_name | last_name | ROUND(total_medicare_payment_amt) |
---|---|---|---|
1730117003 | GARY | MARDER | 3655683.0 |
SELECT npi,
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
ROUND(total_medicare_payment_amt)
FROM providers
WHERE npi = '1730117003';
What exactly is that "radiation treatment"? Who knows. So write a query using Dr. Marder's npi
to list his procedures and what Medicare paid him. Whatever is at the top is probably this radiation treatment.
hcpcs_code | hcpcs_description | total_hcpcs_medicare_payment_amt |
---|---|---|
77402 | Radiation treatment delivery | 2406840.0 |
It's actually worth listing the top 10 procedures by total payment for Dr. Marder, just to get some context. But if you want to limit it to the result solely based on the WSJ's assertion, you can use a WHERE
condition like this:
WHERE total_hcpcs_medicare_payment_amt BETWEEN 2405000 AND 2414990
AND ...
SELECT
hcpcs_code, hcpcs_description,
ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt))
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE
npi = 1730117003
GROUP BY
hcpcs_code, hcpcs_description
ORDER BY
total_hcpcs_medicare_payment_amt DESC
LIMIT 10;
Note: GROUP BY
and SUM
aggregate not needed, as there is exactly one row for each combination of doctor and hcpcs_code
.
In the previous answer, you found the hcpcs_code
for the "radiation treatment" that is bringing Dr. Marder the big Medicare payment. Now use it to write a query to find all the other doctors that have ever billed for that exact same procedure.
According to the WSJ, you should end up with a list of 3 doctors (if you include Dr. Marder)
npi | first_name | last_name | provider_type | total_hcpcs_medicare_payment_amt |
---|---|---|---|---|
1730117003 | GARY | MARDER | Radiation Oncology | 2406840.0 |
1962453795 | CRAIG | ABBOTT | Dermatology | 1039210.0 |
1447272026 | ROBERT | NORMAN | Dermatology | 51250.0 |
That's right, I want you to write a query that INNER JOIN
s against providers
, just for fun, because we really don't need the doctor names…though getting their provider_type
is useful.
SELECT
providers.npi,
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
providers.provider_type,
ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt))
AS total_hcpcs_medicare_payment_amt
FROM payments
INNER JOIN providers
ON providers.npi = payments.npi
WHERE
hcpcs_code = 77402
GROUP BY
providers.npi
ORDER BY
total_hcpcs_medicare_payment_amt DESC;
Note: GROUP BY
and SUM
aggregate not needed, as there is exactly one row for each combination of doctor and hcpcs_code
.
OK, so this is tricky because what is the HCPCS code for this "lower-voltage machine"? It might be fun to do a archive.org search for Dr. Marder's website. But you can do that on your own. I'm going to try to derive this with just SQL and the hard number of "$22 per treatment".
It's worth noting the phrase "Under Medicare guidelines"…I interpret that to mean that the $22 per treatment figure comes from average_Medicare_allowed_amt
, rather than what we've been using so far, average_Medicare_payment_amt
.
However, I include both fields for comparison's sake:
avg_avg_allowed | avg_avg_payment | hcpcs_code | hcpcs_description |
---|---|---|---|
21.137257454092 | 16.5379122457126 | 77401 | Radiation treatment delivery |
14.1102372002831 | 11.2056990984501 | 77417 | Radiology port film(s) |
44.6459502983889 | 35.3444672554544 | 77421 | Stereoscopic x-ray guidance |
50.0 | 32.392444444 | 77499 | Radiation therapy management |
100.25534435 | 79.502910581093 | 77431 | Radiation therapy management |
So the code for this lower-voltage machine seems to be 77401
. Note that the hcpcs_description
is still the vague 'Radiation treatment delivery'
– but how were we supposed to know that?
Check out my sorting condition – because we really don't know how close to $22
the actual value is:
SELECT
AVG(average_Medicare_allowed_amt) AS avg_avg_allowed
/* you fill out the rest */
ORDER BY ABS(avg_avg_allowed - 22) ASC
LIMIT 5;
SELECT
AVG(average_Medicare_allowed_amt) AS avg_avg_allowed,
AVG(average_Medicare_payment_amt) AS avg_avg_payment,
hcpcs_code, hcpcs_description
FROM payments
WHERE hcpcs_code LIKE '774%'
GROUP BY hcpcs_code, hcpcs_description
ORDER BY ABS(avg_avg_allowed - 22) ASC
LIMIT 5;
OK, back to using hcpcs_code
of 77402
npi | hcpcs_code | line_srvc_cnt | bene_unique_cnt | average_Medicare_payment_amt |
---|---|---|---|---|
1730117003 | 77402 | 15610.0 | 94 | 154.0 |
SELECT npi, hcpcs_code,
line_srvc_cnt,
bene_unique_cnt,
average_Medicare_payment_amt
FROM payments
WHERE npi = 1730117003
AND hcpcs_code = 77402;
The practice of James E. Beale, an orthopedic surgeon in the Detroit area, received $3.7 million from Medicare in 2012, more than any other member of his specialty, according to the data.
Dr. Beale’s practice accomplished that despite not performing a single surgery on a Medicare patient. His chief Medicare revenue source was “manual therapy techniques,” which the coding manual used by Medicare to set reimbursements describes as a massage or manipulation of various regions of the body, lasting 15 minutes.
Dr. Beale’s practice billed Medicare for it 107,670 times and received $2.3 million. By contrast, the average doctor or physical therapist in the data who billed for the technique performed it 520 times and was reimbursed less than $11,000 for it.
…The Medicare payment data show that Dr. Beale’s practice performed the 15-minute massage an average of 149 times per patient for average Medicare billings per patient of $3,155.
Does "Detroit area" mean that the listed city will be 'Detroit'
? Maybe, maybe not. Let's just use nppes_provider_state = 'MI'
to be safe:
npi | first_name | last_name | provider_type | city | state | total_payment |
---|---|---|---|---|---|---|
1316934409 | JAMES | BEALE | Orthopedic Surgery | WARREN | MI | 3705702.0 |
SELECT npi,
nppes_provider_first_name AS first_name,
nppes_provider_last_org_name AS last_name,
provider_type,
nppes_provider_city AS city,
nppes_provider_state AS state,
ROUND(total_medicare_payment_amt) AS total_payment
FROM providers
WHERE
first_name = 'JAMES'
AND last_name = 'BEALE'
AND state = 'MI';
The previous answer has Dr. Beale's npi
and his provider_type
. So, write the query that shows that he's at the top of his profession in Medicare payments.
npi | total_payment |
---|---|
1316934409 | 3705702.0 |
1811183676 | 2302127.0 |
1295921054 | 2089683.0 |
1639185978 | 1934375.0 |
1376594069 | 1528257.0 |
SELECT npi,
ROUND(total_medicare_payment_amt) AS total_payment
FROM providers
WHERE
provider_type = 'Orthopedic Surgery'
ORDER BY total_payment DESC
LIMIT 5;
Hmmm…how to filter for non-surgeries? It's just safer to list all of his possible operations. Write a query that filters payments
for rows belonging to Dr. Beale's npi
.
hcpcs_code | hcpcs_description | total_hcpcs_medicare_payment_amt |
---|---|---|
97140 | Manual therapy | 2280800.0 |
97110 | Therapeutic exercises | 845309.0 |
97032 | Electrical stimulation | 203449.0 |
99215 | Office/outpatient visit est | 182795.0 |
99205 | Office/outpatient visit new | 45158.0 |
99214 | Office/outpatient visit est | 36999.0 |
97124 | Massage therapy | 19355.0 |
95904 | Sense nerve conduction test | 12292.0 |
95903 | Motor nerve conduction test | 11610.0 |
93306 | Tte w/doppler complete | 9122.0 |
99348 | Home visit est patient | 8914.0 |
G0181 | Home health care supervision | 8790.0 |
99213 | Office/outpatient visit est | 8449.0 |
93970 | Extremity study | 5287.0 |
93923 | Upr/lxtr art stdy 3+ lvls | 4563.0 |
99343 | Home visit new patient | 4146.0 |
99349 | Home visit est patient | 3638.0 |
93880 | Extracranial study | 3321.0 |
97016 | Vasopneumatic device therapy | 3286.0 |
99204 | Office/outpatient visit new | 2666.0 |
95934 | H-reflex test | 1885.0 |
G0180 | MD certification HHA patient | 1365.0 |
99212 | Office/outpatient visit est | 1361.0 |
99406 | Behav chng smoking 3-10 min | 854.0 |
93000 | Electrocardiogram complete | 181.0 |
93005 | Electrocardiogram tracing | 106.0 |
None of the descriptions have the word "surgery" in them, so I guess that checks out?
SELECT
hcpcs_code, hcpcs_description,
ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt))
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE
npi = 1316934409
GROUP BY
hcpcs_code, hcpcs_description
ORDER BY
total_hcpcs_medicare_payment_amt DESC;
Note: GROUP BY
and SUM
aggregate not needed, as there is exactly one row for each combination of doctor and hcpcs_code
.
npi | hcpcs_code | hcpcs_description | line_srvc_cnt | total_hcpcs_medicare_payment_amt |
---|---|---|---|---|
1316934409 | 97140 | Manual therapy | 107670.0 | 2280800.0 |
SELECT
npi, hcpcs_code, hcpcs_description,
line_srvc_cnt,
ROUND(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE
npi = 1316934409
AND hcpcs_code = '97140';
avg_service_count | avg_total_payment |
---|---|
520.0 | 10885.0 |
SELECT
ROUND(AVG(line_srvc_cnt)) AS avg_service_count,
ROUND(AVG(average_Medicare_payment_amt * line_srvc_cnt))
AS avg_total_payment
FROM payments
WHERE
hcpcs_code = '97140';
npi | hcpcs_code | services_per_patient | payment_per_patient |
---|---|---|---|
1316934409 | 97140 | 149.0 | 3155.0 |
SELECT
npi, hcpcs_code,
ROUND(line_srvc_cnt / bene_unique_cnt)
AS services_per_patient,
ROUND(average_Medicare_payment_amt * line_srvc_cnt / bene_unique_cnt )
AS payment_per_patient
FROM payments
WHERE /* fill out the conditions for yourself */
SELECT
npi, hcpcs_code,
ROUND(line_srvc_cnt / bene_unique_cnt)
AS services_per_patient,
ROUND(average_Medicare_payment_amt * line_srvc_cnt / bene_unique_cnt )
AS payment_per_patient
FROM payments
WHERE
npi = 1316934409
AND hcpcs_code = '97140';
This is the last paragraph in the story. There's a whole lot of paragraphs that precede it that are based on reporting, interviewing, and documentation external to the Medicare database. You should read it.
The government data show the lab collected nearly $1 million from Medicare in 2012. It billed the program for medical tests on 626 patients, roughly the same number as were treated with EECP at Dr. Weaver’s clinic.
Basically, look up the clinic by name:
npi | last_name | total_allowed_amt | total_payment_amt | total_patients |
---|---|---|---|---|
1578566808 | GCC IMAGING, LLC | 1049161.0 | 930349.0 | 626.0 |
SELECT npi,
nppes_provider_last_org_name AS last_name,
ROUND(total_medicare_allowed_amt) AS total_allowed_amt,
ROUND(total_medicare_payment_amt) AS total_payment_amt,
total_unique_benes AS total_patients
FROM providers
WHERE nppes_provider_last_org_name LIKE 'GCC%';
And whew! – we're done! That was a lot of database queries – none of them particularly complicated in terms of pure SQL. But they all require a significant amount of attention to detail and awareness of the domain. Hopefully you got a good SQL workout and a better appreciation of the nuances of the angles to the WSJ's investigation.
Honestly, the most time-consuming part of this problem is just reading the story. You've already written about 20 queries about Medicare data. Now you just have to come up with one – but also do the work of identifying a part in a story that depended directly on the Medicare database.
You don't have to pick a WSJ story. Various news outlets joined the Medicare data party after the WSJ was successful in fighting for the release of the data. For example, here's a couple of contemporary New York Times articles:
As with the previous problems, create a new page for this problem, and include the following:
Please pick a non-trivial assertion, e.g. not "Dr. John Smith is a cardiologist who lives in Iowa". It should include at least either a JOIN
or a nested-query. I include a couple examples below.
Word of warning: Be sure that the data-based fact/assertion that you attempt to reproduce via the provided 2012 Medicare dataset is actually derived from the 2012 Medicare dataset. Some of the stories in the WSJ investigation refer to different datasets to which they have special access, and for which no SQL mastery can actually replicate without the source data.
For example, this Bloomberg story finds that "Medicare paid at least 3,900 individual health-care providers at least $1 million in 2013". However, that comes from the 2013 Medicare Dataset. I guess you could download that data and make your own database if you really want to, but it's not necessary for this exercise.
(But obviously, the fact that Medicare will keep releasing these massive datasets on an annual basis means that the WSJ series, amazing as it is, is just the beginning of important stories and analyses that can be done with this data)
Here's a couple of examples of what I would like to see you do:
Sliver of Medicare Doctors Get Big Share of Payouts
The excerpt below contains lots of interesting assertions. But I will only show the query for the first one:
In 2012, 100 doctors received a total of $610 million, ranging from a Florida ophthalmologist who was paid $21 million by Medicare to dozens of doctors, eye and cancer specialists chief among them, who received more than $4 million each that year. While more money by far is spent for routine office visits than any other single expenditure, one of the most heavily reimbursed procedures — costing a total of $1 billion for 143,000 patients — is for a single treatment for an eye disorder common in the elderly.
SELECT ROUND(SUM(total_medicare_payment_amt))
FROM (
SELECT total_medicare_payment_amt
FROM providers
WHERE nppes_entity_code = 'I'
ORDER BY total_medicare_payment_amt DESC
LIMIT 100)
AS my_nested_query;
ROUND(SUM(total_medicare_payment_amt)) |
---|
610287430.0 |
This is a multi-faceted – and thus, a multi-step problem. It's a nice real-world example of how you can make-do when you are highly ignorant of the intricacies of the healthcare system.
Doctors Bill Big For Tarnished Drug, another installment in the WSJ's "Medicare Unmasked" series.
One Florida oncology group stands out for how much it bills Medicare for the pricey drug. Medicare paid U.S. oncologists $128 million in 2012 to administer Procrit, federal data show. One-sixth of that money went to oncologists in the group, Florida Cancer Specialists. Of the 20 oncologists whom Medicare paid most for Procrit, 11 belonged to the Florida group.
By this point, you've looked through enough of the Medicare database to know that while individual doctor identities are included, group or clinic affiliation is not shown. In other words, querying for the term 'Florida Cancer Specialists'
won't find the data needed to back the WSJ's assertion.
So I'll just aim to show that 11 of the top 20 oncologists receiving Medicare payments for administering Procrit are based in Florida. That's a start.
(sidenote: so how did the WSJ link doctors to Florida Cancer Specialists? Well, you can always start with the group's homepage and doing an old-fashioned look-see.)
But if you query for hcpcs_description LIKE '%procrit%'
, you'll find zero results in the payments
database. Knowing what to look for requires understanding the variations in how procedures and drugs are categorized via HCPCS…which is something beyond my knowledge level.
So I Googled for "HCPCS code for procrit medicare":
It took me longer than I care to admit that the very first cms.gov URL contained what I needed. Even though "Procrit" isn't mentioned in the document, the name of its active ingredient is: Epoetin alfa, better known in sports media via the acronym EPO and its reputation in blood doping, which should explain some of the WSJ's investigative team's particular interest.
For our attempt to find the proper query to back the assertion, we just need to search for 'epoetin'
instead of 'Procrit'
.
First, let's list all the possible HCPCS codes that involve epoetin:
SELECT
DISTINCT hcpcs_description,
hcpcs_code
FROM payments
WHERE hcpcs_description LIKE '%epoetin%';
The result:
hcpcs_description | hcpcs_code |
---|---|
Epoetin alfa, non-esrd | J0885 |
Darbepoetin alfa, non-esrd | J0881 |
Epoetin alfa 1000 units ESRD | J0886 |
Now, another query to find which of these drugs correspond to this assertion:
Medicare paid U.S. oncologists $128 million in 2012 to administer Procrit
SELECT hcpcs_description, hcpcs_code,
ROUND(SUM(average_Medicare_payment_amt * line_srvc_cnt))
AS total_hcpcs_medicare_payment_amt
FROM payments
WHERE hcpcs_code IN ("J0881", "J0886", "J0885")
AND nppes_entity_code = 'I'
AND provider_type LIKE '%Oncology%'
GROUP BY hcpcs_code;
The result:
hcpcs_description | hcpcs_code | total_hcpcs_medicare_payment_amt |
---|---|---|
Darbepoetin alfa, non-esrd | J0881 | 128129354.0 |
Epoetin alfa, non-esrd | J0885 | 127936974.0 |
Epoetin alfa 1000 units ESRD | J0886 | 31236.0 |
So it could be either J0881
or J0885
. The easiest (and laziest) way to try to figure out the proper code would be to make an ad-hoc query based on this assertion:
…Of the 20 oncologists whom Medicare paid most for Procrit, 11 belonged to the Florida group.
This is going to be one gigantic query, so let me break it down ins teps:
First, create a query on payments
that finds provider_type
of Oncology, the corresponding npi
, and calculates the total Medicare payment received for either one of the codes J0881
and J0885
. Then do an INNER JOIN
with providers
based on npi
, then sort the list by the total payment, limit by 20
Oh yeah, include the nppes_provider_state
from providers
:
SELECT
nppes_provider_state,
(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
INNER JOIN providers
ON payments.npi = providers.npi
WHERE
payments.hcpcs_code = "J0881"
AND payments.nppes_entity_code = 'I'
GROUP BY
payments.npi, payments.hcpcs_code
ORDER BY
total_hcpcs_medicare_payment_amt DESC
LIMIT 20;
Then, do a query on the above query results, but do a COUNT
for when nppes_provider_state
corresponds to Florida. And again, do this whole thing twice, once for J0881
amd once for J0885
.
The query that returns a COUNT
of 11
is the one that has the HCPCS code that the WSJ is using (we think).
The code that gets the answer that we want is J0885
; try the query yourself:
SELECT COUNT(*)
FROM (
SELECT
nppes_provider_state,
(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
INNER JOIN providers
ON payments.npi = providers.npi
WHERE
payments.hcpcs_code = "J0885"
AND payments.nppes_entity_code = 'I'
GROUP BY
payments.npi, payments.hcpcs_code
ORDER BY
total_hcpcs_medicare_payment_amt DESC
LIMIT 20)
AS top_20_providers
WHERE nppes_provider_state = 'FL';
COUNT(*) |
---|
11 |
OK, all of this work would qualify for a great answer to this part of the midterm. It seems heavily contrived, but it's quite reasonable if you're trying to learn from the WSJ's work and you have no idea what Procrit is, or what the point of targeting these Florida doctors is (though maybe you should read the entire story?)
It's worth doing one more query: finding the npi
numbers for the 11 Florida doctors and then re-joining against providers
just to take a look at all of the other fields to see if they have any relevance to the WSJ story:
SELECT
providers.*
FROM (
SELECT
providers.npi AS the_npi,
nppes_provider_state AS the_state,
(average_Medicare_payment_amt * line_srvc_cnt)
AS total_hcpcs_medicare_payment_amt
FROM payments
INNER JOIN providers
ON payments.npi = providers.npi
WHERE
payments.hcpcs_code = "J0885"
AND payments.nppes_entity_code = 'I'
GROUP BY
payments.npi, payments.hcpcs_code
ORDER BY
total_hcpcs_medicare_payment_amt DESC
LIMIT 20)
AS top_20_providers
INNER JOIN providers
ON the_npi = providers.npi
WHERE the_state = 'FL';
You'll find that none of the fields refer to their affiliation with Florida Cancer Specialists. But you can find their names on the FCS website. Which again, reaffirms my point that investigations aren't created from queries of a single database inside a vacuum, even when that database contains several gigabytes of data.