How Web Scraping Is Used To Extract Healthcare Data?
Health information comes in several types and dimensions. A web scraping tool is used for obtaining data from difficult-to-reach locations.
Scraping the Bottom of the Data
Web scraping is an important technique for getting data in a wide range. This is the process of automatically extracting specific information through a website. We create a program, or the "bot," that crawls the backends of the website and extracts data in a usable format.
The dataset obtained has been used in several sectors. For example, a bot can be developed that retrieves stock prices of a specific date, monitoring temperature daily, or the commuters found on London Underground. Data scraping allows to add some more characteristics to the data and build better datasets quickly.
The medicine industry found potential in data scraping services as it has huge data that needs to be worked on. For example, collecting data on genetic variants from the internet or datasets with side effects of medication.
The Non-Technical Overview
In this, web scraping is done by the Python Selenium module as an example. Have a look at how it works using the below-mentioned situation:
Generate code that obtains automated prescribing data and retrieves details of the drug from the NHS website.
Prescribed Medication is the code.
Beginning by importing libraries:
#import library import pandas as PD import NumPy as np #!pip install selenium from selenium import webdriver
Then the data can be imported in the required format.
The list of medications prescribed to a patient in the National Health and Nutrition Examination Survey. It is a cross-sectional health dataset that studies the fitness and nutritional condition of samples from the US population and is open to the public.
#import drug chart df = PD.read_csv('drug_chart.CSV)
It's worth noting that the medications are brought as a string which makes it easy to handle the list of prescribed drugs for each patient.
#prescription list formation def prescription_list(row): """ This function returns a list of all the prescription medication an individual is prescribed""" if row['Prescriptions'] is np.nan: return(np.nan) else: drugs = row['Prescriptions'].split(", ") drugs_list = [] for i in drugs: drugs_list.append(i) return(drugs_list) df['prescription_list'] = df.apply(prescription_list, axis=1)#display df.head()
One of the libraries of web scraping is selenium of python which works through a mechanized Google page. Then it can be controlled programmatically for the search. let us create an example to use this function:
driver = web driver.Chrome("/usr/local/bin/chromedriver")
you can now instruct the driver to look for a certain page. Here, it’s looking for metformin which is an anti-diabetic medicine on the NHS website.
driver.get("https://www.nhs.uk/medicines/metformin/")
selenium has now opened the metformin page on google chrome. Select the info you wish the data to be extracted from the page, use right-click button, and copy the path of export. It copies a link to an appropriate HTML code and allows you to visit the section you want on the website. It can be termed as scraping from a website.
for example, if you would like to retrieve all the information for metformin through the NHS website. That information can be retrieved by following the given path:
"""//*[@id=”about-metformin”]/div"""
The following syntax is used to return:
#scrape metformin = driver.find_element_by_xpath("""//*[@id="about-metformin"]/div""") metformin.text"Metformin is a medicine used to treat type 2 diabetes, and to help prevent type 2 diabetes if you're at high risk of developing it.\nMetformin is used when treating polycystic ovary syndrome (PCOS), although it's not officially approved for PCOS.\nType 2 diabetes is an illness where the body does not make enough insulin, or the insulin that it makes does not work properly. This can cause high blood sugar levels (hyperglycemia).\nPCOS is a condition that affects how the ovaries work.\nMetformin lowers your blood sugar levels by improving the way your body handles insulin.\nIt's usually prescribed for diabetes when diet and exercise alone have not been enough to control your blood sugar levels.\nFor women with PCOS, metformin lowers insulin and blood sugar levels, and can also stimulate ovulation.\nMetformin is available on prescription as tablets and as a liquid that you drink."
The returned text can then be cleaned up with regex and can also make other modifications with replacing functions.
metformin.text.replace("\n", " ")
You can now use this information to extract data from several sections of the NHS website. drug details of patients for all prescribed medications in the database are returned by the nhs_ details function.
The important thing is to know that Selenium assumes that the page structures of the website are the same. So, if the website's HTML structure changes, the found element by the XPath function will fail. There are many methods to deal with this problem like methods for locating drug information, including multiple attempts and articles. So basically, this method is found from trial-error so it also helps in understanding the basic HTML structure of the NHS website.
def nhs_details(drug): drug = drug.lower() try: driver.get(f"https://www.nhs.uk/medicines/{drug}/") section_1 = driver.find_element_by_xpath(f"""//*[@id="about-{drug}"]/div""") section_1_text = section_1.text.replace("\n", " ") section_2 = driver.find_element_by_xpath("""//*[@id="key-facts"]/div""") section_2_text = section_2.text.replace("\n", " ") try: section_3 = driver.find_element_by_xpath(f"""//*[@id="who-can-and-cannot-take-{drug}"]/div""") section_3_text = section_3.text.replace("\n", " ") except: section_3 = driver.find_element_by_xpath(f"""//*[@id="who-can-and-cant-take-{drug}"]/div""") section_3_text = section_3.text.replace("\n", " ") return(section_1_text, section_2_text, section_3_text) except: driver.get(f"https://www.nhs.uk/medicines/{drug}-for-adults/") section_1 = driver.find_element_by_xpath(f"""//*[@id="about-{drug}-for-adults"]/div""") section_1_text = section_1.text.replace("\n", " ") section_2 = driver.find_element_by_xpath("""//*[@id="key-facts"]/div""") section_2_text = section_2.text.replace("\n", " ") section_3 = driver.find_element_by_xpath(f"""//*[@id="who-can-and-cannot-take-{drug}"]/div""") section_3_text = section_3.text.replace("\n", " ") return(section_1_text, section_2_text, section_3_text)
The following are the components of the code:
- lower() standardizes the input, and the web driver
- f-strings allow putting any medication into the URL.
- The found element by the XPath method returns interested data in JSON object from HTML.
- The object is converted into text and cleaned up to eliminate escape characters.
nhs_details('SITAGLIPTIN')('Sitagliptin is a medicine used to treat type 2 diabetes. Type 2 diabetes is an illness where the body does not make enough insulin, or the insulin that it makes does not work properly. This can cause high blood sugar levels (hyperglycemia). Sitagliptin is prescribed for people who still have high blood sugar, even though they have a sensible diet and exercise regularly. Sitagliptin is only available on prescription. It comes as tablets that you swallow. It also comes as tablets containing a mixture of sitagliptin and metformin. Metformin is another drug used to treat diabetes.', "Sitagliptin works by increasing the amount of insulin that your body makes. Insulin is the hormone that controls sugar levels in your blood. You take sitagliptin once a day. The most common side effect of sitagliptin is headaches. This medicine does not usually make you put on weight. Sitagliptin is also called by the brand name Januvia. When combined with metformin it's called Janumet.", "Sitagliptin can be taken by adults (aged 18 years and older). Sitagliptin is not suitable for some people. To make sure it's safe for you, tell your doctor if you: have had an allergic reaction to sitagliptin or any other medicines in the past have problems with your pancreas have gallstones, or have very high levels of triglycerides (a type of fat) in your blood are a heavy drinker or dependent on alcohol have (or have previously had) any problems with your kidneys are pregnant or breastfeeding, or trying to get pregnant This medicine is not used to treat type 1 diabetes (when your body does not produce insulin).")
Now, building a function that helps to return NHS website assistance for prescribed drugs in the NHANES extract.
Accessing NHS drug information ('Amlodipine is a medicine used to treat high blood pressure (hypertension). If you have high blood pressure, taking amlodipine helps prevent future heart disease, heart attacks, and strokes. Amlodipine is also used to prevent chest pain caused by heart disease (angina). This medicine is only available on prescription. It comes as tablets or as a liquid to swallow.', "Amlodipine lowers your blood pressure and makes it easier for your heart to pump blood around your body. It's usual to take amlodipine once a day. You can take it at any time of day, but try to make sure it's around the same time each day. The most common side effects include headache, flushing, feeling tired and swollen ankles. These usually improve after a few days. Amlodipine can be called amlodipine besilate, amlodipine maleate or amlodipine mesilate. This is because the medicine contains another chemical to make it easier for your body to take up and use it. It doesn't matter what your amlodipine is called. They all work as well as each other. Amlodipine is also called by the brand names Istin and Amlostin.", "Amlodipine can be taken by adults and children aged 6 years and over. Amlodipine is not suitable for some people. To make sure amlodipine is safe for you, tell your doctor if you: have had an allergic reaction to amlodipine or any other medicines in the past are trying to get pregnant, are already pregnant or you're breastfeeding have liver or kidney disease have heart failure or you have recently had a heart attack") Prescription medication: LOSARTAN Accessing NHS drug information ("Losartan is a medicine widely used to treat high blood pressure and heart failure and to protect your kidneys if you have both kidney disease and diabetes. Losartan helps to prevent future strokes, heart attacks, and kidney problems. It also improves your survival if you're taking it for heart failure or after a heart attack. This medicine is only available on prescription. It comes as tablets.", "Losartan lowers your blood pressure and makes it easier for your heart to pump blood around your body. It's often used as a second-choice treatment if you had to stop taking another blood pressure-lowering medicine because it gave you a dry, irritating cough. If you have diarrhea and vomiting from a stomach bug or illness while taking losartan, tell your doctor. You may need to stop taking it until you feel better. The main side effects of losartan are dizziness and fatigue, but they're usually mild and short-lived. Losartan is not normally recommended in pregnancy or while breastfeeding. Talk to your doctor if you're trying to get pregnant, you're already pregnant or you're breastfeeding. Losartan is also called by the brand name Cozaar.", "Losartan can be taken by adults aged 18 years and over. Children aged 6 years and older can take it, but only to treat high blood pressure. Your doctor may prescribe losartan if you've tried taking similar blood pressure-lowering medicines such as ramipril and lisinopril in the past but had to stop taking them because of side effects such as a dry, irritating cough. Losartan isn't suitable for some people. To make sure losartan is safe for you, tell your doctor if you: have had an allergic reaction to losartan or other medicines in the past have diabetes have heart, liver, or kidney problems recently had a kidney transplant have had diarrhea or vomiting have been on a low salt diet have low blood pressure is trying to get pregnant, are already pregnant or you are breastfeeding") Prescription medication: SIMVASTATIN Accessing NHS drug information ("Simvastatin belongs to a group of medicines called statins. It's used to lower cholesterol if you've been diagnosed with high blood cholesterol. It's also taken to prevent heart disease, including heart attacks and strokes. Your doctor may prescribe simvastatin if you have a family history of heart disease, or a long-term health condition such as rheumatoid arthritis, or type 1 or type 2 diabetes. The medicine is available on prescription as tablets. You can also buy a low-strength 10mg tablet from a pharmacy.", "Simvastatin seems to be a very safe medicine. It's unusual to have any side effects. Keep taking simvastatin even if you feel well, as you will still be getting the benefits. Most people with high cholesterol don't have any symptoms. Do not take simvastatin if you're pregnant, trying to get pregnant, or breastfeeding. Do not drink grapefruit juice while you're taking simvastatin. It doesn't mix well with this medicine. Simvastatin is also called Zocor and Simvador.", "Simvastatin can be taken by adults and children over the age of 10 years. Simvastatin isn't suitable for some people. Tell your doctor if you: have had an allergic reaction to simvastatin or any other medicines in the past have liver or kidney problems are trying to get pregnant, think you might be pregnant, you're already pregnant, or you're breastfeeding have severe lung disease regularly drink large amounts of alcohol have an underactive thyroid have, or have had, a muscle disorder (including fibromyalgia)")
It can be concluded that web scraping tools are very useful in converting a large amount of data available in the healthcare industry into a readable and usable format.
Are you looking for healthcare data scraping service? Contact 3i data scraping now!