Reading a database into Pandas

Lately I had to connect to a Windows Server database quickly to perform some fast data analysis. I don’t know any faster way to do that but in Python and run it through Pandas. Today I will share how to connect to a database (in my case using pymssql) and read the data you want into a Pandas DataFrame. Enjoy!

First of all, we need to install pymssql. This can be done as easily as pip install pymssql when using pip or conda install pymssql when using Anaconda Python packages. If you want to use any other database package, you must know that Pandas supports SQLAlchemy connectable and DBAPI2 connections.

Once we imported our module, we can now use it in code!

# Import data
import pymssql
import pandas as pd

We can throw our database strings into variables. Mind that the best way would be to read those from ENV variables, but for the purpose of this blog post, that would be an overkill.

# Database access
DB_HOST       = 'Input database location'
DB_PASSWORD   = 'Input user password'
DB_USER       = 'Input user name'

Now, let’s create a connection to our database using the variables:

conn = pymssql.connect(

Now its the time to do some SQL work. You need to write whatever you want to get out of the database. In my case I’ll do a lazy example on getting all info about a Note from the database (Don’t try that at home… actually anywhere).

sql = 'SELECT * FROM Notes'

Last step! Getting the data into a Pandas DataFrame. This is as simple as:

result_df = pd.read_sql(sql , conn)

And voilla! You have your data exactly where you want it. But wait… there is more! What if you have a shitty server and you can’t just get a couple of million records in one go? Let’s try breaking this into chunks!

chunksize = 500  # Make it as big as you want
chunks = []  # A list to store chunk DataFrames

# Mind the chunksize parameter
for chunk in pd.read_sql(sql , conn, chunksize=chunksize):

# Connect the chunks
full_df = pd.concat(chunks)

Good job, now you know how to connect to a database and read the data straight into Pandas!

Tell me what you think!