Index de l'article

Advanced read/write in Excel

Formulas

Write formula

No problem to create Excel formulas with Pandas, but use these tips below.

  • As you will use simple and double quote in your formulas, frame your string with triple quotes.
  • Translate in english all Excel keywords (IF, CHAR, SUBSTITUTE...).
  • Use a simple comma to separate your parameters (english...).
  • Re-sort your dataframe and reset your index before the formula:
df.sort_values(by=['Your Id'], ascending=True, inplace=True)
df = df.reset_index(drop=True)
  • Start the index with a special value (to start with 2, as usual in Excel):
df.index = df.index + 2
  • Then use the index as a string in your formula:
'''...''' + df.index.astype(str) + '''...'''
  • Do not add other sorts after the formulas creation, it causes index issues and impacts your formulas.

 

Read formula

Sometime you want read formula as string. If the Excel file was created by Excel himself, no problem, just use data_only=True:

wb = openpyxl.load_workbook('My file.xlsx', data_only = True)
ws = wb['My tab']
df = pd.DataFrame(ws.values)
print(tabulate(df.head(5), headers='keys', tablefmt='psql', showindex=False))
But if the file, and the formulas, was created with Python, so the Excel cache do not know your formulas and... oups!
So just open and close your file before to use data_only=True:
import xlwings
...
excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open('My file.xlsx')
excel_book.save()
excel_book.close()
excel_app.quit()

Formating

Set font in blue, bold and centered:

import openpyxl
from openpyxl.styles import Font, PatternFill, Fill, Alignment
 
myFile = 'C:/Path/test.xlsx'
 
myDF.to_excel(myFile, index=False, sheet_name='Tab 1')
 
workbook = openpyxl.load_workbook(myFile)
worksheet = workbook['Sheet1']
 
font = Font(color='1469BE', bold=True)
alignment = Alignment(horizontal='center')
column = 'B'
 
for cell in worksheet[column][1:]:
    cell.font = font
    cell.alignment = alignment
 
workbook.save(myFile)

Conditional formatting

If a number is equal to, in just one cell

Here we add in one cell (B3) a rule to fill it in blue if it equals to 2:

import openpyxl
from openpyxl import styles, formatting
 
myFile = 'C:/Path/test.xlsx'
 
myDF.to_excel(myFile, index=False, sheet_name='Tab 1')
 
workbook = openpyxl.load_workbook(myFile)
worksheet = workbook['Sheet1']
 
myBlue = styles.PatternFill(start_color='0088cc', end_color='0088cc', fill_type='solid')
worksheet.conditional_formatting.add('B3', formatting.rule.CellIsRule(operator='equal', formula=['2'], fill=myBlue))
 
workbook.save(myFile)

If a number is greater than, in whole column

Same as above, we just change the operator with greaterThan and we target the whole column B as Excel did, with  $1:$1048576:

import openpyxl
from openpyxl import styles, formatting
 
myFile = 'C:/Path/test.xlsx'
 
myDF.to_excel(myFile, index=False, sheet_name='Tab 1')
 
workbook = openpyxl.load_workbook(myFile)
worksheet = workbook['Sheet1']
 
myBlue = styles.PatternFill(start_color='0088cc', end_color='0088cc', fill_type='solid')
worksheet.conditional_formatting.add('B1:B1048576', formatting.rule.CellIsRule(operator='greaterThan', formula=['2'], fill=myBlue))
 
workbook.save(myFile)

Operators

The above method supports several operators:

{'beginsWith', 'endsWith', 'notBetween', 'notContains', 'lessThan', 'greaterThanOrEqual', 'notEqual', 'between', 'lessThanOrEqual', 'containsText', 'greaterThan', 'equal'}

If contains a string

For match with strings it is a little specific, with redundant code and building the column_slot ourself, using the size of our dataframe +1 :

Here for example we fill in red the cells from A column when contain the string CSS.

import pandas as pd
import openpyxl
from openpyxl import styles, formatting
from openpyxl.styles.differential import DifferentialStyle
from openpyxl.formatting.rule import Rule
 
myFile = 'C:/Path/test.xlsx'
 
myDF.to_excel(myFile, index=False, sheet_name='Tab 1')
 
workbook = openpyxl.load_workbook(myFile)
worksheet = workbook['Sheet1']
 
myRed = styles.PatternFill(start_color='ffc7ce', end_color='ffc7ce', fill_type='solid')
 
column_slot = 'A2:A'+str(myDF.shape[0]+1)
 
styleRed = DifferentialStyle(fill=myRed)
rule = Rule(type='containsText', operator='containsText', text='css', dxf=styleRed)
rule.formula = ['SEARCH("css",' + column_slot + ")"]
worksheet.conditional_formatting.add(column_slot, rule)
 
workbook.save(myFile)

Fix Excel percentage

When your data comes from an Excel file and contains percentages, something like dtype (search it in Miscellaneous chapter) will be non-efficient, because Excel stores percentages as numbers (indices between 0 and 1). The displaying with % is just a view. Actually 35% will be stored as 0.3 for example.

In addition, it is possible that these percentages are melted with true strings or numbers in the same columns, and sometimes with or without decimal ... Argghhh 🥵

But just relax:

import re
...
df[MyField] = df[MyField].apply(lambda x: str(round(float(x) * 100, 2)).rstrip('0').rstrip('.') + ' %' if re.search(r'^0\.\d+$|^0$|^1$',x) else x)
Attention
If the Excel file is wrong builded, melting true numbers between 0 and 1 with percentages, they will be converted. So maybe add red alerts to manage these special cases.

Search by lines

See the chapter Search by lines.

Build hyperlink

df['id_commune'] = '=HYPERLINK("https://my_site.com/blog?id=' + df['my_id'].astype(str) + '", "' + df['my_article'].astype(str) + '")'

 

Liens ou pièces jointes
Télécharger ce fichier (France-Departements-Deformation.zip)France-Departements-Deformation.zip[France-Departements-Deformation]335 Ko
Télécharger ce fichier (simple_countries.zip)simple_countries.zip[simple_countries]1880 Ko