Making tables from Stata is one of the most common coding tasks in applied economics. For most researchers, it is also one on which much time is wasted: questions about better ways of automating the formatting of nice tables from Stata often crop up on social media. Reproducibility in economics also crucially depends on streamlining this process.
Take the example of the pre-working paper publication reproducibility checks DIME has recently implemented. This is a bit of a fancy group: all codes are typically hosted on GitHub, all data registered in the microdata Library, RAs get a lot of training on reproducible coding practices. Yet, out of 22 recently reviewed papers, most failures to replicate had to do with the workflow used to export tables.
In this blog, we propose workflows to minimize the pain and increase the gains. We share the framework DIME Analytics has developed to help research teams with the task of coding tables, discuss two distinct stages to the problem, and link to Stata code for getting the job done.
When incorporating tables into papers, it is a common practice to copy-and-paste results from csv and Excel files, or the Stata window, and then format then in Word. Some setups are more manual than others, but the road to sharing results that do not reproduce is short: all you need to do is to not copy one table, or one line of one table, after updating your data or specification. Additionally, heavily formatting tables after they are exported often makes it harder to confirm that the results exported are the same as the ones shown in the paper. This all needs to stop!!!
There are lots of reasons to export tables somewhere other than the Stata results window, but they don't all justify the same approach. You might be exploring regression results with various specifications, and not want to read them one-by-one. You might be preparing a report or paper for submission or publication. Your journal might require tables inline in Word. (Really.) Depending on what you are doing now and what you might need to do in the future, there are some questions that should help you triage before implementing code:
- Do I need this output to be immediately shareable without post-processing?
- Is this output ready for publication, or just for discovery and exploration? - Do I need to be able to adjust number formatting and rounding later?
- Will I need to adjust table layout and formatting later?
- What will be the required workflow when I re-produce this table?
- What will happen to the table if I alter models, parameters, or other core components?
- Am I likely to alter models, parameters, or other core components?
Different use cases have different answers to these questions, but most projects fall into one of two broad development stages.
The way you move from Stage One to Stage Two will depend on the output software you plan to use. Here, we will describe how to automate LaTeX tables — we also talk about working in Excel here. LaTeX gives more flexibility with auto-updating tables in reports and presentations; however, there is some fixed cost in learning the formatting language. Excel remains popular (and is preferred by some journals for house styling), but applying formatting in Excel remains a largely manual process, slowing down replication runs.
In all cases, we recommend a simple file structure to help keep organized. Every table has its own file. When tables are in Stage One, these outputs should be named informatively: names like main-regression.tex
, robustness-checks.xlsx
, and balance-tables.tex
are great. During Stage Two, these should change to structural names: table-01.tex
or table-A05_robustness.xlsx
are acceptable (note the use of dashes, leading zeros, and underscores to organize semantic content: learn more about naming things). This ensures that you and your code reviewers can always find things and understand how they are connected both to the code and to the final product. Use Git wherever possible to track and store past and alternative specifications, and how they affect your results.
Last month, the two most-downloaded packages from SSC were estout
and outreg2
, which are used to export tables. Both can create simple tables in LaTeX, although they will not always look the nicest without formatting. Exporting results to individual .tex
files for each table and importing them with input
into a master .tex
document is the easiest way to create outputs when you are still making changes to the results. The greatest advantage of all this is that you only need to recompile the master document, without any copy-pasting or opening multiple files to see all the new results at once.
As for actually doing this, the estout
package, by Ben Jann, has lots of options. You can get it to do basically anything you want! The default table is pretty simple, and the documentation is huge, but we've prepared a few go-to examples that solve the most common formatting needs for a LaTeX table. The esttab
command also allows you to export nicely formatted tables to Word, Excel, csv and HTML, but the options vary from one format to the other.
If you're trying to create a very specific table format, the easiest way to do it in a replicable manner is to write the complete LaTeX code for the table. This means saving any number that should be displayed as locals, and hardcoding the LaTeX code for the table. But instead of writing the number themselves, you just call the locals that were previously saved. filewrite
allows you to write the LaTeX code in a do-file, then have Stata write the text file with the table, and save it as a .tex
file. You can find an example of how to use it here.
The two commands above are our go-to solution to exporting tables to LaTeX. However, there are a few other options out there. outreg2 also exports tables to LaTeX format, but we've found it harder to use and to find resources than estout
. stata-tex is another option for custom tables, but takes some more setting up with Excel and Python. Finally, you can write a whole HTML, word or PDF document using different options for Stata markdown, entirely within Stata. Discussing these options would take yet another blog post, but you can check out the dynamic documents, markstat, and texdoc documentation for more information.
Whichever software and packages you decide to use, automating your table creation workflow will likely save you time, as long as you do it at the right moment. It will also greatly reduce the risk of circulating, submitting or publishing manuscripts with out-of-date results. And if we may plant another seed for debate, this, rather than aesthetics is also the main reason to prefer LaTeX over Word for applied work.
联系客服