15 min read

A Walk Though of Accessing Financial Statements with XBRL in R - Part 1

Introduction

As financial professionals and analytic software lovers, the ability to efficiently load a large number of financial statements, and conduct an analysis has always been a key objective. In previous posts, Redwall Analytics worked with a 15-year time series of municipal Comprehensive Annual Financial Reports (CAFR) for 15 Fairfield County, CT towns Fairfield County Town Level Spending and Liabilities Gallop Since 2001. We also studied the worrisome long-term, time series of unfunded liabilities for all 169 Connecticut municipalities Connecticut City Unfunded Pension And OPEB Liabilities. We are probably most proud of our work replicating the complicated one-year (2016) spreadsheet analysis by Marc Joffe (Reason Foundation) and Mark Fitch (Yankee Institute) over the long-term Replicating Yankee Institute Risk Score Over 15 Years.

However, all of these were conducted using downloaded .csv data from the State’s website, not on the kind of structured XBRL data now required in real-time for public companies on the SEC’s Edgar website. Despite several attempts, XBRL had remained elusive up until now, but with daily practice, tasks previously beyond reach, suddenly become achievable. A good deal of credit and much of the code to extract XBRL data from Edgar comes Aaron Mumala’s 2018 blog post here: Accessing Financial Data from the SEC - Part 2. In addition, Micah Waldstein’s Parsing Functions in edgarWebR, as well as his edgarWebR package were essential. Lastly, none of this would have been possible without Darko Bergant’s finstr and Roberto Bertolusso’s XBRL packages. To summarize the workflow to be described below, filing data is located with edgarWeb, the XBRL Instance Documents are downloaded and parsed into a list of data.frames with the XBRL package’s xbrlDoAll function, and finally, statements are organized into traditional Income Statements and Balance Sheets using the finstr package.

This will be the first in a three-part series. First, a quick walk-through to try to fill in some gaps from Aaron Mumala’s blog post and current documentation to help others up the learning curve with XBRL. Even better, we would welcome comments clarifying solutions to unresolved issues or even more efficient ways of extracting the same data than we have used. In the second part, we will use another free source of financial statement data (financialmodelingprep.com API) to pull 10-year’s of income statements for ~700 publically-listed pharma companies and explore R&D spending. Lastly, we will look to mine financial statement items for warning signs using metrics from a 2003 piece in the CFA Conference Proceeding.

10-K Filing List via edgarWebR

Below, we choose to query all past Apple 10-K filings up until last week wrapped in the try() function, so that our lapply of company_filings (from edgarWeb) won’t be halted (even if a particular filing can’t be successfully loaded). 28 results are available along with the web link (href) to each set of filings. Although filings prior to 2008 are available, these are in html format, and unfortunately, can’t be parsed with finstr and XBRL. A later effort may be made to retrieve these by scraping and parsing the html, but this is an advanced operation.

# This chunk was run previously and filing_list was saved for the purposes of this blog post

# Call company_filings from edgarWeb package
filing_list <- 
    try(company_filings(as.character('AAPL'),
                        ownership = FALSE, 
                        type = '10-K', 
                        before = "2020207",
                        count = 40, 
                        page = 1),
        TRUE)

# Show full list
filing_list

List of Annual Index Filing hrefs via edgarWebR

ticker <- 'AAPL'
type <- '10-K'

#parse this listing to find only 10-K filings...e.g. not 10-K/A and 10-K405
href_list <- filing_list[filing_list$type==type,]$href

# Show the first five index links
href_list[1:5]
[1] "https://www.sec.gov/Archives/edgar/data/320193/000032019319000119/0000320193-19-000119-index.htm"
[2] "https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/0000320193-18-000145-index.htm"
[3] "https://www.sec.gov/Archives/edgar/data/320193/000032019317000070/0000320193-17-000070-index.htm"
[4] "https://www.sec.gov/Archives/edgar/data/320193/000162828016020309/0001628280-16-020309-index.htm"
[5] "https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm"

Individual Filing Package Components via edgarWebR

Here we use edgarWeb’s filing_details function to see the available 19 documents for the 2017 index filing package using the 3rd element of the href_list above (2017). The first element here is the html 10-K, and there are links to many other elements from the filing including: html exhibits, .jpg’s, the complete submission as a text file, the XBRL schema and the all important XBRL Instance Document. If we copied the html url 10-K in our browser, it would show the 10-K, however, not in the XBRL form we are looking for.

# 2nd Element is 2018
filing_2018 <- 
  filing_details(href_list[[2]])$documents

# Just lines with "XBRL" and "10-K"
filing_2018[str_detect(filing_2018$description,"XBRL|10-K"),]
   seq                                            description
1    1                                                   10-K
12   9                                 XBRL INSTANCE DOCUMENT
13  10                XBRL TAXONOMY EXTENSION SCHEMA DOCUMENT
14  11  XBRL TAXONOMY EXTENSION CALCULATION LINKBASE DOCUMENT
15  12   XBRL TAXONOMY EXTENSION DEFINITION LINKBASE DOCUMENT
16  13        XBRL TAXONOMY EXTENSION LABEL LINKBASE DOCUMENT
17  14 XBRL TAXONOMY EXTENSION PRESENTATION LINKBASE DOCUMENT
                document
1   a10-k20189292018.htm
12     aapl-20180929.xml
13     aapl-20180929.xsd
14 aapl-20180929_cal.xml
15 aapl-20180929_def.xml
16 aapl-20180929_lab.xml
17 aapl-20180929_pre.xml
                                                                                      href
1   https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/a10-k20189292018.htm
12     https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/aapl-20180929.xml
13     https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/aapl-20180929.xsd
14 https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/aapl-20180929_cal.xml
15 https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/aapl-20180929_def.xml
16 https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/aapl-20180929_lab.xml
17 https://www.sec.gov/Archives/edgar/data/320193/000032019318000145/aapl-20180929_pre.xml
         type    size
1        10-K 2405889
12 EX-101.INS 3226268
13 EX-101.SCH   58783
14 EX-101.CAL  119640
15 EX-101.DEF  284952
16 EX-101.LAB  883295
17 EX-101.PRE  511226

Extract Available XBRL Instance Documents from Edgar via edgarWebR and XBRL Package Custom Function

In this case, we are going with only 10-K’s, so we filter above again by type equal “10-K”, and extract the 24 href’s available. Next below, we use xbrlDoAll from the XBRL package to extract the financial statements using the links from the href_list we made above. Note that we again use try() because the function would fail on some of the documents. We are not going to run the code here on the blog, because it takes a few minutes to run for Apple. We previously saved and reloaded the object above.

We have not shown here, but XBRL parsing for US companies is governed by the schemas from the US GAAP reported Taxonomy, which are updated annually. These can be found here on the XBRL US website XBRL Taxonomy. By downloading the relevant .xsd file, removing the .xml suffix at the end and moving it to in the “xbrl.Cache” file created when you run XBRL, these problems were resolved for XBRL Instance Documents. Redwall spent quite a bit of time figuring out that we needed currency, dei, exch, country and currency files among others. We did this manually and by trial and error, but surely there is a better way, but a more efficient solution might be found here on Stack Overflow XBRL Package Error in File

# This chunk was run previously and stmt_list was saved for the purposes of this blog post

# Need to use project where xbrl.cache is stored for this chunk
setwd("/Users/davidlucey/Desktop/David/Projects/xbrl_investment")

# Run xbrlDoAll and store in stmt_list
stmt_list <-  
  
    # apply function to each element of href_list
    lapply(href_list, function(href) {
      
    # Get filing details for the particular annual href index
    filing.details <- filing_details(href)$documents
    
    # Further get XBRL Instance Document for year href
    doc <- filing.details[str_detect(filing.details$description,"XBRL INSTANCE DOCUMENT"),]$href
    
    # Extract XBRL of using specified href
    try(xbrlDoAll(doc))
    })

As we have set it up in the code above, we loop through all Apple’s 10-K hrefs parsing with the XBRL package. This gives a list object for each annual reporting package, each in turn containing 9-10 data.frames (as shown below). Most of the data.frames relate to the taxonomy, governed by the .xsb files for that year. Starting in 2008, 10-Ks began being filed as “XBRL Instance Documents”, and these work seamlessly with the workflow described above. Around 2018, 10-K’s shifted to “Extended XBRL Instance Document” format, which we could not build into financial statements with finstr.

# Show one of extracted XBRL statements
summary(stmt_list[[3]])
             Length Class      Mode
element       8     data.frame list
role          5     data.frame list
calculation  11     data.frame list
context      13     data.frame list
unit          4     data.frame list
fact          9     data.frame list
definition   11     data.frame list
label         5     data.frame list
presentation 11     data.frame list

In the future, we would like to better understand the XBRL data structure, but again, it wasn’t easy to find much simple documentation on the subject. This post from Darko Bergant gives an excellent diagram of the structural hierarchy of the XBRL object Exploring XBRL files with R. He goes on in the same post: All values are kept in the fact table (in the fact field, precisely). The element table defines what are these values (the XBRL concepts, e.g. “assets”, “liabilities”, “net income” etc.). The context table defines the periods and other dimensions for which the values are reported.

Below, we show the first 50 lines of the “fact” list of the 2017 XBRL package. As noted above, data here is nested in levels, with the top level being the Balance Sheet, Income Statement, etc (again see Exploring XBRL files with R). It is possible to drill down to lower level items. Mr. Bergant also explains here how to for example extract a lower level item such as warranty information which wouldn’t be broken out at the top level How to get all the elements contained in the original XBRL?.

# Drill down on financial statement "fact" for individual year in stmt_list
stmt_list[[3]]$fact[1:50,c(2:4)]
                                                                                                                                                              contextId
1                                                                                                                                                           FD2015Q4YTD
2                                                                   FD2015Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_AccumulatedOtherComprehensiveIncomeMember
3                                                           FD2015Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_CommonStockIncludingAdditionalPaidInCapitalMember
4                                                                                      FD2015Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_RetainedEarningsMember
5                                                                                                                                                           FD2016Q4YTD
6                                                                   FD2016Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_AccumulatedOtherComprehensiveIncomeMember
7                                                           FD2016Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_CommonStockIncludingAdditionalPaidInCapitalMember
8                                                                                      FD2016Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_RetainedEarningsMember
9                                                                                                                                                           FD2017Q4YTD
10                                                                  FD2017Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_AccumulatedOtherComprehensiveIncomeMember
11                                                          FD2017Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_CommonStockIncludingAdditionalPaidInCapitalMember
12                                                                                     FD2017Q4YTD_us-gaap_StatementEquityComponentsAxis_us-gaap_RetainedEarningsMember
13                                                                                                                                                             FI2017Q4
14                                                                                                                                                             FI2016Q4
15                                                                                                                                                             FI2017Q4
16                                                                                           FI2016Q4_us-gaap_BalanceSheetLocationAxis_us-gaap_AccruedLiabilitiesMember
17                                                                                           FI2017Q4_us-gaap_BalanceSheetLocationAxis_us-gaap_AccruedLiabilitiesMember
18 FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_FourthQuarter2017CanadianDollarDenominatedDebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate2.513NotesMember
19                          FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_FourthQuarter2017DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate1.500NotesMember
20                          FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_FourthQuarter2017DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate2.100NotesMember
21                          FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_FourthQuarter2017DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate2.900NotesMember
22                          FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_FourthQuarter2017DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate3.750NotesMember
23              FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof1.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate4.300NotesMember
24             FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate1.550NotesMember
25             FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate1.900NotesMember
26             FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate2.500NotesMember
27             FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate3.000NotesMember
28             FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate3.350NotesMember
29             FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate4.250NotesMember
30        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesDue2019Member
31        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesDue2020Member
32        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_SecondQuarter2017DebtIssuanceof10.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesDue2022Member
33               FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof1.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate3.000NotesMember
34               FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof7.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate1.800NotesMember
35               FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof7.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate2.300NotesMember
36               FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof7.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate2.850NotesMember
37               FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof7.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate3.200NotesMember
38          FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof7.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesDue2020Member
39          FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017DebtIssuanceof7.0BillionMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesDue2022Member
40            FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017EuroDenominatedDebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate0.875NotesMember
41            FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_ThirdQuarter2017EuroDenominatedDebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRate1.375NotesMember
42                      FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2013DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRateRangefrom1.000to3.850NotesMember
43                                        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2013DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesMember
44                      FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2014DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRateRangefrom2.100to4.450NotesMember
45                                        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2014DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesMember
46                      FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2015DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRateRangefrom0.350to4.375NotesMember
47                                        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2015DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesMember
48                      FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2016DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRateRangefrom1.100to4.650NotesMember
49                                        FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2016DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FloatingRateNotesMember
50                      FD2017Q4YTD_us-gaap_DebtInstrumentAxis_aapl_A2013DebtIssuanceMember_us-gaap_LongtermDebtTypeAxis_aapl_FixedRateRangefrom1.000to3.850NotesMember
   unitId         fact
1     usd    748000000
2     usd            0
3     usd    748000000
4     usd            0
5     usd    379000000
6     usd            0
7     usd    379000000
8     usd            0
9     usd    620000000
10    usd            0
11    usd    620000000
12    usd            0
13    usd 166000000000
14    usd 216000000000
15    usd 252300000000
16    usd    163000000
17    usd     35000000
18   <NA>         2024
19   <NA>         2019
20   <NA>         2022
21   <NA>         2027
22   <NA>         2047
23   <NA>         2047
24   <NA>         2019
25   <NA>         2020
26   <NA>         2022
27   <NA>         2024
28   <NA>         2027
29   <NA>         2047
30   <NA>         2019
31   <NA>         2020
32   <NA>         2022
33   <NA>         2027
34   <NA>         2020
35   <NA>         2022
36   <NA>         2024
37   <NA>         2027
38   <NA>         2020
39   <NA>         2022
40   <NA>         2025
41   <NA>         2029
42   <NA>         2043
43   <NA>         2018
44   <NA>         2044
45   <NA>         2019
46   <NA>         2045
47   <NA>         2020
48   <NA>         2046
49   <NA>         2021
50   <NA>         2018

Apply XBRL_get_statements via finstr Package to transform XBRL into Financial Statements

Next, we run finstr’s xbrl_get_statements on our stmt_list objects (ie: the XBRL Instance Documents) to convert the parsed XBRL objects into financial statements.

# This chunk was run previously and result_list was saved for the purposes of this blog post

result_list <- 
    sapply(stmt_list, function(stmt) {
      try(xbrl_get_statements(stmt))
      })

As shown above, finstr’s xbrl_get_statements is unsuccessful on the first item (2019), which is the new “Extended XBRL Instance Document”, because of a problem with duplicate keys after the spreading the data.frame. It seems that finstr hasn’t been updated since 2017, so it likely has something to do with the problem. We asked for guidance on Stack Overflow Finstr Get XBRL Statement Error Parsing XBRL Instance Documents, but so far no luck. Also, it fails with the pre-2008 documents as expected, which were only available as html.

Catching Errors in Successfully Parsed Statements

There are many issues with XBRL including that companies classify items in differing ways, use different names for the same items and sometimes the there are errors in the calculation hierarchy. Classification differences will be challenging, but finstr has the check_statement function to show where there are calculation inconsistencies.

# Run check_statement on 2nd element of result_list (2018)
for(list in result_list[2]) {
   print(lapply(list, check_statement))
}
$ConsolidatedBalanceSheets
Number of errors:  0 
Number of elements in errors:  0 

$ConsolidatedStatementsOfCashFlows
Number of errors:  6 
Number of elements in errors:  1 

$ConsolidatedStatementsOfComprehensiveIncome
Number of errors:  0 
Number of elements in errors:  0 

$ConsolidatedStatementsOfOperations
Number of errors:  0 
Number of elements in errors:  0 

Above we can see that there are six errors in the “CashAndCashEquivalentsPeriodIncreaseDecrease” element_id in the 2017 Statement of Cash Flows. We can then drill down to see what those are. In this case, it is not a mismatch between the original data and the amount calculated as a check. The check is just not there (NA). We need to do further research to fully understand this function.

check <- check_statement(result_list[[2]]$ConsolidatedStatementsOfCashFlows, element_id = "CashAndCashEquivalentsPeriodIncreaseDecrease")

# Calculated is NA
check$calculated
[1] NA NA NA

Looking at Apple’s 2017 Balance Sheet Disclosure

Here we show the balance sheets for the first available document which is as of September 2018 (2019 does exist, but we were not able to parse that thus far). We could do the same for the Income Statement or the Statement of Cash Flows.

# Drill down to annual balance sheet from result_list
result_list[[2]]$ConsolidatedBalanceSheets
Financial statement: 2 observations from 2017-09-30 to 2018-09-29 
 Element                                             2018-09-29 2017-09-30
 Assets =                                            365725     375319    
 + AssetsCurrent =                                   131339     128645    
   + CashAndCashEquivalentsAtCarryingValue            25913      20289    
   + AvailableForSaleSecuritiesCurrent                40388      53892    
   + AccountsReceivableNetCurrent                     23186      17874    
   + InventoryNet                                      3956       4855    
   + NontradeReceivablesCurrent                       25809      17799    
   + OtherAssetsCurrent                               12087      13936    
 + AssetsNoncurrent =                                234386     246674    
   + AvailableForSaleSecuritiesNoncurrent            170799     194714    
   + PropertyPlantAndEquipmentNet                     41304      33783    
   + OtherAssetsNoncurrent                            22283      18177    
 LiabilitiesAndStockholdersEquity =                  365725     375319    
 + Liabilities =                                     258578     241272    
   + LiabilitiesCurrent =                            116866     100814    
     + AccountsPayableCurrent                         55888      44242    
     + OtherLiabilitiesCurrent                        32687      30551    
     + DeferredRevenueCurrent                          7543       7548    
     + CommercialPaper                                11964      11977    
     + LongTermDebtCurrent                             8784       6496    
   + LiabilitiesNoncurrent =                         141712     140458    
     + DeferredRevenueNoncurrent                       2797       2836    
     + LongTermDebtNoncurrent                         93735      97207    
     + OtherLiabilitiesNoncurrent                     45180      40415    
 + CommitmentsAndContingencies                            0          0    
 + StockholdersEquity =                              107147     134047    
   + CommonStocksIncludingAdditionalPaidInCapital     40201      35867    
   + RetainedEarningsAccumulatedDeficit               70400      98330    
   + AccumulatedOtherComprehensiveIncomeLossNetOfTax  -3454       -150    

Our results_list is nested, so not so easy to work with so we use the purrr package to flatten it into 35 financial statement objects including all of the balance sheets, income statements and cash flows for the period. The same first balance sheet we have already looked at is shown, but now at the top level so that the underlying data is easier to analyze and plot.

# Flatten nested lists down to one list of all financial statements for the 10 year period
fs <- purrr::flatten(result_list)

# Now the balance sheet is at the top level of fs instead of nested
fs[[2]]
Financial statement: 2 observations from 2017-09-30 to 2018-09-29 
 Element                                             2018-09-29 2017-09-30
 Assets =                                            365725     375319    
 + AssetsCurrent =                                   131339     128645    
   + CashAndCashEquivalentsAtCarryingValue            25913      20289    
   + AvailableForSaleSecuritiesCurrent                40388      53892    
   + AccountsReceivableNetCurrent                     23186      17874    
   + InventoryNet                                      3956       4855    
   + NontradeReceivablesCurrent                       25809      17799    
   + OtherAssetsCurrent                               12087      13936    
 + AssetsNoncurrent =                                234386     246674    
   + AvailableForSaleSecuritiesNoncurrent            170799     194714    
   + PropertyPlantAndEquipmentNet                     41304      33783    
   + OtherAssetsNoncurrent                            22283      18177    
 LiabilitiesAndStockholdersEquity =                  365725     375319    
 + Liabilities =                                     258578     241272    
   + LiabilitiesCurrent =                            116866     100814    
     + AccountsPayableCurrent                         55888      44242    
     + OtherLiabilitiesCurrent                        32687      30551    
     + DeferredRevenueCurrent                          7543       7548    
     + CommercialPaper                                11964      11977    
     + LongTermDebtCurrent                             8784       6496    
   + LiabilitiesNoncurrent =                         141712     140458    
     + DeferredRevenueNoncurrent                       2797       2836    
     + LongTermDebtNoncurrent                         93735      97207    
     + OtherLiabilitiesNoncurrent                     45180      40415    
 + CommitmentsAndContingencies                            0          0    
 + StockholdersEquity =                              107147     134047    
   + CommonStocksIncludingAdditionalPaidInCapital     40201      35867    
   + RetainedEarningsAccumulatedDeficit               70400      98330    
   + AccumulatedOtherComprehensiveIncomeLossNetOfTax  -3454       -150    

Next, we can graph the evolution of “Current” items over the 10-year period. Note that in 2017, Apple started calling them “Consolidated Balance Sheets”. Prior to that, they were named “Statements of Financial Position Classified”. Hence, we had to regex match using both “Balance” and “Position” to get the relevant documents for the full period. This clearly cumbersome and difficult to do at scale on a larger number of companies. We can see that both current assets and liabilities have tripled over the period.

# Regex match list items matching "Balance" and "Position" and rbind into bs (balance sheet)
bs <-
  rbindlist(fs[str_detect(names(fs), "Balance|Position")], fill = TRUE)

# Drop any duplicates with same endDate
bs <- unique(bs, by = "endDate")

# Function to scale y axis label
scaleFUN <- function(x) paste("$",x/1000000000,"Billion")

current <- names(bs)[str_detect(names(bs), "Current")]

# Tidy and ggplot from within data.table bs object coloring variables
bs[, melt(.SD, id.vars = "endDate", measure.vars=current)][
  ][!is.na(value)][
    ][, ggplot(.SD, 
               aes(as.Date(endDate),
                   as.numeric(value),
                   color = variable)) +
        geom_line() +
        scale_y_continuous(labels = scaleFUN) +
        labs(title = "Apple Current Items over Time",
             caption = "Source: SEC Edgar") +
        ylab("Amount") +
        xlab("Year") +
        theme_bw()]

We are curious about R&D spending over time, so here we took the income statements which again used changing names (Statements of Income and Statements of Operations) over time. It looks like Apple reduced its R&D spending sharply after the iPhone launch, but has ramped it up again to launch the wearables business. We will use this strategy to extract specific balance sheet and income statement items in Part 3 to mine for warning signs.

# Regex match list items matching "StatementOFIncome" and StatementsOfOperations", and rbind into is (income statement)
is <-
  rbindlist(fs[str_detect(names(fs),"StatementOfIncome$|StatementsOfOperations$")], fill =
              TRUE)

# Drop NA rows
is <- 
  is[!is.na(SalesRevenueNet)]

# Drop previous year's which are dupes
is <-
  unique(is, by = "endDate")

# Mutate R&D to sales ratio variable and select columns needed
is <- 
  is[, rd_sales :=
       ResearchAndDevelopmentExpense / SalesRevenueNet][
       ][, .(endDate, rd_sales)]

# Tidy and ggplot within data.table is object
is[, melt(.SD, id.vars = "endDate", measure.vars=c("rd_sales"))][
  , ggplot(.SD, aes(as.Date(endDate), as.numeric(value), color = variable)) +
    geom_line() +
    scale_y_continuous(labels = scales::percent) +
    labs(title = "Apple R&D-to-Sales Ratio has More than Doubled since 2012",
         caption = "SEC Edgar") +
    ylab("Amount") +
    xlab("Year") +
    theme_bw()]

Conclusion

After spending the few years solving many problems in R, the volume of discussion about XBRL still seems surprisingly sparse for such a big potential use case. In addition, most of development at least in R seemed to stop cold in 2017. It will be interesting to learn if this is because most the people who look at financial statements generally don’t use XBRL in analytic software yet, or because it is too inefficient to get clean data in a usable form. In the end, Aaron Mumala suggested that downloading and parsing XBRL from Edgar was probably still not ready for prime time.

Admittedly, the challenges discovered in this exercise because of changing statement names, financial statement items, data formats, we wondered if the SEC Edgar XBRL disclosures are suitable for a large scale analysis. Many providers charge for data parsing and cleaning, and it may be worth paying for, especially if real money will be involved. In Part 2 of this series, we will look at using a free outside provider of financial statement data (Financial Modeling Prep) and see what we find.