FINAL PROJECT: Abstract and Reader's Reponse > A collections framework for high performance R computing

Abstract:
Despite its popularity among data scientists, statisticians, and biologists, R is a fairly slow and memory-inefficient language. This is due largely to its unusual combination of language features and lack of built-in support for standard computer science data structures. In recent years, several authors have developed R packages that help users write more efficient programs. We aim to contribute to this effort with our package “RCollections.” RCollections is a unified framework of container classes that allows users to easily and efficiently store, manipulate, and retrieve data. Classes provided by the package exist in an s3 inheritance hierarchy and include staples such as “set,” “map,” “bag,” “stack,” “queue,” and “graph.” Containers can be used individually or combined to create more complex data structures as required. Under the hood the package is written in C++ and tightly integrated with the “foreach” and “iterators” packages. Consequently, operations are fast and can be parallelized to take advantage of multicore processors. In this paper, we begin with a review of standard computer science data structures and discuss the asymptotic efficiency of algorithms typically associated with those data structures. Next, we give an overview of the package’s design and describe how the package fits within the existing R ecosystem. Then, we briefly summarize low-level implementation details and run several benchmark performance tests. Finally, we conclude with several specific examples of package use. (226 words)

Reader’s profile:
The reader is skeptical of the package because there are already several packages on CRAN that implement similar data structures.

Reader’s response:
After reading this document I do think there is a place for RCollections on CRAN. The packages “hash,” “sets,” “rstackdeque,” and “network” already implement hash maps, sets, stacks and queues, and graphs, respectively. RCollections implements these things as well, but it does so in a unique way. First, RCollections allows users to combine containers to create specialized data structures, which is novel. Moreover, the inheritance hierarchy and consistent programming interface of the package make it easy to transition between different containers with minimal fuss. Additionally, the package is written in C++ and allows for easy parallel computation, so it is fast. Finally, the package has some specific features that the others do not.

Thesis:
R is a slow language due to its unusual language features and lack of built-in support for standard data structures. We seek to help address this problem by developing a package that provides users with easy access to high-performance C++ containers. Moreover, we seek to make this package accessible to R users of all skill types.

Voice:
Third person plural (while seeking to minimize use of “we” and “us”) throughout the document, as is standard in the field.

Citation:
In text author-date citations, which is standard in the field.
May 5, 2017 | Unregistered CommenterTB
Edit: For voice, I meant first person plural.
May 5, 2017 | Unregistered CommenterTB
T -- good plan. Generally, having more than one way to approach data tasks can be good. However, do you wish to address this reader in this way:

you make a case for what efficiencies or applications your approach has over those at CRAN now.
May 7, 2017 | Registered CommenterMarybeth Shea