Home

Thesis Outline and Additional Papers

Jason Cairns

2022-08-29

See research for previous writing

  1. Front Matter
    1. Title
    2. Abstract
    3. ToC, LoF, LoT
  2. Introduction
    1. Broad introduction to topic
    2. Problem Statement
      • Proposal
      • Project Overview
      • Means of development
      • Emphasise developer level
      • Match to what has been produced (work backwards a little)
    3. Beneficiaries of this research
    4. History & context
      • big data work group
  3. Literature Review
    1. Overview: review method, limitations, scope
    2. Literature, grouped by similarities
      1. Packages
      2. Algorithms
  4. Brief theory chapter
    1. Motivating the following chapter’s aspects, objectives which will be linked in the next chapter to each aspect
    2. Defining a distributed statistical modelling system for R
      • What does it mean to be distributed?
      • Properties that distributed object/chunk/computation etc. must possess
      • What can’t be done
      • Direction: Statistical Modelling
      • High-level (like mathematical definition)
      • Response
  5. Illustrative Problem
    1. Aspect: Object System
    2. Aspect: Computation
    3. Aspect: Concurrency
    4. Aspect: Reference
  6. Experiments in implementation of System
    1. RServe-based System - Current and Proposed Information Structures - Experiment: Eager Distributed Object Supplement - Experiment: Eager Distributed Object Precursory Report - Experiment: Distributed Decision Tree
    2. Redis-based System - Initial Distributed Object Experimentation with a Message Queue Communication System - Report on Current Chunk Architecture - Chunk ID Origination and Client-Server Communication - Message Queues for Communication in a Distributed Object System - Initial Chunk Experimentation with a Message Queue Object System - Inter-node communication with Redis
    3. DistObj System - distObj System Initialisation and Input - DistObje Non-Assigned Data Return - Description of distObj Client-Server Call Process
  7. Outline of System Implementation
    1. Overview
    2. orcv
    3. chunknet
    4. largescaler
    5. largescalemodelr
  8. System Capabilities
    1. Application
      • ADMM
      • lm, gml, admm, xgboost, boosting
      • What is more efficient, what is less (data movement)
    2. Theoretical comparison with other Systems
    3. Benchmarks
      • e.g. Against Spark table, foreach etc.
      • Validity and justification of benchmarks
    4. Extensions to the system
      • e.g. shuffle, index as in-system extensions
      • Open/Closed principle
      • Extensions serving to validate the system
  9. Recommendations
    1. Comparison between expectation and reality
    2. Future Work
  10. Appendices
    1. Documentation
    2. Source Code
    3. Bibliography