ADR-002: CouchDB as Primary Database

Decision Analysis and Resolution: SW360 Primary Database

Created by: SW360 Architecture Team
Original Decision: 2014
Reformatted: April 2026
Status: Accepted
Estimated read time: 12 minutes


Table of Contents

  1. Background
  2. Goal
  3. Key Principles
  4. Key Inputs, Assumptions and Restrictions
  5. Options Analysis
  6. Criteria for Making a Decision
  7. Final Decision
  8. Contributors
  9. Evolution & Updates

Background

SW360 manages software components, releases, projects, and licenses for open source license compliance. The data model is characterized by:

  • Complex, evolving structures: Components have different fields than licenses; new fields are frequently added
  • Binary attachments: Source code archives, clearing reports, license texts
  • Full-text search requirements: Search across all document fields
  • Enterprise deployments: Need horizontal scaling and high availability
  • Replication needs: Disaster recovery and geographic distribution

Why This Decision Matters: The database choice fundamentally affects every aspect of SW360—from development productivity to operational complexity to feature capabilities.


Goal

The goal of this decision analysis is to:

  1. Select a primary database that supports SW360’s document-centric data model
  2. Enable storage of binary attachments alongside metadata
  3. Provide full-text search capabilities
  4. Support enterprise deployment requirements (HA, replication, scaling)
  5. Minimize operational complexity

Key Principles

#PrincipleDescription
1Schema FlexibilitySupport varying document structures without migrations
2Attachment Co-locationStore binaries with their metadata
3Simple IntegrationAvoid complex drivers and ORM layers
4Operational SimplicityEasy backup, replication, and disaster recovery
5Production ReadinessBattle-tested with enterprise deployments

Key Inputs, Assumptions and Restrictions

TypeDescription
InputSW360 data model has ~20 document types with varying schemas
InputAttachments range from KB to hundreds of MB (source archives)
InputFull-text search is critical for license compliance workflows
AssumptionWrite volume is moderate; read-heavy workload
AssumptionStrong consistency within a document is sufficient
RestrictionMust be open source with enterprise adoption
RestrictionOperations team prefers HTTP-based APIs for monitoring

Options Analysis

Option 1 - PostgreSQL/MySQL

Summary

Use a traditional relational database with JSONB columns for flexible schema support. This leverages established SQL expertise while adding semi-structured data capabilities.

Conceptual View

┌─────────────────────────────────────────────┐
│               PostgreSQL                     │
├─────────────────────────────────────────────┤
│  components (id, type, data JSONB)          │
│  releases (id, component_id, data JSONB)    │
│  attachments (id, blob BYTEA, metadata)     │
│  ... foreign keys, indexes ...              │
└─────────────────────────────────────────────┘

Impact / Changes Required

  • Design hybrid schema with JSONB for flexible fields
  • Implement separate attachment storage (filesystem or blob)
  • Add full-text search via PostgreSQL FTS or external Elasticsearch
  • Build custom replication for HA

SWOT Analysis

CategoryAnalysis
Strengths1. ACID transactions guarantee consistency
2. Mature ecosystem, well-understood by operations
3. Powerful query capabilities (joins, aggregations)
4. PostgreSQL JSONB supports semi-structured data
5. Large community and tooling
Weaknesses1. Schema changes require migrations
2. Attachment storage requires separate solution
3. Horizontal scaling more complex (sharding)
4. ORM layers add complexity for nested documents
5. Master-slave replication, not multi-master
Opportunities1. Can add extensions (PostGIS, pg_trgm)
2. Foreign data wrappers for integration
Threats1. Schema evolution friction slows development
2. Attachment handling creates operational complexity
3. Full-text search requires additional setup

Option 2 - MongoDB

Summary

Use MongoDB as a document-oriented database with native JSON storage, flexible schemas, and GridFS for large file storage.

Conceptual View

graph TB
    subgraph MongoDB
        direction TB
        C1[Collections:<br/>components, releases,<br/>projects, licenses]
        C2[GridFS: attachments]
    end

Impact / Changes Required

  • Design document collections with embedded relationships
  • Configure GridFS for attachment storage
  • Set up replica sets for high availability
  • Integrate MongoDB text search

SWOT Analysis

CategoryAnalysis
Strengths1. Native JSON document storage
2. Flexible schema with embedded documents
3. GridFS for large file storage
4. Built-in text search
5. Horizontal scaling via sharding
Weaknesses1. Less mature than PostgreSQL (in 2014)
2. GridFS separates files from documents
3. No multi-document transactions (2014)
4. Memory-mapped storage can consume RAM
5. Requires MongoDB driver expertise
Opportunities1. Growing ecosystem and community
2. Cloud-hosted options (Atlas)
Threats1. License changes (SSPL) may affect deployment
2. Operational complexity for large deployments
3. Vendor lock-in concerns

Option 3 - Apache CouchDB

Summary

Use Apache CouchDB, a document-oriented database with HTTP API, native attachment storage, and built-in replication capabilities.

Conceptual View

graph TB
    subgraph CouchDB["Apache CouchDB"]
        direction TB
        DB1["sw360db/<br/>component:abc123 + attachments<br/>release:def456 + attachments<br/>license:ghi789"]
        DB2["sw360users/"]
        DB3["sw360attachments/"]
        F["Features:<br/>HTTP REST API | MVCC | Replication | Attachments"]
    end

Impact / Changes Required

  • Design documents by type with type discriminator field
  • Attach binaries directly to documents
  • Configure views (MapReduce) for queries
  • Set up replication for disaster recovery

SWOT Analysis

CategoryAnalysis
Strengths1. Native HTTP REST API—debug with curl/browser
2. Attachments stored directly on documents
3. Built-in master-master replication
4. MVCC prevents concurrent update conflicts
5. Flexible schema—no migrations needed
6. Apache license, truly open source
Weaknesses1. No SQL joins—manual denormalization needed
2. MapReduce views require JavaScript knowledge
3. Views can consume significant memory
4. Eventual consistency model
5. Smaller community than PostgreSQL/MongoDB
Opportunities1. CouchDB-Lucene for full-text search
2. Nouveau (CouchDB 3.4+) integrates search natively
3. PouchDB for offline-first applications
4. IBM Cloudant provides managed service
Threats1. Limited adoption compared to alternatives
2. Developer mindset shift from SQL
3. View performance tuning can be complex

Option 4 - Elasticsearch

Summary

Use Elasticsearch as the primary database, leveraging its distributed search and document storage capabilities.

Conceptual View

graph TB
    subgraph ES["Elasticsearch"]
        direction TB
        I1["Indices:<br/>sw360-components<br/>sw360-releases<br/>sw360-licenses"]
        I2["(attachments require external storage)"]
    end

Impact / Changes Required

  • Design indices with mappings
  • Implement external attachment storage
  • Handle document versioning manually
  • Manage cluster operations

SWOT Analysis

CategoryAnalysis
Strengths1. Exceptional search capabilities
2. Distributed by design
3. Powerful aggregations
4. Near real-time indexing
Weaknesses1. Not designed as primary database
2. No native attachment storage
3. Document updates are expensive (re-index)
4. No transactions or constraints
5. Cluster management complexity
Opportunities1. Combined search and storage
Threats1. License changes (Elastic License)
2. Data loss risks without careful configuration
3. Resource-intensive for primary storage
4. Not battle-tested as primary DB

Criteria for Making a Decision

T-Shirt Sizing Scale

T-Shirt SizeNumeric ValueMeaning
XS1.0Worst for this aspect
S2.5Poor
S-M3.75Below Average
M5.0Average
M-L6.25Above Average
L7.5Good
L-XL8.75Very Good
XL10.0Best for this aspect

Weighted Evaluation Matrix

CriteriaDescriptionWeightPostgreSQLMongoDBCouchDBElasticsearch
RatingScoreRatingScoreRatingScoreRatingScore
Schema FlexibilityAdd fields without migrations9M45.0L-XL78.75XL90.0L67.5
Attachment StorageStore binaries with documents9S22.5M45.0XL90.0XS9.0
Replication/HABuilt-in disaster recovery8M-L50.0L60.0XL80.0L60.0
Query FlexibilityComplex queries, aggregations7XL70.0L52.5M35.0L-XL61.25
Full-Text SearchSearch across all content7M35.0M-L43.75M-L43.75XL70.0
HTTP APIDebug with curl/browser6S15.0S-M22.5XL60.0L45.0
Production ReadinessEnterprise deployments8XL80.0M-L50.0L60.0M-L50.0
MVCC/Conflict HandlingConcurrent update safety6L45.0M30.0XL60.0M30.0
Developer ExperienceLearning curve, debugging7L-XL61.25L52.5L52.5M-L43.75
Open Source LicenseTrue open source, no lock-in6XL60.0M-L37.5XL60.0M30.0
Operational SimplicityBackup, monitoring, tuning7L52.5M-L43.75L52.5M35.0
TOTAL536.25516.25683.75501.5

Score Summary

RankOptionTotal ScoreRecommendation
🥇 1Apache CouchDB683.75SELECTED
🥈 2PostgreSQL536.25Good alternative for SQL teams
🥉 3MongoDB516.25Close competitor
4Elasticsearch501.5❌ Not suited as primary DB

Final Decision

Selected Option: Apache CouchDB

Rationale

Apache CouchDB was selected as the primary database for SW360 based on:

  1. Highest Weighted Score (683.75) - Clear winner across combined criteria

  2. Attachment Storage (XL) - Only option with native document-level attachment storage:

    {
      "_id": "release:abc123",
      "_attachments": {
        "source.tar.gz": { "content_type": "application/gzip", "length": 15728640 },
        "clearing-report.pdf": { "content_type": "application/pdf", "length": 524288 }
      },
      "name": "Apache Commons",
      "version": "3.12"
    }
    
  3. Schema Flexibility (XL) - Documents can have different structures:

    • Components have vendor references
    • Licenses have full text and obligations
    • No migration scripts needed for new fields
  4. Built-in Replication (XL) - Master-master replication out of the box:

    curl -X POST http://localhost:5984/_replicate \
      -d '{"source": "sw360db", "target": "http://backup:5984/sw360db", "continuous": true}'
    
  5. HTTP REST API (XL) - Debug with standard tools:

    # Get a document
    curl http://localhost:5984/sw360db/component:abc123
    
    # Search with Mango
    curl -X POST http://localhost:5984/sw360db/_find \
      -d '{"selector": {"type": "component", "componentType": "OSS"}}'
    

Implementation Notes

Database Structure:

CouchDB Instance
├── sw360db           # Main application data
│   ├── components    # type: "component"
│   ├── releases      # type: "release"
│   ├── projects      # type: "project"
│   ├── licenses      # type: "license"
│   └── attachments   # stored on documents
├── sw360users        # User data
└── sw360attachments  # Attachment metadata index

Access Patterns:

  • By ID: Direct document retrieval
  • By View: Pre-computed MapReduce indexes
  • By Mango: Ad-hoc JSON queries

Review Triggers

This decision should be revisited if:

  • CouchDB development stagnates or project becomes unmaintained
  • Multi-document transactions become critical requirement
  • Horizontal scaling beyond master-master replication needed
  • Full-text search performance becomes limiting factor

Contributors

NameRoleContribution
SW360 Architecture TeamDecision MakersRequirements analysis, criteria weighting
Operations TeamStakeholdersHA/DR requirements, operational input
Development TeamImplementersFeasibility assessment, POC

Evolution & Updates

Search Engine Changes

PeriodSearch EngineNotes
2014-2023CouchDB-LuceneExternal plugin, requires Java
2024+NouveauNative CouchDB 3.4+ feature

SDK Migration

PeriodClient LibraryNotes
2014-2023EktorpDeprecated, unmaintained since 2016
2024+IBM Cloudant SDKSee ADR-006

Consequences Summary

Positive

  • ✅ Flexible data model—add fields without migrations
  • ✅ Attachment handling—binaries co-located with metadata
  • ✅ HTTP API—simple debugging with curl/browser
  • ✅ Built-in replication—easy backup and disaster recovery
  • ✅ MVCC—automatic conflict detection
  • ✅ Apache License—true open source

Negative

  • ⚠️ No SQL joins—related data fetched separately or denormalized
  • ⚠️ Eventual consistency—not suitable for strict transaction requirements
  • ⚠️ Query limitations—complex queries need MapReduce views
  • ⚠️ Memory usage—views can consume significant memory
  • ⚠️ Learning curve—different paradigm from SQL databases

Technical Debt Created

  • MapReduce views require JavaScript maintenance
  • External search integration needed (CouchDB-Lucene → Nouveau)
  • Ektorp → Cloudant SDK migration required (completed 2024)

Revision History

VersionDateAuthorChanges
1.02014Architecture TeamInitial decision
2.0April 2026Bibhuti Bhusan DashReformatted to DAR/SWOT template