Cassandra

Cassandra architecture

Search for: Cassandra architecture

An artcile from Dzone on cassandra architecture


On a cluster of nodes
distributed
Large volumes of data
High availability
High write throughput
High read throughput
No master nodes
No single point of failure

How does cassandra partitions its data?

Search for: How does cassandra partitions its data?

Above taken from

How does cassandra partitions data among its cluster of nodes

Search for: How does cassandra partitions data among its cluster of nodes

Docs on data replication of cassandra

Arch in brief from docs

A sequentially written commit log on each node captures write activity to ensure data durability. Data is then indexed and written to an in-memory structure, called a memtable, which resembles a write-back cache. Each time the memory structure is full, the data is written to disk in an SSTables data file. All writes are automatically partitioned and replicated throughout the cluster. Cassandra periodically consolidates SSTables using a process called compaction, discarding obsolete data marked for deletion with a tombstone. To ensure all data across the cluster stays consistent, various repair mechanisms are employed.

Partitioning of data on multiple nodes based on a key is explained here

cassandra replicas data center cluster

Search for: cassandra replicas data center cluster

Partitioning vs replication in Cassandra

Search for: Partitioning vs replication in Cassandra

What are consistency levels in Cassandra: Local_Quorum

Search for: What are consistency levels in Cassandra: Local_Quorum

what are key spaces in cassandra?

Search for: what are key spaces in cassandra?

Cluster, Data center, Node


A Cluster is a collection of Data Centers.

A Data Center is a collection of Racks.

A Rack is a collection of Servers.

A Server contains 256 virtual nodes (or vnodes) by default.

A vnode is the data storage layer within a server.

How does cassandra replicate data across data centers?

Search for: How does cassandra replicate data across data centers?

Setting up multiple data centers for cassandra

Search for: Setting up multiple data centers for cassandra

This articles clarifies a few things

If you have two data-centers -- you basically have complete data in each data-center. And if you have set replication factor, say, 2 for each data-center -- this means each data-center will have 2 copies of the data.

what is a seed node in cassandra?

Search for: what is a seed node in cassandra?

What are snitches in cassandra?

Search for: What are snitches in cassandra?

You can read up on snitches here

A snitch determines which datacenters and racks nodes belong to. They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into datacenters and racks. Specifically, the replication strategy places the replicas based on the information provided by the new snitch. All nodes must return to the same rack and datacenter. Cassandra does its best not to have more than one replica on the same rack (which is not necessarily a physical location).

SimpleStrategy and Network topology Strategy differences

Search for: SimpleStrategy and Network topology Strategy differences

A good read on cassandra architecture from an external perspective

This seem like a down to earth article on replication

Another article