[SOLR-10233] Add support for different replica types in Solr - ASF JIRA

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 7.0
Component/s: SolrCloud
Labels:
None

Description

For the majority of the cases, current SolrCloud's distributed indexing is great. There is a subset of use cases for which the legacy Master/Slave replication may fit better:

Don’t require NRT
LIR can become an issue, prefer availability of reads vs consistency or NRT
High number of searches (requiring many search nodes)

~~SOLR-9835~~ is adding replicas that don’t do indexing, just update their transaction log. This Jira is to extend that idea and provide the following replica types:

Realtime: Writes updates to transaction log and indexes locally. Replicas of type “realtime” support NRT (soft commits) and RTG. Any realtime replica can become a leader. This is the only type supported in SolrCloud at this time and will be the default.
Append: Writes to transaction log, but not to index, uses replication. Any append replica can become leader (by first applying all local transaction log elements). If a replica is of type append but is also the leader, it will behave as a realtime. This is exactly what ~~SOLR-9835~~ is proposing (non-live replicas)
Passive: Doesn’t index or writes to transaction log. Just replicates from realtime or append replicas. Passive replicas can’t become shard leaders (i.e., if there are only passive replicas in the collection at some point, updates will fail same as if there is no leaders, queries continue to work), so they don’t even participate in elections.

When the leader replica of the shard receives an update, it will distribute it to all realtime and append replicas, the same as it does today. It won't distribute to passive replicas.

By using a combination of append and passive replicas, one can achieve an equivalent of the legacy Master/Slave architecture in SolrCloud mode with most of its benefits, including high availability of writes.

API (v1 style)

/admin/collections?action=CREATE…&realtimeReplicas=X&appendReplicas=Y&passiveReplicas=Z
/admin/collections?action=ADDREPLICA…&type=[realtime/append/passive]

“replicationFactor=” will translate to “realtime=“ for back compatibility
if passive > 0, append or realtime need to be >= 1 (can’t be all passives)

Placement Strategies

By using replica placement rules, one should be able to dedicate nodes to search-only and write-only workloads. For example:

shard:*,replica:*,type:passive,fleet:slaves

where “type” is a new condition supported by the rule engine, and “fleet:slaves” is a regular tag. Note that rules are only applied when the replicas are created, so a later change in tags won't affect existing replicas. Also, rules are per collection, so each collection could contain it's own different rules.
Note that on the server side Solr also needs to know how to distribute the shard requests (maybe ShardHandler?) if we want to hit only a subset of replicas (i.e. *passive *replicas only, or similar rules)

SolrJ

SolrCloud client could be smart to prefer passive replicas for search requests when available (and if configured to do so). Passive replicas can’t respond RTG requests, so those should go to realtime replicas.

Cluster/Collection state

{"gettingstarted":{
  "replicationFactor":"1",
  "router":{"name":"compositeId"},
  "maxShardsPerNode":"2",
  "autoAddReplicas":"false",
  "shards":{
    "shard1":{
      "range":"80000000-ffffffff",
      "state":"active",
      "replicas":{
        "core_node5":{
          "core":"gettingstarted_shard1_replica1",
          "base_url":"http://127.0.0.1:8983/solr",
          "node_name":"127.0.0.1:8983_solr",
          "state":"active",
          "leader":"true",
          **"type": "realtime"**},
        "core_node10":{
          "core":"gettingstarted_shard1_replica2",
          "base_url":"http://127.0.0.1:7574/solr",
          "node_name":"127.0.0.1:7574_solr",
          "state":"active",
          **"type": "passive"**}},
      }},
    "shard2":{
      ...

Back compatibility

We should be able to support back compatibility by assuming replicas without a “type” property are realtime replicas.

Failure Scenarios for passive replicas

Replica-Leader partition

In SolrCloud today, in this scenario the replica would be placed in LIR. With passive replicas, replicas may not be able to replicate from some time (and fall behind with the index) but queries can still be served. Once the connection is re-established the replication will continue.

Replica ZooKeeper partition

Passive replica will leave the cluster. “Smart clients” and other replicas (e.g. for distributed search) won’t find it and won’t query on it. Direct search requests to the replica may still succeed.

Passive replica dies (or is unreachable)

Replica won’t be query-able. On restart, replica will recover from the leader, following the same flow as realtime replicas: set state to DOWN, then RECOVERING, and finally ACTIVE. Passive replicas will use a different RecoveryStrategy implementation, that omits preparerecovery, and peer sync attempt, it will jump to replication . If the leader didn't change, or if the other replicas are of type “append”, replication should be incremental. Once the first replication is done, passive replica will declare itself active and start serving traffic.

Leader dies

Passive replica won’t be able to replicate. The cluster won’t take updates until a new leader is elected. Once a new leader is elected, updates will be back to normal. Passive replicas will remain active and serving query traffic during the “write outage”. Once the new leader is elected the replication will restart (maybe from a different node)

Leader ZooKeeper partition

Same as today. Leader will abandon leadership and a new replica will be elected as leader.

Q&A

Can I use a combination of passive + realtime?

You could. The problem is that, since realtime generate their own index, any change of leadership could trigger a full replication from all the passive replicas. The biggest benefits of append replicas is that they share the same index files, which means that even if the leader changes, the number of segments to replicate will remain low. For that reason, using append replicas is recommended when using passive.

Can I use passive + append + realtime?

The issue with mixing realtime replicas with append replicas is that if a different realtime replica becomes the leader, the whole purpose of using append replicas is defeated, since they will all have to replicate the full index.

What happens if replication from passives fail?

TBD: In general we want those replicas to continue serving search traffic, but we may want to have a way to say “If can’t replicate after X hours put yourself in recovery” or something similar.
varunthacker suggested that we include in the response time since the last successful replication, and then the client can choose what to do with the results (in a multi-shard request, this date would be the oldest of all shards).

Do passive replicas need to replicate from the leader only?

This is not necessary. Passive replicas can replicate from any realtime or append replicas, although this would add some extra waiting time for the last updates. Replicating from a realtime replica may not be a good idea, see the question “Can I use a combination of passive + realtime?”

What if I need NRT?

Then you can’t query append or passive replicas. You should use all realtime replicas

Will new passive replicas start receiving traffic immediately after added?

passive replicas will have the same states as realtime/append replicas, they’ll join the cluster as “DOWN” and be moved to “RECOVERY” until they can replicate from the leader. Then they’ll start the replication process and become “ACTIVE”, at this point they’ll start responding queries. They'll use a different RecoveryStrategy that skips peer sync and buffering of docs, and just replicates.

What if a passive replica receives an update?

This will work the same as today with non-leader replicas, it will just forward the update to the correct leader.

What is the difference between using active + passive with legacy master/slave?

These are just some I can think of:

You now need ZooKeeper to run in SolrCloud mode
High availability for writes, as long as you have more than 1 active replica
Shard management by Solr at index time and query time.
Full support for Collections and Collections API
SolrCloudClient support

I'd like to get some thoughts on this proposal.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

11431.consoleText.txt
23/May/17 14:58
836 kB
Steven Rowe
SOLR-10233.patch
03/Apr/17 23:51
200 kB
Tomas Eduardo Fernandez Lobbe
SOLR-10233.patch
31/Mar/17 20:47
209 kB
Tomas Eduardo Fernandez Lobbe
SOLR-10233.patch
31/Mar/17 19:19
204 kB
Tomas Eduardo Fernandez Lobbe
SOLR-10233.patch
08/Mar/17 21:12
123 kB
Tomas Eduardo Fernandez Lobbe
SOLR-10233.patch
07/Mar/17 00:06
123 kB
Tomas Eduardo Fernandez Lobbe

Issue Links

fixes

SOLR-10368 Randomize TestCollectionAPI to use both replication schemes

Resolved

is related to

SOLR-15297 Creating a collection with TLOG replicas fails with placement rules

Open

SOLR-11656 TLOG replication doesn't work properly after rebalancing leaders.

Closed

relates to

SOLR-9835 Create another replication mode for SolrCloud

Resolved

SOLR-10752 replicationFactor default should be 0 if tlogReplicas is specified when creating a collection

Resolved

SOLR-10772 Add support for replica types in CLI

Open

SOLR-10774 ReplicationHandler should fail when using "ReplicateFromLeader" (TLOG/PULL replica types) if the node is no longer the leader

Open

SOLR-10775 When using PULL replicas, it would be nice for the client to be able to know how old the current index is

Open

SOLR-10773 Add support for replica types in V2 API

Resolved

links to

GitHub Pull Request #196

(4 relates to, 1 links to)

Add support for different replica types in Solr

Details

Description

API (v1 style)

Placement Strategies

SolrJ

Cluster/Collection state

Back compatibility

Failure Scenarios for passive replicas

Replica-Leader partition

Replica ZooKeeper partition

Passive replica dies (or is unreachable)

Leader dies

Leader ZooKeeper partition

Q&A

Can I use a combination of passive + realtime?

Can I use passive + append + realtime?

What happens if replication from passives fail?

Do passive replicas need to replicate from the leader only?

What if I need NRT?

Will new passive replicas start receiving traffic immediately after added?

What if a passive replica receives an update?

What is the difference between using active + passive with legacy master/slave?

Attachments

Attachments

Issue Links

Activity

People

Dates