algorithm/algorithm/HashedArrayTree.java at master · tonghe1900/algorithm

History

391 lines (349 loc) · 15.7 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

/***************************************************************************

* File: HashedArrayTree.java

* Author: Keith Schwarz (htiek@cs.stanford.edu)

* An implementation of the List abstraction backed by a hashed array tree

* (HAT), a data structure supporting amortized O(1) lookup, append, and

* last-element removal. In this sense it is akin to a standard dynamic

* array implementation. However, a hashed array tree also has the advantage

* that its memory overhead is only O(sqrt(n)) rather than the typical O(n)

* found in dynamic arrays and their variants.

* Internally, the hashed array tree is implemented as an array of pointers

* that optionally point to an array of elements. The topmost array and each

* element always have the same size, which is always a power of two. In

* this sense, the hashed array tree is essentially a two-dimensional array

* of elements. However, the advantage of the hashed array tree is that the

* topmost array pointers are all initially null and only filled in when space

* is needed. This means that the maximum overhead of the structure is the

* size of the topmost array, plus the number of unused elements in the current

* block. Here is one sample HAT:

* [ ] [ ] [ ] [ ]

* | | |

* v v v

* [0] [4] [8]

* [1] [5] [9]

* [2] [6] [ ]

* [3] [7] [ ]

* Here, the topmost array of pointers has three pointers in use, each of which

* point to an array of the corresponding number of elements.

* Whenever an element is added to a hashed array tree, one of three cases

* must hold:

* 1. There is extra space at the end of the final subarray (for example, in

* the top picture). In that case, the element is added to that position.

* 2. The final subarray is full, but space for another subarray exists in the

* topmost array. In that case, a new array is allocated and the element

* is added to that array.

* 3. The final subarray is full and no new arrays remain open. In that case,

* since the topmost array has size 2^n and each array has size 2^n, there

* must be a total of 2^(2n) elements in the hashed array tree. We

* next double the size of the topmost array to 2^(n + 1), then allocate

* 2^(n - 1) subarrays of size 2^(n + 1) elements for a total of 2^(2n)

* elements' worth of space. The elements from the old HAT are then copied

* over, a new array is allocated for the new element, and it is added as

* the first element of that array.

* Let's now talk about the performance and memory usage of this structure.

* First, we note that we can perform lookups in O(1), assuming that the

* machine is transdichotomous (meaning that a single machine word can hold

* the size of any array). This can be done by breaking the input index in

* half, then using the first half to choose which array to look into (the

* "hashed" part of HAT) and the second half to choose which index to select.

* This trick is similar to the trick used to represent a two-dimensional

* array using a linear structure.

* Next, let's think about how much time it takes to do an append operation.

* Each append when space remains takes O(1) to look up the proper position

* in the hashed array tree for writing, but appends can be much more expensive

* when the HAT needs to be resized. Fortunately, this does not happen very

* often. Whenever the HAT doubles in size, its capacity grows from 2^(2n) to

* 2^(2n + 2), meaning that four times as many elements must be inserted before

* the next copy operation. If we define a potential function as twice the

* number of filled-in elements in the HAT above half capacity (i.e. the

* number of elements in the arrays in the second half), then we can prove an

* amortized O(1) time for append. Consider any series of appends. If the

* append does not expand the HAT, then it takes O(1) time and increases the

* potential by 1/2. If the append does expand the HAT, and the HAT's topmost

* array has size 2^n, then there must be 2^(n-1) elements in the latter half

* of the HAT, so the potential is 2^(2n). The time required to move each

* of the 2^(2n) elements is 2^(2n), and in the new HAT there are no elements

* in the latter half of the array. Consequently, the new potential is zero.

* The actual time required to perform the append is thus 2^2n + O(1), and

* the decrease in potential is -2^2n, so the amortized cost is O(1) as

* expected.

* Last, let's talk about the cost to do a remove. This is similar to the

* append case - we delete the last element of the last array, removing the

* array from the topmost array if it becomes empty. We also compact the

* HAT if it becomes too sparse by shrinking from a HAT of size 2^(2n + 2) to

* a HAT of size 2^(2n) if the HAT becomes one-eighth full. A similar

* potential method can be used to show that this operation runs in amortized

* O(1).

* Finally, let's consider the memory overhead of the HAT. For any HAT of

* topmost array size 8 or more, since the HAT is always at least one-eighth

* full, there must be at least one full array. This array has size equal to

* the size of the topmost array (call this k), and so if every array were to

* be filled in to capacity, there would be a total of k arrays of size k,

* for a total of k^2 elements. Of this capacity, we know that at least an

* eighth of them are filled in, so k^2 must be at most 8n, and so

* k = O(sqrt(n)). To finish the analysis, the overhead of the structure is

* at most the overhead of this top-level array, plus potentially k - 1

* unused elements in some array. This is a total of O(k) = O(sqrt(n))

* overhead, which is what we originally desired.

import java.util.*; // For AbstractList

@SuppressWarnings("unchecked")

public final class HashedArrayTree<T> extends AbstractList<T> {

/* To simplify the implementation, we enforce that the size of the topmost

* array never drops below 2. This prevents weirdness when we try to

* allocate 2^(n-1) arrays during a doubling and find that n = 0.

private static final int kMinArraySize = 2;

/* The topmost array of elements; initially of size two. */

private T[][] mArrays = (T[][]) new Object[kMinArraySize][];

/* Number of elements, initially zero since the HAT is created empty. */

private int mSize = 0;

/* A constant containing lg2 of the topmost array size. This enables some

* cute bit-twiddling tricks to improve efficiency.

private int mLgSize = 1;

/**

* Returns the number of elements in the HashedArrayTree.

* @return The number of elements in the HashedArrayTree.

@Override

public int size() {

return mSize;

}

/**

* Adds a new element to the HashedArrayTree.

* @param elem The element to add.

* @return true

@Override

public boolean add(T elem) {

/* First, check if we're completely out of space. If so, do a resize

* to ensure we do indeed have room.

if (size() == mArrays.length * mArrays.length)

grow();

/* Compute the (arr, index) pair for the next position. The next

* position is at the location indicated by size(), but we know that

* space exists from the previous call.

final int offset = computeOffset(size());

final int index = computeIndex(size());

/* Check if an array exists here. If not, make one up. */

if (mArrays[offset] == null)

mArrays[offset] = (T[]) new Object[mArrays.length];

/* Write the element to its location. */

mArrays[offset][index] = elem;

/* Update the element count. */

++mSize;

/* Per the Collections contract, return true to signal a successful

* add.

return true;

}

/**

* Sets the element at the specified position to the indicated value.

* If the index is out of bounds, throws an IndexOutOfBounds exception.

* @param index The index at which to set the value.

* @param elem The element to store at that position.

* @return The value initially at that location.

* @throws IndexOutOfBoundsException If index is invalid.

@Override

public T set(int index, T elem) {

/* Find out where to look. */

final int offset = computeOffset(index);

final int arrIndex = computeIndex(index);

/* Cache the value there and write the new one. */

T result = mArrays[offset][arrIndex];

mArrays[offset][arrIndex] = elem;

/* Hand back the old value. */

return result;

}

/**

* Returns the value of the element at the specified position.

* @param index The index at which to query.

* @return The value of the element at that position.

* @throws IndexOutOfBoundsException If the index is invalid.

@Override

public T get(int index) {

/* Check that this is a valid index. */

if (index < 0 || index >= size())

throw new IndexOutOfBoundsException("Index " + index + ", size " + size());

/* Look up the element. */

return mArrays[computeOffset(index)][computeIndex(index)];

}

/**

* Adds the specified element at the position just before the specified

* index.

* @param index The index just before which to insert.

* @param elem The value to insert

* @throws IndexOutOfBoundsException if the index is invalid.

@Override

public void add(int index, T elem) {

/* Confirm the validity of the index. */

if (index < 0 || index >= size())

throw new IndexOutOfBoundsException("Index " + index + ", size " + size());

/* Add a dummy element to ensure that everything resizes correctly.

* There's no reason to repeat the logic.

add(null);

/* Next, we need to shuffle down every element that appears after

* the inserted element. We'll do this using our own public interface.

for (int i = size(); i > index; ++i)

set(i, get(i - 1));

/* Finally, write the element. */

set(index, elem);

}

/**

* Removes the element at the specified position from the HashedArrayTree.

* @param index The index of the element to remove.

* @return The value of the element at that position.

* @throws IndexOutOfBoundsException If the index is invalid.

@Override

public T remove(int index) {

/* Cache the value at the indicated position; this also does the bounds

* check.

T result = get(index);

/* Use a naive shuffle-down algorithm to reposition elements after

* the removed one.

for (int i = index + 1; i < size(); ++i)

set(i - 1, get(i));

/* Clobber the last element to play nicely with the garbage collector. */

set(size() - 1, null);

/* Decrement our size. */

--mSize;

/* If we are now at 1/8 total capacity, shrink the structure. */

if (size() * 8 <= mArrays.length * mArrays.length)

shrink();

/* Otherwise, if the size is now an even multiple of the array size,

* we can drop the very last array. This is the array whose offset

* is one after the end of the elements.

else if (size() % mArrays.length == 0)

mArrays[computeOffset(size())] = null;

return result;

}

/**

* Given an index, returns the offset into the master array at which the

* element with that index can be found.

* @return The index into the topmost array where the given element can

* be found.

private int computeOffset(int index) {

/* This can be computed by dividing the index by the index by the

* topmost array. However, if we want to be very clever, we can do

* this efficiently by bit-shifting downard by the lg2 of the size

* of the topmost array.

return index >> mLgSize;

}

/**

* Given an index, returns the offset into the appropriate subarray in

* which the element with that index can be found.

* @return The index into the subarray array where the given element can

* be found.

private int computeIndex(int index) {

/* This can be computed by modding the index by the index by the

* topmost array. But we can do this more efficiently with a different

* tactic. Since the array size is a perfect power of two, it must

* look like this:

* 00..010..0

* Subtracting one yields

* 00..001..1

* ANDing this with the index produces the value we're looking for.

return index & (mArrays.length - 1);

}

/**

* Grows the internal representation by doubling the size of the topmost

* array and copying the appropriate number of elements over.

private void grow() {

/* Double the size of the topmost array. */

T[][] newArrays = (T[][]) new Object[mArrays.length * 2][];

/* The new arrays each have size 2^(n + 1). We need 2^(n - 1) of them

* to hold the old elements. Allocate those here and copy everything

* over.

for (int i = 0; i < mArrays.length; i += 2) {

/* Allocate the array. */

newArrays[i / 2] = (T[]) new Object[newArrays.length];

/* Use System.arraycopy to move everything over. */

System.arraycopy(mArrays[i], 0, newArrays[i / 2], 0, mArrays.length);

System.arraycopy(mArrays[i + 1], 0, newArrays[i / 2], mArrays.length, mArrays.length);

/* Null out the old arrays to be nice to the GC during this

* potentially stressful time.

mArrays[i] = mArrays[i + 1] = null;

}

/* Switch out this new array for the old. */

mArrays = newArrays;

/* Bump up lg2 of the size. */

++mLgSize;

}

/**

* Decreases the size of the HAT by shrinking into a better fit.

private void shrink() {

/* If the size of the topmost array is at its minimum, don't do

* anything. This doesn't change the asymptotic memory usage because

* we only do this for small arrays.

if (mArrays.length == kMinArraySize) return;

/* Otherwise, we currently have 2^(2n) / 8 = 2^(2n - 3) elements.

* We're about to shrink into a grid of 2^(2n - 2) elements, and so

* we'll fill in half of the elements.

T[][] newArrays = (T[][]) new Object[mArrays.length / 2][];

/* Copy everything over. We'll need half as many arrays as before. */

for (int i = 0; i < newArrays.length / 2; ++i) {

/* Create the arrays. */

newArrays[i] = (T[]) new Object[newArrays.length];

/* Move everything into it. If this is an odd array, it comes

* from the upper half of the old array; otherwise it comes from

* the lower half.

System.arraycopy(mArrays[i / 2], (i % 2 == 0)? 0 : newArrays.length,

newArrays[i], 0, newArrays.length);

/* Play nice with the GC. If this is an odd-numbered array, we

* just copied over everything we needed and can clear out the

* old array.

if (i % 2 == 1)

mArrays[i / 2] = null;

}

/* Copy the arrays over. */

mArrays = newArrays;

/* Drop the lg2 of the size. */

--mLgSize;

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

HashedArrayTree.java

Latest commit

History

HashedArrayTree.java

File metadata and controls